######################################################################
 
README
 
######################################################################

=====================================
Usage Description
=====================================
The alignment pipeline is used to align the other 14 species to the cucumber reference genome with LASTZ & MULTIZ.
Three steps are performed as: (1) dataset_preparation; (2) pair-wise alignment using LASTZ; (3) Merge the pair alignments into 15way multiple alignment using MULTIZ

 
=====================================
Directory Contents
=====================================
 
This directory includes README, input directory, output directory and scripts directory. 
 
 
The sections below include:
 
	alignment.sh file
	scripts directory
		run.sh
		pl_scripts
			pslSplitOnTarget_myself.pl
			remove_nonbase.pl
			scaffold_length_filter.pl
	Input directory
		genome_fasta directory
			species[1~15]_scaffold.fa files
		GAP_matrix_plant
	Output directory
		pair_wise_maf directory
			Cucumber_spec.maf files
		15way.multiz
	README file
 
 
=====================================
run.sh file
=====================================
##############################################################################################
# Step1 prepare_dataset   
# Usage: Filtered the scaffolds whose length are less than length_cutoff                     
#      Depedency_tools: scaffold_filter.pl, remove_nobase.pl 	                            
#      Input: scaffold or chromosome fasta files in each species                             
#      Output: species_scaffold_good.fa 							                         
# Step2 Pair-wise alignment using LASTZ tool with cucumber genome as reference     			 
#       This stript was a reference to online resource										
#       From 2.1 to 2.8 step by step														 
# Usage:Aligning each of other 14 species genome to reference genome                         
#      Dependency_tools:lastz-1.02.00,ucsc_tool,pslSplitOnTarget_myself.pl,GAP_matrix_plant  																						 
#      Input: target_genome:Cucumber_scaffold_good.fa quary_genome:each of other 14 genomes  
#      Output: maf_files [Multiple Alignment Format]										 
# Step3 Multiple alignment using MULTIZ tool to merge the 14 pair-wise alignments above      
#       From 3.1 to 3.14 step by step														 
# Usage:Species indexed by d, where d is gradually increased with the divergence from        
#       cucumber according to the phylogenetic tree in Figure 1                              
#      Dependency_tools:multiz-tba.012109										             																					 
#      Input: Other 14 pair-wised maf files [named by species_Cucumber.maf]				     
#      Output: n-way maf_files [n=3~15]							             				 
##############################################################################################


=====================================
scripts directory
=====================================
This directory provides the dependency three perl scripts required in alignment.sh.
(1) pslSplitOnTarget_myself.pl was used to rewrite the format of all psl files in step2.3 of alignment.sh
(2) remove_nonbase.pl was used to remove nobase characters in the scaffolds.
(3) scaffold_length_filter.pl was used to remove nobase characters in the scaffolds.

=====================================
Input directory
=====================================
This directory provides 15 species genome dataset as fasta files  and GAP_matrix_plant required in alignment.sh. 
GAP_matrix_plant was used in LASTZ step.

=====================================
Output directory
=====================================
This directory provides main output files including pair-wise alignments and multiple alignment as 15way.multiz files.


pair_wise_maf directory
--------------------------------------------------------
Including 14 {reference}_{species}.maf: reference=Cucumber

=====================================
README file
=====================================
It is this file.

 



