Editing User Preferences
The aPhyloGeo software allows users to customize their preferences through a YAML configuration file. This file includes a set of parameters for easy handling. Below is an example configuration file with explanations for each parameter:
Configuration File Example
file_name: './datasets/example/geo.csv'
specimen: 'id'
dist_threshold: 60
window_size: 200
step_size: 100
bootstrap_threshold: 100
reference_gene_dir: './datasets/example'
reference_gene_file: 'sequences.fasta'
makeDebugFiles: True
alignment_method: '1' # Options: 1: pairwiseAligner, 2: MUSCLE, 3: CLUSTALW, 4: MAFFT
distance_method: '1' # Options: 1: Least-Square distance, 2: Robinson-Foulds distance, 3: Euclidean distance (DendroPY)
fit_method: '1' # Options: 1: Wider Fit by elongating with Gap (starAlignment), 2: Narrow-fit prevent elongation with gap when possible
tree_type: '1' # Options: 1: BioPython consensus tree, 2: FastTree application
rate_similarity: 90
method_similarity: '1' # Options: 1: Hamming distance, 2: Levenshtein distance, 3: Damerau-Levenshtein distance, 4: Jaro similarity, 5: Jaro-Winkler similarity, 6: Smith–Waterman similarity, 7: Jaccard similarity, 8: Sørensen-Dice similarity
preprocessing_genetic: 1 # Enable genetic preprocessing (1 = yes, 0 = no)
preprocessing_climatic: 1 # Enable climatic preprocessing (1 = yes, 0 = no)
preprocessing_threshold_genetic: 0.2 # Proportion of gaps allowed per column in genetic alignments
preprocessing_threshold_climatic: 0.7 # Variance threshold for filtering climatic features
permutations_mantel_test: 999 # Number of permutations for significance testing
permutations_protest: 999 #permutations for PROTEST analysis
mantel_test_method: "pearson" # Correlation method ('pearson', 'spearman', 'kendalltau')
statistical_test: '0' # Please select one ~ 0: Both test, 1: Mantel test, 2: Procrustes analysis
User Preferences Options
File Name: Path to the input data file (`./datasets/example/geo.csv` in the example).
Specimen: Identifier for the specimens in the dataset (`id` in the example).
Bootstrap Threshold: Number of replicates threshold to be generated for each sub-MSA.
Distance Threshold: Distance threshold between genetic tree and climatic tree for each sub-MSA.
Window Size: Size of the sliding window.
Step Size: Sliding window advancement step.
Data Names: List of newick file names for each dataset.
Reference Gene Directory: Directory containing reference gene data (
./datasets/examplein the example).Reference Gene File: File containing reference gene sequences (
sequences.fastain the example).Make Debug Files: Option to generate debug files (True or False).
Alignment Method: Algorithm selection for sequence alignment (
'1'in the example). To use the MUSCLE, CLUSTALW, or MAFFT alignement method, please make sure to follow the installation instructions provided in the Alignment Dependencies Installation section.Distance Method: Distance selection (
'1'in the example).Fit Method: Gap selection elongation (
'1'in the example).Tree Inference Method / Tree Type: The choice of inference methods (
'1'in the example).Rate Similarity: The rate similarity between sequences to reduce and remove the sub-MSA with a high value of similarity.
Method Similarity: The choice of similarity methods (
'1'in the example).Genetic Preprocessing: Enable or disable filtering of alignment columns with gaps (
preprocessing_genetic: 1to enable, `0` to disable).Climatic Preprocessing: Enable or disable variance-based filtering of climatic features (
preprocessing_climatic: 1to enable, `0` to disable).Genetic Preprocessing Threshold: Maximum allowed proportion of gaps per column in the alignment (
preprocessing_threshold_genetic: 0.2in the example).Climatic Preprocessing Threshold: Minimum variance threshold to retain a climatic feature (
preprocessing_threshold_climatic: 0.7in the example).Mantel Test Permutations: Number of permutations for the Mantel test (
permutations_mantel_test: 999in the example).PROTEST Permutations: Number of random permutations for the PROTEST (Procrustes randomization test) to assess statistical significance (
permutations_protest: 999in the example).Mantel Test Method: Correlation method for the Mantel test (
"pearson"in the example).Statistical Test: Select which statistical test(s) to perform for global correlation between climatic and genetic matrices. Options:
0: Run both Mantel and Procrustes/PROTEST1: Run only the Mantel test2: Run only Procrustes + PROTEST (statistical_test: '0'in the example)