Editing User Preferences

The aPhyloGeo software allows users to customize their preferences through a YAML configuration file. This file includes a set of parameters for easy handling. Below is an example configuration file with explanations for each parameter:

Configuration File Example

file_name: './datasets/example/geo.csv'
specimen: 'id'
dist_threshold: 60
window_size: 200
step_size: 100
bootstrap_threshold: 100
reference_gene_dir: './datasets/example'
reference_gene_file: 'sequences.fasta'
makeDebugFiles: True
alignment_method: '1' # Options: 1: pairwiseAligner, 2: MUSCLE, 3: CLUSTALW, 4: MAFFT
distance_method: '1' # Options: 1: Least-Square distance, 2: Robinson-Foulds distance, 3: Euclidean distance (DendroPY)
fit_method: '1' # Options: 1: Wider Fit by elongating with Gap (starAlignment), 2: Narrow-fit prevent elongation with gap when possible
tree_type: '1' # Options: 1: BioPython consensus tree, 2: FastTree application
rate_similarity: 90
method_similarity: '1' # Options: 1: Hamming distance, 2: Levenshtein distance, 3: Damerau-Levenshtein distance, 4: Jaro similarity, 5: Jaro-Winkler similarity, 6: Smith–Waterman similarity, 7: Jaccard similarity, 8: Sørensen-Dice similarity
preprocessing_genetic: 1                # Enable genetic preprocessing (1 = yes, 0 = no)
preprocessing_climatic: 1               # Enable climatic preprocessing (1 = yes, 0 = no)
preprocessing_threshold_genetic: 0.2    # Proportion of gaps allowed per column in genetic alignments
preprocessing_threshold_climatic: 0.7   # Variance threshold for filtering climatic features
permutations_mantel_test: 999 # Number of permutations for significance testing
permutations_protest: 999 #permutations for PROTEST analysis
mantel_test_method: "pearson" # Correlation method ('pearson', 'spearman', 'kendalltau')
statistical_test: '0' # Please select one ~ 0: Both test, 1: Mantel test, 2: Procrustes analysis

User Preferences Options

  • File Name: Path to the input data file (`./datasets/example/geo.csv` in the example).

  • Specimen: Identifier for the specimens in the dataset (`id` in the example).

  • Bootstrap Threshold: Number of replicates threshold to be generated for each sub-MSA.

  • Distance Threshold: Distance threshold between genetic tree and climatic tree for each sub-MSA.

  • Window Size: Size of the sliding window.

  • Step Size: Sliding window advancement step.

  • Data Names: List of newick file names for each dataset.

  • Reference Gene Directory: Directory containing reference gene data (./datasets/example in the example).

  • Reference Gene File: File containing reference gene sequences (sequences.fasta in the example).

  • Make Debug Files: Option to generate debug files (True or False).

  • Alignment Method: Algorithm selection for sequence alignment ('1' in the example). To use the MUSCLE, CLUSTALW, or MAFFT alignement method, please make sure to follow the installation instructions provided in the Alignment Dependencies Installation section.

  • Distance Method: Distance selection ('1' in the example).

  • Fit Method: Gap selection elongation ('1' in the example).

  • Tree Inference Method / Tree Type: The choice of inference methods ('1' in the example).

  • Rate Similarity: The rate similarity between sequences to reduce and remove the sub-MSA with a high value of similarity.

  • Method Similarity: The choice of similarity methods ('1' in the example).

  • Genetic Preprocessing: Enable or disable filtering of alignment columns with gaps (preprocessing_genetic: 1 to enable, `0` to disable).

  • Climatic Preprocessing: Enable or disable variance-based filtering of climatic features (preprocessing_climatic: 1 to enable, `0` to disable).

  • Genetic Preprocessing Threshold: Maximum allowed proportion of gaps per column in the alignment (preprocessing_threshold_genetic: 0.2 in the example).

  • Climatic Preprocessing Threshold: Minimum variance threshold to retain a climatic feature (preprocessing_threshold_climatic: 0.7 in the example).

  • Mantel Test Permutations: Number of permutations for the Mantel test (permutations_mantel_test: 999 in the example).

  • PROTEST Permutations: Number of random permutations for the PROTEST (Procrustes randomization test) to assess statistical significance (permutations_protest: 999 in the example).

  • Mantel Test Method: Correlation method for the Mantel test ("pearson" in the example).

  • Statistical Test: Select which statistical test(s) to perform for global correlation between climatic and genetic matrices. Options: 0: Run both Mantel and Procrustes/PROTEST 1: Run only the Mantel test 2: Run only Procrustes + PROTEST (statistical_test: '0' in the example)