Editing User Preferences

The aPhyloGeo software allows users to customize their preferences through a YAML configuration file. This file includes a set of parameters for easy handling. Below is an example configuration file with explanations for each parameter:

Configuration File Example

file_name: './datasets/example/geo.csv'
specimen: 'id'
dist_threshold: 60
window_size: 200
step_size: 100
bootstrap_threshold: 100
reference_gene_dir: './datasets/example'
reference_gene_file: 'sequences.fasta'
makeDebugFiles: True
alignment_method: '1' # Options: 1: pairwiseAligner, 2: MUSCLE, 3: CLUSTALW, 4: MAFFT
distance_method: '1' # Options: 1: Least-Square distance, 2: Robinson-Foulds distance, 3: Euclidean distance (DendroPY)
fit_method: '1' # Options: 1: Wider Fit by elongating with Gap (starAlignment), 2: Narrow-fit prevent elongation with gap when possible
tree_type: '1' # Options: 1: BioPython consensus tree, 2: FastTree application
rate_similarity: 90
method_similarity: '1' # Options: 1: Hamming distance, 2: Levenshtein distance, 3: Damerau-Levenshtein distance, 4: Jaro similarity, 5: Jaro-Winkler similarity, 6: Smith–Waterman similarity, 7: Jaccard similarity, 8: Sørensen-Dice similarity
preprocessing_genetic: 1                # Enable genetic preprocessing (1 = yes, 0 = no)
preprocessing_climatic: 1               # Enable climatic preprocessing (1 = yes, 0 = no)
preprocessing_threshold_genetic: 0.2    # Proportion of gaps allowed per column in genetic alignments
preprocessing_threshold_climatic: 0.7   # Variance threshold for filtering climatic features

User Preferences Options

  • File Name: Path to the input data file (`./datasets/example/geo.csv` in the example).

  • Specimen: Identifier for the specimens in the dataset (`id` in the example).

  • Bootstrap Threshold: Number of replicates threshold to be generated for each sub-MSA.

  • Distance Threshold: Distance threshold between genetic tree and climatic tree for each sub-MSA.

  • Window Size: Size of the sliding window.

  • Step Size: Sliding window advancement step.

  • Data Names: List of newick file names for each dataset.

  • Reference Gene Directory: Directory containing reference gene data (`./datasets/example` in the example).

  • Reference Gene File: File containing reference gene sequences (`sequences.fasta` in the example).

  • Make Debug Files: Option to generate debug files (True or False).

  • Alignment Method: Algorithm selection for sequence alignment (Options: `1: pairwiseAligner, 2: MUSCLE, 3: CLUSTALW, 4: MAFFT` in the example).

  • Distance Method: Distance selection (Options: `1: Least-Square distance, 2: Robinson-Foulds distance, 3: Euclidean distance (DendroPY)` in the example).

  • Fit Method: Gap selection elongation (Options: `1: Wider Fit by elongating with Gap (starAlignment), 2: Narrow-fit prevent elongation with gap when possible` in the example).

  • Tree Inference Method: The choice of inference methods (Options: `1: BioPython consensus tree, 2: FastTree application` in the example).

  • Rate Similarity: The rate similarity between sequences to reduce and remove the sub-MSA with a high value of similarity.

  • Method Similarity: The choice of similarity methods (Options: `1: Hamming distance, 2: Levenshtein distance, 3: Damerau-Levenshtein distance, 4: Jaro similarity, 5: Jaro-Winkler similarity, 6: Smith–Waterman similarity, 7: Jaccard similarity, 8: Sørensen-Dice similarity` in the example).

  • Genetic Preprocessing: Enable or disable filtering of alignment columns with gaps (`preprocessing_genetic: 1` to enable, `0` to disable).

  • Climatic Preprocessing: Enable or disable variance-based filtering of climatic features (`preprocessing_climatic: 1` to enable, `0` to disable).

  • Genetic Preprocessing Threshold: Maximum allowed proportion of gaps per column in the alignment (`preprocessing_threshold_genetic: 0.2` in the example).

  • Climatic Preprocessing Threshold: Minimum variance threshold to retain a climatic feature (`preprocessing_threshold_climatic: 0.7` in the example).

To use the following alignement methods, MUSCLE, CLUSTALW, and MAFFT, please ensure to follow the installation instructions provided in the Alignment Dependencies Installation section.