Tutorial
Description
Cumacea (Crustacea: Peracarida) from the deep North Atlantic to the Arctic Ocean
The study area was located in a northern region of the North Atlantic, including the Icelandic Sea, the Denmark Strait, and the Norwegian Sea. The specimens examined were collected as part of the IceAGE project (Icelandic marine Animals: Genetic and Ecology; Cruise ship M85/3 in 2011), which studied the deep continental slopes and abyssal waters around Iceland Meißner et al., 2018. The sampling period for the included specimens was from August 30 to September 22, 2011, and they were collected at depths ranging from 316 to 2568 m. Information on the sampling plan, sample processing, DNA extraction steps, PCR amplification, sequencing, and extracted and aligned DNA sequences is available in the Uhlir et al., 2021 article. Refer to the example Cumacea FASTA file and example Cumacea CSV file for guidance.
Input
The algorithm takes two files as input with the following definitions:
🧬 Genetic file with FASTA extension. The first file or set of files will contain the genetic sequence information of the species sets selected for the study. The name of the file must allow you to know the name of the gene. It is therefore strongly recommended to follow the following nomenclature:
gene_name.fasta
. It should contain genetic variants (e.g., SNPs) and their associated metadata (e.g., sample IDs, location information).⛅ Climatic file with CSV extension (Comma-Separated Values). The second file will contain the habitat information for the species sets selected for the study. Each row will represent the species identifier and each column will represent a climate condition. It should include relevant climatic variables (e.g., temperature, precipitation) for each geographic location represented in your genetic data and must be clearly labeled to match the expected format.
Preparing Your Data
Include relevant climatic variables (e.g., temperature, precipitation) for each geographic location represented in your genetic data. Column headers must be clearly labeled to match the expected format. Refer to the example files in the datasets directory for guidance.
Output
The algorithm will return a CSV file that contains information from all relevant MSAs (see the Workflow section for more details). The sliding windows of interest are those with interesting bootstrap support (i.e., indicating the robustness of the tree) and high similarity to the climate condition in question (i.e., based on the RF
, RFnorm
, LS
, and Euclidean
values). They will indicate, among other things, the name of the gene, the position of the beginning and end of the sliding window, the average bootstrap value, the LS value, and finally the climatic condition for which this genetic zone would explain the adaptation of the species in a given environment.
To sum up, aPhyloGeo generates an output.csv
file containing analysis results. Additional visualizations (e.g., maps, plots) may be generated based on your configuration.
Prerequisites
System Requirements
Operating System: Windows, macOS, or Linux.
Python: Python 3.8 or higher.
Key Features of aPhyloGeo
Multi-Platform: Works seamlessly on Windows, macOS, and Linux.
- Flexible Analysis: Supports various phylogeographic analyses, including:
Identifying genetic lineages and their geographic origins
Assessing the impact of climate on genetic diversity
Visualizing genetic and geographic relationships
Customizable: Tailor analyses using a configuration file to fit your specific research questions.
Open Source: Freely available and encourages contributions from the research community.
Before you begin, ensure you have the following installed:
pip install pandas aphylogeo