References
State of the Field - Advancements in Genomic Analysis
The field of genomic analysis has progressed significantly in recent years, notably in the creation of tools and algorithms to explore the intricate relationship between genetic variation and environmental factors. Our algorithm for identifying sub-sequences within genes [#nadia_tahiri-proc-scipy-2022], and its subsequent application to SARS-CoV-2 data in 2023 [#nadia_tahiri-proc-scipy-2023], enhance comprehension of the genetic underpinnings of adaptation across various species and environments.
In the broader field of phylogeography, substantial methodological advancements have also occurred. Several Python packages provide functionalities pertinent to phylogeographic analysis, but often in a fragmented way. Biopython, a cornerstone in bioinformatics, excels at handling genetic sequences and basic phylogenetic tasks, yet falls short in integrating environmental data. DendroPy, a robust library for phylogenetic trees, aids in visualizing phylogeographic patterns but requires additional tools for comprehensive analysis. While SciPy’s statistical utilities could be harnessed for custom analyses, its complexity demands a strong background in statistical programming. GeoPandas, adept at handling geospatial data, is useful for mapping genetic or environmental distributions, but lacks seamless integration with genetic data analysis tools. In summary, while powerful individual tools exist, a comprehensive and user-friendly Python package specifically designed for phylogeographic analysis remains a gap to be filled.
Statistical approaches, including generalized linear models (GLMs) and mixed models, are increasingly used to investigate the relationship between genetic variation and environmental variables. These methods enable researchers to quantify the relative influence of various factors, such as climate, geography, and demography, on observed patterns of genetic diversity.
The continuous refinement of these tools and methodologies, coupled with the growing availability of high-throughput sequencing technologies and environmental data, has opened up exciting new research avenues in evolutionary biology, ecology, and conservation. aPhyloGeo builds upon these advancements, providing a unified platform for integrating genetic and climatic data to address a wide array of phylogeographic questions. By bridging the gap between genomics and environmental science, aPhyloGeo aims to contribute to a more comprehensive understanding of the forces shaping biodiversity in a changing world.
Calculation of distance between phylogenetic tree: `Least Square metric`
Calculation of distance between phylogenetic tree: `Robinson-Foulds metric`
Dataset full description: `Analysis of genetic and climatic data of SARS-CoV-2`
Muscle5:
Fastree:
ClustalW:
Mafft: