aphylogeo.utils
- aphylogeo.utils.getDissimilaritiesMatrix(df, columnWithSpecimenName, columnToSearch)[source]
Create a list containing the names of specimens and their corresponding minimum temperatures.
- Parameters:
df – The content of the CSV file.
columnWithSpecimenName (str) – The first column containing specimen names.
columnToSearch (str) – The column to compare with the first one.
- Returns:
The dissimilarities matrix.
- aphylogeo.utils.leastSquare(tree1, tree2)[source]
Calculate the least square distance between two trees.
Trees must have the same number of leaves.
Leaves must all have a twin in each tree.
A tree must not have duplicate leaves.
- Parameters:
tree1 (distanceTree (from Biopython)) – The first tree to compare.
tree2 (distanceTree (from Biopython)) – The second tree to compare.
- Returns:
The final distance between the two trees.
- Return type:
float
- aphylogeo.utils.robinsonFoulds(tree1, tree2)[source]
Method that calculates the robinson foulds distance between two trees.
Trees must have the same number of leaves.
Leaves must all have a twin in each tree.
A tree must not have duplicate leaves
- Parameters:
tree1 (distanceTree object from biopython converted to Newick) – The first tree to compare
tree2 (distanceTree object from biopython converted to Newick) – The second tree to compare
- Returns:
The final distance between the two
- Return type:
float
- aphylogeo.utils.euclideanDist(tree1, tree2)[source]
Method that calculate the Euclidean distance (a.k.a. Felsenstein’s 2004 “branch length distance”) between two trees based on
edge_weight_attr
.Trees need to share the same |TaxonNamespace| reference. The bipartition bitmasks of the trees must be correct for the current tree structures (by calling
Tree.encode_bipartitions()
method)- Parameters:
tree1 (distanceTree object from biopython converted to DendroPY format Newick) – The first tree to compare
tree2 (distanceTree object from biopython converted to DendroPY format Newick) – The second tree to compare
- Returns:
The final Euclidean distance between the two
- Return type:
float
- aphylogeo.utils.createTree(dm)[source]
Create a dna tree from content coming from a fasta file.
- Parameters:
dm – The content used to create the tree.
- Returns:
The tree created.
- aphylogeo.utils.climaticPipeline(df)[source]
Creates a dictionary containing the climatic Trees.
- Parameters:
df – DataFrame with the climatic data.
- Returns:
Dictionary with trees for each variable.
- Return type:
dict
- aphylogeo.utils.reverse_climatic_pipeline(trees, df)[source]
Converts a dictionary of climatic trees back into a DataFrame.
- Parameters:
trees – The climatic trees dictionary.
df – Original DataFrame to infer specimen column.
- Returns:
DataFrame containing the climatic data.
- Return type:
pd.DataFrame
- aphylogeo.utils.createBoostrap(msaSet: dict, bootstrapThreshold)[source]
Create a tree structure from sequences given by a dictionnary.
- Parameters:
msaSet (dict) – A dictionary containing the multiple sequence alignments.
bootstrapThreshold (int)
- Returns:
A dictionary with the trees for each sequence.
- Return type:
dict
- aphylogeo.utils.bootSingle(args)[source]
Perform a bootstrap consensus analysis on a multiple sequence alignment (MSA).
- Parameters:
args –
A list of arguments, which includes:
msaSet (dict): A dictionary of multiple sequence alignments.
constructor (function): A function used to construct trees from the MSA.
key (str): The key to access a specific MSA from the msaSet.
bootstrapThreshold (int): The number of bootstrap samples to generate.
- Returns:
result (Tree): The consensus tree obtained from the bootstrap analysis.
key (str): The key corresponding to the MSA used for this result.
- Return type:
list
- aphylogeo.utils.calculateAverageBootstrap(tree)[source]
Calculate the average bootstrap confidence of a tree.
- Parameters:
tree – The tree from which to calculate the average confidence.
- Returns:
The average bootstrap (confidence) value of the tree.
- Return type:
float
- aphylogeo.utils.fasttreeCMD(input_fasta, boot, nt)[source]
Create the command line executed to create a phylogenetic tree using FastTree application
- Parameters:
input_fasta (str) – The input FASTA file containing the multiple sequence alignment.
output (str) – The relative path of the output tree generated upon FastTree execution.
boot (int, optional) – The value of bootstrap to be used for tree generation (default is 100).
nt (bool, optional) – Whether the sequences are nucleotides (default is True for nucleotide sequences, set to False for protein sequences).
- Returns:
The FastTree command line object.
- Return type:
FastTreeCommandline
- aphylogeo.utils.createTmpFasta(msaset)[source]
Create fasta files from multiple sequences alignment
- Parameters:
msaset (dict(str, AlignIO)) – A dictionary where the key is the window name, and the value is the MSA object.
- aphylogeo.utils.fasttree(msaset, boot=1000, nt=True)[source]
Create phylogenetic trees from a set of multiple sequence alignments using the FastTree application.
This function creates temporary FASTA files to be used as input for FastTree. Since FastTree outputs .tree files, the function reads them back during execution and removes the output tree files afterward.
- Parameters:
msaset (dict(str, AlignIO)) – A dictionary containing the multiple sequence alignments (MSA).
boot (int, optional) – The number of bootstrap replicates to generate. Default is 100.
nt (bool, optional) – Whether the sequences are nucleotides (True) or amino acids (False). Default is True (nucleotide sequences).
- Returns:
A dictionary where the key is the window name and the value is a tree in Newick format.
- Return type:
dict(str, Tree)
- aphylogeo.utils.createGeneticList(geneticTrees, bootstrap_threshold)[source]
Create a list of Trees if the bootstrap Average is higher than the threshold
- Parameters:
geneticTrees (dict) – A dictionary containing the genetic trees.
bootstrap_threshold
- Returns:
geneticTrees: A sorted list of genetic trees.
bootstrapList: A list of the bootstrap average of each tree.
- aphylogeo.utils.createClimaticList(climaticTrees)[source]
Create a list of climaticTrees
- Parameters:
climaticTrees (dict) – A dictionary containing the climatic trees.
- Returns:
A list of the climatic trees.
- aphylogeo.utils.getData(leavesName, dist, index, climaticList, bootstrap, genetic, csv_data, reference_gene_filename, dist_norm, dist2, dist3)[source]
Get data from a csv file a various parameters to store into a list
- Parameters:
leavesName (list) – The list of the actual leaves.
dist (float) – The least square distance between two trees.
index (int) – The index of the climatic tree.
climaticList (list) – The list of climatic trees.
bootstrap – The bootstrap values.
genetic – The genetic tree.
csv_data – The pf containing the data.
reference_gene_filename – The name of the reference gene.
dist_norm
dist2
dist3
- Returns:
A list containing the data.
- Return type:
list
- aphylogeo.utils.writeOutputFile(data, output_file)[source]
Write the datas from data list into a new csv file
- Parameters:
data (list) – The list containing the final data.
output_file (String) – Output csv filepath for the results.
- aphylogeo.utils.filterResults(climaticTrees, geneticTrees, csv_data, create_file=True)[source]
Create the final datas from the Climatic Tree and the Genetic Tree
- Parameters:
climaticTrees (dict) – The dictionary containing the climatic trees.
geneticTrees (dict) – The dictionary containing the genetic trees.
csv_data (pd.DataFrame) – The dataframe containing the data from the CSV file.
create_file (bool) – Whether to create a file or not. Default is True.
- Returns:
The final data in a list format.
- Return type:
list
- aphylogeo.utils.format_to_csv(data)[source]
Format the data to a csv file
- Parameters:
data (list) – The data to format.
- Returns:
The formatted data in a dictionary format.
- Return type:
dict
- aphylogeo.utils.geneticPipeline(seq_alignement)[source]
Get the genetic Trees from the initial file datas so we can compare every valid tree with the climatic ones. In the end it calls a method that create a final csv file with all the data that we need for the comparison
- Parameters:
seq_alignement (dict) – The multiple sequence alignment.
- Returns:
The genetic trees.
- Return type:
dict
- aphylogeo.utils.loadSequenceFile(file)[source]
Reads the .fasta file. Extract sequence ID and sequences.
- Parameters:
file (str) – The file name of a .fasta file.
- Returns:
The sequences in a dictionary format.
- Return type:
dict
- aphylogeo.utils.save_climatic_trees(climatic_trees, file_path)[source]
Save the climatic trees to a file.
- Parameters:
climatic_trees (dict) – The climatic trees to save.
file_path (str) – The path to the file where the climatic trees will be saved.
- aphylogeo.utils.load_climatic_trees(file_path)[source]
Load climatic trees from a file.
- Parameters:
file_path (str) – Path to the target file where the climatic trees will be loaded
- Returns:
The loaded climatic trees
- Return type:
dict
- aphylogeo.utils.convert_alignment_to_simple_format(input_path: str, output_path: str)[source]
Transform a JSON file with windows in FASTA simple format to one with {window: {id: sequence}} format. To improve file readability
- Parameters:
input_path (str) – Path of the JSON input file.
output_path (str) – Path of the output JSON file.