aphylogeo.utils

aphylogeo.utils.header()[source]
aphylogeo.utils.getDissimilaritiesMatrix(df, columnWithSpecimenName, columnToSearch)[source]

Create a list containing the names of specimens and their corresponding minimum temperatures.

Parameters:
  • df – The content of the CSV file.

  • columnWithSpecimenName (str) – The first column containing specimen names.

  • columnToSearch (str) – The column to compare with the first one.

Returns:

The dissimilarities matrix.

aphylogeo.utils.leastSquare(tree1, tree2)[source]

Calculate the least square distance between two trees.

  • Trees must have the same number of leaves.

  • Leaves must all have a twin in each tree.

  • A tree must not have duplicate leaves.

Parameters:
  • tree1 (distanceTree (from Biopython)) – The first tree to compare.

  • tree2 (distanceTree (from Biopython)) – The second tree to compare.

Returns:

The final distance between the two trees.

Return type:

float

aphylogeo.utils.robinsonFoulds(tree1, tree2)[source]

Method that calculates the robinson foulds distance between two trees.

  • Trees must have the same number of leaves.

  • Leaves must all have a twin in each tree.

  • A tree must not have duplicate leaves

Parameters:
  • tree1 (distanceTree object from biopython converted to Newick) – The first tree to compare

  • tree2 (distanceTree object from biopython converted to Newick) – The second tree to compare

Returns:

The final distance between the two

Return type:

float

aphylogeo.utils.euclideanDist(tree1, tree2)[source]

Method that calculate the Euclidean distance (a.k.a. Felsenstein’s 2004 “branch length distance”) between two trees based on edge_weight_attr.

Trees need to share the same |TaxonNamespace| reference. The bipartition bitmasks of the trees must be correct for the current tree structures (by calling Tree.encode_bipartitions() method)

Parameters:
  • tree1 (distanceTree object from biopython converted to DendroPY format Newick) – The first tree to compare

  • tree2 (distanceTree object from biopython converted to DendroPY format Newick) – The second tree to compare

Returns:

The final Euclidean distance between the two

Return type:

float

aphylogeo.utils.createTree(dm)[source]

Create a dna tree from content coming from a fasta file.

Parameters:

dm – The content used to create the tree.

Returns:

The tree created.

aphylogeo.utils.climaticPipeline(df)[source]

Creates a dictionary containing the climatic Trees.

Parameters:

df – DataFrame with the climatic data.

Returns:

Dictionary with trees for each variable.

Return type:

dict

aphylogeo.utils.reverse_climatic_pipeline(trees, df)[source]

Converts a dictionary of climatic trees back into a DataFrame.

Parameters:
  • trees – The climatic trees dictionary.

  • df – Original DataFrame to infer specimen column.

Returns:

DataFrame containing the climatic data.

Return type:

pd.DataFrame

aphylogeo.utils.createBoostrap(msaSet: dict, bootstrapThreshold)[source]

Create a tree structure from sequences given by a dictionnary.

Parameters:
  • msaSet (dict) – A dictionary containing the multiple sequence alignments.

  • bootstrapThreshold (int)

Returns:

A dictionary with the trees for each sequence.

Return type:

dict

aphylogeo.utils.bootSingle(args)[source]

Perform a bootstrap consensus analysis on a multiple sequence alignment (MSA).

Parameters:

args

A list of arguments, which includes:

  • msaSet (dict): A dictionary of multiple sequence alignments.

  • constructor (function): A function used to construct trees from the MSA.

  • key (str): The key to access a specific MSA from the msaSet.

  • bootstrapThreshold (int): The number of bootstrap samples to generate.

Returns:

  • result (Tree): The consensus tree obtained from the bootstrap analysis.

  • key (str): The key corresponding to the MSA used for this result.

Return type:

list

aphylogeo.utils.calculateAverageBootstrap(tree)[source]

Calculate the average bootstrap confidence of a tree.

Parameters:

tree – The tree from which to calculate the average confidence.

Returns:

The average bootstrap (confidence) value of the tree.

Return type:

float

aphylogeo.utils.fasttreeCMD(input_fasta, boot, nt)[source]

Create the command line executed to create a phylogenetic tree using FastTree application

Parameters:
  • input_fasta (str) – The input FASTA file containing the multiple sequence alignment.

  • output (str) – The relative path of the output tree generated upon FastTree execution.

  • boot (int, optional) – The value of bootstrap to be used for tree generation (default is 100).

  • nt (bool, optional) – Whether the sequences are nucleotides (default is True for nucleotide sequences, set to False for protein sequences).

Returns:

The FastTree command line object.

Return type:

FastTreeCommandline

aphylogeo.utils.createTmpFasta(msaset)[source]

Create fasta files from multiple sequences alignment

Parameters:

msaset (dict(str, AlignIO)) – A dictionary where the key is the window name, and the value is the MSA object.

aphylogeo.utils.fasttree(msaset, boot=1000, nt=True)[source]

Create phylogenetic trees from a set of multiple sequence alignments using the FastTree application.

This function creates temporary FASTA files to be used as input for FastTree. Since FastTree outputs .tree files, the function reads them back during execution and removes the output tree files afterward.

Parameters:
  • msaset (dict(str, AlignIO)) – A dictionary containing the multiple sequence alignments (MSA).

  • boot (int, optional) – The number of bootstrap replicates to generate. Default is 100.

  • nt (bool, optional) – Whether the sequences are nucleotides (True) or amino acids (False). Default is True (nucleotide sequences).

Returns:

A dictionary where the key is the window name and the value is a tree in Newick format.

Return type:

dict(str, Tree)

aphylogeo.utils.createGeneticList(geneticTrees, bootstrap_threshold)[source]

Create a list of Trees if the bootstrap Average is higher than the threshold

Parameters:
  • geneticTrees (dict) – A dictionary containing the genetic trees.

  • bootstrap_threshold

Returns:

  • geneticTrees: A sorted list of genetic trees.

  • bootstrapList: A list of the bootstrap average of each tree.

aphylogeo.utils.createClimaticList(climaticTrees)[source]

Create a list of climaticTrees

Parameters:

climaticTrees (dict) – A dictionary containing the climatic trees.

Returns:

A list of the climatic trees.

aphylogeo.utils.getData(leavesName, dist, index, climaticList, bootstrap, genetic, csv_data, reference_gene_filename, dist_norm, dist2, dist3)[source]

Get data from a csv file a various parameters to store into a list

Parameters:
  • leavesName (list) – The list of the actual leaves.

  • dist (float) – The least square distance between two trees.

  • index (int) – The index of the climatic tree.

  • climaticList (list) – The list of climatic trees.

  • bootstrap – The bootstrap values.

  • genetic – The genetic tree.

  • csv_data – The pf containing the data.

  • reference_gene_filename – The name of the reference gene.

  • dist_norm

  • dist2

  • dist3

Returns:

A list containing the data.

Return type:

list

aphylogeo.utils.writeOutputFile(data, output_file)[source]

Write the datas from data list into a new csv file

Parameters:
  • data (list) – The list containing the final data.

  • output_file (String) – Output csv filepath for the results.

aphylogeo.utils.filterResults(climaticTrees, geneticTrees, csv_data, create_file=True)[source]

Create the final datas from the Climatic Tree and the Genetic Tree

Parameters:
  • climaticTrees (dict) – The dictionary containing the climatic trees.

  • geneticTrees (dict) – The dictionary containing the genetic trees.

  • csv_data (pd.DataFrame) – The dataframe containing the data from the CSV file.

  • create_file (bool) – Whether to create a file or not. Default is True.

Returns:

The final data in a list format.

Return type:

list

aphylogeo.utils.format_to_csv(data)[source]

Format the data to a csv file

Parameters:

data (list) – The data to format.

Returns:

The formatted data in a dictionary format.

Return type:

dict

aphylogeo.utils.geneticPipeline(seq_alignement)[source]

Get the genetic Trees from the initial file datas so we can compare every valid tree with the climatic ones. In the end it calls a method that create a final csv file with all the data that we need for the comparison

Parameters:

seq_alignement (dict) – The multiple sequence alignment.

Returns:

The genetic trees.

Return type:

dict

aphylogeo.utils.loadSequenceFile(file)[source]

Reads the .fasta file. Extract sequence ID and sequences.

Parameters:

file (str) – The file name of a .fasta file.

Returns:

The sequences in a dictionary format.

Return type:

dict

aphylogeo.utils.save_climatic_trees(climatic_trees, file_path)[source]

Save the climatic trees to a file.

Parameters:
  • climatic_trees (dict) – The climatic trees to save.

  • file_path (str) – The path to the file where the climatic trees will be saved.

aphylogeo.utils.load_climatic_trees(file_path)[source]

Load climatic trees from a file.

Parameters:

file_path (str) – Path to the target file where the climatic trees will be loaded

Returns:

The loaded climatic trees

Return type:

dict

aphylogeo.utils.convert_alignment_to_simple_format(input_path: str, output_path: str)[source]

Transform a JSON file with windows in FASTA simple format to one with {window: {id: sequence}} format. To improve file readability

Parameters:
  • input_path (str) – Path of the JSON input file.

  • output_path (str) – Path of the output JSON file.

aphylogeo.utils.dictToJson(data, filepath)[source]

Save a dictionary as a JSON file.

Parameters:
  • data – Dictionary to save.

  • filepath – Path to the output JSON file.