Extractor

The class supports the initialization with a dictionary mapping dataset names to file paths, automatically loads these datasets into memory, and provides methods for easy retrieval and manipulation of these datasets.

class qsar.utils.extractor.Extractor(paths: Dict[str, str])

Bases: object

Class for cross-validation related functionalities.

Parameters:

paths (Dict[str, str]) – Dictionary of {str: str} pairs where the key is the name of the dataframe and the value is the path to the CSV file.

Variables:

dfs (Dict[str, pd.DataFrame]) – Extracted DataFrames from the paths provided during initialization.

extract_dfs(paths: Dict[str, str] | None = None) Dict[str, DataFrame]

Extracts DataFrames from a dictionary of {name: path} pairs.

Parameters:

paths (Dict[str, str], optional) – Dictionary of {str: str} pairs where the key is the name of the dataframe and the value is the path to the CSV file. If not provided, defaults to the paths provided at initialization.

Returns:

Dictionary of {str: pd.DataFrame} pairs where the key is the name of the dataframe and the value is the DataFrame itself.

Return type:

Dict[str, pd.DataFrame]

Raises:

FileNotFoundError – If a path in the dictionary does not exist.

get_df(name: str) DataFrame

Get a DataFrame by its name.

Parameters:

name (str) – Name of the DataFrame.

Returns:

The DataFrame associated with the given name.

Return type:

pd.DataFrame

Raises:

KeyError – If the name does not exist in the stored DataFrames.

split_x_y(y_col: str) Tuple[Dict[str, DataFrame], Dict[str, DataFrame]]

Splits the DataFrames into X and y DataFrames based on the specified column.

Parameters:

y_col (str) – Name of the column to be used as the y values.

Returns:

A tuple containing two dictionaries. The first dictionary contains the X DataFrames and the second dictionary contains the y DataFrames, both keyed by the names of the original DataFrames.

Return type:

Tuple[Dict[str, pd.DataFrame], Dict[str, pd.DataFrame]].