Extractor
The class supports the initialization with a dictionary mapping dataset names to file paths, automatically loads these datasets into memory, and provides methods for easy retrieval and manipulation of these datasets.
- class qsar.utils.extractor.Extractor(paths: Dict[str, str])
Bases:
object
Class for cross-validation related functionalities.
- Parameters:
paths (Dict[str, str]) – Dictionary of {str: str} pairs where the key is the name of the dataframe and the value is the path to the CSV file.
- Variables:
dfs (Dict[str, pd.DataFrame]) – Extracted DataFrames from the paths provided during initialization.
- extract_dfs(paths: Dict[str, str] | None = None) Dict[str, DataFrame]
Extracts DataFrames from a dictionary of {name: path} pairs.
- Parameters:
paths (Dict[str, str], optional) – Dictionary of {str: str} pairs where the key is the name of the dataframe and the value is the path to the CSV file. If not provided, defaults to the paths provided at initialization.
- Returns:
Dictionary of {str: pd.DataFrame} pairs where the key is the name of the dataframe and the value is the DataFrame itself.
- Return type:
Dict[str, pd.DataFrame]
- Raises:
FileNotFoundError – If a path in the dictionary does not exist.
- get_df(name: str) DataFrame
Get a DataFrame by its name.
- Parameters:
name (str) – Name of the DataFrame.
- Returns:
The DataFrame associated with the given name.
- Return type:
pd.DataFrame
- Raises:
KeyError – If the name does not exist in the stored DataFrames.
- split_x_y(y_col: str) Tuple[Dict[str, DataFrame], Dict[str, DataFrame]]
Splits the DataFrames into X and y DataFrames based on the specified column.
- Parameters:
y_col (str) – Name of the column to be used as the y values.
- Returns:
A tuple containing two dictionaries. The first dictionary contains the X DataFrames and the second dictionary contains the y DataFrames, both keyed by the names of the original DataFrames.
- Return type:
Tuple[Dict[str, pd.DataFrame], Dict[str, pd.DataFrame]].