Gan Featurizer
The class supports operations such as counting heavy atoms in molecules, filtering molecules based on atom counts, determining the appropriate atom count based on a dataset’s distribution, and converting SMILES strings into unique, feature-encoded molecular formats compatible with GAN inputs. The design aims to streamline the preparation of chemical datasets for QSAR modeling in a GAN framework, focusing on molecular feature extraction and preprocessing.
- class qsar.gan.gan_featurizer.QsarGanFeaturizer(**kwargs)
Bases:
MolGanFeaturizer
Featurizes molecules for a Generative Adversarial Network (GAN) model using the RDKit and DeepChem libraries.
The class is responsible for processing SMILES strings into a format suitable for GAN models in QSAR applications.
- determine_atom_count(smiles: DataFrame, quantile: float = 0.95) tuple[int, Series]
Determines the atom count for a DataFrame of SMILES strings.
- Parameters:
smiles (pd.DataFrame) – A DataFrame of SMILES strings.
quantile (float) – The quantile to use when determining the atom count. Default is 0.95.
- Returns:
A tuple containing the atom count and a DataFrame of atom counts.
- Return type:
tuple[int, DataFrame]
- get_features(smiles: DataFrame) ndarray
Returns the features for a DataFrame of SMILES strings.
- Parameters:
smiles (pd.DataFrame) – A DataFrame of SMILES strings.
- Returns:
An array of features for the SMILES strings.
- Return type:
np.ndarray
- static get_unique_smiles(nmols: ndarray) list
Returns a list of unique SMILES strings.
- Parameters:
nmols (np.ndarray) – An array of molecules.
- Returns:
A list of unique SMILES strings.
- Return type:
list