Cross Validator
The CrossValidator is designed to be flexible and applicable to any model conforming to the scikit-learn interface, making it a valuable tool for QSAR model development and validation. The class supports both standard K-Fold and Stratified K-Fold cross-validation strategies, allowing for its use in a wide range of QSAR scenarios, including those with imbalanced datasets.
The evaluation methods within the CrossValidator class enable the assessment of QSAR models based on various performance metrics such as R squared, cross-validation score, and mean squared error, providing comprehensive insights into model behavior and efficacy.
- class qsar.utils.cross_validator.CrossValidator(df: DataFrame)
Bases:
object
Class for cross-validation related functionalities.
- Variables:
df (pd.DataFrame) – DataFrame containing the data.
- create_cv_folds(df: DataFrame | None = None, y: str = 'Log_MP_RATIO', n_folds: int = 3, n_groups: int = 5) tuple
Create cross-validation folds.
- Parameters:
df (pd.DataFrame, optional) – DataFrame to be used. If not provided, a default will be used.
y (str, optional) – Target column name. Defaults to ‘Log_MP_RATIO’.
n_folds (int, optional) – Number of folds. Defaults to 3.
n_groups (int, optional) – Number of groups for stratified k-fold. Defaults to 5.
- Returns:
A tuple containing a list of feature sets, a list of targets, a DataFrame with fold information, the target column name, and the number of folds.
- Return type:
tuple
- cross_value_score(model, df: DataFrame | None = None) float
Compute cross-validation score for the given model.
- Parameters:
model (Model) – The model to be evaluated.
df (pd.DataFrame, optional) – DataFrame to be used, if not provided, default is used.
- Returns:
Mean cross-validation score.
- Return type:
float
- evaluate_model_performance(model, x_train, y_train, x_test, y_test) dict
Compute various scores for model evaluation.
- Parameters:
model (Model) – The model to be evaluated.
x_train (pd.DataFrame) – Training feature set.
y_train (pd.DataFrame) – Training target set.
x_test (pd.DataFrame) – Testing feature set.
y_test (pd.DataFrame) – Testing target set.
- Returns:
A tuple containing the R squared score, CV score, custom CV score, and Q squared score.
- Return type:
tuple
- static get_predictions(model, x_train: DataFrame, y_train: DataFrame, x_test: DataFrame) tuple
Get predictions using the provided model.
- Parameters:
model (object or model instance) – The model to be used for prediction.
x_train (pd.DataFrame) – Training feature set.
y_train (pd.DataFrame) – Training target set.
x_test (pd.DataFrame) – Testing feature set.
- Returns:
A tuple containing predictions on the training set and predictions on the testing set.
- Return type:
tuple