Custom Preprocessing
The LowVarianceRemover and HighCorrelationRemover are designed as sklearn-compatible transformers, making them suitable for inclusion in sklearn Pipeline objects. They facilitate the automatic removal of features based on variance and correlation criteria, simplifying the data preprocessing steps required for effective QSAR modeling.
The PreprocessingPipeline class combines these individual transformers into a single pipeline, ensuring a coherent and orderly application of feature selection procedures. This custom pipeline can be directly integrated with other sklearn processes, offering a versatile tool for QSAR data preparation.
- class qsar.preprocessing.custom_preprocessing.HighCorrelationRemover(df_correlation, df_corr_y, threshold, verbose)
Bases:
BaseEstimator
,TransformerMixin
Custom transformer to remove features with high correlation from a dataset.
- fit(x, y=None)
Fit the transformer.
- Parameters:
x – input features.
y – target variable.
- Returns:
self.
- set_fit_request(*, x: bool | None | str = '$UNCHANGED$') HighCorrelationRemover
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
x
parameter infit
.- Returns:
self – The updated object.
- Return type:
object
- set_transform_request(*, x: bool | None | str = '$UNCHANGED$') HighCorrelationRemover
Request metadata passed to the
transform
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed totransform
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it totransform
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
x
parameter intransform
.- Returns:
self – The updated object.
- Return type:
object
- transform(x)
Transform the input features.
- Parameters:
x – input features.
- Returns:
transformed features (without highly correlated features).
- class qsar.preprocessing.custom_preprocessing.LowVarianceRemover(y, variance_threshold, cols_to_ignore, verbose)
Bases:
BaseEstimator
,TransformerMixin
Custom transformer to remove features with low variance from a dataset.
- fit(x, y=None)
Fit the transformer.
- Parameters:
x – input features.
y – target variable.
- Returns:
self.
- set_fit_request(*, x: bool | None | str = '$UNCHANGED$') LowVarianceRemover
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
x
parameter infit
.- Returns:
self – The updated object.
- Return type:
object
- set_transform_request(*, x: bool | None | str = '$UNCHANGED$') LowVarianceRemover
Request metadata passed to the
transform
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed totransform
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it totransform
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
x
parameter intransform
.- Returns:
self – The updated object.
- Return type:
object
- transform(x)
Transform the input features.
- Parameters:
x – input features.
- Returns:
transformed features.
- class qsar.preprocessing.custom_preprocessing.PreprocessingPipeline(target='Log_MP_RATIO', variance_threshold=0, cols_to_ignore=None, verbose=False, threshold=0.9)
Bases:
object
Custom preprocessing pipeline composed of two custom transformers: - LowVarianceRemover: removes features with low variance - HighCorrelationRemover: removes features with high correlation
- get_pipeline()
Get the preprocessing pipeline composed of two custom transformers. Start with LowVarianceRemover and then apply HighCorrelationRemover to the output of the first transformer.
- Returns:
preprocessing pipeline.