py_madaclim main package
API docs for both info and raster_manipulation modules under the main py-madaclim package.
You can find the API docs for the py-madaclim.utils sub-module here
py_madaclim.info module
- class py_madaclim.info.MadaclimLayers(clim_raster: Path | None = None, env_raster: Path | None = None)[source]
Bases:
object
A class that represents all of the information and data from the climate and environmental variable layers that can be found from the rasters of the Madaclim database.
The main metadata retrieval tool for the Madaclim database. Access all layers information with the all_layers attribute. Also provides methods to filter, generate unique labels from the all_layers attr. Access the crs and band number from the climate and environmental rasters when they are provided in the constructor. Categorical data can be explored in details with the categorical_layers attribute and the value:category pairs with the get_categorical_combinations.
- clim_raster
The path to the Madaclim climate raster GeoTiff file. Defaults to None if not specified.
- Type:
pathlib.Path
- env_raster
The path to the Madaclim environmental raster GeoTif file. Defaults to None if not specified.
- Type:
pathlib.Path
- all_layers
A DataFrame containing a complete and formatted version of all Madaclim layers.
- Type:
pd.DataFrame
- categorical_layers
A DataFrame containing the in depth information about the layers with categorical data.
- Type:
pd.DataFrame
- property all_layers: DataFrame
Retrieves the ‘all_layers’ Dataframe using the private ‘_get_madaclim_layers’ method.
Contains all information about all the raster layers in the Madaclim db.
- Returns:
A DataFrame containing a complete and formatted version of all Madaclim layers.
- Return type:
pd.DataFrame
- property categorical_layers: DataFrame
Retrieves the ‘categorical_layers’ Dataframe using the private ‘_get_categorical_df’ method.
Contains detailed information about the categorical layers from the rasters in the Madaclim db.
- Returns:
A DataFrame containing information for each categorical value in each layer
- Return type:
pd.DataFrame
- property clim_crs: CRS
Retrieves the Coordinate Reference System (CRS) from the Madaclim climate raster.
- This property first validates the clim_raster attribute, ensuring its integrity and existence.
It then opens the raster file and retrieves the CRS in EPSG format. The EPSG code is used to create and return a pyproj CRS object.
- Returns:
The CRS object derived from the EPSG code of the climate raster.
- Return type:
pyproj.crs.crs.CRS
Example
Valid ‘clim_raster’ attribute before accessing the ‘clim_crs’ attr.
>>> mada_info = MadaclimLayers() >>> mada_info.clim_crs Traceback (most recent call last): raise AttributeError(f"Undefined attribute: '{raster_attr_name}'. You need to assign a valid pathlib.Path to the related raster attribute first.") AttributeError: Undefined attribute: 'clim_raster'. You need to assign a valid pathlib.Path to the related raster attribute first. >>> mada_info.clim_raster = Path("./madaclim_current.tif") >>> print(mada_info.clim_crs) EPSG:32738
- property clim_raster: Path
Get or set the path to the climate raster file.
- This property allows you to get the current path to the climate raster file, or set a new
path. If setting a new path, the value must be a pathlib.Path object or a str. If the value is a str, it will be converted to a pathlib.Path object. The path must exist, otherwise a FileNotFoundError will be raised.
- Returns:
The current path to the climate raster file.
- Raises:
TypeError – If the new path is not a pathlib.Path object or str.
ValueError – If the new path cannot be converted to a pathlib.Path object.
FileNotFoundError – If the new path does not exist.
- Type:
pathlib.Path
- download_data(save_dir: Path | None = None)[source]
Downloads climate and environment raster files from the Madaclim website.
This method downloads the climate and environment raster data from the Madaclim website and saves them to the specified directory. If no directory is specified, the data is saved to the current working directory.
- Parameters:
save_dir (Optional[pathlib.Path]) – The directory where the data should be saved. If not specified, the data is saved to the current working directory.
- Raises:
ValueError – If save_dir is not a directory.
- property env_crs: CRS
Retrieves the Coordinate Reference System (CRS) from the Madaclim environmental raster.
This property first validates the env_raster attribute, ensuring its integrity and existence. It then opens the raster file and retrieves the CRS in EPSG format. The EPSG code is used to create and return a pyproj CRS object.
- Returns:
The CRS object derived from the EPSG code of the environmental raster.
- Return type:
pyproj.crs.crs.CRS
Example
Valid ‘env_raster’ attribute before accessing the ‘env_crs’ attr.
>>> mada_info = MadaclimLayers() >>> mada_info.env_crs Traceback (most recent call last): raise AttributeError(f"Undefined attribute: '{raster_attr_name}'. You need to assign a valid pathlib.Path to the related raster attribute first.") AttributeError: Undefined attribute: 'env_raster'. You need to assign a valid pathlib.Path to the related raster attribute first. >>> mada_info.env_raster = Path("./madaclim_current.tif") >>> print(mada_info.env_crs) EPSG:32738
- property env_raster: Path
Get or set the path to the environment raster file.
- This property allows you to get the current path to the environment raster file, or set a new
path. If setting a new path, the value must be a pathlib.Path object or a str. If the value is a str, it will be converted to a pathlib.Path object. The path must exist, otherwise a FileNotFoundError will be raised.
- Returns:
The current path to the environment raster file.
- Raises:
TypeError – If the new path is not a pathlib.Path object or str.
ValueError – If the new path cannot be converted to a pathlib.Path object.
FileNotFoundError – If the new path does not exist.
- Type:
pathlib.Path
- fetch_specific_layers(layers_labels: int | str | List[int | str], *args: str) dict | DataFrame [source]
Fetches specific layers from the all_layers DataFrame based on the given input and returns either the entire rows or certain columns as a dictionary.
- Parameters:
layers_labels (Union[int, str, List[Union[int, str]]]) – The layer labels to fetch. Can be a single int or str value, or a list of int or str values. The input can also be in the format “layer_{num}” or “{geotype}_{num}_{name}_({description})” (output from get_layers_labels(as_descriptive_labels=True) method).
*args (str) – Optional. One or more column names in all_layers DataFrame. If specified, only these columns
dictionary. (will be returned as a) –
- Returns:
- If args is specified, returns a nested dictionary with the format:
- {
- layer_<num>: {
<arg1>: <value>, <arg2>: <value>, …
} Otherwise, returns a DataFrame with the specified layers.
- Return type:
Union[dict, pd.DataFrame]
- Raises:
TypeError – If any value in layers_labels cannot be converted to an int or is not in the “layer_{num}” format.
ValueError – If any layer_number does not fall between the minimum and maximum layer numbers.
KeyError – If any value in args is not a column in all_layers DataFrame.
Examples
Using a list of layer numbers
>>> mada_info = MadaclimLayers() >>> mada_info.fetch_specific_layers([1, 15, 55, 71]) geoclim_type layer_number layer_name layer_description is_categorical units 0 clim 1 tmin1 Monthly minimum temperature - January False °C x 10 14 clim 15 tmax3 Monthly maximum temperature - March False °C x 10 54 clim 55 bio19 Precipitation of coldest quarter False mm.3months-1 70 env 71 alt Altitude False meters
Using the output from get_layers_labels method
>>> bioclim_labels = [label for label in mada_info.get_layers_labels(as_descriptive_labels=True) if "bio" in label] >>> bio1_bio2_labels = bioclim_labels[0:3] >>> mada_info.fetch_specific_layers(bio1_bio2_labels) geoclim_type layer_number layer_name layer_description is_categorical units 36 clim 37 bio1 Annual mean temperature False degrees 37 clim 38 bio2 Mean diurnal range False mean of monthly max temp - monthy min temp
>>> # Or from descriptive_labels as well >>> pet_layers = [label for label in mada_info.get_layers_labels(as_descriptive_labels=True) if "pet" in label] >>> len(pet_layers) 13 >>> mada_info.fetch_specific_layers(pet_layers[-1]) geoclim_type layer_number layer_name layer_description is_categorical units 67 clim 68 pet Annual potential evapotranspiration from the T... False mm
Fetch as dict with keys as layer_<num> and vals of choice using
>>> mada_info.fetch_specific_layers([55, 75], "geoclim_type", "layer_name", "is_categorical") {
- ‘layer_55’: {
‘geoclim_type’: ‘clim’, ‘layer_name’: ‘bio19’, ‘is_categorical’: False
}, ‘layer_75’: {
‘geoclim_type’: ‘env’, ‘layer_name’: ‘geo’, ‘is_categorical’: True}
}
} >>> # Only col names will be accepted as additionnal args >>> bio1 = next((layer for layer in mada_info.get_layers_labels(as_descriptive_labels=True) if “bio1” in layer), None) >>> mada_info.fetch_specific_layers(bio1, “band_number”) Traceback (most recent call last):
- if not min_layer <= layer_number <= max_layer:
KeyError: “Invalid args: [‘band_number’]. Args must be one of a key of [‘geoclim_type’, ‘layer_number’, ‘layer_name’, ‘layer_description’, ‘is_categorical’, ‘units’] or ‘all’” >>> # Get all keys with the all argument >>> mada_info.fetch_specific_layers(bio1, “all”) {
- ‘layer_37’: {
‘geoclim_type’: ‘clim’, ‘layer_number’: 37, ‘layer_name’: ‘bio1’, ‘layer_description’: ‘Annual mean temperature’, ‘is_categorical’: False, ‘units’: ‘degrees’
}
}
- get_bandnums_from_layers(layers_labels: int | str | List[int | str]) List[int] [source]
Retrieves band numbers corresponding to the provided layers’ labels.
This method accepts labels for a subset of layers (specified as either layer numbers, “layer_<num>” format, or descriptive labels) and returns the corresponding band numbers from the all_layers dataframe. If the input is in the descriptive label format or “layer_<num>” format, it should match the output of the get_layers_labels method.
- Parameters:
layers_labels (Union[int, str, List[Union[int, str]]]) – A list of layer labels in various formats, or a single layer label.
- Raises:
TypeError – If elements of layers_labels cannot be converted to int or if they do not match the format produced by the get_layers_labels method.
ValueError – If the derived layer numbers do not fall within the valid range of layer numbers in the all_layers dataframe.
- Returns:
A list of band numbers corresponding to the provided layer labels.
- Return type:
List[int]
Example
>>> mada_info = MadaclimLayers(clim_raster="madaclim_current.tif", env_raster="madaclim_enviro.tif") >>> last_20 = mada_info.get_layers_labels()[-20:] >>> band_nums = mada_info.get_bandnums_from_layers(last_20) >>> band_nums [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 1, 2, 3, 4, 5, 6, 7, 8, 9]
- get_categorical_combinations(layers_labels: int | str | List[int | str] | None = None, as_descriptive_keys: bool = False) dict | Dict[str, Dict[int, str]] [source]
Returns a dictionary representation of the specified categorical layers corresponding the the categorical value encoding.
- Parameters:
layers_labels (Optional[Union[int, str, List[Union[int, str]]]]) – The layer labels to fetch. Can be a single integer or string value, or a list of integer or string values. The input can also be in the format “layer_{num}” or “{geotype}_{num}_{name}_({unit})” (output from get_layers_labels(as_descriptive_labels=True) method). If layers_labels is None, all categorical layers are fetched.
as_descriptive_keys (bool) – If True, returns the descriptive layer labels. Otherwise, returns the “layer_<num>” format. Defaults to False
- Raises:
TypeError – If layers_labels is not a list of integers or strings, a single integer or a string
that can be converted to an integer, or in the output format from the 'get_layers_labels' method. –
ValueError – If a layer number in layers_labels is not a valid categorical layer number.
- Returns:
A dictionary of the specified categorical layers. If multiple layers were specified, the dictionary keys are ‘layer_{num}’, and the values are dictionaries with layer values as keys and their corresponding categories as values. If a single layer was specified, the dictionary keys are the categorical values, and the values are the categories themselves.
- Return type:
Union[dict, Dict[str, Dict[int, str]]]
Examples
If multiple layers specified, it returns:
>>> madaclim_info = MadaclimLayers() >>> >>> madaclim_info.get_categorical_combinations([75, 76]) { 'layer_75': { 1: 'N-Bemarivo', 2: 'S-Bemarivo,_N-Mangoro', ... }, 'layer_76': { 1: 'Bare_Rocks', 2: 'Raw_Lithic_Mineral_Soils', ... }, ... }
If a single layer is specified, it returns:
>>> madaclim_info.get_categorical_combinations("layer_76") { 'layer_76: { 1: 'Bare_Rocks', 2: 'Raw_Lithic_Mineral_Soils', ... } }
For more descriptive keys (same output from as_descriptive_labels)
>>> madaclim_info.get_categorical_combinations("layer_76", as_descriptive_keys=True) { 'env_76_soi_Soil types (categ_vals: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)': { 1: 'Bare_Rocks', 2: 'Raw_Lithic_Mineral_Soils', ... } }
- get_layers_labels(layers_subset: str | List[int] | None = None, as_descriptive_labels: bool = False) list [source]
Retrieves unique layer labels based on the provided subset of layers.
This method fetches the unique labels from the all_layers dataframe, given a subset of layers (specified as either layer numbers, a geoclim_type, or a single layer number). The layer labels can be returned in a descriptive format if as_descriptive_labels is set to True.
- Parameters:
layers_subset (Optional[Union[str, List[int]]], optional) – A list of layer numbers or a geoclim_type string to subset the labels from, or a single layer number as a string or int. Defaults to None, which will select all layers (no subset).
as_descriptive_labels (bool, optional) – If True, returns the descriptive layer labels. Otherwise, returns the “layer_<num>” format. Defaults to False.
- Raises:
TypeError – If elements of layers_subset cannot be converted to int.
ValueError – If layers_subset is a string not in possible_geoclim_types, cannot be converted to int, or if the ‘layer_number’ and ‘layer_name’ columns in the all_layers dataframe have non-unique entries.
- Returns:
A list of unique layer labels. These labels are either in the “layer_<num>” format or the descriptive format, based on as_descriptive_labels.
- Return type:
list
Examples
Get labels for all layers
>>> mada_info = MadaclimLayers() >>> all_layers = mada_info.get_layers_labels() >>> len(all_layers) 79 >>> # Basic format 'layer_<num>' >>> all_layers[:5] ['layer_1', 'layer_2', 'layer_3', 'layer_4', 'layer_5']
Specify a geoclim subset
>>> env_layers = mada_info.get_layers_labels(layers_subset="env") >>> env_layers ['layer_71', 'layer_72', 'layer_73', 'layer_74', 'layer_75', 'layer_76', 'layer_77', 'layer_78', 'layer_79']
Extract more information
>>> informative_labels = mada_info.get_layers_labels(as_descriptive_labels=True) >>> informative_labels[:2] ['clim_1_tmin1_Monthly minimum temperature - January (°C x 10)', 'clim_2_tmin2_Monthly minimum temperature - February (°C x 10)']
Specify a single layer or a subset of layers
>>> mada_info.get_layers_labels(37, as_descriptive_labels=True) ['clim_37_bio1_Annual mean temperature (degrees)'] >>> mada_info.get_layers_labels([68, 75], as_descriptive_labels=True) ['clim_68_pet_Annual potential evapotranspiration from the Thornthwaite equation (mm)', 'env_75_geo_Rock types (categ_vals: 1, 2, 4, 5, 6, 7, 9, 10, 11, 12, 13)']
- select_geoclim_type_layers(geoclim_type: str) DataFrame [source]
Method that selects the desired geoclimatic type layers as a dataframe.
- Parameters:
geoclim_type (str) – The desired geoclimatic layers type to extract.
- Returns:
A slice of the all_layers dataframe containing the desired geoclimatic type layers.
- Return type:
pd.DataFrame
- Raises:
TypeError – If geoclim_type is not a string.
ValueError – If geoclim_type does not corresponds to a valid geoclim type.
py_madaclim.raster_manipulation module
- class py_madaclim.raster_manipulation.MadaclimCollection(madaclim_points: MadaclimPoint | List[MadaclimPoint] | None = None)[source]
Bases:
object
- add_points(madaclim_points: MadaclimPoint | List[MadaclimPoint]) None [source]
Adds one or more MadaclimPoint objects to the MadaclimCollection.
- Parameters:
madaclim_points (Union[MadaclimPoint, List[MadaclimPoint]]) – A single MadaclimPoint object or a list of MadaclimPoint objects to be added to the MadaclimCollection.
- Raises:
TypeError – If the input is not a MadaclimPoint object or a list of MadaclimPoint objects.
ValueError – If the input MadaclimPoint(s) is/are already in the MadaclimCollection or if their specimen_id(s) are not unique.
Examples
Add a single point
>>> from py_madaclim.geoclim.raster_manipulation import MadaclimPoint, MadaclimCollection
>>> specimen_1 = MadaclimPoint(specimen_id="spe1", latitude=-23.574583, longitude=46.419806, source_crs="epsg:4326") >>> collection = MadaclimCollection() >>> collection No MadaclimPoint inside the collection.
>>> collection.add_points(specimen_1) >>> collection MadaclimCollection = [ MadaclimPoint(specimen_id=spe1, mada_geom_point=POINT (644890.8921103649 7392153.658976035), sampled=False) ]
Add multiple points
>>> specimen_2 = MadaclimPoint(specimen_id="spe2", latitude=-20.138470, longitude=46.054688, family="Rubiaceae", has_sequencing=True, num_samples=1) >>> other_collection.add_points([specimen_1, specimen_2]) >>> print(other_collection) MadaclimCollection = [ MadaclimPoint(specimen_id=spe1, mada_geom_point=POINT (644890.8921103649 7392153.658976035), sampled=False), MadaclimPoint(specimen_id=spe2, mada_geom_point=POINT (610233.867750987 7772846.143786541), sampled=False) ]
No duplicates allowed >>> other_collection.add_points(specimen_1) Traceback (most recent call last): File “<stdin>”, line 1, in <module> File “…/src/py_madaclim/geoclim/raster_manipulation.py”, line 1013, in add_points
- ValueError: MadaclimPoint(
specimen_id = spe1, source_crs = 4326, latitude = -23.574583, longitude = 46.419806, mada_geom_point = POINT (644890.8921103649 7392153.658976035), sampled_layers = None, nodata_layers = None
) is already in the current MadaclimCollection instance.
- property all_points: list
Get the all_points attribute.
Corresponds to a list of each object in the collection.
- Returns:
A list of all the MadaclimPoint objects in the MadaclimCollection.
- Return type:
list
Examples
>>> # MadaclimPoints are stored in the .all_points attributes in a list >>> collection.all_points[0] MadaclimPoint( specimen_id = sample_A, source_crs = 4326, latitude = -18.9333, longitude = 48.2, mada_geom_point = POINT (837072.9150244407 7903496.320897499), sampled_layers = None, nodata_layers = None )
- binary_encode_categorical() None [source]
Binary encodes the categorical layers contained in the sampled_layers attribute.
This function performs binary encoding of categorical layers found in the raster data for each of the Point in the collection. After the encoding, the function updates the Collection instance’s attributes for the categorical encoding status, the encoded layers and the gdf replacing the categorical columns by the binary encoded features.
- Raises:
ValueError – If the MadaclimCollection doesn’t contain any MadaclimPoints or if the raster data has not been sampled yet.
- Returns:
None
Notes
See the binary_encode_categorical method for logic and possible raised exceptions.
- property encoded_categ_labels: List[str] | None
Get the labels from the binary encoded categorical layers for the whole collection.
- Returns:
- A list containing the set of the labels from the binary encoding
of categorical features from the whole collection. None if the categorical layers have not been encoded yet.
- Return type:
Optional[List[str]]
- property encoded_categ_layers: Dict[str, Dict[str, int]] | None
Get the binary encoded categorical layers values.
- Returns:
- A nested dictionary containing the set of binary encoded categorical features.
The outer dictionary uses the MadaclimPoint.specimen_id as keys. The corresponding value for each key is another dictionary where the keys are layer number (or a more descriptive label) with the categorical feature. Values correspond to the binary encoded value for that given category
- Return type:
Optional[Dict[str, int]]
- property gdf: GeoDataFrame
Get the GeoPandas DataFrame of the collection (concat of all points’ gdfs).
- Returns:
A Geopandas GeoDataFrame generated from instance attributes and Point geometry.
- Return type:
gpd.GeoDataFrame
- property is_categorical_encoded: bool
Get the state of the binary encoding of the categorical layers of the collection.
- Returns:
- The state of the binary_encode_categorical method. If True, the method has been called
and a new set of binary features has been generated for the whole collection. Otherwise, either the layers have not been sampled or the categorical features havenot been encoded.
- Return type:
bool
- property nodata_layers: Dict[str, str | List[str]] | None
Get the nodata_layers attribute of the collection generated from the sampled_from_rasters method.
This attribute is a dictionary that contains the MadaclimPoint.specimen_id as keys and the values as the ‘nodata_layers’ as str or list of str.
- Returns:
- A dictionary with MadaclimPoint.specimen_id as keys and values of str or list of str of the layers_name with nodata values.
None if Collection has not been sampled yet or all layers sampled contained valid data.
- Return type:
Optional[Dict[str, Union[str, List[str]]]]
- plot_on_layer(layer: str | int, **kwargs) None [source]
Plot a layer as a raster map and distribution plot with a focus on the MadaclimPoint object.
Based on the Madaclim’s CRS, the MadaclimPoint’s geometry (mada_geom_point) is plotted on the raster map and the sampled value is displayed against a distribution of all possible values for that layer in Madaclim db. Pass addition kwargs to customize each subplots (See **kwargs, _LayerPlotter and _LayerConfig for more details).
- Parameters:
layer (Union[str, int]) – Layer to plot. Accepts layer numbers as integers, or layer labels in descriptive or layer_<num> format.
**kwargs – Additional arguments to customize the subplots, imshow, colorbar, histplot and Point objects from matplotlib. Use “subplots_<arg>”, “imshow_<arg>”, “cax_<arg>”, and “barplot_<arg>” formats to customize corresponding the base Raster and barplot plots as matplotlib/sns arguments. Use “point_<arg>” to customize the Point objects on the raster.
- Returns:
None
- Raises:
ValueError – If the specified layer to plot is not has not been sampled yet.
ValueError – If the layer label cannot be found within the sampled layers.
- classmethod populate_from_csv(csv_file: str | Path) MadaclimCollection [source]
Creates a new MadaclimCollection from a CSV file.
Each row of the CSV file should represent a MadaclimPoint. The CSV file must have columns that correspond to the arguments of the MadaclimPoint constructor. If a ‘source_crs’ column is not provided, the method uses the default CRS value.
- Parameters:
csv_file (Union[str, pathlib.Path]) – The path to the CSV file.
- Returns:
A new MadaclimCollection instance with MadaclimPoint objects created from the rows of the CSV file.
- Return type:
- Raises:
TypeError – If ‘csv_file’ is not a str or pathlib.Path object.
FileNotFoundError – If the file specified by ‘csv_file’ does not exist.
ValueError – If the CSV file headers are missing required arguments for
constructing MadaclimPoint objects. –
Examples
CSV requirements for construction
>>> # header must contain req. positional args for MadaclimPoint >>> # When no source_crs header is found, defaults to EPSG:4326 specimen_id,latitude,longitude sample_A,-18.9333,48.2 sample_B,-16.295741,46.826763 sample_C,-21.223,47.5204 sample_D,-17.9869,49.2966 sample_E,-21.5166,47.4833
>>> collection = MadaclimCollection.populate_from_csv("some_samples.csv") Warning! No source_crs column in the csv. Using the default value of EPSG:4326... Created new MadaclimCollection with 5 samples.
>>> # Can accept other non-required data for MadaclimPoint instantiation specimen_id,latitude,longitude,source_crs,has_sequencing,specie sample_F,-19.9333,47.2,4326,True,bojeri sample_G,-18.295741,45.826763,4326,False,periwinkle sample_H,-21.223,44.5204,4326,False,spectabilis
>>> other_collection = MadaclimCollection.populate_from_csv("other_samples.csv")
- classmethod populate_from_df(df: DataFrame) MadaclimCollection [source]
Class method to populate a MadaclimCollection from a pandas DataFrame.
This method takes a DataFrame where each row represents a MadaclimPoint and its attributes. If the ‘source_crs’ column is not provided in the DataFrame, the default CRS will be used.
- Parameters:
df (pd.DataFrame) – DataFrame where each row represents a MadaclimPoint. Expected columns are the same as the required arguments for the MadaclimPoint constructor.
- Returns:
- A new MadaclimCollection instance populated with MadaclimPoints
created from the DataFrame.
- Return type:
- Raises:
TypeError – If ‘df’ is not a pd.DataFrame.
ValueError – If the DataFrame is missing any of the required arguments to construct a MadaclimPoint.
Example
Respect requirements for MadaclimPoint construction in df columns
>>> import pandas as pd >>> sample_df specimen_id latitude longitude 0 sample_W -16.295741 46.826763 1 sample_X -17.9869 49.2966 2 sample_Y -18.9333 48.2166 3 sample_Z -13.28 49.95
>>> collection = MadaclimCollection.populate_from_df(sample_df) Warning! No source_crs column in the df. Using the default value of EPSG:4326... Creating MadaclimPoint(specimen_id=sample_W...) Creating MadaclimPoint(specimen_id=sample_X...) Creating MadaclimPoint(specimen_id=sample_Y...) Creating MadaclimPoint(specimen_id=sample_Z...) Created new MadaclimCollection with 4 samples.
- remove_points(*, madaclim_points: MadaclimPoint | List[MadaclimPoint] | None = None, indices: int | List[int] | None = None, clear: bool = False) None [source]
Removes MadaclimPoint objects from the MadaclimCollection based on specified criteria.
This method allows removing MadaclimPoint objects from the collection by providing either MadaclimPoint instance(s), index/indices, or by clearing the whole collection.
- Parameters:
madaclim_points (Optional[Union[MadaclimPoint, List[MadaclimPoint]]], optional) – A single MadaclimPoint object or a list of MadaclimPoint objects to be removed from the collection. Defaults to None.
indices (Optional[Union[int, List[int]]], optional) – A single index or a list of indices of the MadaclimPoint objects to be removed from the collection. Defaults to None.
clear (bool, optional) – If set to True, removes all MadaclimPoint objects from the collection. When using this option, ‘madaclim_points’ and ‘indices’ must not be provided. Defaults to False.
- Raises:
ValueError – If the MadaclimCollection is empty or if none of the input options are provided.
ValueError – If ‘madaclim_points’ and ‘indices’ are both provided.
ValueError – If ‘clear’ is set to True and either ‘madaclim_points’ or ‘indices’ are provided.
TypeError – If an invalid type is provided for ‘madaclim_points’ or ‘indices’.
ValueError – If a provided MadaclimPoint object is not in the collection or if an index is out of bounds.
IndexError – If an index is out of range.
Examples
Remove points by passing in the ‘MadaclimPoint’ instances, the index or the ‘specimen_id’
>>> sample_W = collection.all_points[0] >>> collection.remove_points(madaclim_points=sample_W)
>>> # Using the position index of the instance >>> collection.remove_points(indices=-1) # Removes last point of the collection
>>> # Using the specimen.id attribute >>> collection.remove_points(madaclim_points="sample_Y")
Remove multiple points
>>> # A list of str or MadaclimPoint or mixed types are accepted for the madaclim_points argument. >>> sample_w = collection.all_points[0] >>> to_remove = [sample_w, "sample_X"] >>> collection.remove_points(madaclim_points=to_remove)
>>> # Or pass in a list of indices to the indices argument. >>> collection.remove_points(indices=[0, -1]) # Remove first and last point
>>> # Finaly we can clear the collection of all instances. >>> collection.remove_points(clear=True) No MadaclimPoint inside the collection.
- sample_from_rasters(clim_raster: Path, env_raster: Path, layers_to_sample: int | str | List[int | str] = 'all', layer_info: bool = False) None [source]
Samples geoclimatic data from raster files for specified layers at the location of each point belonging to the MadaclimCollection’s instance.
Calling this method will also update the sampled_layers attributes with the data extracted from the layers_to_sample for every point in the collection. If sampled data containing ‘nodata’ values, the nodata_layers attribute will be updated with the name of the layers accordingly. Also, the gdf attribute GeoDataFrame will be updated with the sampled_layers.
- Parameters:
clim_raster_path (pathlib.Path) – Path to the climate raster file.
env_raster_path (pathlib.Path) – Path to the environment raster file.
layers_to_sample (Union[int, str, List[Union[int, str]]], optional) – The layer number(s) to sample from the raster files. Can be a single int, a single string in the format ‘layer_<num>’, or a list of ints or such strings. Defaults to ‘all’.
layer_info (bool, optional) – Whether to use descriptive labels for the returned dictionary keys. Defaults to False.
- Returns:
None
- Raises:
ValueError – If the MadaclimCollection doesn’t contain any MadaclimPoints.
Notes
This method also updates the ‘sampled_layers’ and ‘nodata_layers’ attributes of the MadaclimCollection instance.
Examples
Sample the value for each point in the collection according to their location
>>> from py_madaclim.geoclim.raster_manipulation import MadaclimPoint, MadaclimCollection >>> specimen_1 = MadaclimPoint(specimen_id="spe1_aren", latitude=-18.9333, longitude=48.2, genus="Coffea", species="arenesiana", has_sequencing=True) >>> specimen_2 = MadaclimPoint(specimen_id="spe2_humb", latitude=-12.716667, longitude=45.066667, source_crs=4326, genus="Coffea", species="humblotiana", has_sequencing=True) >>> collection = MadaclimCollection() >>> collection.add_points([specimen_1, specimen_2])
>>> from py_madaclim.info import MadaclimLayers >>> madaclim_info = MadaclimLayers() >>> bioclim_labels = [label for label in madaclim_info.get_layers_labels(as_descriptive_labels=True) if "bio" in label]
>>> # Validating the rasters mada_rasters = MadaclimRasters(clim_raster="madaclim_current.tif", env_raster="madaclim_enviro.tif") >>> collection.sample_from_rasters( mada_rasters.clim_raster, mada_rasters.env_raster, layers_to_sample=bioclim_labels )
Attribute state updating
>>> collection # sampled status updated MadaclimCollection = [ MadaclimPoint(specimen_id=spe1_aren, mada_geom_point=POINT (837072.9150244407 7903496.320897499),sampled=True), MadaclimPoint(specimen_id=spe2_humb, mada_geom_point=POINT (507237.57495924993 8594195.741515966),sampled=True) ] >>> # Results also stored in the `sampled_layers` attribute >>> collection.sampled_layers["spe2_humb"]["layer_55"] 66
>>> # layers_to_sample also accepts a single layer, or multiple layers as the output from the `get_layers_labels` method in MadaclimLayers >>> collection.sample_from_rasters(37) {'spe1_aren': {'layer_37': 196}, 'spe2_humb': {'layer_37': 238}}
- property sampled_layers: Dict[str, Dict[str, int]] | None
Get the sampled_layers attribute of the collection generated from the sampled_from_rasters method.
This attribute is a nested dictionary. The outer dictionary uses the MadaclimPoint.specimen_id as keys. The corresponding value for each key is another dictionary, which uses layer_names as keys and sampled values from rasters as values.
- Returns:
- A dictionary with MadaclimPoint.specimen_id as keys and a dictionary of layer_names (str) and sampled values (int) as values.
None if Collection has not been sampled yet.
- Return type:
Optional[Dict[str, Dict[str, int]]]
- class py_madaclim.raster_manipulation.MadaclimPoint(specimen_id: str, longitude: float, latitude: float, source_crs: ~pyproj.crs.crs.CRS = <Geographic 2D CRS: EPSG:4326> Name: WGS 84 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: World. - bounds: (-180.0, -90.0, 180.0, 90.0) Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich, **kwargs)[source]
Bases:
object
A class representing a specimen as a geographic point with a specific coordinate reference system (CRS) and additional attributes. The class provides methods for validating the point’s coordinates and CRS, as well as sampling values from climate and environmental rasters of the Madaclim database.
- specimen_id
An identifier for the point.
- Type:
str
- latitude
The latitude of the point.
- Type:
float
- longitude
The longitude of the point.
- Type:
float
- source_crs
The coordinate reference system of the point.
- Type:
pyproj.crs.crs.CRS
- mada_geom_point
A Shapely Point object representing the point projected in the Madaclim rasters’ CRS.
- Type:
shapely.geometry.point.Point
- sampled_layers
A dictionary containing the layers labels as keys and their values from the sampled raster at the Point’s position as int. None if data has not been sampled yet.
- Type:
Optional[Dict[str, int]]
- nodata_layers
A list containing the layers labels. None if data has not been sampled yet or no layers sampled containined nodata values.
- Type:
Optional[List[str]]
- is_categorical_encoded
The state of the binary_encode_categorical method. If True, the method has been called and a new set of binary features has been generated. Otherwise, either the layers have not been sampled or the categorical features not been encoded.
- Type:
bool
- encoded_categ
- encoded_categ_layers
A dictionary containing the set of binary encoded categorical features.
- Type:
Optional[Dict[str, int]]
- gdf
A Geopandas GeoDataFrame generated from instance attributes and mada_geom_point geometry. Updates along any changes to the instance’s attributes.
- Type:
gpd.GeoDataFrame
- property base_attr: dict
Get the base attributes when constructing the instance.
- Returns:
A dictionary containing the base attributes names as keys and their values as values.
- Return type:
dict
- binary_encode_categorical() None [source]
Binary encodes the categorical layers contained in the sampled_layers attribute.
This function performs binary encoding of categorical layers found in the raster data. It uses the MadaclimLayers object to get information about possible categorical layers. If no categorical layers are found in the data, a ValueError is raised. After the encoding, the function updates the respective instance attributes for the categorical encoding status, the encoded layers and the gdf replacing the categorical columns by the binary encoded features.
- Raises:
ValueError – If no categorical layers are found in the raster
data or if the raster data has not been sampled yet. –
- Returns:
None
- property encoded_categ_labels: List[str] | None
Get the labels from the binary encoded categorical layers of the instance.
- Returns:
- A list containing the set of the labels from the binary encoding
of categorical features. None if the categorical layers have not been encoded yet.
- Return type:
Optional[List[str]]
- property encoded_categ_layers: Dict[str, int] | None
Get the binary encoded categorical layers values.
- Returns:
- A dictionary containing the set of binary encoded categorical features.
Keys are contain the layer number (or more description) and the categorical feature. Values are the binary encoded value for that given category.
- Return type:
Optional[Dict[str, int]]
- property gdf: GeoDataFrame
Get the GeoPandas DataFrame using mada_geom_point as geometry.
- Returns:
A Geopandas GeoDataFrame generated from instance attributes and Point geometry.
- Return type:
gpd.GeoDataFrame
- static get_args_names() Tuple[list, list] [source]
Gets the names of the required and default arguments of the MadaclimPoint constructor.
This method uses the inspect module to introspect the MadaclimPoint constructor and extract the names of its arguments. It then separates these into required arguments (those that don’t have default values) and default arguments (those that do).
- Returns:
- A tuple containing two lists:
The first list contains the names of the required arguments.
The second list contains the names of the default arguments.
- Return type:
Tuple[list, list]
Note
‘self’ is excluded from the returned lists.
- static get_default_source_crs(as_epsg: bool = True) CRS | int [source]
Extracts the default value of the source_crs attribute. By default, it will return the crs as the EPSG code.
- Parameters:
as_epsg (bool, optional) – The EPSG code of the source_CRS. Defaults to True.
- Returns:
The default value for the source_crs attribute. If true, source_crs is returned as the EPSG code of the crs.
- Return type:
Union[pyproj.crs.crs.CRS, int]
- property is_categorical_encoded: bool
Get the state of the binary encoding of the categorical layers.
- Returns:
- The state of the binary_encode_categorical method. If True, the method has been called and a new set of binary features has been generated.
Otherwise, either the layers have not been sampled or the categorical features have not been encoded.
- Return type:
bool
- property latitude: float
Gets or sets the latitude attribute.
- Parameters:
value (float) – The latitude value of the point.
- Returns:
The current latitude value of the point.
- Return type:
float
- property longitude: float
Gets or sets the longitude attribute.
- Parameters:
value (float) – The longitude value of the point.
- Returns:
The current longitude value of the point.
- Return type:
float
- property mada_geom_point: Point
- property nodata_layers: List[str] | None
Get the layers labels containing nodata values when calling the`sampled_from_rasters` method.
- Returns:
- A list containing the layers labels (either layer_<num> or more descriptive).
Returns None if data has not been sampled yet or no layers sampled containined nodata values.
- Return type:
Optional[List[str]]
- plot_on_layer(layer: str | int, **kwargs) None [source]
Plot a layer as a raster map and distribution plot with a focus on the MadaclimPoint object.
Based on the Madaclim’s CRS, the MadaclimPoint’s geometry (mada_geom_point) is plotted on the raster map and the sampled value is displayed against a distribution of all possible values for that layer in Madaclim db. Pass addition kwargs to customize each subplots (See **kwargs, _LayerPlotter and _LayerConfig for more details).
- Parameters:
layer (Union[str, int]) – Layer to plot. Accepts layer numbers as integers, or layer labels in descriptive or layer_<num> format.
**kwargs – Additional arguments to customize the subplots, imshow, colorbar, histplot and Point objects from matplotlib. Use “subplots_<arg>”, “imshow_<arg>”, “cax_<arg>”, and “histplot_<arg>” formats to customize corresponding the base Raster and histogram plots as matplotlib/sns arguments. Use “point_<arg>” to customize the Point objects on the raster. Use “vline_<arg>” to customize the vertical line on the distribution plot.
- Returns:
None
- Raises:
ValueError – If the specified layer to plot is not has not been sampled yet.
ValueError – If the layer label cannot be found within the sampled layers.
- sample_from_rasters(clim_raster: Path, env_raster: Path, layers_to_sample: int | str | List[int | str] = 'all', layer_info: bool = False) None [source]
Samples geoclimatic data from raster files for specified layers at the location of the instances’s lat/lon coordinates from the mada_geom_point attribute.
Calling this method will also update the sampled_layers attributes with the data extracted from the layers_to_sample. If sampled data containing ‘nodata’ values, the nodata_layers attribute will be updated with the name of the layers accordingly. Also, the gdf attribute GeoDataFrame will be updated with the sampled_layers.
- Parameters:
clim_raster_path (pathlib.Path) – Path to the climate raster file.
env_raster_path (pathlib.Path) – Path to the environment raster file.
layers_to_sample (Union[int, str, List[Union[int, str]]], optional) – The layer number(s) to sample from the raster files. Can be a single int, a single string in the format ‘layer_<num>’, or the descriptive label or a list of ints or such strings. Defaults to ‘all’.
layer_info (bool, optional) – Whether to use descriptive labels for the returned dictionary keys. Defaults to False.
- Raises:
TypeError – If the layers_to_sample is not valid, or if the mada_geom_point attribute is not a Point object.
ValueError – If the layer_number is out of range or if the mada_geom_point object is empty.
- Returns:
None
Examples
Sample a set of layers >>> from py_madaclim.info import MadaclimLayers >>> madaclim_info = MadaclimLayers() >>> bioclim_labels = [label for label in madaclim_info.get_layers_labels(as_descriptive_labels=True) if “bio” in label]
>>> specimen_1 = MadaclimPoint(specimen_id="spe1_aren", latitude=-18.9333, longitude=48.2, genus="Coffea", species="arenesiana", has_sequencing=True) >>> spe1_bioclim = specimen_1.sample_from_rasters( ... clim_raster="madaclim_current.tif", ... env_raster="madaclim_enviro.tif", ... layers_to_sample=bioclim_labels ... ) >>> spe1_bioclim["layer_37"] 196 >>> # layer_info key as more descriptive and informative >>> spe1_bioclim = specimen_1.sample_from_rasters( ... clim_raster="madaclim_current.tif", ... env_raster="madaclim_enviro.tif", ... layers_to_sample=bioclim_labels, ... layer_info=True ... )
>>> bio1_label = bioclim_labels[0] 'clim_37_bio1 (Annual mean temperature)' >>> spe1_bioclim[bio1_label] 196
Warning message for NaN in the data extracted >>> # We can easily access the nodata layers (still sampled with the method regardless) >>> spe2_all_layers, spe2_nodata_layers = specimen_2.sample_from_rasters( … clim_raster=”madaclim_current.tif”, … env_raster=”madaclim_enviro.tif”, … layer_info=True, … return_nodata_layers=True … ) >>> len(spe2_nodata_layers) 5 >>> spe2_nodata_layers[0] # Example of a categorical feature description with raster-value/description associations ‘env_75_geology (1=Alluvial_&_Lake_deposits, 2=Unconsolidated_Sands, 4=Mangrove_Swamp, 5=Tertiary_Limestones_+_Marls_&_Chalks, 6=Sandstones, 7=Mesozoic_Limestones_+_Marls_(inc._”Tsingy”), 9=Lavas_(including_Basalts_&_Gabbros), 10=Basement_Rocks_(Ign_&_Met), 11=Ultrabasics, 12=Quartzites, 13=Marble_(Cipolin))’
Updated attributes post-sampling >>> specimen_2.sample_from_rasters( … clim_raster=”madaclim_current.tif”, … env_raster=”madaclim_enviro.tif”, … layers_to_sample=[37, 75] … ) MadaclimPoint(
specimen_id = spe2_humb, source_crs = 4326, latitude = -12.716667, longitude = 45.066667, mada_geom_point = POINT (507237.57495924993 8594195.741515966), len(sampled_layers) = 2 layer(s), len(nodata_layers) = 1 layer(s), is_categorical_encoded = False genus = Coffea, species = humblotiana, has_sequencing = True, gdf.shape = (1, 10)
) >>> specimen_2.sampled_layers {‘layer_37’: 238, ‘layer_75’: -32768} >>> specimen_2.nodata_layers [‘layer_75’]
- property sampled_layers: Dict[str, int] | None
Get the instance’s data obtained from the sampled_from_rasters method.
- Returns:
- A dictionary containing the layers labels as keys and their values as int.
Returns None if data has not been sampled yet.
- Return type:
Optional[Dict[str, int]]
- property source_crs: CRS
Get or sets the source_crs attribute.
- Args:
value (pyproj.crs.CRS): The coordinate reference system for the point.
- Returns:
The coordinate reference system of the point.
- Return type:
pyproj.crs.CRS
- property specimen_id: str
Get or sets the specimen_id attribute.
- Parameters:
value (str) – The new identifier for the MadaclimPoint.
- Returns:
The identifier for the MadaclimPoint.
- Return type:
str
- class py_madaclim.raster_manipulation.MadaclimRasters(clim_raster: Path, env_raster: Path)[source]
Bases:
object
Handles operations on Madaclim climate and environmental raster files. Also provides a method to visualize the raster layers (map) and distribution of the raster values.
- clim_raster
Path to the climate raster file.
- Type:
pathlib.Path
- clim_crs
The CRS derived of the climate raster file.
- Type:
pyproj.crs.crs.CRS
- clim_nodata_val
The nodata value from the climate raster file
- Type:
float
- clim_bounds
The bounds of the climate raster in order of (left, bottom, right, top)
- Type:
tuple
- env_raster
Path to the environmental raster file.
- Type:
pathlib.Path
- env_crs
The CRS derived of the environmental raster file.
- Type:
pyproj.crs.crs.CRS
- env_nodata_val
The nodata value from the environmental raster file
- Type:
float
- env_bounds
The bounds of the environmental raster in order of (left, bottom, right, top)
- Type:
tuple
- property clim_bounds: Tuple[float]
Retrieves the bounds of the climate raster.
This property opens the raster file and retrieves bounds values (left, bottom, right, top) from the raster.
- Returns:
The bounds values in order (left, bottom, right, top)
- Return type:
Tuple[float]
- property clim_crs: CRS
Retrieves the Coordinate Reference System (CRS) from the Madaclim climate raster.
This property opens the raster file and retrieves the CRS in EPSG format. The EPSG code is used to create and return a pyproj CRS object.
- Returns:
The CRS derived of the climate raster file.
- Return type:
pyproj.crs.crs.CRS
- property clim_nodata_val: float
Retrieves the nodata value from the Madaclim climate raster.
This property opens the raster file and retrieves nodata value from the raster.
- Returns:
The nodata value from the climate raster file
- Return type:
float
- property clim_raster: Path
Retrieves or sets the climate raster file path.
- Parameters:
value (pathlib.Path) – The new climate raster file path.
- Returns:
The climate raster file path.
- Return type:
pathlib.Path
- property env_bounds: Tuple[float]
Retrieves the bounds of the climate raster.
This property opens the raster file and retrieves bounds values (left, bottom, right, top) from the raster.
- Returns:
The bounds values in order (left, bottom, right, top)
- Return type:
Tuple[float]
- property env_crs: CRS
Retrieves the Coordinate Reference System (CRS) from the Madaclim environmental raster.
This property opens the raster file and retrieves the CRS in EPSG format. The EPSG code is used to create and return a pyproj CRS object.
- Returns:
The CRS derived of the environmental raster file.
- Return type:
pyproj.crs.crs.CRS
- property env_nodata_val: float
Retrieves the nodata value from the Madaclim environmental raster.
This property opens the raster file and retrieves nodata value from the raster.
- Returns:
The nodata value from the environmental raster file
- Return type:
float
- property env_raster: Path
Retrieves or sets the environmental raster file path.
- Parameters:
value (pathlib.Path) – The new environmental raster file path.
- Returns:
The environmental raster file path.
- Return type:
pathlib.Path
- plot_layer(layer: str | int, **kwargs) Tuple[Figure, List[Axes]] [source]
Method to plot a specific layer from the Madagascan climate/environmental raster datasets. The layer is displayed as a raster map and its distribution is plotted in a histogram.
It accepts layer labels in the following formats: layer_<num> (e.g. “layer_1”) and <descriptive_layer_label> (e.g. “annual_mean_temperature”). Alternatively, the layer number can be supplied directly as an integer.
Depending on whether the layer is categorical or continuous, the visualization will be different. For categorical layers, it will display a map using different colors for each category and a legend mapping categories to colors. For continuous layers, it will display a color gradient map with a color bar.
- Parameters:
layer (Union[str, int]) – Layer to plot. Accepts layer numbers as integers, or layer labels in descriptive or layer_<num> format.
**kwargs – Additional arguments to customize the subplots, imshow, colorbar and histplot from matplotlib. Use “subplots_<arg>”, “imshow_<arg>”, “cax_<arg>”, and “histplot_<arg>” formats to customize corresponding matplotlib/sns arguments.
- Returns:
The top-level container for all plot elements. axes (List[matplotlib.axes.Axes]): An array containing the Axes objects
of the subplots.
- Return type:
fig (matplotlib.figure.Figure)
- Raises:
TypeError – If ‘layer’ is not a str or an int.
ValueError – If ‘layer’ is not found within the range of layers.
Note
This method returns the fig and axes object for further customization when used by other classes. It uses the private _PlotConfig and _LayerPlotter utility classes for the checks and vizualisation.
Example
Visualization of the raster maps
>>> from py_madaclim.info import MadaclimLayers >>> # Extract environmental layers labels >>> mada_info = MadaclimLayers(clim_raster="madaclim_current.tif", env_raster="madaclim_enviro.tif") >>> env_labels = mada_info.get_layers_labels("env", as_descriptive_labels=True)
>>> # Default visualization of the raster map >>> from py_madaclim.raster_manipulation import MadaclimRasters >>> mada_rasters = MadaclimRasters(clim_raster=mada_info.clim_raster, env_raster=mada_info.env_raster) # Using common attr btw the instances >>> mada_rasters.plot_layer(env_layers_labels[0])
>>> # Pass in any number of kwargs to the imshow or cax (raster + colorbarax) or histplot for customization >>> mada_rasters.plot_layer(env_labels[0], imshow_cmap="terrain", histplot_binwidth=100, histplot_stat="count")
>>> # Some layers are categorical data so the figure formatting will change (no cbar) >>> geo_rock_label = next(label for label in env_labels if "geo" in label) >>> mada_rasters.plot_layer(geo_rock_label, subplots_figsize=(12, 8))
>>> # For numerical features with highly skewed distribution, specify vmin or vmax for the raster map >>> mada_rasters.plot_layer(env_labels[3], imshow_vmin=6000)
>>> # To know which are the categorical data, use the MadaclimLayers utilities >>> mada_info.categorical_layers # as df >>> mada_info.get_categorical_combinations() # As dict, default selects all possibilities