py_madaclim.utils.gbif_api
- py_madaclim.utils.gbif_api.download_extract_read_occ(download_id: str, target_dir: str | Path | None = None, low_memory=False) None | DataFrame [source]
Downloads, extracts and read the content of an occurrence data file from the GBIF API
- Parameters:
download_id (str) – The ID of the download to fetch
target_dir (Optional[Union[str, pathlib.Path]], optional) – The directory path to save the download and extracted files to. Defaults to the current working directory
low_memory (bool) – Loads the entire file into memory if true instead of chunks. Avoid guessing incorrect dtypes upon read when false. Defaults to False
- Returns:
- A pandas dataframe containing all the occurrences data with columns according to the requested format.
If data could not be properly extracted from the download, it returns None.
- Return type:
Union[None, pd.DataFrame]
- Raises:
TypeError – If the ‘download_id’ is not a string.
TypeError – If the ‘target_dir’ is not a string or a pathlib.Path object.
NotADirectoryError – If the provided argument for ‘target_dir’ is not a valid directory.
Exception – If an error occurred while making the request.
ValueError – If the JSON response could not be parsed.
ValueError – If the ‘format’ object from the response is not valid.
FileNotFoundError – If no ‘.csv’ files are found in files from a non-DWCA download format.
- py_madaclim.utils.gbif_api.get_taxon_key_by_species_match(name: str | None = None, return_full_on_match=False, **kwargs: str | bool) dict | int [source]
Fetches a taxon key by matching species name and other parameters.
- Parameters:
name (str, optional) – The scientific name of the species to match. If not provided, at least one other match parameter must be provided.
return_full_on_match (bool, optional) – If set to True, returns the full match data. If False, returns only the taxon key.
**kwargs (Union[str, bool]) – Additional match parameters. These could include ‘rank’, ‘strict’, ‘verbose’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’.
- Returns:
If return_full_on_match is True, returns a dictionary with full match data. Otherwise, returns the taxon key (integer).
- Return type:
dict or int
- Raises:
TypeError – If ‘name’ is provided and is not a str, or if ‘strict’ is provided and is not a bool, or if any other kwargs are provided and are not str.
ValueError – If an invalid argument is provided, or if ‘name’ is not provided and no other match parameters are provided.
Exception – If an error occurs while making the HTTP request.
- py_madaclim.utils.gbif_api.request_occ_download_mdg_valid_coordinates(taxon_key: int, email: str, dotenv_filepath: str | Path | None = None, reponse_format: str = 'DWCA', year_range: tuple[int] | None = None) Response | str [source]
Makes a POST request to GBIF API to download occurrence data in Madagascar for a specific taxon key.
Uses a set of predetermined payload information such as Madagascar GADM geographic identifier and geographic coordinates requirements. Allows limited customization for taxon key and year range for the payload.
- Parameters:
taxon_key (int) – The taxon key to request occurrence data for.
email (str) – The email address to send the download link to.
dotenv_filepath (str or pathlib.Path, optional) – The path to the .env file containing GBIF credentials. Defaults to current directory.
reponse_format (str, optional) – The format of the download file. Choices are “DWCA”, “SIMPLE_CSV”, or “SPECIES_LIST”. Defaults to “DWCA”.
year_range (tuple, optional) – A tuple of two integers specifying the range of years to request data for. Defaults to None.
- Returns:
The response object or the error message if an error occurred.
- Return type:
requests.models.Response or str
- Raises:
TypeError – If ‘taxon_key’ is not an integer, or if ‘dotenv_filepath’ is not a valid type for pathlib.Path.
ValueError – If ‘email’ is not a valid email address
ValueError – If required keys are missing in ‘.env’ file
ValueError – If ‘response_format’ is not one of the allowed choices.
FileNotFoundError – If ‘.env’ file is not found.
Exception – If an error occurred while making the request.
- py_madaclim.utils.gbif_api.search_occ_by_gbif_id(gbif_id: str | int) dict | None [source]
Gets details for a single occurrence using its gbifID (key of single record) from the GBIF API ‘/occurrence/search’ endpoint.
- Parameters:
gbif_id (Union[str, int]) – The unique identifier for an occurrence record in GBIF.
- Returns:
- If the gbif_id is valid, returns a dictionary containing the details of the record.
Otherwise, it returns None.
- Return type:
data (Union[dict, None])
- py_madaclim.utils.gbif_api.search_occ_mdg_valid_coordinates(taxon_key: int, year_range: tuple[int] | None = None) list [source]
Searches for occurrences in Madagascar with valid coordinates for a given taxon key using the GBIF API ‘/occurrence/search’ endpoint.
Uses a set of predetermined search params for the Madagascar GADM geographic identifier and geographic coordinates requirements. Provides limited customization for taxon key and year range.
- Parameters:
taxon_key (int) – The GBIF taxon key to search for.
year_range (tuple[int], optional) – A tuple specifying the range of years to search for occurrences in.
- Returns:
A list of occurrences with valid coordinates for the given taxon key.
- Return type:
list
- Raises:
TypeError – If ‘taxon_key’ is not an integer, or if ‘year_range’ is provided and is not a tuple of integers.
ValueError – If ‘year_range’ is provided and does not contain exactly 2 elements, or if the first element of ‘year_range’ is larger than the second.
Exception – If an error occurs while making the HTTP request.