py_madaclim.utils.gbif_api

py_madaclim.utils.gbif_api.download_extract_read_occ(download_id: str, target_dir: str | Path | None = None, low_memory=False) None | DataFrame[source]

Downloads, extracts and read the content of an occurrence data file from the GBIF API

Parameters:
  • download_id (str) – The ID of the download to fetch

  • target_dir (Optional[Union[str, pathlib.Path]], optional) – The directory path to save the download and extracted files to. Defaults to the current working directory

  • low_memory (bool) – Loads the entire file into memory if true instead of chunks. Avoid guessing incorrect dtypes upon read when false. Defaults to False

Returns:

A pandas dataframe containing all the occurrences data with columns according to the requested format.

If data could not be properly extracted from the download, it returns None.

Return type:

Union[None, pd.DataFrame]

Raises:
  • TypeError – If the ‘download_id’ is not a string.

  • TypeError – If the ‘target_dir’ is not a string or a pathlib.Path object.

  • NotADirectoryError – If the provided argument for ‘target_dir’ is not a valid directory.

  • Exception – If an error occurred while making the request.

  • ValueError – If the JSON response could not be parsed.

  • ValueError – If the ‘format’ object from the response is not valid.

  • FileNotFoundError – If no ‘.csv’ files are found in files from a non-DWCA download format.

py_madaclim.utils.gbif_api.get_taxon_key_by_species_match(name: str | None = None, return_full_on_match=False, **kwargs: str | bool) dict | int[source]

Fetches a taxon key by matching species name and other parameters.

Parameters:
  • name (str, optional) – The scientific name of the species to match. If not provided, at least one other match parameter must be provided.

  • return_full_on_match (bool, optional) – If set to True, returns the full match data. If False, returns only the taxon key.

  • **kwargs (Union[str, bool]) – Additional match parameters. These could include ‘rank’, ‘strict’, ‘verbose’, ‘kingdom’, ‘phylum’, ‘class’, ‘order’, ‘family’, ‘genus’.

Returns:

If return_full_on_match is True, returns a dictionary with full match data. Otherwise, returns the taxon key (integer).

Return type:

dict or int

Raises:
  • TypeError – If ‘name’ is provided and is not a str, or if ‘strict’ is provided and is not a bool, or if any other kwargs are provided and are not str.

  • ValueError – If an invalid argument is provided, or if ‘name’ is not provided and no other match parameters are provided.

  • Exception – If an error occurs while making the HTTP request.

py_madaclim.utils.gbif_api.request_occ_download_mdg_valid_coordinates(taxon_key: int, email: str, dotenv_filepath: str | Path | None = None, reponse_format: str = 'DWCA', year_range: tuple[int] | None = None) Response | str[source]

Makes a POST request to GBIF API to download occurrence data in Madagascar for a specific taxon key.

Uses a set of predetermined payload information such as Madagascar GADM geographic identifier and geographic coordinates requirements. Allows limited customization for taxon key and year range for the payload.

Parameters:
  • taxon_key (int) – The taxon key to request occurrence data for.

  • email (str) – The email address to send the download link to.

  • dotenv_filepath (str or pathlib.Path, optional) – The path to the .env file containing GBIF credentials. Defaults to current directory.

  • reponse_format (str, optional) – The format of the download file. Choices are “DWCA”, “SIMPLE_CSV”, or “SPECIES_LIST”. Defaults to “DWCA”.

  • year_range (tuple, optional) – A tuple of two integers specifying the range of years to request data for. Defaults to None.

Returns:

The response object or the error message if an error occurred.

Return type:

requests.models.Response or str

Raises:
  • TypeError – If ‘taxon_key’ is not an integer, or if ‘dotenv_filepath’ is not a valid type for pathlib.Path.

  • ValueError – If ‘email’ is not a valid email address

  • ValueError – If required keys are missing in ‘.env’ file

  • ValueError – If ‘response_format’ is not one of the allowed choices.

  • FileNotFoundError – If ‘.env’ file is not found.

  • Exception – If an error occurred while making the request.

py_madaclim.utils.gbif_api.search_occ_by_gbif_id(gbif_id: str | int) dict | None[source]

Gets details for a single occurrence using its gbifID (key of single record) from the GBIF API ‘/occurrence/search’ endpoint.

Parameters:

gbif_id (Union[str, int]) – The unique identifier for an occurrence record in GBIF.

Returns:

If the gbif_id is valid, returns a dictionary containing the details of the record.

Otherwise, it returns None.

Return type:

data (Union[dict, None])

py_madaclim.utils.gbif_api.search_occ_mdg_valid_coordinates(taxon_key: int, year_range: tuple[int] | None = None) list[source]

Searches for occurrences in Madagascar with valid coordinates for a given taxon key using the GBIF API ‘/occurrence/search’ endpoint.

Uses a set of predetermined search params for the Madagascar GADM geographic identifier and geographic coordinates requirements. Provides limited customization for taxon key and year range.

Parameters:
  • taxon_key (int) – The GBIF taxon key to search for.

  • year_range (tuple[int], optional) – A tuple specifying the range of years to search for occurrences in.

Returns:

A list of occurrences with valid coordinates for the given taxon key.

Return type:

list

Raises:
  • TypeError – If ‘taxon_key’ is not an integer, or if ‘year_range’ is provided and is not a tuple of integers.

  • ValueError – If ‘year_range’ is provided and does not contain exactly 2 elements, or if the first element of ‘year_range’ is larger than the second.

  • Exception – If an error occurs while making the HTTP request.