feedback_forensics.data.datasets

`feedback_forensics.data.datasets`#

Module with configurations for built-in datasets.

Module Contents#

Classes#

BuiltinDataset

Class to represent a built-in dataset.

Functions#

`get_dataset_from_ap_json`	Get a dataset config from AnnotatedPairs json file.
`get_datasets_from_dir`	Get all AnnotatedPairs datasets inside dir.
`get_urlname_from_stringname`	Get the URL name from the string name.
`get_stringname_from_urlname`	Get the string name from the URL name.
`load_datasets_from_repo`	Load datasets from git repo, e.g. on HuggingFace (HF).
`load_standard_web_datasets`	Load standard openly accessible datasets from HuggingFace.
`load_webapp_datasets`	Load special gated datasets for the webapp from HuggingFace.
`get_available_datasets`	Get the current list of all available datasets.
`get_available_datasets_names`	Get the names of all available datasets.
`get_default_dataset_names`	Get the names of the default datasets.
`add_dataset`	Add a dataset to the list of available datasets.

Data#

_available_datasets

API#

class feedback_forensics.data.datasets.BuiltinDataset#

Class to represent a built-in dataset.

name: str#: None

path: str | None#: None

description: str | None#: None

filterable_columns: list[str] | None#: None

source: str | None#: None

property url_name: str#: Get the name of the dataset for the URL.

feedback_forensics.data.datasets._available_datasets#: []

feedback_forensics.data.datasets.get_dataset_from_ap_json(file_path: str | pathlib.Path) → feedback_forensics.data.datasets.BuiltinDataset#: Get a dataset config from AnnotatedPairs json file.

feedback_forensics.data.datasets.get_datasets_from_dir(dir_path: str | pathlib.Path) → list[feedback_forensics.data.datasets.BuiltinDataset]#: Get all AnnotatedPairs datasets inside dir.

feedback_forensics.data.datasets.get_urlname_from_stringname(stringname: str, datasets: list[feedback_forensics.data.datasets.BuiltinDataset]) → str#: Get the URL name from the string name.

feedback_forensics.data.datasets.get_stringname_from_urlname(urlname: str, datasets: list[feedback_forensics.data.datasets.BuiltinDataset]) → str#: Get the string name from the URL name.

feedback_forensics.data.datasets.load_datasets_from_repo(repo_username: str, repo_name: str, repo_provider: str, token: str | None = None, data_extraction_subdir: str | None = None, file_path: str | None = None, ref: str | None = None)#

Load datasets from git repo, e.g. on HuggingFace (HF).

Args: repo_username (str): Username of repo repo_name (str): Name of repo repo_provider (str): Provider of the git repo (e.g. “huggingface.co/datasets”) token (str): Token for the repo (optional, required for private repos) data_extraction_subdir (str): Subdirectory of repo to extract datasets from (optional) file_path (str): Path to file to extract from repo (optional), if provided, will only extract the file ref (str): Reference to use for the repo (optional)

Returns: int: Number of datasets successfully loaded

feedback_forensics.data.datasets.load_standard_web_datasets()#

Load standard openly accessible datasets from HuggingFace.

Triggered via command line flag or webapp mode.

feedback_forensics.data.datasets.load_webapp_datasets()#

Load special gated datasets for the webapp from HuggingFace.

Triggered via webapp mode only.

feedback_forensics.data.datasets.get_available_datasets()#

Get the current list of all available datasets.

This function will always return the most up-to-date list of available datasets that have been loaded, including both standard datasets from HuggingFace and any locally added datasets.

Returns: list[BuiltinDataset]: List of all available datasets

feedback_forensics.data.datasets.get_available_datasets_names()#: Get the names of all available datasets.

feedback_forensics.data.datasets.get_default_dataset_names()#: Get the names of the default datasets.

feedback_forensics.data.datasets.add_dataset(dataset)#

Add a dataset to the list of available datasets.

Args: dataset (BuiltinDataset): Dataset to add