feedback_forensics.data.datasets#

Module with configurations for built-in datasets.

Module Contents#

Classes#

BuiltinDataset

Class to represent a built-in dataset.

Functions#

get_dataset_from_ap_json

Get a dataset config from AnnotatedPairs json file.

get_datasets_from_dir

Get all AnnotatedPairs datasets inside dir.

get_urlname_from_stringname

Get the URL name from the string name.

get_stringname_from_urlname

Get the string name from the URL name.

load_datasets_from_repo

Load datasets from git repo, e.g. on HuggingFace (HF).

load_standard_web_datasets

Load standard openly accessible datasets from HuggingFace.

load_webapp_datasets

Load special gated datasets for the webapp from HuggingFace.

get_available_datasets

Get the current list of all available datasets.

get_available_datasets_names

Get the names of all available datasets.

get_default_dataset_names

Get the names of the default datasets.

add_dataset

Add a dataset to the list of available datasets.

Data#

API#

class feedback_forensics.data.datasets.BuiltinDataset#

Class to represent a built-in dataset.

name: str#

None

path: str | None#

None

description: str | None#

None

filterable_columns: list[str] | None#

None

source: str | None#

None

property url_name: str#

Get the name of the dataset for the URL.

feedback_forensics.data.datasets._available_datasets#

[]

feedback_forensics.data.datasets.get_dataset_from_ap_json(file_path: str | pathlib.Path) feedback_forensics.data.datasets.BuiltinDataset#

Get a dataset config from AnnotatedPairs json file.

feedback_forensics.data.datasets.get_datasets_from_dir(dir_path: str | pathlib.Path) list[feedback_forensics.data.datasets.BuiltinDataset]#

Get all AnnotatedPairs datasets inside dir.

feedback_forensics.data.datasets.get_urlname_from_stringname(stringname: str, datasets: list[feedback_forensics.data.datasets.BuiltinDataset]) str#

Get the URL name from the string name.

feedback_forensics.data.datasets.get_stringname_from_urlname(urlname: str, datasets: list[feedback_forensics.data.datasets.BuiltinDataset]) str#

Get the string name from the URL name.

feedback_forensics.data.datasets.load_datasets_from_repo(repo_username: str, repo_name: str, repo_provider: str, token: str | None = None, data_extraction_subdir: str | None = None, file_path: str | None = None, ref: str | None = None)#

Load datasets from git repo, e.g. on HuggingFace (HF).

Args: repo_username (str): Username of repo repo_name (str): Name of repo repo_provider (str): Provider of the git repo (e.g. “huggingface.co/datasets”) token (str): Token for the repo (optional, required for private repos) data_extraction_subdir (str): Subdirectory of repo to extract datasets from (optional) file_path (str): Path to file to extract from repo (optional), if provided, will only extract the file ref (str): Reference to use for the repo (optional)

Returns: int: Number of datasets successfully loaded

feedback_forensics.data.datasets.load_standard_web_datasets()#

Load standard openly accessible datasets from HuggingFace.

Triggered via command line flag or webapp mode.

feedback_forensics.data.datasets.load_webapp_datasets()#

Load special gated datasets for the webapp from HuggingFace.

Triggered via webapp mode only.

feedback_forensics.data.datasets.get_available_datasets()#

Get the current list of all available datasets.

This function will always return the most up-to-date list of available datasets that have been loaded, including both standard datasets from HuggingFace and any locally added datasets.

Returns: list[BuiltinDataset]: List of all available datasets

feedback_forensics.data.datasets.get_available_datasets_names()#

Get the names of all available datasets.

feedback_forensics.data.datasets.get_default_dataset_names()#

Get the names of the default datasets.

feedback_forensics.data.datasets.add_dataset(dataset)#

Add a dataset to the list of available datasets.

Args: dataset (BuiltinDataset): Dataset to add