feedback_forensics.data.datasets#
Module with configurations for built-in datasets.
Module Contents#
Classes#
Class to represent a built-in dataset. |
Functions#
Get a dataset config from AnnotatedPairs json file. |
|
Get all AnnotatedPairs datasets inside dir. |
|
Get the URL name from the string name. |
|
Get the string name from the URL name. |
|
Load datasets from git repo, e.g. on HuggingFace (HF). |
|
Load standard openly accessible datasets from HuggingFace. |
|
Load special gated datasets for the webapp from HuggingFace. |
|
Get the current list of all available datasets. |
|
Get the names of all available datasets. |
|
Get the names of the default datasets. |
|
Add a dataset to the list of available datasets. |
Data#
API#
- class feedback_forensics.data.datasets.BuiltinDataset#
Class to represent a built-in dataset.
- name: str#
None
- path: str | None#
None
- description: str | None#
None
- filterable_columns: list[str] | None#
None
- source: str | None#
None
- property url_name: str#
Get the name of the dataset for the URL.
- feedback_forensics.data.datasets._available_datasets#
[]
- feedback_forensics.data.datasets.get_dataset_from_ap_json(file_path: str | pathlib.Path) feedback_forensics.data.datasets.BuiltinDataset#
Get a dataset config from AnnotatedPairs json file.
- feedback_forensics.data.datasets.get_datasets_from_dir(dir_path: str | pathlib.Path) list[feedback_forensics.data.datasets.BuiltinDataset]#
Get all AnnotatedPairs datasets inside dir.
- feedback_forensics.data.datasets.get_urlname_from_stringname(stringname: str, datasets: list[feedback_forensics.data.datasets.BuiltinDataset]) str#
Get the URL name from the string name.
- feedback_forensics.data.datasets.get_stringname_from_urlname(urlname: str, datasets: list[feedback_forensics.data.datasets.BuiltinDataset]) str#
Get the string name from the URL name.
- feedback_forensics.data.datasets.load_datasets_from_repo(repo_username: str, repo_name: str, repo_provider: str, token: str | None = None, data_extraction_subdir: str | None = None, file_path: str | None = None, ref: str | None = None)#
Load datasets from git repo, e.g. on HuggingFace (HF).
Args: repo_username (str): Username of repo repo_name (str): Name of repo repo_provider (str): Provider of the git repo (e.g. “huggingface.co/datasets”) token (str): Token for the repo (optional, required for private repos) data_extraction_subdir (str): Subdirectory of repo to extract datasets from (optional) file_path (str): Path to file to extract from repo (optional), if provided, will only extract the file ref (str): Reference to use for the repo (optional)
Returns: int: Number of datasets successfully loaded
- feedback_forensics.data.datasets.load_standard_web_datasets()#
Load standard openly accessible datasets from HuggingFace.
Triggered via command line flag or webapp mode.
- feedback_forensics.data.datasets.load_webapp_datasets()#
Load special gated datasets for the webapp from HuggingFace.
Triggered via webapp mode only.
- feedback_forensics.data.datasets.get_available_datasets()#
Get the current list of all available datasets.
This function will always return the most up-to-date list of available datasets that have been loaded, including both standard datasets from HuggingFace and any locally added datasets.
Returns: list[BuiltinDataset]: List of all available datasets
- feedback_forensics.data.datasets.get_available_datasets_names()#
Get the names of all available datasets.
- feedback_forensics.data.datasets.get_default_dataset_names()#
Get the names of the default datasets.
- feedback_forensics.data.datasets.add_dataset(dataset)#
Add a dataset to the list of available datasets.
Args: dataset (BuiltinDataset): Dataset to add