feedback_forensics.data.dataset_utils#

Utilities for working with datasets.

Module Contents#

Functions#

get_available_models

Get all available model names from a dataset.

add_annotators_to_votes_dict

Add annotators to an existing votes_dict.

get_annotators_by_type

Extract all annotators grouped by their type from a votes_dict.

get_first_json_key_value

Get the first key and value from a JSON file.

API#

feedback_forensics.data.dataset_utils.get_available_models(df: pandas.DataFrame) list#

Get all available model names from a dataset.

Args: df: DataFrame containing the dataset

Returns: List of unique model names found in the dataset

feedback_forensics.data.dataset_utils.add_annotators_to_votes_dict(votes_dict: dict, annotator_metadata: dict, annotations_df: pandas.DataFrame) dict#

Add annotators to an existing votes_dict.

Args: votes_dict: An existing votes_dict with dataframe and metadata annotator_metadata: Dictionary of annotator metadata to add annotations_df: DataFrame with only comparison_id and annotator columns

Returns: A new votes_dict with annotators added

feedback_forensics.data.dataset_utils.get_annotators_by_type(votes_dict: Dict[str, Any]) Dict[str, Dict[str, List[str]]]#

Extract all annotators grouped by their type from a votes_dict.

Args: votes_dict: Dictionary containing annotator metadata

Returns: Dictionary mapping variant types to dictionaries with “column_ids” and “visible_names” keys Automatically handles missing keys with empty lists using defaultdict

feedback_forensics.data.dataset_utils.get_first_json_key_value(file_path)#

Get the first key and value from a JSON file.

This is useful for extracting metadata from AnnotatedPairs file, without loading full dataset.