feedback_forensics.data.dataset_utils#
Utilities for working with datasets.
Module Contents#
Functions#
Get all available model names from a dataset. |
|
Add annotators to an existing votes_dict. |
|
Extract all annotators grouped by their type from a votes_dict. |
|
Get the first key and value from a JSON file. |
API#
- feedback_forensics.data.dataset_utils.get_available_models(df: pandas.DataFrame) list#
Get all available model names from a dataset.
Args: df: DataFrame containing the dataset
Returns: List of unique model names found in the dataset
- feedback_forensics.data.dataset_utils.add_annotators_to_votes_dict(votes_dict: dict, annotator_metadata: dict, annotations_df: pandas.DataFrame) dict#
Add annotators to an existing votes_dict.
Args: votes_dict: An existing votes_dict with dataframe and metadata annotator_metadata: Dictionary of annotator metadata to add annotations_df: DataFrame with only comparison_id and annotator columns
Returns: A new votes_dict with annotators added
- feedback_forensics.data.dataset_utils.get_annotators_by_type(votes_dict: Dict[str, Any]) Dict[str, Dict[str, List[str]]]#
Extract all annotators grouped by their type from a votes_dict.
Args: votes_dict: Dictionary containing annotator metadata
Returns: Dictionary mapping variant types to dictionaries with “column_ids” and “visible_names” keys Automatically handles missing keys with empty lists using defaultdict
- feedback_forensics.data.dataset_utils.get_first_json_key_value(file_path)#
Get the first key and value from a JSON file.
This is useful for extracting metadata from AnnotatedPairs file, without loading full dataset.