feedback_forensics.data.handler

Contents

feedback_forensics.data.handler#

Module with the main data handler for the app.

The data handler allows efficient loading of AnnotatedPairs datasets and computation of annotation metrics. It provides a unified interface for loading data from different sources and computing metrics.

Module Contents#

Classes#

ColumnHandler

Class to handle data operations (loading, computing metrics) of a single annotation column dataset.

DatasetHandler

Class to handle dataset operations of multi-column annotation datasets.

Functions#

_get_annotator_df_col_names

Get the column names of the annotators in votes_df from a list of visible names.

_split_votes_dict

Split votes data by split_col.

API#

feedback_forensics.data.handler._get_annotator_df_col_names(annotator_visible_names: list[str], votes_dicts: dict[str, dict]) list[str]#

Get the column names of the annotators in votes_df from a list of visible names.

Note that this fn can be used both for annotators shown in rows and columns of the final output plot. All annotators are columns in the original votes_df.

This also gives warning if not all annotators are available across all datasets.

feedback_forensics.data.handler._split_votes_dict(votes_dict: dict, split_col: str, selected_vals: list[str] | None = None) dict#

Split votes data by split_col.

First assert that only one votes_df in vote_dfs, and split that votes_df into multiple, based on the unique values of split_col.

Args: votes_dict: Dictionary with keys “df” and “annotator_metadata” split_col: Column to split on selected_vals: Optional list of values to filter split_col by. If None, use all values.

Returns: Dictionary mapping split values to filtered DataFrames

class feedback_forensics.data.handler.ColumnHandler(cache: dict | None = None, avail_datasets: dict | None = None, reference_models: list[str] | None = None)#

Class to handle data operations (loading, computing metrics) of a single annotation column dataset.

Initialization

property votes_dict#
property cache#
property avail_datasets#
property df#
property annotator_metadata#
property reference_annotator_col#
property shown_annotator_rows#
property data_path#
property default_annotator_rows#

Default annotator rows for the votes_dict.

property default_annotator_cols#

Default annotator cols for the votes_dict.

This is always the single reference annotator column (in a list).

get_annotator_visible_name(annotator_col: str)#

Get the visible name for an annotator column.

_get_data_path(dataset_name: str)#

Get the data path for a given dataset name.

The path is extracted from the available dataset configs.

Args: dataset_name (str): The name of the dataset to get the path for.

Returns: str: The path to the dataset.

load_data_from_name(name: str)#

Load data from a given dataset name.

load_data_from_path(dataset_path: str | pathlib.Path)#

Load data from a given path.

load_from_votes_dict(votes_dict: dict)#

Load data from a given votes_dict.

set_visible_annotator_rows(annotator_rows_keys: list[str])#

Change the visible annotator rows for the votes_dict.

compute_overall_metrics()#

Compute the overall metrics for the votes_dict.

compute_annotator_metrics()#

Compute the annotator metrics for the votes_dict.

class feedback_forensics.data.handler.DatasetHandler(cache: dict | None = None, avail_datasets: dict | None = None, reference_models: list[str] | None = None)#

Class to handle dataset operations of multi-column annotation datasets.

A dataset consists of one or multiple annotation columns, represented by ColumnHandler objects. Each column can either be a different annotator or the same annotator but on a different data(sub)set.

Initialization

Initialize the dataset handler.

Args: cache (dict | None): Cache dictionary to store and retrieve model annotators avail_datasets (dict | None): Dictionary of available datasets reference_models (list[str] | None): List of reference models to use for virtual annotators. Only relevant if using model annotators.

property cache#
property avail_datasets#
property col_handlers#

Get column handlers (each handler is a single dataset).

property first_handler#

Get the first column handler.

property first_handler_name#

Get the name of the first dataset handler.

property is_single_dataset#

Check if the dataset handler is a single dataset.

property votes_dicts#

Get the votes_dicts for all dataset handlers.

property num_cols#

Get the number of columns in the dataset.

add_col_handler(name: str, handler: feedback_forensics.data.handler.ColumnHandler)#

Add a column handler.

reset_handlers()#

Reset the dataset handlers.

add_data_from_name(name: str)#

Load data from a given dataset name.

load_data_from_names(names: list[str])#

Load data from a given list of dataset names.

add_data_from_path(path: str | pathlib.Path, name: str | None = None)#

Add data from a given path.

load_data_from_paths(paths: list[str | pathlib.Path])#

Load data from a given list of paths.

add_data_from_votes_dict(votes_dict: dict, name: str)#

Add data from a given votes_dict.

load_data_from_votes_dicts(votes_dicts: dict[str, dict])#

Load data from a given list of votes_dicts.

get_col_handler(name: str)#

Get the column handler for a given dataset name.

get_available_annotators()#

Get annotators available onall dataset columns.

Some datasets may not have all annotators. This method provides access to the shared annotators.

get_available_annotator_visible_names()#

Get the visible names of the available annotators.

set_annotator_rows(annotator_visible_names: list[str] | None = None, annotator_keys: list[str] | None = None)#

Change the visible annotator rows for all dataset handlers.

split_by_col(col: str, selected_vals: list[str] | None = None)#

Split the dataset by a given column to create multiple datasets.

set_annotator_cols(annotator_visible_names: list[str] | None = None, annotator_keys: list[str] | None = None)#

Change the annotator columns for all dataset handlers.

get_overall_metrics()#

Get the overall metrics for all dataset handlers.

get_annotator_metrics()#

Get the annotator metrics for all dataset handlers.

get_annotator_metrics_df(metric_name: str, add_max_diff_col: bool = True, index_col_name: str = 'Annotator')#

Get the annotator metrics for all dataset handlers as a single dataframe.