feedback_forensics.app.metrics#

Compute metrics

Module Contents#

Functions#

get_agreement

get_acc

Accuracy: proportion of non-irrelevant votes (‘agree’ or ‘disagree’) that agree with original preferences.

get_cohens_kappa

Cohen’s kappa: measures agreement beyond chance.

get_cohens_kappa_randomized

Cohen’s kappa: measures agreement beyond chance.

get_relevance

get_principle_strength

Relevance-weighted Cohen’s kappa: combines Cohen’s kappa with relevance.

get_num_votes

get_agreed

get_disagreed

get_not_applicable

get_metrics

compute_annotator_metrics

get_overall_metrics

Compute overall metrics

ensure_categories_identical

get_default_avail_metrics

Data#

API#

feedback_forensics.app.metrics.DEFAULT_METRIC_NAME#

‘strength’

feedback_forensics.app.metrics.get_agreement(value_counts: pandas.Series, *, annotation_a=None, annotation_b=None) float#
feedback_forensics.app.metrics.get_acc(value_counts: pandas.Series, *, annotation_a=None, annotation_b=None) float#

Accuracy: proportion of non-irrelevant votes (‘agree’ or ‘disagree’) that agree with original preferences.

If there are no non-irrelevant votes, return 0.5.

feedback_forensics.app.metrics.get_cohens_kappa(value_counts: pandas.Series, *, annotation_a=None, annotation_b=None) float#

Cohen’s kappa: measures agreement beyond chance.

This takes into account agreement across categories (including non-text_a/text_b votes).

feedback_forensics.app.metrics.get_cohens_kappa_randomized(value_counts: pandas.Series, *, annotation_a=None, annotation_b=None) float#

Cohen’s kappa: measures agreement beyond chance.

This version assumes that at least one of the annotators had no access to order information, making the probability of chance agreement at 0.5.

In the regular version, the value will always be 0 if one annotator is imbalanced (e.g., by construction).

feedback_forensics.app.metrics.get_relevance(value_counts: pandas.Series, *, annotation_a=None, annotation_b=None) float#
feedback_forensics.app.metrics.get_principle_strength(value_counts: pandas.Series, *, annotation_a=None, annotation_b=None) float#

Relevance-weighted Cohen’s kappa: combines Cohen’s kappa with relevance.

This is computed as: (Cohen’s kappa) * relevance which simplifies to: 2 * (accuracy - 0.5) * relevance

feedback_forensics.app.metrics.get_num_votes(value_counts: pandas.Series, *, annotation_a=None, annotation_b=None) int#
feedback_forensics.app.metrics.get_agreed(value_counts: pandas.Series, *, annotation_a=None, annotation_b=None) int#
feedback_forensics.app.metrics.get_disagreed(value_counts: pandas.Series, *, annotation_a=None, annotation_b=None) int#
feedback_forensics.app.metrics.get_not_applicable(value_counts: pandas.Series, *, annotation_a=None, annotation_b=None) int#
feedback_forensics.app.metrics.get_metrics()#
feedback_forensics.app.metrics.compute_annotator_metrics(votes_df: pandas.DataFrame, annotator_metadata: dict, annotator_cols: list[str], ref_annotator_col: str) dict#
feedback_forensics.app.metrics.get_overall_metrics(votes_df: pandas.DataFrame, ref_annotator_col: str) dict#

Compute overall metrics

Includes

  • overall number of votes,

  • percentage of votes that are text_a,

  • percentage of votes that are text_b,

  • average length winning text

  • average length losing text

Args: votes_df: pd.DataFrame ref_annotator_col: name of the column that contains the reference annotator’s preference, e.g. “preferred_text” usually

Returns: dict: overall metrics

feedback_forensics.app.metrics.ensure_categories_identical(df: pandas.DataFrame, col_a: str, col_b: str) pandas.DataFrame#
feedback_forensics.app.metrics.get_default_avail_metrics()#