feedback_forensics.app.metrics

`feedback_forensics.app.metrics`#

Compute metrics

`get_agreement`
`get_acc`	Accuracy: proportion of non-irrelevant votes (‘agree’ or ‘disagree’) that agree with original preferences.
`get_cohens_kappa`	Cohen’s kappa: measures agreement beyond chance.
`get_cohens_kappa_randomized`	Cohen’s kappa: measures agreement beyond chance.
`get_relevance`
`get_principle_strength`	Relevance-weighted Cohen’s kappa: combines Cohen’s kappa with relevance.
`get_num_votes`
`get_agreed`
`get_disagreed`
`get_not_applicable`
`get_metrics`
`compute_annotator_metrics`
`get_overall_metrics`	Compute overall metrics
`ensure_categories_identical`
`get_default_avail_metrics`

feedback_forensics.app.metrics.get_agreement(value_counts: pandas.Series, *, annotation_a=None, annotation_b=None) → float#

feedback_forensics.app.metrics.get_acc(value_counts: pandas.Series, *, annotation_a=None, annotation_b=None) → float#

Accuracy: proportion of non-irrelevant votes (‘agree’ or ‘disagree’) that agree with original preferences.

If there are no non-irrelevant votes, return 0.5.

feedback_forensics.app.metrics.get_cohens_kappa(value_counts: pandas.Series, *, annotation_a=None, annotation_b=None) → float#

Cohen’s kappa: measures agreement beyond chance.

This takes into account agreement across categories (including non-text_a/text_b votes).

feedback_forensics.app.metrics.get_cohens_kappa_randomized(value_counts: pandas.Series, *, annotation_a=None, annotation_b=None) → float#

Cohen’s kappa: measures agreement beyond chance.

This version assumes that at least one of the annotators had no access to order information, making the probability of chance agreement at 0.5.

In the regular version, the value will always be 0 if one annotator is imbalanced (e.g., by construction).

feedback_forensics.app.metrics.get_relevance(value_counts: pandas.Series, *, annotation_a=None, annotation_b=None) → float#

feedback_forensics.app.metrics.get_principle_strength(value_counts: pandas.Series, *, annotation_a=None, annotation_b=None) → float#

Relevance-weighted Cohen’s kappa: combines Cohen’s kappa with relevance.

This is computed as: (Cohen’s kappa) * relevance which simplifies to: 2 * (accuracy - 0.5) * relevance

feedback_forensics.app.metrics.get_num_votes(value_counts: pandas.Series, *, annotation_a=None, annotation_b=None) → int#

feedback_forensics.app.metrics.get_agreed(value_counts: pandas.Series, *, annotation_a=None, annotation_b=None) → int#

feedback_forensics.app.metrics.get_disagreed(value_counts: pandas.Series, *, annotation_a=None, annotation_b=None) → int#

feedback_forensics.app.metrics.get_not_applicable(value_counts: pandas.Series, *, annotation_a=None, annotation_b=None) → int#

feedback_forensics.app.metrics.compute_annotator_metrics(votes_df: pandas.DataFrame, annotator_metadata: dict, annotator_cols: list[str], ref_annotator_col: str) → dict#

feedback_forensics.app.metrics.get_overall_metrics(votes_df: pandas.DataFrame, ref_annotator_col: str) → dict#

Compute overall metrics

Includes

Args: votes_df: pd.DataFrame ref_annotator_col: name of the column that contains the reference annotator’s preference, e.g. “preferred_text” usually

Returns: dict: overall metrics

feedback_forensics.app.metrics.ensure_categories_identical(df: pandas.DataFrame, col_a: str, col_b: str) → pandas.DataFrame#