Getting started:
Measure personality encouraged by feedback 🗣️

Getting started:
Measure personality encouraged by feedback 🗣️#

In this tutorial, we show how to use Feedback Forensics to measure the personality traits encouraged by a pairwise feedback dataset. This analysis will allow us to answer whether the given dataset encourages, for example, more confident or friendlier models. This tutorial is structured as follows:

Load data: Load illustrative example dataset
Annotate data: Annotate personality traits in the dataset
Personality analysis: Analyse your model’s personality with the app 🎉
Bonus - Python analysis: Analyse your model’s personality using Python API

Important

To run all cells, this tutorial requires the OPENROUTER_API_KEY variable to be set.

1. Setting up the data#

To apply the Feedback Forensics analysis, we need a csv dataset with the following columns: text_a, text_b, and preferred_text. Optionally prompt, model_a and model_b columns can also be included (but these are not necessary). Below, we create an illustrative mini dataset to run our analysis on. Replace this dataset with your own data as suitable.

import pandas as pd
import pathlib

tmp_dir = pathlib.Path("tmp")
tmp_dir.mkdir(exist_ok=True)

example_df = pd.DataFrame(
    [
        {"prompt": "How are you?", "text_a": "Ok", "text_b": "I'm good, thanks for asking.", "preferred_text": "text_b"},
        {"prompt": "What's a good name for a cat?", "text_a": "What a stupid question!", "text_b": "I'd suggest 'Whiskers'.", "preferred_text": "text_b"},
        {"prompt": "Is Feedback Forensics a great tool?", "text_a": "Yes, it is a great tool. Thank you for asking.", "text_b": "Yes.", "preferred_text": "text_a"},
    ]
)
example_df.to_csv(tmp_dir / "example_data.csv", index=False)
example_df.head()

	prompt	text_a	text_b	preferred_text
0	How are you?	Ok	I'm good, thanks for asking.	text_b
1	What's a good name for a cat?	What a stupid question!	I'd suggest 'Whiskers'.	text_b
2	Is Feedback Forensics a great tool?	Yes, it is a great tool. Thank you for asking.	Yes.	text_a

2. Annotating the data#

Then we use the ff-annotate CLI to collect annotations for personality traits on our dataset. The annotated results will be saved to the annotated_data/ dir.

!ff-annotate --datapath tmp/example_data.csv --output-dir tmp/annotated_data

3. Visualising encouraged personality with app#

Finally, we use the Feedback Forensics App to visualise the results.

feedback-forensics -d tmp/annotated_data/results/070_annotations_train_ap.json

Illustrative screenshot of the visualisation (see here for an online example of a visualised result on Chatbot Arena data):

Bonus: Computing metrics in Python#

As an alternative to the app, we can also use the Feedback Forensics Python API to compute personality metrics.

import feedback_forensics as ff
import pandas as pd

# load data
dataset = ff.DatasetHandler()
dataset.add_data_from_path("tmp/annotated_data/results/070_annotations_train_ap.json")

# compute metrics
annotator_metrics = dataset.get_annotator_metrics()

/home/docs/checkouts/readthedocs.org/user_builds/feedback-forensics/envs/latest/lib/python3.11/site-packages/alpaca_eval/utils.py:20: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources

📜  | INFO | AnnotatedPairs format version: 2.0

📜  | INFO | Removing 0 comparisons with empty responses. Fraction affected: 0.00%

📜  | INFO | No target models available, creating no model identity annotators.

📜  | INFO | Loaded data from path: tmp/annotated_data/results/070_annotations_train_ap.json

# Get strength of each personality trait
kappa = pd.Series(annotator_metrics["070_annotations_train_ap"]["metrics"]["strength"])

# Get top 5 personality traits (by strength)
print(f"\n## Top 5 encouraged personality traits (by strength):\n{kappa.sort_values(ascending=False).head(5)}")

print(f"\n## Bottom 5 personality traits (by strength):\n{kappa.sort_values(ascending=True).head(5)}")

## Top 5 encouraged personality traits (by strength):
is more polite                               1.000000
has a friendlier tone                        1.000000
is more empathetic to the user               1.000000
uses more personal pronouns (I, we, you)     1.000000
compliments the user's question or prompt    0.666667
dtype: float64

## Bottom 5 personality traits (by strength):
is more concise                   -0.666667
has a more avoidant tone          -0.666667
includes inappropriate language   -0.333333
is more offensive                 -0.333333
refuses to answer the question    -0.333333
dtype: float64

Getting started: Measure personality encouraged by feedback 🗣️

Contents

Getting started: Measure personality encouraged by feedback 🗣️#