Getting started:
Measure your model’s personality 🤖

Getting started:
Measure your model’s personality 🤖#

Feedback Forensics can be used to measure your model’s personality relative to other models. This tutorial will explain how in four steps (plus a bonus step):

Load data: Load other models responses from HuggingFace (including prompts)
Generate responses: Create responses to same prompts with your model
Annotate responses: Combine your model’s and other models’ responses in single dataset and annotate that dataset
Personality analysis: Analyse your model’s personality with the app 🎉
Bonus - Python analysis: Analyse your model’s personality using Python API

Important

To run all cells, this tutorial requires the OPENROUTER_API_KEY variable to be set.

1. Load data#

import datasets

prompts = datasets.load_dataset("rdnfn/ff-model-personality", split="prompts_v1")["text"]

2. Generate your model’s responses#

Next we generate your model’s responses on the same prompts. Overall there are 500 unique prompts in this dataset.

Note

The sample size in this tutorial is intentionally tiny by default to keep cost low. Change the NUM_PROMPTS variable below to increase the sample size.

from feedback_forensics.tools.ff_modelgen import run_model_on_prompts_async
import random
import pathlib
import pandas as pd

# If not set elsewhere, set openrouter api key
# import os
# os.environ["OPENROUTER_API_KEY"] = "..."

NUM_PROMPTS = 10 # to keep this example cheap, increase for representative results
random.seed(42)
prompts = random.sample(prompts, NUM_PROMPTS)

# This will generate a jsonl file with an API model's responses
# Replace the model name with another API model or generate your
# responses separately with your own endpoint.

model = "openrouter/openai/gpt-4o-mini"
output_path = pathlib.Path("tmp/model_responses/")
output_path.mkdir(parents=True, exist_ok=True)

await run_model_on_prompts_async(
    prompts=prompts,
    model_name=model,
    output_path=output_path,
    max_concurrent=30, # Adjust based on your needs
    max_tokens=4096,
)

responses_files =  output_path / "generations" / (model + ".jsonl")
generations = pd.read_json(responses_files, lines=True)

Show code cell output

Hide code cell output

/home/docs/checkouts/readthedocs.org/user_builds/feedback-forensics/envs/latest/lib/python3.11/site-packages/alpaca_eval/utils.py:20: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources

  0%|          | 0/10 [00:00<?, ?it/s]

 10%|█         | 1/10 [00:01<00:09,  1.10s/it]

 20%|██        | 2/10 [00:02<00:11,  1.40s/it]

 30%|███       | 3/10 [00:03<00:07,  1.11s/it]

 40%|████      | 4/10 [00:03<00:04,  1.36it/s]

 50%|█████     | 5/10 [00:04<00:04,  1.21it/s]

 60%|██████    | 6/10 [00:10<00:10,  2.63s/it]

 70%|███████   | 7/10 [00:17<00:12,  4.00s/it]

 80%|████████  | 8/10 [00:17<00:05,  2.76s/it]

 90%|█████████ | 9/10 [00:25<00:04,  4.34s/it]

100%|██████████| 10/10 [00:28<00:00,  3.78s/it]

100%|██████████| 10/10 [00:28<00:00,  2.80s/it]

/tmp/ipykernel_903/2486268626.py:31: FutureWarning: Passing literal json to 'read_json' is deprecated and will be removed in a future version. To read from a literal string, wrap it in a 'StringIO' object.
  generations = pd.read_json(responses_files, lines=True)

Completed processing 10 prompts with model openrouter/openai/gpt-4o-mini
Total time: 28.40s, Average: 2.84s per prompt

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[2], line 31
     22 await run_model_on_prompts_async(
     23     prompts=prompts,
     24     model_name=model,
   (...)     27     max_tokens=4096,
     28 )
     30 responses_files =  output_path / "generations" / (model + ".jsonl")
---> 31 generations = pd.read_json(responses_files, lines=True)

File ~/checkouts/readthedocs.org/user_builds/feedback-forensics/envs/latest/lib/python3.11/site-packages/pandas/io/json/_json.py:815, in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, precise_float, date_unit, encoding, encoding_errors, lines, chunksize, compression, nrows, storage_options, dtype_backend, engine)
    813     return json_reader
    814 else:
--> 815     return json_reader.read()

File ~/checkouts/readthedocs.org/user_builds/feedback-forensics/envs/latest/lib/python3.11/site-packages/pandas/io/json/_json.py:1012, in JsonReader.read(self)
   1010         data = ensure_str(self.data)
   1011         data_lines = data.split("\n")
-> 1012         obj = self._get_object_parser(self._combine_lines(data_lines))
   1013 else:
   1014     obj = self._get_object_parser(self.data)

File ~/checkouts/readthedocs.org/user_builds/feedback-forensics/envs/latest/lib/python3.11/site-packages/pandas/io/json/_json.py:1040, in JsonReader._get_object_parser(self, json)
   1038 obj = None
   1039 if typ == "frame":
-> 1040     obj = FrameParser(json, **kwargs).parse()
   1042 if typ == "series" or obj is None:
   1043     if not isinstance(dtype, bool):

File ~/checkouts/readthedocs.org/user_builds/feedback-forensics/envs/latest/lib/python3.11/site-packages/pandas/io/json/_json.py:1176, in Parser.parse(self)
   1174 @final
   1175 def parse(self):
-> 1176     self._parse()
   1178     if self.obj is None:
   1179         return None

File ~/checkouts/readthedocs.org/user_builds/feedback-forensics/envs/latest/lib/python3.11/site-packages/pandas/io/json/_json.py:1392, in FrameParser._parse(self)
   1388 orient = self.orient
   1390 if orient == "columns":
   1391     self.obj = DataFrame(
-> 1392         ujson_loads(json, precise_float=self.precise_float), dtype=None
   1393     )
   1394 elif orient == "split":
   1395     decoded = {
   1396         str(k): v
   1397         for k, v in ujson_loads(json, precise_float=self.precise_float).items()
   1398     }

ValueError: Unexpected character found when decoding 'true'

generations.head(2)

### Merge generations into pairwise dataset to annotate

data = []

# get gpt-4o generations to compare against
gpt4o_name = "openai/gpt-4o-2024-08-06"
prompt_df = df[df["prompt"].isin(prompts)].copy()
prompt_df["responses"] = prompt_df.apply(lambda x: [x["response_a"], x["response_b"]], axis=1)
prompt_df["gpt4o_response"] = prompt_df["responses"].apply(lambda x: x[0]["text"] if x[0]["model"] == gpt4o_name else x[1]["text"] if x[1]["model"] == gpt4o_name else None)

# compile responses
for prompt in prompts:
    # get your model generations
    model_response = generations[generations["prompt"] == prompt].iloc[0]["response"]
    gpt4o_response = prompt_df[prompt_df["prompt"] == prompt]["gpt4o_response"].iloc[0]

    data.append({
        "prompt": prompt,
        "text_a": model_response,
        "text_b": gpt4o_response,
        "model_a": model,
        "model_b": gpt4o_name
    })

comparison_df = pd.DataFrame(data)
comparison_df.to_csv("tmp/model_comparison.csv", index=False)

comparison_df.head(2)

3. Annotating the data#

Then we use the ff-annotate CLI to collect personality annotations for our dataset. The annotated results will be saved to the annotated_model_data/ dir.

!ff-annotate --datapath tmp/model_comparison.csv --output-dir tmp/annotated_model_data

Illustrative screenshot of the visualisation (see here for an online example of a visualised result comparing models):

4. Visualising model personality with app#

Finally, we use the Feedback Forensics App to visualise the results.

feedback-forensics -d tmp/annotated_model_data/results/070_annotations_train_ap.json

Bonus: Computing metrics in Python (Optional)#

Now, we have additional personality annotations. Next, we use the Feedback Forensics Python API to compute personality metrics.

import feedback_forensics as ff
import pandas as pd

# load data
dataset = ff.DatasetHandler()
dataset.add_data_from_path("tmp/annotated_model_data/results/070_annotations_train_ap.json")

# set annotators to two models we consider here
model_annotators = [ann for ann in dataset.get_available_annotator_visible_names() if "Model" in ann]
dataset.set_annotator_cols(annotator_visible_names=model_annotators)

# compute metrics
annotator_metrics = dataset.get_annotator_metrics()

# Get strength of each personality trait
kappa = pd.Series(annotator_metrics[f"Model: {model.replace('openrouter/','')}"]["metrics"]["strength"])

# Get top 5 personality traits (by strength)
print(f"\n## Top 5 personality traits in {model} (by strength):\n{kappa.sort_values(ascending=False).head(5)}")

print(f"\n## Bottom 5 personality traits in {model} (by strength):\n{kappa.sort_values(ascending=True).head(5)}")

Getting started: Measure your model’s personality 🤖

Contents

Getting started: Measure your model’s personality 🤖#