Getting started:
Measure your model’s personality 🤖#
Feedback Forensics can be used to measure your model’s personality relative to other models. This tutorial will explain how in four steps (plus a bonus step):
Load data: Load other models responses from HuggingFace (including prompts)
Generate responses: Create responses to same prompts with your model
Annotate responses: Combine your model’s and other models’ responses in single dataset and annotate that dataset
Personality analysis: Analyse your model’s personality with the app 🎉
Bonus - Python analysis: Analyse your model’s personality using Python API
Important
To run all cells, this tutorial requires the OPENROUTER_API_KEY variable to be set.
1. Load data#
import datasets
prompts = datasets.load_dataset("rdnfn/ff-model-personality", split="prompts_v1")["text"]
2. Generate your model’s responses#
Next we generate your model’s responses on the same prompts. Overall there are 500 unique prompts in this dataset.
Note
The sample size in this tutorial is intentionally tiny by default to keep cost low. Change the NUM_PROMPTS variable below to increase the sample size.
from feedback_forensics.tools.ff_modelgen import run_model_on_prompts_async
import random
import pathlib
import pandas as pd
# If not set elsewhere, set openrouter api key
# import os
# os.environ["OPENROUTER_API_KEY"] = "..."
NUM_PROMPTS = 10 # to keep this example cheap, increase for representative results
random.seed(42)
prompts = random.sample(prompts, NUM_PROMPTS)
# This will generate a jsonl file with an API model's responses
# Replace the model name with another API model or generate your
# responses separately with your own endpoint.
model = "openrouter/openai/gpt-4o-mini"
output_path = pathlib.Path("tmp/model_responses/")
output_path.mkdir(parents=True, exist_ok=True)
await run_model_on_prompts_async(
prompts=prompts,
model_name=model,
output_path=output_path,
max_concurrent=30, # Adjust based on your needs
max_tokens=4096,
)
responses_files = output_path / "generations" / (model + ".jsonl")
generations = pd.read_json(responses_files, lines=True)
generations.head(2)
### Merge generations into pairwise dataset to annotate
data = []
# get gpt-4o generations to compare against
gpt4o_name = "openai/gpt-4o-2024-08-06"
prompt_df = df[df["prompt"].isin(prompts)].copy()
prompt_df["responses"] = prompt_df.apply(lambda x: [x["response_a"], x["response_b"]], axis=1)
prompt_df["gpt4o_response"] = prompt_df["responses"].apply(lambda x: x[0]["text"] if x[0]["model"] == gpt4o_name else x[1]["text"] if x[1]["model"] == gpt4o_name else None)
# compile responses
for prompt in prompts:
# get your model generations
model_response = generations[generations["prompt"] == prompt].iloc[0]["response"]
gpt4o_response = prompt_df[prompt_df["prompt"] == prompt]["gpt4o_response"].iloc[0]
data.append({
"prompt": prompt,
"text_a": model_response,
"text_b": gpt4o_response,
"model_a": model,
"model_b": gpt4o_name
})
comparison_df = pd.DataFrame(data)
comparison_df.to_csv("tmp/model_comparison.csv", index=False)
comparison_df.head(2)
3. Annotating the data#
Then we use the ff-annotate CLI to collect personality annotations for our dataset. The annotated results will be saved to the annotated_model_data/ dir.
!ff-annotate --datapath tmp/model_comparison.csv --output-dir tmp/annotated_model_data
Illustrative screenshot of the visualisation (see here for an online example of a visualised result comparing models):
4. Visualising model personality with app#
Finally, we use the Feedback Forensics App to visualise the results.
feedback-forensics -d tmp/annotated_model_data/results/070_annotations_train_ap.json
Bonus: Computing metrics in Python (Optional)#
Now, we have additional personality annotations. Next, we use the Feedback Forensics Python API to compute personality metrics.
import feedback_forensics as ff
import pandas as pd
# load data
dataset = ff.DatasetHandler()
dataset.add_data_from_path("tmp/annotated_model_data/results/070_annotations_train_ap.json")
# set annotators to two models we consider here
model_annotators = [ann for ann in dataset.get_available_annotator_visible_names() if "Model" in ann]
dataset.set_annotator_cols(annotator_visible_names=model_annotators)
# compute metrics
annotator_metrics = dataset.get_annotator_metrics()
# Get strength of each personality trait
kappa = pd.Series(annotator_metrics[f"Model: {model.replace('openrouter/','')}"]["metrics"]["strength"])
# Get top 5 personality traits (by strength)
print(f"\n## Top 5 personality traits in {model} (by strength):\n{kappa.sort_values(ascending=False).head(5)}")
print(f"\n## Bottom 5 personality traits in {model} (by strength):\n{kappa.sort_values(ascending=True).head(5)}")