feedback_forensics.tools.ff_model_comparison

feedback_forensics.tools.ff_model_comparison#

Tools to compare models’ personalities.

Uses generations created by ff_modelgen.

Module Contents#

Functions#

create_pairwise_datasets

Create pairwise dataset for each model against the reference models.

compare_models

Compare models’ personalities.

run

CLI entry point for model comparison.

API#

feedback_forensics.tools.ff_model_comparison.create_pairwise_datasets(model_names: list[str], reference_models: list[str], generations: dict[str, pandas.DataFrame], output_path: str = 'output/tmp/pairwise_datasets')#

Create pairwise dataset for each model against the reference models.

Args: model_names (list[str]): List of model names to compare. reference_models (list[str]): List of reference model names to compare against. generations (dict[str, pd.DataFrame]): Dictionary of generations for each model. output_path (str): Path to save the pairwise datasets.

feedback_forensics.tools.ff_model_comparison.compare_models(prompts: list[str], model_names: list[str], reference_models: list[str], output_path: str = 'output/')#

Compare models’ personalities.

First creates generations for the given set of prompts. Then creates pairwise dataset for each model against the reference models. Then annotates the pairwise dataset using inverse-cai package.

Args: prompts: List of prompts to compare. model_names: List of model names to compare. reference_models: List of reference model names to compare against. output_path: Path to save the results.

feedback_forensics.tools.ff_model_comparison.run()#

CLI entry point for model comparison.