feedback_forensics.tools.ff_model_comparison#
Tools to compare models’ personalities.
Uses generations created by ff_modelgen.
Module Contents#
Functions#
Create pairwise dataset for each model against the reference models. |
|
Compare models’ personalities. |
|
CLI entry point for model comparison. |
API#
- feedback_forensics.tools.ff_model_comparison.create_pairwise_datasets(model_names: list[str], reference_models: list[str], generations: dict[str, pandas.DataFrame], output_path: str = 'output/tmp/pairwise_datasets')#
Create pairwise dataset for each model against the reference models.
Args: model_names (list[str]): List of model names to compare. reference_models (list[str]): List of reference model names to compare against. generations (dict[str, pd.DataFrame]): Dictionary of generations for each model. output_path (str): Path to save the pairwise datasets.
- feedback_forensics.tools.ff_model_comparison.compare_models(prompts: list[str], model_names: list[str], reference_models: list[str], output_path: str = 'output/')#
Compare models’ personalities.
First creates generations for the given set of prompts. Then creates pairwise dataset for each model against the reference models. Then annotates the pairwise dataset using inverse-cai package.
Args: prompts: List of prompts to compare. model_names: List of model names to compare. reference_models: List of reference model names to compare against. output_path: Path to save the results.
- feedback_forensics.tools.ff_model_comparison.run()#
CLI entry point for model comparison.