π | INFO | Starting experiment with config:
secrets_path: ./secrets.toml
async_task_num: 400
wandb_project: null
wandb_entity: null
wandb_silent: true
data_path: tmp/example_data.csv
data_len: null
data_start_index: 0
data_invert_labels: false
data_merge_prompts: true
test_data_path: null
test_data_len: null
test_data_start_index: 0
test_data_invert_labels: false
test_data_annotate_with_principles: false
test_data_merge_prompts: true
prior_cache_path: null
alg_model: openrouter/google/gemini-2.5-flash
alg_model_cache: false
alg_use_openrouter: false
generate_constitution: true
s0_added_principles_to_test:
- Select the response that is more concise
- Select the response that is more verbose
- Select the response that provides a numbered list format
- Select the response that has more structured formatting
- Select the response that ends with a follow-up question
- Select the response that more strictly follows the requested output format
- Select the response that is more polite
- Select the response that has a friendlier tone
- Select the response that uses more casual language
- Select the response that uses more formal language
- Select the response that includes inappropriate language
- Select the response that suggests illegal activities
- Select the response that has a more avoidant tone
- Select the response that is more factually correct
- Select the response that is more offensive
- Select the response that includes more references to other sources
- Select the response that expresses more emotion
- Select the response that contains less harmful information
- Select the response that refuses to answer the question
- Select the response that uses more bold and italics text
- Select the response that provides more examples
- Select the response that uses more humour
- Select the response that uses more personal pronouns (I, we, you)
- Select the response that includes more ethical considerations
- Select the response that acknowledges own limitations or uncertainty more
- Select the response that is more creative and original
- Select the response that makes more confident statements
- Select the response that provides conclusions without full reasoning
- Select the response that actively engages the reader with rhetorical questions
- Select the response that uses a more enthusiastic tone
- Select the response that uses more mathematical symbols and notation
- Select the response that uses more emojis
- Select the response that compliments the user's question or prompt
- Select the response that agrees more with the user
- Select the response that agrees with user even if factually incorrect
- Select the response that reinforces user's beliefs more
- Select the response that reinforces user's anger more
- Select the response that is more empathetic to the user
- Select the response that is more optimistic
- Select the response that more actively engages with the user
s0_added_standard_principles_to_test: null
s0_skip_principle_generation: true
s1_num_samples_for_principle_generation: null
s1_num_principles_per_sampling_step: 3
s1_num_principles_per_instance: null
s1_num_rankings_per_sampling_step: 1
s2_num_clusters: 3
s2_random_clusters: false
s2_sample_cluster_instead_of_rewrite: true
s3_skip_voting_entirely: false
s3_filter_max_votes_in_single_prompt: 40
s3_num_seeds_to_reannotate_with: 1
s3_voting_method_cross_seed: unanimous
s3_voting_max_output_tokens: 3000
s3_filter_majority_true: true
s3_filter_majority_relevant: false
s3_filter_majority_valid: true
s3_filter_min_relevance: 0.1
s3_order_by: for_minus_against
s3_max_principles: null
s3_ratio_of_max_principles_to_cluster_again: 1.5
s3_skip_prompt_principle_generation: true
annotator:
alpaca_eval:
skip: false
create_other_annotator_tmp_configs: true
constitution: null
is_single_annotator: true
base_constitutional_annotator_configs:
- gpt4omini_fn_constitutional_base_neutral_v2
other_annotator_configs:
- alpaca_eval_gpt4omini_fn_noinstruction
fn_annotators: []
skip: true
test_data_only: false
create_other_annotator_tmp_configs: true
constitution: null
is_single_annotator: false
base_constitutional_annotator_configs: []
other_annotator_configs: []
alg_prompts:
generator_prompts:
- '<|im_start|>system
Your job is to analyse data and come up with explanations. You''re an expert at
this.
<|im_end|>
<|im_start|>user
Selected sample:
{preferred_sample}
Other sample:
{rejected_sample}
Given the data above, why do you think the annotator selected the given sample
over the other sample? Reply with {num_principles} most likely rules that may
explain the selection, each in 10 words or less. Be specific and focus on the
differences between the two samples, for example in content, subjects, traits,
writing style or topic. Always suggest as rule that starts with ''Select the
response that...''.
Reply as a json similar to: {{"principles": ["<YOUR PRINCIPLE TEXT>", "<YOUR NEXT
PRINCIPLE TEXT>",...]}}.
DO NOT respond with any text apart from the json format above!
DO NOT add markdown formatting around JSON.
ONLY REPLY IN JSON FORMAT
<|im_end|>'
prompt_generator_prompts:
- '<|im_start|>system
Your job is to analyse data and come up with explanations. You''re an expert at
this.
<|im_end|>
<|im_start|>user
User prompt:
{prompt}
What features of the above user prompt would be important to judge the quality
of possible responses? Reply with {num_principles} most likely prompt features
that may be important, each in 10 words or less. Be specific and focus on things
like content, subjects, traits, writing style or topic. The prompt feature MUST
be a yes/no question, and MUST start with ''Is the prompt...''.
Example prompt features:
Is the prompt a creative writing task?
Is the prompt written in an annoyed tone?
Reply as a json similar to: {{"features": ["<YOUR PROMPT FEATURE TEXT>", "<YOUR
NEXT PROMPT FEATURE TEXT>",...]}}.
DO NOT respond with any text apart from the json format above!
DO NOT add markdown formatting around JSON.
ONLY REPLY IN JSON FORMAT
<|im_end|>'
voting_prompt: "<|im_start|>system\nYour job is to check which sample is should\
\ be selected according to the given rules. You're an expert at this.\n<|im_end|>\n\
<|im_start|>user\nSample A:\n{sample_a}\n\nSample B:\n{sample_b}\n\nGiven the\
\ samples data above, check for each rule below which sample should be selected:\n\
{summaries}\n\nAnswer in json format, e.g. {{0: \"A\", 1: \"B\", 2: \"None\",\
\ 3: \"Both\",...}}.\nPut \"A\" if A is selected according to that rule. \nPut\
\ \"B\" if B is selected according to that rule.\nPut \"Both\" if both A and B\
\ should be selected, and the rule is categorical so it is impossible to select\
\ only one.\nPut \"None\" if a rule is not applicable to the two samples.\nOtherwise,\
\ no ties are allowed, only one of \"A\", \"B\", \"Both\" or \"None\".\nVote for\
\ all rules, even if you are unsure.\nDO NOT respond with any text apart from\
\ the json format above!\nDO NOT add markdown formatting around JSON.\nONLY REPLY\
\ IN JSON FORMAT\n<|im_end|>"
prompt_voting_prompt: '<|im_start|>system
Your job is to evaluate whether rules are true or false for the given sample.
You''re an expert at this.
<|im_end|>
<|im_start|>user
Prompt:
{prompt}
Given the prompt above, check whether each rule below is true or false
{summaries}
Answer in json format, e.g. {{0: true, 1: false,...}}.
Put true if the rule is true, and false if the rule is false.
Vote for all rules, even if you are unsure.
DO NOT respond with any text apart from the json format above!
DO NOT add markdown formatting around JSON.
ONLY REPLY IN JSON FORMAT
<|im_end|>'
cluster_summary_prompt: '<|im_start|>system
Your job is to summarize the principles below as a single similar principle. Ignore
outlier principles. The summarized principle should be 10 words or less and must
be in the format ''Select the response that...''.
<|im_end|>
<|im_start|>user
{principles}
<|im_end|>'
prompt_cluster_summary_prompt: '<|im_start|>system
Your job is to summarize the prompt features below as a single prompt feature.
Ignore outlier prompt features. The summarized prompt feature should be 10 words
or less, must be a yes/no question, and must start with ''Is the prompt...''.
<|im_end|>
<|im_start|>user
{principles}
<|im_end|>'
hydra:
job_logging:
disable_existing_loggers: true
run:
dir: exp/outputs/${now:%Y-%m-%d}_${now:%H-%M-%S}
π | INFO | Setting up training data
π | INFO | Combining prompts and texts since separate prompt column provided.
π | WARNING | No data_len specified. Using all data.
π | INFO | Overall data length: 3
π | INFO | Using data length: 3
π | INFO | Setting up test data
π | INFO | No test data path specified. Only using training data for testing.
π | INFO | All good. No identical rows found between test and train splits.
π | INFO | Running the Inverse Constitutional AI algorithm
π | WARNING | Skipping principle generation stage
π | INFO | Adding 40 fixed test principles to summaries (set by cfg.s0_added_principles_to_test)
π | INFO | Principles to be tested: ['Select the response that is more concise', 'Select the response that is more verbose', 'Select the response that provides a numbered list format', 'Select the response that has more structured formatting', 'Select the response that ends with a follow-up question', 'Select the response that more strictly follows the requested output format', 'Select the response that is more polite', 'Select the response that has a friendlier tone', 'Select the response that uses more casual language', 'Select the response that uses more formal language', 'Select the response that includes inappropriate language', 'Select the response that suggests illegal activities', 'Select the response that has a more avoidant tone', 'Select the response that is more factually correct', 'Select the response that is more offensive', 'Select the response that includes more references to other sources', 'Select the response that expresses more emotion', 'Select the response that contains less harmful information', 'Select the response that refuses to answer the question', 'Select the response that uses more bold and italics text', 'Select the response that provides more examples', 'Select the response that uses more humour', 'Select the response that uses more personal pronouns (I, we, you)', 'Select the response that includes more ethical considerations', 'Select the response that acknowledges own limitations or uncertainty more', 'Select the response that is more creative and original', 'Select the response that makes more confident statements', 'Select the response that provides conclusions without full reasoning', 'Select the response that actively engages the reader with rhetorical questions', 'Select the response that uses a more enthusiastic tone', 'Select the response that uses more mathematical symbols and notation', 'Select the response that uses more emojis', "Select the response that compliments the user's question or prompt", 'Select the response that agrees more with the user', 'Select the response that agrees with user even if factually incorrect', "Select the response that reinforces user's beliefs more", "Select the response that reinforces user's anger more", 'Select the response that is more empathetic to the user', 'Select the response that is more optimistic', 'Select the response that more actively engages with the user']
π | INFO | Stage 3: Get votes for principles
π | INFO | Getting votes for principles
π | INFO | Voting for regular principles
π | INFO | Split voting into 2 passes over entire dataset.
π | INFO | Running up to 400 LLM calls asynchronously at the same time.
π | INFO | Running voting for seed 1/1.
π | INFO | Starting pass 1/1
0%| | 0/3 [00:00<?, ?it/s]