Limitations ⚠️

Limitations ⚠️#

Feedback Forensics relies on AI annotators (LLM-as-a-Judge) to detect implicit objectives in feedback data. Though such annotators have been shown correlate with human judgements on many tasks, they also have well-known limitations: they are often susceptible to small input changes and can exhibit various biases (as do human annotators). As such, Feedback Forensics results should be taken as an indication for further investigation rather than a definitive final judgement of the data. In general, results based on more samples are less susceptible to noise introduced by AI annotators – and thus may be considered more reliable.