I've been thinking a lot about LLM Evals recently, as in my post on [[What Makes LLM Evals Hard]]. The whole thing brings to mind a short story from Ted Chiang's science fiction story called "Catching Crumbs from the Table", published in Nature in 2002. I find the title and the story itself just a really powerful metaphor, and one which the whole field of LLM Evaluation, whether aware of it or not, is striving to avoid.
In the story, artificial intelligences called "metahumans" have advanced far beyond human capabilities, producing scientific breakthroughs incomprehensible to human researchers.
Human scientists have essentially lost track of their own loop. Now unable to understand or contribute to cutting-edge research, they are all reduced to "catching crumbs" - interpreting pure AI scientific work, reverse-engineering artifacts that are based on advanced principles none of them can fully grasp. Of course, Evals today are done so that we can reduce harm from LLM outputs, but under a fast take-off scenario he manages to draw a very clear outline of the potential problem. A world where humans struggle to evaluate and understand the outputs of artificial minds.
Although it's just an interesting thought experiment, the scenario showed me just how critical it is to advance our evaluation methodologies in tandem with model development. If we fail to do so, we risk creating a reality where:
1. We can no longer effectively assess the capabilities and limitations of our most advanced models.
2. We struggle to identify potential risks or biases in model outputs.
3. We lose the ability to guide AI development in alignment with human values and needs.
In essence, we risk being left to "catch crumbs" from increasingly inscrutable AI systems, much like the scientists in Chiang's story.
I haven't been a huge singularity/AGI argument fan, but I did feel a sense of this when I was having a conversation with ChatGPT on a subject I know extremely well, and it not only kept up but brought other examples I was not aware of. There is a definitely strange mix of awe and unease at interfacing with a system that seems to possess knowledge beyond one's own. Perhaps it's something we'll all get used to.