Why I think differently about LLMs

When I was in grad school I studied a protein called Fus2p. Fus2p exists in baker's yeast: the yeast responsible for making bread, beer, and for a surprising amount of our knowledge about how cells work. It is not related (in an obvious way) to any protein that exists in humans and it participates in the biological process of fusing two yeast cells together. This process is called "mating" but it probably isn't related (by evolution) to the fusion of mammalian gametes.

There are only a few labs in the world interested in Fus2p and the direct concerns around it. Studying it might be broadly interesting someday for a deeper understanding of membrane dynamics or cell signalling or something, but few working biologists will ever have heard of it. This is to a first approximation kind of the point of a graduate degree: to know more and more about less and less until you know everything about nothing. Or less cynically, by concentrating tremendous effort in a tiny realm of human knowledge, to push the front line of that knowledge ahead, by a tiny amount.

I thought about this when I heard Sam Altman's claim that ChatGPT will soon be able to answer questions at the level of an advanced sciences PhD student. What does this even mean? How is it possible that enough training data exists in such micro-fields that an LLM would be able to answer at a level approaching someone who is an expert?

I have this constant disconnect when I talk to people about AI that I think comes down the following: most people, even if they claim to know otherwise, are under the impression that what the machines are doing is thinking or reasoning. Even the name "artificial intelligence" implies this. There's nothing new here: chess engines can clearly "think", but this idea is old enough now that it doesn't surprise anyone.

The disconnect happens when it begins to appear that the engineers or the GPUs or even the model itself is the entity making the magic happen. Someone super-smart built some super-smart software, and all they needed was some training data. That wasn't a problem because the internet provides a huge source, for free.

I see things the other way around: the machines are able to "think" only because of the abundance of data. If an LLM gives you some great insight, it's because someone out there at some time wrote that same thing, and you could find the same answer if you just talked to them. The training data is the only thing that matters and the only thing that has ever mattered. The software is just some advanced statistics that operates over it.

This has several important implications:

It is not a foregone conclusion that LLMs will continue to get better. The training well has been scraped dry. Yes, more and more writing will be released online in the future. A lot of that writing is now done by LLMs, which has it's own problems. The total amount of content, particularly original content, does not increase exponentially the way that compute power does. This means that our past assumptions about computers getting more and more powerful don't necessarily apply in this domain.
The question of whether content creators are being fairly compensated for their data is not even the right question. They are not just providing data. They are the thing that is making the model run. That their work being stolen in this way is not just a moral failure, it's going to lead to a fundamental break in the way that the culture of the web operates. More on this later.
It's very easy to see why LLMs succeed when the prompt is sufficiently general and get worse the more specific the question being asked. For general questions, there's* simply a lot *for them to work with. They fail utterly at the very specific: questions about an individual graduate thesis are like LSD for a computer. They aren't going to magically get better at this over time because increases in processing power don't create more data for them to work with.

I think that this disconnect between success in general domains vs success in specific domains is where AI boomers are dangerously overpromising. A second-year undergrad who can answer general questions about cell biology and genetics stands a good chance of becoming a grad student who can research and write a thesis. A software job candidate who can complete basic tasks in an application or interview stands a good chance of being a valuable asset to the company. We use* *these general cases as domains to discuss common concerns among those working in the field, and to evaluate people for their aptitude. But they are not the same as actually doing the work, and people who confuse the two have potential to do real damage.