Tubingen (Delayed) Blog 2
Last night, I couldn’t sleep and wanted to write, but forgot Medium existed, so didn’t decide to blog until this morning. So here’s the subject that kept me up: trend lines in AI compute.
I was watching an interview by Dwarkesh Patel with a former classmate of mine at Columbia, Leopold Aschenbrenner, who recently was fired from OpenAI. I remember he was always the smartest guy in the economics classes I attended, but I never interacted with him outside of that, which is why it came as no surprise to me listening to him talk and skimming his blog posts that his main thesis for achieving AGI is believing ‘not in science fiction, but in straight lines on a graph’ (taken directly from his essays on ‘situational awareness’). Sounds a lot like how economists pre-1950 explained human rationality. Look at that expected utility calculation, it’s just a straight line on a graph! Case closed, we’re homo economicus all the way down. I’d like to argue that the dynamics of human cognition, evolution, and natural resource depletion is far from linear.
What did shock me in this interview, is that for all this willfully giddy trend line drawing and speculative extrapolation, not only is no alternative of the power law asymptote considered (which I’ll get into in just a second), but no concern whatsoever is given for the environmental or human cost of building 100 Gigawatt factories or an extra Hoover Dam to facilitate such a trend-line scaling in AI pre-training and inference (topics Leopold endorses with a blasé air repeatedly on the podcast). The only time environmental risk is mentioned in the podcast, first pointedly by Dwarkesh, and later grudgingly by Leopold, Leopold says ‘yeah, I am concerned about the environment…but ultimately I think national interest will have to win out.’ This sentence alone made me feel physically unwell, enough to stop the podcast. We CANNOT have people who think that scaling training data and power at all costs is the only way to ‘win’, not only because it’s a logically flawed idea (as I’ll get into in a second), but also because morally this feels like the attitude that will ultimately devour our earth’s resources at an exponential clip and facilitate gruesome faction wars even faster. And yet, here’s my former classmate, going live for millions, gleefully speaking of AI enhancing warfare and China-US military races.
As for the logical flaw in the trend line extrapolations on ‘effective compute’ (#tokens processed versus loss), the first thing to realize is that this is a power law distribution. What that means is that as you scale data processing power to infinity, you will never reach the asymptote of 0 loss (what we may consider as full human or super-human intelligence). We can cut down and burn all the natural resources in the world, spend a trillion plus dollars, and fuel massive AI training factories until they melt, but, according to the same trend lines AI optimists use to extoll their visions of the future to the world, we won’t get to a lossless model. We will probably still see great improvements as we descend the loss curve, but betting on the availability of several Earth-large supplied of natural gas is a terrible move.
I believe in smart moves being drawn from research in energy and data efficient machine learning, techniques that make more use of the little data we as humans process so effortlessly. Our cognitive evolution endowed us with 20-watt gigafactories inside our own skulls — we should look to our minds as the biggest inspirations for breakthroughs in AI, not to a breakneck rush towards a lossless gold-mine that doesn’t exist and an Earth that would likewise cease to exist.