in Science, Technology

Trying to understand how ChatGPT works

I finally got around to reading the Stephen Wolfram essay on What Is ChatGPT Doing … and Why Does It Work? Despite being written in relatively simple terms, the article still pushed the boundaries of my comprehension. Parts of it landed on my brain like an impressionist painting.

Things that stuck out for me:

  • In order to improve the output, a deliberate injection of randomness (called ‘temperature’) is required, which means that ‘lower-probability’ words get added as text is generated. Without this, the output seems to be “flatter”, “less interesting” and doesn’t “show any creativity”.
  • Neural networks are better at more complex problems than on simple ones. Doing arithmetic via a neural network-based AI is very difficult as there is no sequence of operations as you would find in a traditional procedural computer program. Humans can do lots of complicated tasks, but we use computers for calculations because they are better at doing this type of work than we are. Now that plugins are available for ChatGPT, it can itself ‘use a computer’ in a similar way that we do, offloading this type of traditional computational work.
  • Many times, Wolfram says something along the lines of “we don’t know why this works, it just does”. The whole field of AI using neural networks seems to be trial and error, as the models are too complex for us to fathom and reason about.

Particularly over the past decade, there’ve been many advances in the art of training neural nets. And, yes, it is basically an art. Sometimes—especially in retrospect—one can see at least a glimmer of a “scientific explanation” for something that’s being done. But mostly things have been discovered by trial and error, adding ideas and tricks that have progressively built a significant lore about how to work with neural nets.

  • People do seem to be looking at the output from ChatGPT and then quickly drawing conclusions of where things are headed from a ‘general intelligence’ point of view. As Matt Ballantine puts it, this may be a kind of ‘Halo effect’, where we are projecting our hopes and fears onto the technology. However, just because it is good at one type of task — generating text — doesn’t necessarily mean that it is good at other types of tasks. From Wolfram’s essay:

But there’s something potentially confusing about all of this. In the past there were plenty of tasks—including writing essays—that we’ve assumed were somehow “fundamentally too hard” for computers. And now that we see them done by the likes of ChatGPT we tend to suddenly think that computers must have become vastly more powerful—in particular surpassing things they were already basically able to do […]

But this isn’t the right conclusion to draw. Computationally irreducible processes are still computationally irreducible, and are still fundamentally hard for computers—even if computers can readily compute their individual steps. And instead what we should conclude is that tasks—like writing essays—that we humans could do, but we didn’t think computers could do, are actually in some sense computationally easier than we thought.

  • So my last big takeaway is that — maybe — human language is much less complex than we thought it was.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.