Trying to understand how ChatGPT works

I finally got around to reading the Stephen Wolfram essay on What Is ChatGPT Doing … and Why Does It Work? Despite being written in relatively simple terms, the article still pushed the boundaries of my comprehension. Parts of it landed on my brain like an impressionist painting.

Things that stuck out for me:

In order to improve the output, a deliberate injection of randomness (called ‘temperature’) is required, which means that ‘lower-probability’ words get added as text is generated. Without this, the output seems to be “flatter”, “less interesting” and doesn’t “show any creativity”.
Neural networks are better at more complex problems than on simple ones. Doing arithmetic via a neural network-based AI is very difficult as there is no sequence of operations as you would find in a traditional procedural computer program. Humans can do lots of complicated tasks, but we use computers for calculations because they are better at doing this type of work than we are. Now that plugins are available for ChatGPT, it can itself ‘use a computer’ in a similar way that we do, offloading this type of traditional computational work.
Many times, Wolfram says something along the lines of “we don’t know why this works, it just does”. The whole field of AI using neural networks seems to be trial and error, as the models are too complex for us to fathom and reason about.

People do seem to be looking at the output from ChatGPT and then quickly drawing conclusions of where things are headed from a ‘general intelligence’ point of view. As Matt Ballantine puts it, this may be a kind of ‘Halo effect’, where we are projecting our hopes and fears onto the technology. However, just because it is good at one type of task — generating text — doesn’t necessarily mean that it is good at other types of tasks. From Wolfram’s essay:

But there’s something potentially confusing about all of this. In the past there were plenty of tasks—including writing essays—that we’ve assumed were somehow “fundamentally too hard” for computers. And now that we see them done by the likes of ChatGPT we tend to suddenly think that computers must have become vastly more powerful—in particular surpassing things they were already basically able to do […]

But this isn’t the right conclusion to draw. Computationally irreducible processes are still computationally irreducible, and are still fundamentally hard for computers—even if computers can readily compute their individual steps. And instead what we should conclude is that tasks—like writing essays—that we humans could do, but we didn’t think computers could do, are actually in some sense computationally easier than we thought.

So my last big takeaway is that — maybe — human language is much less complex than we thought it was.

Published

2 June 2023

Andrew Doran in Science, Technology | 2 June 2023

Trying to understand how ChatGPT works

Related

Published

2 June 2023

Leave a commentCancel reply