The AI GPT-4 has emergent abilities—but that’s not why it’s scary.
Kelso Harper: Since its release last year, you’ve probably heard a lot of chatter about ChatGPT.
Newscaster: The CEO of the company behind ChatGPT says artificial intelligence will reshape society.
Newscaster: ChatGPT, a model created by OpenAI, has the ability to respond to prompts in a humanlike manner.
Sophie Bushwick: But this chatbot is just the interface between users and a large language model called GPT-3.5. And last month, the model’s developer, tech research company OpenAI, announced its most advanced model yet. The new GPT-4 is really good at solving problems and writing original text and code—so good that it’s got some AI researchers asking everyone to press pause.
Newscaster: Now an open letter signed by Elon Musk, Steve Wozniak and a number of top AI scientists is calling for a six-month pause on AI development arguing quote powerful ai systems should be developed only once we know their effects will be positive and their risks will be manageable.
Harper: So what’s the real difference between GPT-4 and its predecessor–and why does it scare AI experts so much?
Bushwick: I’m Sophie Bushwick
Harper: And I’m Kelso Harper
— and you’re listening to Tech, Quickly.
Harper: So what exactly is this artificial intelligence?
Bushwick: GPT stands for “generative pre-trained transformer,” which is a type of large language model, and that is a type of AI program that can analyze and generate text. Basically, researchers build a machine learning program called a neural network, because its structure is inspired by the human brain. And then they feed it a bunch of text so it can learn the patterns of which words generally follow which other words, and then eventually generate its own sentences. And then humans test it out so they can tweak its responses to be more coherent.
Harper: Yeah, I've played around a little bit with ChatGPT and it can hold a conversation and write pretty darn well. I mean, we even published a statement from ChatGPT about why it should be regulated, right? But you’re saying GPT-4 is even better?
Bushwick: GPT-4 blows GPT-3.5 out of the water in some areas–but it also has some of the same problems. So let’s start with the good stuff. GPT-4 can do something different which is analyze images as well as text, even though it does not produce original images itself. But for instance, this could be used to describe objects in the environment around someone who is visually impaired. Or, in one demo, it was fed a sketch of a simple website, and then it wrote the code to make that website.
Harper: Well, that is so cool and really impressive.
Bushwick: Yeah, it can also process and produce longer chunks of text than its predecessor without devolving into unrelated topics. And it's better at imitating the writing style of specific authors. And it's much better at problem solving. So the OpenAI team had both GPT-4 and GPT-3.5 take a bunch of exams, including the SATs, the GREs, some AP tests, even a couple sommelier exams. GPT-4 got consistently high scores – better than its predecessors and even better than some humans. So, it scored in the 90th percentile on a simulated bar exam.
Harper: Well, that's kind of rude, actually. This AI has never even had a sip of wine, and it's way more sophisticated than I would ever be.
Bushwick: I mean, luckily for us insecure humans, GPT four isn't good at everything. It did not score well on the AP English language or literature exams. So we've got that going for us.
Harper: Thank goodness. That really reassures my fragile human ego. But you mention that GPT-4 still has some of the same problems that were causing issues with GPT-3.5 and chatbot, right?
Bushwick: Yeah, it still sometimes gives biased responses, and it also produces hallucinations.
Harper: So a hallucination is when it just, like, confidently makes something up, right? Like claiming there is a population of humans on Mars?
Bushwick: Yes, exactly. And this happens because it doesn't actually know what it's saying. It's just good at putting out words that sound like they might be true.
Bushwick: And this has already caused problems. In one case, GPT falsely stated that a law professor in California had sexually harassed a student. It even cited a made up news article that doesn't actually exist to support its claim.
Harper: Oh, gosh, that doesn’t sound good.
Bushwick: In Australia a man plans to sue ChatGPT because Chat GPT falsely stated that he had been sentenced to prison for bribery and corruption. So these hallucinations, they can have real world consequences.
Harper: Wow, what can we do about these kinds of problems?
Bushwick: Well, a group of AI researchers and AI developers published an open letter calling for a pause on AI development. But then other AI ethicists pointed out that those claims actually hype up the technology, they make it seem super powerful. So other researchers think we need to focus on addressing the current issues with the technology, like being more transparent about what training data is being used, and communicating clearly about how hallucinations affect the trustworthiness of models like GPT-4, and making sure the models that we interact with directly, like ChatGPT or Google’s Bard, have guardrails in place before they’re published.
Harper: Okay, so what about concerns people have that this could develop into like a super powerful artificial general intelligence, or AGI, with the ability to reason and to think on par with a human.
Bushwick: There has been some chatter about this possibility. One researcher published a paper claiming there were quote unquote glimmers of AGI in GPT-4
Bushwick: Although that paper has come under a lot of criticism. I don’t think this type of model is going to turn into AGI–there’s no mind behind it, just the illusion of one. But GPT-4 does show that it has emergent abilities.
Harper: Emergent abilities?
[Clip: 2001: A Space Odyssey]
Hal: Open the pod bay doors, Hal.
Dave: I'm sorry, Dave. I'm afraid I can't do that.
Bushwick: Right out of sci-fi, yeah.
Harper: Oh my gosh.
Bushwick: Well, essentially, what it means is it can do things it was not trained to do. Hmm. Like it was not taught directly how to translate text into different languages or how to perform arithmetic. But it can do those things. Other language models have shown emergent abilities to, once they're scaled up to a large enough size and trained on a large enough volume of data.
But we're not really sure why these abilities appear because, again, we can't really peek under the hood.
Harper: Well, spooky. Okay. But the training data itself can cause some problems, too, right? Like by teaching these models to be biased.
Bushwick: Well, because we humans are the ones who produce the training data. (Singing) “It’s us. Hi! We’re the problem, it’s us.”
Harper: It’s true.
Bushwick: But the answer is actually go beyond just bias training data. We humans, we love projecting our thoughts and emotions onto other objects like we assume if we're having a conversation, the entity on the other end is its own person. And that's not really the case.
Harper:Is it really so bad to treat a chat bot like a person or think of it kind of like a person?
Bushwick: When we empathize with machines and we assume they think like we do. We're more likely to trust them. And the better these models get at producing humanlike text, the better they'll get at influencing our behavior. Hmm. I mean, remember the early days of GPS assistance when people still knew the best route by heart or from a paper map, but then they would make a wrong turn because the GPS told them to.
Harper: Right, and they ended up, like, driving into lakes!
Bushwick: Totally! Because they trust the GPS! And now imagine they're following a machine's instruction in other areas.
A recent study showed that statements written by chat chip can influence someone's decision when they're making a moral judgment. Even when the participants didn't believe that that was happening, they didn't think that those statements were influencing them, even though they were.
What about if someone is making a decision about who to vote for based on information that came out of a chat bot? Information that we know might include inaccurate hallucinations.
Harper: Right. That seems like it would be a problem.
Bushwick: And it doesn’t take a super-powerful AGI to cause that problem either.
Harper: Yeah. Well, shoot. Thanks for giving us a, uh, reassuring update on the state of text-generating A.I.
Bushwick: It's a little anxiety inducing, but it's exciting too. I feel like the dog from up, but instead of squirrel, I'm just constantly being distracted by new AI news.
Harper: Amazing. Great callback.
Bushwick: Thank you
Harper: Well, thank you again. And thank you for listening to Tech quickly. I'm Kelso Harper.
Bushwick: I’m Sophie Bushwick.
Tech, Quickly is a part of Scientific American’s podcast Science, Quickly, which is produced by Jeff DelViscio, Tulika Bose, and Kelso Harper. Our theme music by Dominic Smith.
Harper: And head to sciam.com for even more up to date and in-depth tech news.