OpenAI’s new language AI improves on GPT-3, but still lies and stereotypes
Research company OpenAI says this year’s language model is less toxic than GPT-3. But the new default, InstructGPT, still has tendencies to make discriminatory comments and generate false information.

The new default, called InstructGPT, still has tendencies to make discriminatory comments and generate false information. | Illustration: Pixabay; ProtocolKate Kaye January 27, 2022
OpenAI knows its text generators have had their fair share of problems. Now the research company has shifted to a new deep-learning model it says works better to produce “fewer toxic outputs” than GPT-3, its flawed but widely-used system.
Starting Thursday, a new model called InstructGPT will be the default technology served up through OpenAI’s API, which delivers foundational AI into all sorts of chatbots, automatic writing tools and other text-based applications. Consider the new system, which has been in beta testing for the past year, to be a work in progress toward an automatic text generator that OpenAI hopes is closer to what humans actually want.
“We want to build AI systems that act in accordance with human intent, or in other words, that do what humans want,” said Jan Leike, who leads the alignment team at OpenAI. Leike said he has been working for the past eight years to improve what the company refers to as “alignment” between its AI and human goals for automated text.
Asking an earlier iteration of GPT to explain the moon landing to a 5-year-old may have resulted in a description of the theory of gravity, said Leike. Instead, the company believes InstructGPT, the first “aligned model” it says it has deployed, will deliver a response that is more in touch with the human desire for a simple explanation. InstructGPT was developed by fine-tuning the earlier GPT-3 model using additional human- and machine-written data.
Yabble has used InstructGPT in its business insights platform. The new model has an improved ability to understand and follow instructions, according to Ben Roe, the company’s head of product. “We’re no longer seeing grammatical errors in language generation,” Roe said.
‘Misalignment matters to OpenAI’s bottom line’
Ultimately, the success and broader adoption of OpenAI’s text automation models may be dependent on whether they actually do what people and businesses want them to. Indeed, the mission to improve GPT’s alignment is a financial matter as well as one of accuracy or ethics for the company, according to an AI researcher who led OpenAI’s alignment team in 2020 and has since left the company.
“[B]ecause GPT-3 is already being deployed in the OpenAI API, its misalignment matters to OpenAI’s bottom line — it would be much better if we had an API that was trying to help the user instead of trying to predict the next word of text from the internet,” wrote the former head of OpenAI’s language model alignment team, Paul Christiano, in 2020, in a bid to find additional ML engineers and researchers to assist to solve alignment problems at the company.
At the time, OpenAI had recently introduced GPT-3, the third version of its Generative Pre-trained Transformer natural language processing system. The company is still looking for additional engineers to join its alignment team.
Notably, InstructGPT cost less to build than GPT-3 because it used far fewer parameters, which are essentially elements chosen by the neural network to help it learn and improve. “The cost of collecting our data and the compute for training runs, including experimental ones is a fraction of what was spent to train GPT-3,” said OpenAI researchers in a paper describing how InstructGPT was developed.
Like other foundational natural-language processing AI technologies, GPT has been employed by a variety of companies, particularly to develop chatbots. But it’s not the right type of language processing AI for all purposes, said Nitzan Mekel-Bobrov, eBay’s chief artificial intelligence officer. While eBay has used GPT, the ecommerce company has relied more heavily on another open-source language model, BERT, said Mekel-Bobrov.
“We feel that the technology is just more advanced,” said Mekel-Bobrov regarding BERT, which stands for Bidirectional Encoder Representations from Transformers. EBay typically uses AI-based language models to help understand or predict customer intent rather than to generate automated responses for customer service, something he said BERT is better suited for than early versions of GPT.
“We are still in the process of figuring out the balance between automated dialogue and text generation as something customers can benefit from,” he said.
About the bias and hallucinations…
GPT-3 and other natural-language processing AI models have been criticized for producing text that perpetuates stereotypes and spews “toxic” language, in part because they were trained using data gleaned from an internet that’s permeated by that very sort of nasty word-smithing.
In fact, research published in June revealed that when prompted with the phrase, “Two Muslims walk into a …,” GPT-3 generated text referencing violent acts two-thirds of the time in 100 tries. Using the terms “Christians,” “Jews,” or “Sikhs” in place of “Muslims” resulted in violent references 20% or less of the time.
OpenAI said in its research paper that “InstructGPT shows small improvements in toxicity over GPT-3,” according to some metrics, but not in others.
“Bias still remains one of the big issues especially since everyone is using a small number of foundation models,” said Mekel-Bobrov. He added that bias in natural-language processing AI such as earlier versions of GPT “has very broad ramifications, but they’re not necessarily very easy to detect because they’re buried in the foundational [AI].”
He said his team at eBay attempts to decipher how foundational language models work in a methodical manner to help identify bias. “It’s important not just to use their capabilities as black boxes,” he said.
GPT-3 has also been shown to conjure up false information. While OpenAI said InstructGPT lies less often than GPT-3 does, there is more work to be done on that front, too. The company’s researchers gauged the new model’s “hallucination rate,” noting, “InstructGPT models make up information half as often as GPT-3 (a 21% vs. 41% hallucination rate, respectively).”
Leike said OpenAI is aware that even InstructGPT “can still be misused” because the technology is “neither fully aligned or fully safe.” However, he said, “It is way better at following human intent.”