The AI startup says it spent a year using human workers to train its GPT-4o model to generate more realistic images, comprehensible text
OpenAI says it has over 400 million weekly users of ChatGPT. Photo: Gabby Jones/Bloomberg News
OpenAI unveiled an updated version of its AI system GPT-4o that can generate more realistic images. It’s the result of a year-long effort with human trainers.
GPT-4o replaces DALL-E 3 as the default image generation model behind OpenAI’s ChatGPT chatbot. And the ability to use it is now available to ChatGPT
- Free,
- Plus,
- Team
- and Pro users, the company said.
Billed as a less expensive version of its most advanced AI model at the time. GPT-4o debuted last year as a multimodal model capable of creating and understanding:
- text,
- video,
- audio
- and images.
Today’s refined GPT-4o model makes it easier for consumers, and businesses, to create:
- more life-like images
- and paragraphs of comprehensible text
- and even company logos and slide decks, OpenAI said.
Behind the improvement to GPT-4o is a group of “human trainers” who labeled training data for the model—pointing out where:
- typos,
- errant hands and faces
had been made in AI-generated images, said Gabriel Goh, the lead researcher on the project.
Through that technique, the AI model was trained to follow human directions more closely, thereby generating more accurately rendered and useful images, he said.
Today’s refined GPT-4o model
The process, usually referred to as “reinforcement learning from human feedback” or RLHF, is a common technique used by AI companies to improve their models after they are initially trained. Given the sheer reach of OpenAI’s AI systems—it says it has over 400 million weekly users of ChatGPT—the impact these human trainers can have is significant.
OpenAI said it worked with a little more than 100 human workers for the reinforcement learning process.
“The base model is already intelligent in its own way,” Goh said, “and then the [reinforcement learning from human feedback] process brings out the intelligence and refines it.”
Image generation is now a lot more useful
With the improvements in research made to GPT-4o, ChatGPT’s image generation is now a lot more useful for consumers and businesses, OpenAI said. Whereas prior iterations of its AI systems weren’t able to generate paragraphs of readable text with images, for instance, GPT-4o is capable of doing so, it said.
The model is also able to create transparent backgrounds, making it possible for businesses to create logos or other iconography, said Jackie Shannon, an OpenAI product lead for ChatGPT multimodal. Other uses the company suggested include asking ChatGPT to generate images based on a user-uploaded brand style guide.
GoDaddy Chief Data and Analytics Officer Travis Muhlestein said the technology and web-hosting company’s use of GPT-4o is “helping us embrace AI-driven content creation.” That includes things like using AI to create stock images and logos, the company said.
Still, the image generation in GPT-4o isn’t perfect, Goh said. In one example the company showed, a user uploaded a photo of their living room with two windows to ChatGPT. The AI system was only able to reproduce one window when recreating the image of the living room with new furniture.
The controverse
The use of AI image generators remains controversial. Some artists have said AI image generators plagiarize their work and threaten their livelihoods.
OpenAI said GPT-4o was trained on:
- “publicly available data,”
- as well as proprietary data
from its partnerships with companies like Shutterstock.
“We’re respecting of the artists’ rights in terms of how we do the output, and we have policies in place that prevent us from generating images that directly mimic any living artists’ work,” said Brad Lightcap, OpenAI’s chief operating officer.
Wall Street Journal owner News Corp has a content-licensing partnership with OpenAI.
Source: Wall Street Journal
Read other news at our blog
In need of a Web Server? Take a look at our services