OpenAI Merges Image Generation into GPT-4o and Powers Sora with Multimodal Capabilities

OpenAI Merges Image Generation into GPT-4o and Powers Sora with Multimodal Capabilities

März 29, 2025 AI tools

OpenAI has recently made significant strides in image and video generation by expanding the capabilities of GPT-4o and integrating them into its video model, Sora.

Introduction of Image Generation in GPT-4o

With its latest update, OpenAI introduced the „Images in ChatGPT“ feature, allowing users to generate images directly within ChatGPT using simple text prompts. The underlying model, GPT-4o, is an “omnimodal” system that can handle various data types including text, images, audio, and video. Compared to earlier models like DALL·E 3, GPT-4o offers significant improvements in attribute accuracy and the representation of text within images. While DALL·E was based on diffusion models, GPT-4o uses an autoregressive approach to generate images sequentially—from left to right and top to bottom. Although this process takes slightly longer, it results in higher-quality and more precise visuals.

Integration of GPT-4o into Sora

Sora, OpenAI’s advanced video generation model, has also benefited from the integration of GPT-4o. Thanks to this upgrade, Sora can now produce more realistic and detailed videos based on textual input. For instance, users can describe a scene, and Sora will create a corresponding video with high visual fidelity and accurate interpretation. This advancement opens up new possibilities in fields like filmmaking, education, and marketing by simplifying and accelerating video content creation.

Availability and Access

The image generation feature in ChatGPT is currently available to Plus, Pro, and Team subscribers. Due to high demand, access for free-tier users has been temporarily delayed. OpenAI is actively working to increase its capacity to make the feature more widely available.

Safety Measures and Ethical Considerations

OpenAI places strong emphasis on the safe and ethical use of its models. Robust safeguards have been implemented to prevent misuse of the image and video generation tools, including the blocking of harmful content and the integration of C2PA metadata to signal AI-generated content. However, challenges remain, especially concerning biases and stereotypes present in generated media. For example, reports indicate that Sora may produce stereotypical depictions in videos, reflecting existing societal biases. OpenAI is aware of this issue and is continuously working to improve its models and reduce such distortions.

Conclusion

The integration of image generation in GPT-4o and the evolution of Sora represent major advancements in AI-driven media creation. These technologies offer vast creative potential but also require thoughtful handling of their ethical and societal implications.

Sources: