Qwen-Image is a new powerful and open source IA image generator -

Do you want smarter information in your reception box? Sign up for our weekly newsletters to obtain only what matters for business managers, data and security managers. Subscribe now

After entering the summer with a blitz new models of powerful open source and freely available available which correspond or in some cases with American rivals with closed / owner, Alibaba’s crack “Qwen Team” of AI researchers is back today with the release of a new model of very classified IA image generator – also open source.

Qwen-Image stands out in a crowded field of generative images models Due to his the emphasis on the rendering of the text in the visuals with precision – An area where many rivals still have trouble.

Supporting alphabetical and logographic scripts, the model is particularly capable of managing complex typography, multiple provisions, semantics at the level of paragraphs and Bilingual content (for example, English-Chinese).

In practice, this allows users to Generate content such as film posters, presentation slides, showcase scenes, handwritten poetry and stylized infographics – With net text that aligns their guests.

The IA Impact series returns to San Francisco – August 5

The next AI phase is here – are you ready? Join the Block, GSK and SAP leaders for an exclusive overview of how autonomous agents reshape business workflows – from real -time decision -making to end -to -end automation.

Secure your place now – space is limited:

Qwen-Image examples include a wide variety of real world use cases:

Marketing and brand: Bilingual posters with brand logos, stylistic calligraphy and coherent design patterns
Presentation design: Discks of slides aware of the layout with title hierarchies and visuals adapted to the theme
Education: Generation of documents in class with diagrams and an educational text made with precision
Retail and e-commerce: Showcase scenes where product labels, signaling and environmental context must all be readable
Creative contents: Handwritten poetry, stage stories, anime style illustration with integrated history text

Users can interact with the model on the Qwen chat website by selecting the “Image generation” mode in the buttons under the entrance field of the prompt.

However, my brief initial tests revealed that the text and rapid membership were not significantly better than Midjourney, the Imagne generator of the popular owner of the American company of the same name. My session via the Qwen cat produced several errors in the rapid understanding and the loyalty of the text, to my great disappointment, even after repeated attempts and a rapid reformularity:

However, Midjourney offers only a limited number of free generations and requires subscriptions for more, compared to Qwen Image, which, thanks to its licenses and open source weights published on Face Enrained, can be adopted by any business or third party supplier.

License and availability

Qwen-Image is distributed under Apache 2.0 licenseallowing use, redistribution and a commercial and non -commercial modification – although the allocation and inclusion of the license text are required for derivative work.

This can make companies in search of an open source image generation tool to be used to make internal or external guarantees such as leaflets, advertisements, opinions, newsletters and other digital communications.

But the fact that the formation data of the model remains a tight secret – as with most other generators of main AI images – can open certain companies on the idea of using it.

Qwen, unlike Adobe Firefly or the generation of native GPT-4O images of Openai, for example, does not offer compensation for commercial uses of its product (IE, if a user is prosecuted for copyright violation, Adobe and Openai will help support them in court).

The associated model and assets – including demonstration notebooks, evaluation tools and fine adjustment scripts – are available via several standards:

In addition, a live evaluation portal called AI Arena allows users to compare generations of image -toured images, contributing to a public ELO style classification.

Training and Development

Behind the performance of Qwen-Image is a In -depth training process based on progressive learning, alignment of multimodal tasks and the conservation of aggressive dataAccording to the technical document, the research team has published today.

The training corpus includes billions of image text pairs from four areas: natural imaging, human portraits, artistic content and design (such as posters and user interface arrangements) and text -oriented synthetic data. The Qwen team has not specified the size of the training data corpusApart from “billions of pairs of image texts”. They provided ventilation of the approximate percentage of each category of content that he included:

Nature: ~ 55%
Design (user interface, posters, art): ~ 27%
People (portraits, human activity): ~ 13%
Synthetic text rendering data: ~ 5%

In particular, Qwen stresses that all the synthetic data has been generated internally, and no image created by other AI models has been used. Despite the detailed conservation and filtering steps described, The documentation does not clarify whether one of the data has been authorized or drawn in public or owners data.

Unlike many generative models that exclude synthetic text due to noise risks, Qwen -Image uses closely controlled synthetic rendering pipelines to improve characters – especially for low -frequency characters in Chinese.

A curriculum style strategy is used: the The model begins with subtitled images and non-text contentThen progresses towards sensitive text scenarios at the disposal, a rendering in mixed language and dense paragraphs. This A progressive exhibition is shown to help the model to become widespread between scripts and types of formatting.

Qwen-Image incorporates three key modules:

Qwen2.5-VLThe multimodal language model, extracts contextual meaning and guides generation through system guests.
VAE encoder / decoderTrained on high -resolution documents and real provisions, manages detailed visual representations, in particular small or dense text.
MmditThe backbone of the dissemination model, coordinates joint learning through image and text methods. A new MSROPE system (Multimodal evolutionary rotary position coding) improves spatial alignment between tokens.

Together, these components allow Qwen-Image to operate effectively in tasks that involve an understanding of the image, a generation and a precise modification.

Performance benchmarks

Qwen-Image has been assessed compared to several public benchmarks:

Awesome And DPG for rapid monitoring and consistency of object attributes
Oneig Bench And For for composition reasoning and layout loyalty
CVTG-2K,, Chinese wordsAnd Lux For text rendering, especially in multilingual contexts

In almost all cases, the qwen image corresponds or exceeds existing closed source models such as GPT IMAGE 1 (High), SEEDREAM 3.0 and FLUX.1 KONTEXT (Pro). In particular, his performance on Chinese text rendering was significantly better than all compared systems.

In the AI Arena public ranking – on the basis of more than 10,000 comparisons in human pairs – Qwen -Image ranks third in the general classification and is the best open -source model.

Implications for business technical decision -makers

For company AI teams that managed complex multimodal workflows, Qwen-Image introduces several functional advantages that align with the operational needs of different roles.

Those who manage the life cycle of vision -language vision models – from deployment training – WilFind the value in the coherent output quality of Qwen-Image and its ready-made components for integration. The open source nature reduces license costs, while modular architecture (QWEN2.5-VL + VAE + MMDIT) facilitates adaptation to personalized data sets or to refine specific outputs to the domain.

THE Curriculum style training data and clear reference results help teams to assess the physical form for purposes. Whether it is the deployment of marketing visuals, document renderings or graphics of electronic commerce products, Qwen-Image allows rapid experimentation without proprietary constraints.

Engineers Responsible for building AI pipelines or deploying models through distributed systems will appreciate the detailed documentation of the infrastructure. The model was formed using an architecture producer-consumer, supports the multi-resolution treatment (256p at 1328p) and is designed to operate with Megatron-LM and the parallelism of the tensor. This Make a candidate for deployment in hybrid cloud environments where reliability and debit count.

In addition, the management of image editing work invites to the image (Ti2i) and specific to use it in real -time or interactive applications.

Professionals have focused on ingestion, validation and transformation of data Can use Qwen-Image as a tool to generate synthetic data sets for training or increased computer vision models. Its ability to generate high -resolution images with integrated multilingual annotations can improve performance in the downstream OCR, object detection or layout analysis tasks.

Since Qwen-Image was also trained to avoid artefacts such as QR codesA distorted text and watermark, it offers a better quality synthetic contribution than many public models – helping business teams to preserve the integrity of the training set.

Looking for comments and opportunities to collaborate

The Qwen team emphasizes the opening and community collaboration in the version of the model.

Developers are encouraged to test and refine the Qwen image, offer traction requests and participate in the assessment classification. Comments on text rendering, loyalty edition and multilingual use cases will shape future iterations.

With a declared objective of “reducing technical obstacles to the creation of visual content”, the team hopes that Qwen-Image will serve not only as a model, but as a foundation for more in-depth research and a practical deployment in all industries.

Daily information on business use cases with VB daily

If you want to impress your boss, VB Daily has covered you. We give you the interior scoop on what companies do with a generative AI, from regulatory changes to practical deployments, so that you can share information for a maximum return on investment.

Read our privacy policy

Thank you for subscribing. Discover more VB newsletters here.

An error occurred.