Why the most controversial characteristic of GPT -5 - The model router

The announcement of the OPENAI GPT -5 last week was supposed to be a triumph – to the test that the company was always the undisputed leader of the AI – until it is. During the weekend, a wave of customer discharge transformed the deployment in addition to a storm of public relations: it has become a product and a crisis of confidence. Users deplored the loss of their favorite models, which had doubled as romantic therapists, friends and partners. The developers complained of degraded performance. The critic of the industry Gary Marcus predictably called GPT-5 “late, over-typical and disappointing”.

The culprit, supported a lot, was hidden in sight: a new model in real time “router” which automatically decides which of the several variants of GPT-5 to turn for each job. Many users assumed that GPT-5 was a single model formed from zero; In reality, it is a network of models – some lower and cheaper, others stronger and more expensive – exceeded together. Experts say that this approach could be the future of AI as important language models are advancing and becoming more with a high intensity of resources. But during the beginnings of GPT-5, Openai demonstrated some of the challenges inherent in the approach and learned some important lessons on the way in which user expectations evolve in the AI era.

For all the advantages promised by the routing of models, many GPT-5 users have been bristled by what they perceived as a lack of control; Some have even suggested that Openai could deliberately try to pull the wool in their eyes.

In response to the GPT-5 tumult, Openai quickly moved to bring the main main model, GPT-4O, for professional users. He also declared that he had corrected the routing of buggy, increased the limits of use and promised continuous updates to regain the confidence and stability of users.

Anand Chowdhary, co-founder of the AI, Firstquadrant sales platform, summed up the situation frankly: “When routing strikes, it looks like magic. When it ends, it feels broken.”

The promise and inconsistency of the routing of the models

Jiaxuan You, assistant IT professor at the University of Illinois Urbana-Champaign, said Fortune His laboratory studied both the promise – and the inconsistency – of model routing. In the case of GPT-5, he said, he thinks (although he cannot confirm) that the model router sometimes sends parts of the same request to different models. A cheaper and faster model could give an answer while a slower and more focused model gives another, and when the system covers these responses together, subtle contradictions take place.

The idea of routing the model is intuitive, he explained, but “making it really work is very non-trivial”. Perfecting a router, he added, can be as difficult as the creation of Amazon quality recommendation systems, which take years and many experts in the field to refine. “GPT-5 is supposed to be built with perhaps more resources orders,” he said, stressing that even if the router chooses a smaller model, it should not produce incoherent responses.

However, you think routing is there to stay. “The community also believes that model routing is promising,” he said, pointing to technical and economic reasons. Technically, unique model performance seems to hit a tray: you have indicated the laws of current scaling, which indicate that when we have more data and calculation, the model improves. “But we all know that the model will not improve infinitely,” he said. “In the past year, we have all seen that the capacity of a single model is really saturated.”

Economically, routing allows AI providers to continue using older models rather than rejecting them when a new lance. Current events require frequent updates, but static facts remain exact for years. Directing certain requests to older models avoids wasting the huge time, calculation and money already spent to train them.

There are also difficult physical limits. GPU memory has become a bottleneck for the formation of ever heavier models, and flea technology is approaching maximum memory which can be packed on a single die. In practice, you have explained, the physical limits mean that the next model cannot be ten times greater.

An older idea that is now completely excited

William Falcon, founder and CEO of AI Platform Lightning AI, underlines that the idea of using a set of models is not new – it has existed since around 2018 – and as the Openai models are a black box, we do not know that GPT -4 did not use a model routing system.

“I think maybe they are more explicit on this subject now, potentially,” he said. In any case, the launch of the GPT-5 was strongly publicized, including the model routing system. The blog post presenting the model called it the “smartest, fastest and most useful model to date, with integrated reflection”. In the official chatgpt blog, Openai has confirmed that GPT – 5 in the Chatppt works on a system of models coordinated by a behind -the -scenes router which goes to deeper reasoning if necessary. The card of the GPT system-5 went further, clearly describing several variants of model-GPT-5-Main, GPT-5-Main-minini for speed and GPT-5-Thinking, GPT-5-Thinking-Mini, plus a thinking version-and explains how the unified system takes place automatically.

In a press pre-briefing, the CEO of Openai, Sam Altman, praised the model of the model as a means of fighting against what had been a list of models difficult to decipher. Altman described the interface as a preceding model selector a “very confusing disorder”.

But Falcon said that the central problem was that GPT-5 just didn’t look like a jump. “GPT -1 to 2 to 3 to 4 – each time, a massive jump. Four to five was not significantly better. This is what people are upset.”

Will multiple models be added to AG?

The debate on the routing of models led some to call for the ongoing media on the possibility of an artificial general intelligence, or AG, under development soon. OPENAI officially defines AGA as “highly autonomous systems that surpass humans to the most economically precious work”, but Altman notably declared last week that it is “not a super useful”))))

“What about the Act promised?” wrote Aiden Chaoyang He, researcher and co-founder of AI, Tensoropera, on X, criticizing the deployment of the GPT-5. “Even a powerful company like Openai does not have the capacity to form a super large model, forcing them to use the model router in real time.”

Robert Nishihara, CEO of the IA production platform, Anyscale, says that the scale still progresses in AI, but the idea of an all-powerful AI model is elusive. “It is difficult to build a model that is the best in everything,” he said. This is why GPT-5 is currently running on a network of models linked by a router, not a single monolith.

Openai said that he hoped to unify them in one model in the future, but Nishihara stresses that hybrid systems have real advantages: you can upgrade a part both without disturbing the rest, and you get most of the advantages without the cost and the complexity of the retirement a whole giant model. As a result, Nishihara thinks that routing will remain.

Aiden Chaoyang he agrees. In theory, the scaling laws are always maintained – more data and calculation make the models better – but in practice, he thinks that the development “spiral” between two approaches: routing of specialized models together, then trying to consolidate them in one. Decisive factors will be engineering costs, calculation and energy limits and commercial pressures.

The AG Hyped-Up story may also have to adapt. “If someone does something that is close to AG, I don’t know if it will literally be a weight of weight by doing so,” said Falcon, referring to the “brains” behind the LLM. “If it’s a collection of models that looks like act, it’s good. No one is a purist here.”

https://fortune.com/img-assets/wp-content/uploads/2025/04/GettyImages-2197091542-e1744671229805.jpg?resize=1200,600