Facing embrace: 5 ways whose companies can reduce IA costs without sacrificing performance

0
HF.webp.jpeg

Do you want smarter information in your reception box? Sign up for our weekly newsletters to obtain only what matters for business managers, data and security managers. Subscribe now


Companies seem to accept it as a basic fact: AI models require a significant amount of calculation; They just have to find ways to get more.

But that should not be like that, according to Sasha Luccioni, the AI and the climate at the embrace. What if there is a more intelligent way to use AI? And if, instead of looking for more (often useless) calculation and means of feeding it, they can focus on improving the performance and precision of the model?

In the end, manufacturers and model companies focus on the wrong problem: they should be IT fighterNo more difficult or do more, says Luccioni.

“There are more intelligent ways to do things that we are currently undergoing, because we are so blinded by: we need more flops, we need more GPU, we need more time,” she said.


The AI scale reached its limits

Electricity ceilings, increase in token costs and inference delays restart the AI company. Join our exclusive fair to discover how best the teams are:

  • Transform energy into a strategic advantage
  • Effective inference architecting for real debit gains
  • Unlock a competitive return on investment with sustainable AI systems

Secure your place to stay in advance::


Here are five key learning in front of the face that can help companies with all sizes to use AI more effectively.

1: right size of the model to the task

Avoid defects for giant models for general use for each use case. Specific or distilled models can correspond, or even Going to larger models in terms of precision for targeted workloads – at a lower cost and with reduced energy consumption.

Luccioni, in fact, noted in the tests that a specific model uses 20 to 30 times less energy than that general. “Because it is a model that can do this task, as opposed to any task that you are launching, which is often the case with models of big languages,” she said.

Distillation is the key here; A complete model could initially be formed from zero and then refined for a specific task. Deepseek R1, for example, is “so huge that most organizations cannot afford to use it” because you need at least 8 GPU, noted Luccioni. On the other hand, the distilled versions can be 10, 20 or even 30 times smaller and operate on a single GPU.

In general, open source models help efficiency, she noted, because they do not need to be trained from zero. It is compared to barely a few years ago, when companies were waste resources because they could not find the model they needed; Nowadays, they can start with a basic model and refine and adapt it.

“It provides an incremental shared innovation, as opposed to the moult foam, each formation of its models on their data sets and essentially waste calculation in the process,” said Luccioni.

It becomes clear that companies are quickly disillusioned with General AI, because costs are not yet proportionate to the advantages. Generic use cases, such as e-mail writing or transcription of meeting notes, are really useful. However, the models specific to tasks still require “a lot of work” because the ready -to -use models do not cut it and are also more expensive, said Luccioni.

This is the next border of added value. “Many companies want a specific task,” noted Luccioni. “They don’t want to act, they want a specific intelligence. And it is the gap that must be filled.”

2. Make the default value efficiency

Adopt the “boost theory” in the design of the system, establish conservative reasoning budgets, limit generative functionalities always on and require an option for high cost calculation methods.

In cognitive sciences, the “node theory” is an approach to management of behavioral change designed to subtly influence human behavior. The “canonical example”, noted Luccioni, adds cutlery to take away: having people to decide if they want plastic utensils, rather than including it automatically with each control, can considerably reduce waste.

“The simple fact of bringing people to opt for something about the release of something is actually a very powerful mechanism to change the behavior of people,” said Luccioni.

The default mechanisms are also not necessary because they increase use and, therefore, costs because the models do more work than they need. For example, with popular search engines such as Google, a Gen AI summary appears automatically at the top by default. Luccioni also noted that when she recently used Openai GPT-5, the model automatically worked in full reasoning mode on “very simple questions”.

“For me, this should be the exception,” she said. “Like:” What is the meaning of life, so of course, I want a summary of the AI generation “. But with “What time does he do in Montreal” or “What are the opening hours of my local pharmacy? I don’t need a generative AI summary, but this is the default.

3. Optimize the use of equipment

Use a lot; Adjust the precision and adjustment lot sizes for a generation of specific equipment in order to minimize the wasted memory and the power print.

For example, companies should ask: should the model be all the time? Will people make him scathing in real time, 100 requests at the same time? In this case, an optimization always on optimization is necessary, noted Luccioni. However, in many others, this is not the case; The model can be executed periodically to optimize the use of memory, and the lot can ensure optimal use of memory.

“It’s a bit like an engineering challenge, but very specific, it is therefore difficult to say:” distill all models “or” modify precision on all models, “said Luccioni.

In one of her recent studies, she found that the size of the lot depends on the equipment, even up to the type or specific version. Speaking from a plot in plot to more can increase energy consumption because models need more memory bars.

“It’s something that people don’t really look at, they say to themselves:” Oh, I’m going to maximize the size of the lot “, but it really amounts to refining all these different things, and suddenly it’s super effective, but it only works in your specific context,” said Luccioni.

4. Encourage energy transparency

It always helps when people are encouraged; To this end, Hugging facing earlier this year launched Ai Energy Score. It is a new way of promoting more energy efficiency, using a scheme 1 to 5 -star rating system, the most effective models earning a “five -star” status.

It could be considered “the energy of the AI” and was inspired by the federal program potentially at the end, which established energy efficiency specifications and eligible brand devices with an Energy Star logo.

“For a few decades, it was really a positive motivation, people wanted this star note, right?” Said Luccioni. “Something similar with Energy Score would be great.”

Hugging Face now has a classification, which he plans to update with new models (Deepseek, GPT-AS) in September, and continuously made it every 6 months or earlier as new models become available. The objective is that the manufacturers of models will consider the notation as an “badge of honor”, said Luccioni.

5. Rethinking the mental “more calculation is better”

Instead of continuing the biggest GPU clusters, start with the question: “What is the most intelligent way to achieve the result?” For numerous workloads, smarter architectures and better organized data surpass the gross-source scaling.

“I think people probably don’t need as much GPU as they think,” said Luccioni. Instead of simply opting for the biggest clusters, she urged businesses to rethink the tasks that GPUs will finish and why they need them, how they have done these types of tasks before, and what additional GPU adds them will finally get.

“It’s a bit of this race down where we need a larger group,” she said. “He is thinking about what you use AI, what technique do you need, what does it need?”


https://venturebeat.com/wp-content/uploads/2025/08/HF.webp?w=1024?w=1200&strip=all

About The Author

Leave a Reply

Your email address will not be published. Required fields are marked *