October 6, 2025

LFM2-VL de liquid ai gives smartphones with small models of AI vision

0
Collage-of-Expression-and-Vision.png

Do you want smarter information in your reception box? Sign up for our weekly newsletters to obtain only what matters for business managers, data and security managers. Subscribe now


Liquid Ai released LFM2-VLA new generation of visual language foundation models designed for effective deployment through a wide range of equipment – smartphones and laptops to portable devices and integrated systems.

The models promise low latency performance, high precision and flexibility for real world applications.

LFM2-VL is based on the existing LFM2 architecture of the company introduced just over a month ago. The company says that it offers the “fastest foundation models on the market” thanks to its “weight” generation approach or model parameters on the fly for each input (known as a varying linear system (LIV)), the extension in a multimodal processing which supports text and image inputs to variable resolutions.

According to Liquid IA, the models offer up to twice the GPU inference speed of comparable models in visual language, while retaining competitive performance on common references.


The AI scale reached its limits

Electricity ceilings, increase in token costs and inference delays restart the AI company. Join our exclusive fair to discover how best the teams are:

  • Transform energy into a strategic advantage
  • Effective inference architecting for real debit gains
  • Unlock a competitive return on investment with sustainable AI systems

Secure your place to stay in advance::


“Efficiency is our product”, Ramin Hasani, co-founder and CEO of Liquid Ai, Ramin Hasani in a post on X announcing the new family of models:

Two variants for different needs

The version includes two model sizes:

  • LFM2-VL-450M -A hyper-efficiency model with less than half a billion parameters (internal parameters) aimed at resource-related environments.
  • LFM2-VL-1.6b – A more competent model that remains light enough for the monom-gpu deployment and based on devices.

The two variants treat images to native resolutions up to 512×512 pixels, avoiding distortion or unnecessary scaling.

For larger images, the system applies non -riding fixes and adds a sticker for the global context, allowing the model to capture both fine details and the wider scene.

Context on AI Liquid

The liquid AI was founded by former researchers from the IT and artificial MIT intelligence laboratory (CSAIL) in order to build AI architectures which exceed the widely used transformer model.

The company’s flagship innovation, liquid foundation models (LFMS), is based on the principles of dynamic systems, signal processing and digital linear algebra, producing models of AI for general use capable of managing text, video, audio, chronological series and other sequential data.

Unlike traditional architectures, Liquid’s approach aims to offer competitive or superior performance using much less calculation resources, allowing real -time adaptability during inference while maintaining low memory requirements. This makes the LFMS well suited for both large -scale business use and deployments of resource limited edges.

In July, the company widened its platform strategy with the launch of the Liquid Edge AI platform (LEAP), a multiplateform SDK designed to facilitate developers to execute models of small languages directly on mobile and integrated devices.

LEAP offers os -an -prosthetic support for iOS and Android, integration with liquid models and other Open Source SLM, and an integrated library with models as small as 300 MB – Small enough for modern phones with a minimal RAM.

Its companion application, Apollo, allows developers to test fully offline models, aligning with the accent put by liquid AI on the AI of low latency confidentiality. Together, Leap and Apollo reflect the company’s commitment to decentralize the execution of the AI, to reduce the dependence on cloud infrastructure and to allow developers to create optimized and specific models for tasks for real environments.

Speed / quality compromise and technical design

LFM2-VL uses a modular architecture combining a tongue model skeleton, a siglip2 naflex vision encoder and a multimodal projector.

The projector includes a two -layer MLP connector with non -sober pixels, reducing the number of image tokens and improving the flow rate.

Users can adjust parameters such as the maximum number of tokens or image fixes, allowing them to balance speed and quality according to the deployment scenario. The training process involved around 100 billion multimodal tokens from open data games and internal synthetic data.

Performance and benchmarks

The models obtain competitive reference results in a range of visual language assessments. LFM2-VL-1.6B Scores well in Realworldqa (65.23), Infovqa (58.68) and Ocrbench (742), and maintains solid results in multimodal reasoning tasks.

The inference tests, LFM2-VL carried out the fastest GPU processing times in its class when tested on a standard workload of a 1024×1024 image and a short prompt.

License and availability

The LFM2-VL models are now available on hugs, as well as an example of a fine adjustment code in colab. They are compatible with embraced face transformers and TRL.

The models are published under a “LFM1.0” license “. Liquid IA described this license as based on the principles of Apache 2.0, but the full text has not yet been published.

The company said that commercial use will be authorized under certain conditions, with different conditions for companies higher and less than $ 10 million in annual income.

With LFM2-VL, Liquid AI aims to make High Performance Multimodal AI more accessible for limited deployments on devices and resources, without sacrificing capacity.


https://venturebeat.com/wp-content/uploads/2025/08/Collage-of-Expression-and-Vision.png?w=1024?w=1200&strip=all

About The Author

Leave a Reply

Your email address will not be published. Required fields are marked *