The study warns security risks while "operating system agents" take control of computers and phones -

Do you want smarter information in your reception box? Sign up for our weekly newsletters to obtain only what matters for business managers, data and security managers. Subscribe now

The researchers have published the most complete survey to date so -called “operating system agents” – artificial intelligence systems that can independently control computers, mobile phones and web browsers by interacting directly with their interfaces. The 30 -page academic journal, accepted for publication during the prestigious conference Association for computational linguistics, mapping a rapidly evolving field which attracted billions of investments to large technological companies.

“The dream of creating AI assistants as capable and versatile as Iron Man’s fictitious jarvis has long captured the imagination,” write researchers. “With the evolution of language models (multimodal) ((m) llms), this dream is closer to reality.”

The survey, led by researchers from the Zhejiang University and Oppo AI Center, is presenting themselves to large technological companies to deploy AI agents which can perform complex digital tasks. OPENAI recently launched “the operator”, published Anthropic “use of computers”, Apple introduced improved AI capabilities in “Apple Intelligence” and Google has unveiled “Project Mariner” – All systems designed to automate IT interactions.

Operating system agents work by observing computer screens and system data, then performing actions such as clicks and scales on mobile, desktop and web platforms. Systems must include interfaces, plan tasks in several stages and translate these plans into executable code. (Credit: github)

Technology giants rush to deploy the AI that controls your office

The speed at which university research has turned into ready -made products for consumers is unprecedented, even according to the standards of Silicon Valley. The survey reveals an explosion of research: more than 60 foundation models and 50 agent executives developed specifically for computer control, the publication rates accelerating considerably since 2023.

The AI scale reached its limits

Electricity ceilings, increase in token costs and inference delays restart the AI company. Join our exclusive fair to discover how best the teams are:

Transform energy into a strategic advantage

Effective inference architecting for real debit gains

Unlock a competitive return on investment with sustainable AI systems

Secure your place to stay in advance::

It is not only progressive progress. We are witnessing the emergence of AI systems that can really understand and manipulate the digital world like humans. Current systems work by taking screenshots of computer screens, using advanced computer vision to understand what is displayed, then performing specific actions such as clicking on buttons, fill out forms and navigate between applications.

“Operating system agents can do tasks independently and have the potential to considerably improve the lives of billions of users worldwide,” notes researchers. “Imagine a world where tasks such as online purchases, the reservation of travel and other daily activities could be carried out transparent by these agents.”

The most sophisticated systems can manage workflows in several complex steps that cover different applications – reserve a restaurant reservation, then automatically adding it to your calendar, then defining a reminder to go early for traffic. What has taken the human minutes of click and typing can now occur in seconds, without human intervention.

The development of AI agents requires a complex training pipeline which combines several approaches, from initial pre-training on screen data to strengthening learning which optimizes performance by trials and errors. (Credit: Arxiv.org)

Why security experts sound alarms on corporate systems controlled by AI

For business technology leaders, the promise of productivity gains is delivered with a reality that gives to think: these systems represent an entirely new attack surface that most organizations are not ready to defend.

The researchers devote considerable attention to what they diplomatically call the concerns of “security and confidentiality”, but the implications are more alarming than their academic language suggests. “Officer agents face these risks, in particular given its large applications on personal devices with user data,” they write.

The attack methods they document read as a cybersecurity nightmare. “Rapid indirect web injection” allows malicious actors to incorporate hidden instructions into web pages which can divert the behavior of an AI agent. Even more about “environmental injection attacks” where apparently harmless web content can encourage agents to steal user data or carry out unauthorized actions.

Consider the implications: an AI agent having access to your corporate emails, your financial systems and customer databases could be manipulated by a web page carefully designed to exfiltrate sensitive information. Traditional security models, built around human users who can identify obvious phishing attempts, decompose when “the user” is an AI system that processes information differently.

The investigation reveals a worrying gap in the preparation. Although general security managers exist for AI agents, “studies on defense specific to operating system agents remain limited.” It is not only an academic concern – it is an immediate challenge for any organization that plans to deploy these systems.

Verification of reality: current AI agents always have difficulties with complex digital tasks

Despite the media threshing surrounding these systems, the analysis of the performance survey reveals significant limitations that temper the expectations of immediate adoption.

The success rates vary considerably depending on the different tasks and platforms. Some commercial systems reach success rates over 50% on certain benchmarks – impressive for emerging technology – but fight with others. Researchers categorize the three -types evaluation tasks: the “basic graphical interface” (understanding of interface elements), “recovery of information” (research and extraction of data) and complex “agency tasks” (autonomous operations in several stages).

The model is revealing: the current systems excel in simple and well -defined tasks but weaken when confronted with the type of complex workflow dependent and dependent on the context which define a large part of the work of modern knowledge. They can reliably click on a specific button or fill out a standard form, but fight with tasks that require reasoning or adaptation sustained to unexpected interface changes.

This performance gap explains why early deployments are focused on narrow and high volume tasks rather than automation for general use. Technology is not yet ready to replace human judgment in complex scenarios, but it is more and more capable of managing routine digital routine work.

Officer agents are based on interconnected systems for perception, planning, memory and action. The complexity of the coordination of these components helps to explain why current systems still fight with sophisticated tasks. (Credit: Arxiv.org)

What happens when AI agents learn to personalize for each user

Perhaps the most intriguing – and potentially transformer – identified challenge in the survey implies what researchers call “personalization and self -evolution”. Unlike AI assistants without today’s state who deal with each interaction as independent agents, future operating system agents will have to learn user interactions and adapt to individual preferences over time.

“The development of personalized operating system agents has been a long -standing objective for AI research,” write the authors. “A personal assistant should adapt constantly and offer improved experiences based on preferences for individual users.”

This capacity could fundamentally change the way we interact with technology. Imagine an AI agent who learns your writing style by email, understands your calendar preferences, knows which restaurants you prefer and can make more and more sophisticated decisions on your behalf. Potential productivity gains are enormous, but the implications of confidentiality too.

The technical challenges are substantial. The survey indicates the need for better multimodal memory systems that can manage not only text but images and voice, with “important challenges” for current technology. How do you build a system that remembers your preferences without creating a complete monitoring of monitoring your digital life?

For technology leaders evaluating these systems, this personalization challenge represents both the greatest opportunity and the greatest risk. The organizations that resolve it first will gain significant competitive advantages, but the implications for confidentiality and security could be serious if they are poorly managed.

The race for the construction of AI assistants which can really work as human users intensifies quickly. While the fundamental challenges concerning security, reliability and personalization remain unresolved, the trajectory is clear. Researchers maintain follow -up developments in the open source benchmark, recognizing that “operating system agents are still in their early stages of development” with “rapid progress which continues to introduce new methodologies and applications”.

The question is not to know if the agents of the AI will transform the way in which we interact with the computers – it is if we will be ready for the consequences when they do. The window to obtain the security and essence executives shrinks as quickly as technology is progressing.

Daily information on business use cases with VB daily

If you want to impress your boss, VB Daily has covered you. We give you the interior scoop on what companies do with a generative AI, from regulatory changes to practical deployments, so that you can share information for a maximum return on investment.

Read our privacy policy

Thank you for subscribing. Discover more VB newsletters here.

An error occurred.

https://venturebeat.com/wp-content/uploads/2025/08/nuneybits_Vector_art_of_human_and_AI_sharing_keyboard_9a3adda9-66ab-4482-8716-8d7eda0c5b72.webp?w=892?w=1200&strip=all
About The Author

lawi23000

See author's posts

Post Navigation

Previous Ai talent has a salary bonus of 30%: “ If you try to catch up later, it will cost you even more ”
Next “ Kpop Demon Hunters ” at the head of the Billboard 100 with “Golden” from Huntr / X