OpenCu OpenCua Open Source Rival agents compete with OPENAI and Anthropic Owner and Anthropic models -

Do you want smarter information in your reception box? Sign up for our weekly newsletters to obtain only what matters for business managers, data and security managers. Subscribe now

A new executive of researchers from the University of Hong Kong (HKU) and collaboration establishments provides an open source base to create robust AI agents who can operate computers. The frame, called OpenCua, includes tools, data and revenues to scale the development of computer use agents (CUAS).

The models formed using this frame work strongly on the CUA references, surpassing the open source models and competing closely with closed agents of the main AI laboratories like Openai and Anthropic.

The challenge of building computer users

Computer use agents are designed to do tasks on a computer independently, from navigation on websites to complex software in operation. They can also help automate workflows in the company. However, the most capable CAA systems are owners, with critical details on their training data, architectures and private development processes.

“While the lack of transparency limits technical progress and raises security problems, the research community needs Cua executives that are really open to study their capacities, its limits and their risks,” the researchers say in their article.

The AI scale reached its limits

Electricity ceilings, increase in token costs and inference delays restart the AI company. Join our exclusive fair to discover how best the teams are:

Transform energy into a strategic advantage

Effective inference architecting for real debit gains

Unlock a competitive return on investment with sustainable AI systems

Secure your place to stay in advance::

At the same time, open source efforts are faced with their own obstacles. There was no scalable infrastructure to collect various and large -scale data necessary to train these agents. Existing open source data sets for graphic user interfaces (Guis) have limited data, and many research projects provide insufficient details on their methods, which makes it difficult for others to reproduce their work.

According to the article, “these limits collectively hamper the progress of the CUAs for general use and restrict a significant exploration of their scalability, their generalization and their potential learning approach.”

Presentation of OpenCua

OpenCua Work on the ground: Xlang Lab and Hu

OpenCua is an open source framework designed to meet these challenges by widening both data collection and the models themselves. At its heart, the Agentnet tool to record human demonstrations of computer tasks on different operating systems.

The tool rationalizes data collection by running in the background on the personal computer of a annotator, by capturing screen videos, mouse entries and keyboard, and the underlying accessibility tree, which provides structured information on the elements on the screen. These raw data is then transformed into “action trajectories”, combining a computer screenshot (the state) with the corresponding action of the user (a click, a key key, etc.). Annotators can then examine, modify and submit these demonstrations.

Age

Using this tool, researchers have collected the agentnet data set, which contains more than 22,600 task demonstrations on Windows, MacOS and Ubuntu, covering more than 200 applications and websites. “This set of data authentically captures the complexity of human behavior and the environmental dynamics of personal IT environments of users,” notes the article.

Recognizing that screen recording tools raise confidentiality problems important for businesses, researchers have designed the Agentnet tool thinking about security. Xinyuan Wang, co-author of the newspaper and doctoral student in HKU, explained that they had set up a framework for the protection of multilayer privacy. “First, the annotators themselves can fully observe the data they generate … before deciding to submit it,” he told Venturebeat. The data then undergoes manual verification for confidentiality problems and automated scanning by a large model to detect any sensitive content remaining before release. “This layer process guarantees business quality robustness for environments that manage sensitive customers or financial data,” added Wang.

To accelerate the evaluation, the team also organized Agentnetbench, an offline reference which provides several correct actions for each step, offering a more efficient means of measuring an agent’s performance.

A new recipe for training agents

The OpenCua framework presents a new pipeline for data processing and the training of computer user agents. The first step converts the raw human demonstrations into clean state action pairs adapted to the formation of vision models (VLM). However, researchers have found that the simple training of models on these pairs gives limited performance gains, even with large amounts of data.

OpenCua Pipeline in configuration chain Source: Xlang Lab in Hku

The key idea was to increase these trajectories with a chain reasoning (COT). This process generates a detailed “interior monologue” for each action, which includes planning, memory and reflection. This structured reasoning is organized in three levels: a high level observation of the screen, reflective thoughts which analyze the situation and plan the following steps, and finally, the concise and executable action. This approach helps the agent to develop a more in -depth understanding of tasks.

“We find a reasoning in crucial natural language for foundation models for generalized IT uses, by helping the CUAs to internalize cognitive capacities,” write researchers.

This data summary pipeline is a general framework that can be adapted by companies to train agents on their own unique internal tools. According to Wang, a company can record demonstrations of its owner workflows and use the same “reflector” and “generator” pipeline to create the necessary training data. “This allows them to disentangle a very efficient agent adapted to their internal tools without the need to make traces of hand-handed reasoning,” he said.

Testify

The researchers applied the OpenCua framework to form a range of Open Source VLM, including variants of Qwen and Kimi-VL, with parameter sizes from 3 billion to 32 billion. The models were evaluated on a series of online and offline benchmarks that test their ability to do tasks and understand the guis.

The 32 billion parameter model, OpenCua-32B, has established a new cutting-edge success rate among the open source models on the Verified Osworld reference. It has also exceeded the OPENAI GPT-4O CUA and considerably reduced the performance gap with the main models of anthropic.

OpenCua shows a massive improvement compared to the basic models (on the left) while competing with the main CUA models (right) Source: Xlang Lab in HKU

For business developers and product managers, research offers several key results. The OpenCua method is widely applicable, improving performance on models with different architectures (dense and mixtures of experts) and sizes. Qualified agents also have a strong generalization, behaving well in a diversified range of tasks and operating systems.

According to Wang, the framework is particularly suitable for the automation of repetitive corporate workflows and with high intensity of labor. “For example, in the agentnet data set, we are already capturing some demonstrations to launch EC2 instances on Amazon AWS and configuration of the annotation parameters on Mturk,” he told Venturebeat. “These tasks involve many sequential steps but follow reproducible models.”

However, Wang noted that filling the deployment of the gap to live requires the main challenges concerning security and reliability. “The greatest challenge of real deployment is security and reliability: the agent must avoid errors that could inadvertently modify the parameters of the system or trigger harmful side effects beyond the planned task,” he said.

The researchers have published the code, the data set and the weights for their models.

While open source agents built on executives as opencua become more capable, they could fundamentally evolve the relationship between knowledge workers and their computers. Wang is considering a future where mastery of complex software becomes less important than the ability to clearly articulate the objectives to an AI agent.

He described two main working methods: “Offline automation, where the agent operates his wider software knowledge to continue a task from start to finish” and “online collaboration, where the agent responds in real time and works side by side with humans, a bit like a colleague”. Basically, humans will provide the strategic “what” is increasingly sophisticated AI agents manage the operational “how”.

Daily information on business use cases with VB daily

If you want to impress your boss, VB Daily has covered you. We give you the interior scoop on what companies do with a generative AI, from regulatory changes to practical deployments, so that you can share information for a maximum return on investment.

Read our privacy policy

Thank you for subscribing. Discover more VB newsletters here.

An error occurred.

https://venturebeat.com/wp-content/uploads/2025/08/computer-use-agent.jpg?w=1024?w=1200&strip=all
About The Author

lawi23000

See author's posts

Post Navigation

Previous 3 ways how the ban on real money game applications like Dream11 could change the Cricket Fandom in India
Next Canadian fighting for Ukraine who recovered after being injured