bytedance / UI-TARS-desktop
- вторник, 13 мая 2025 г. в 00:00:03
A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.
Important
[2025-03-18] We released a technical preview version of a new desktop app - Agent TARS, a multimodal AI agent that leverages browser operations by visually interpreting web pages and seamlessly integrating with command lines and file systems.
UI-TARS Desktop is a GUI Agent application based on UI-TARS (Vision-Language Model) that allows you to control your computer using natural language.
📑 Paper
| 🤗 Hugging Face Models
| 🫨 Discord
| 🤖 ModelScope
🖥️ Desktop Application
| 👓 Midscene (use in browser)
|
Instruction | Video |
---|---|
Please help me open the autosave feature of VS Code and delay AutoSave operations for 500 milliseconds in the VS Code setting. | computer-use-triple-speed.mp4 |
Could you help me check the latest open issue of the UI-TARS-Desktop project on GitHub? | browser-use-triple-speed.mp4 |
See Quick Start.
See Deployment.
See CONTRIBUTING.md.
See @ui-tars/sdk
UI-TARS Desktop is licensed under the Apache License 2.0.
If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝
@article{qin2025ui,
title={UI-TARS: Pioneering Automated GUI Interaction with Native Agents},
author={Qin, Yujia and Ye, Yining and Fang, Junjie and Wang, Haoming and Liang, Shihao and Tian, Shizuo and Zhang, Junda and Li, Jiahao and Li, Yunxin and Huang, Shijue and others},
journal={arXiv preprint arXiv:2501.12326},
year={2025}
}