OpenMind / OM1
- воскресенье, 21 сентября 2025 г. в 00:00:03
Modular AI runtime for robots
Technical Paper | Documentation | X | Discord
OpenMind's OM1 is a modular AI runtime that empowers developers to create and deploy multimodal AI agents across digital environments and physical robots, including Humanoids, Phone Apps, websites, Quadrupeds, and educational robots such as TurtleBot 4. OM1 agents can process diverse inputs like web data, social media, camera feeds, and LIDAR, while enabling physical actions including motion, autonomous navigation, and natural conversations. The goal of OM1 is to make it easy to create highly capable human-focused robots, that are easy to upgrade and (re)configure to accommodate different physical form factors.
ROS2
, Zenoh
, and CycloneDDS
. (We recommend Zenoh
for all new development).gpt-4o
, DeepSeek, and multiple Visual Language Models (VLMs) with pre-configured endpoints for each service.To get started with OM1, let's run the Spot agent. Spot uses your webcam to capture and label objects. These text captions are then sent to OpenAI 4o
, which returns movement
, speech
and face
action commands. These commands are displayed on WebSim along with basic timing and other debugging information.
You will need the uv
package manager.
git clone https://github.com/openmind/OM1.git
cd OM1
git submodule update --init
uv venv
For MacOS
brew install portaudio ffmpeg
For Linux
sudo apt-get update
sudo apt-get install portaudio19-dev python-dev ffmpeg
Obtain your API Key at OpenMind Portal. Copy it to config/spot.json5
, replacing the openmind_free
placeholder. Or, cp env.example .env
and add your key to the .env
.
Run
uv run src/run.py spot
After launching OM1, the Spot agent will interact with you and perform (simulated) actions. For more help connecting OM1 to your robot hardware, see getting started.
inputs
and actions
.json5
config files with custom combinations of inputs and actions./config/
) to create new behaviors.OM1 assumes that robot hardware provides a high-level SDK that accepts elemental movement and action commands such as backflip
, run
, gently pick up the red apple
, move(0.37, 0, 0)
, and smile
. An example is provided in actions/move_safe/connector/ros2.py
:
...
elif output_interface.action == "shake paw":
if self.sport_client:
self.sport_client.Hello()
...
If your robot hardware does not yet provide a suitable HAL (hardware abstraction layer), traditional robotics approaches such as RL (reinforcement learning) in concert with suitable simulation environments (Unity, Gazebo), sensors (such as hand mounted ZED depth cameras), and custom VLAs will be needed for you to create one. It is further assumed that your HAL accepts motion trajectories, provides battery and thermal management/monitoring, and calibrates and tunes sensors such as IMUs, LIDARs, and magnetometers.
OM1 can interface with your HAL via USB, serial, ROS2, CycloneDDS, Zenoh, or websockets. For an example of an advanced humanoid HAL, please see Unitree's C++ SDK. Frequently, a HAL, especially ROS2 code, will be dockerized and can then interface with OM1 through DDS middleware or websockets.
OM1 is developed on:
OM1 should run on other platforms (such as Windows) and microcontrollers such as the Raspberry Pi 5 16GB.
We're excited to introduce full autonomy mode, where three services work together in a loop without manual intervention:
From research to real-world autonomy, a platform that learns, moves, and builds with you. We'll shortly be releasing the BOM and details on DIY for the it. Stay tuned!
Clone the following repos -
To start all services, run the following commands:
cd OM1
docker-compose up om1 -d --no-build
cd unitree_go2_ros2_sdk
docker-compose up orchestrator -d --no-build
docker-compose up om1_sensor -d --no-build
docker-compose up watchdog -d --no-build
cd OM1-avatar
docker-compose up om1_avatar -d --no-build
More detailed documentation can be accessed at docs.openmind.org.
Please make sure to read the Contributing Guide before making a pull request.
This project is licensed under the terms of the MIT License, which is a permissive free software license that allows users to freely use, modify, and distribute the software. The MIT License is a widely used and well-established license that is known for its simplicity and flexibility. By using the MIT License, this project aims to encourage collaboration, modification, and distribution of the software.