elfvingralf / macOSpilot-ai-assistant
- пятница, 15 декабря 2023 г. в 00:00:10
Voice + Vision powered AI assistant that answers questions about any application, in context and in audio.
macOSpilot answers your questions about anything, in any application. No need to reach for another window. Simply use a keyboard shortcut to trigger the assistant, speak your question, and it will give the answer in context and in audio within seconds. Behind the scenes macOSpilot takes a screenshot of your active window when triggered, and sends it to OpenAI GPT Vision along with a transcript of your question. It's answer will be displayed in text, and converted into audio using OpenAI TTS (text to speech).
index.js
. Then chose to run yarn start
from the terminal, or package it with Electron with the instructions below, add your OpenAI API key and let the application run in the background.The most recent screenshot, audio recording, and TTS response will be stored on your machine in part for debugging purposes. The same filename is used every time so they will be overwritten, but are not automatically deleted when you close or delete the application.
Prefer a video? Head on over to YouTube to watch the walk through of how to get started, how the application works, and a brief explanation of how it works under the hood.
Make sure you have NodeJS installed on your machine. Then clone the repo and follow the steps below.
git clone https://github.com/elfvingralf/macOSpilot-ai-assistant.git
Navigate to the folder and run yarn install
or npm install
in your folder. This should install all dependencies.
Run yarn start
or npm start
. Because the application needs access to read your screen, microphone, read/write files etc, you will need to go through the steps of granting it access and possibly restarting your terminal.
Make sure to add your OpenAI API key by clicking the settings icon in the top right-hand corner of the main window. (it's not stored encrypted!)
If you want to change the default values here's a few things that might be worth changing, all in index.js
:
Keyboard shortcut: The default keyboard shortcut keyboardShortcut
is set to "CommandOrControl+Shift+'" (because it seemed like it was rarely used by other applications)
OpenAI Vision prompt: The OpenAI Vision API system prompt in conversationHistory
, currently just set to "You are helping users with questions about their macOS applications based on screenshots, always answer in at most one sentence."
VisionAPI image size: Image resize params to save some money, I left an example of how in callVisionAPI()
(I found that I had much poorer results when using it)
Application window sizes and settings: The size of the main window: mainWindowWidth
and mainWindowHeight
. The size of the notification window, which always remains on top: notificationWidth
and notificationHeight
.
More notification window settings: The level of opacity of the notification window: notificationOpacity
. Where the notification window moves to on activation, relative to the active window: inside positionNotificationAtTopRight()
(terrible naming, I know)
Want to create an .app executable instead of running this from your terminal?
First go to index.js
and change const useElectronPackager
from false
to true
.
Run one of these in your terminal, depending on which platform you're on.
npm run package-mac
npm run package-win
npm run package-linux
Note I have only tested this on Mac (Apple silicon and Intel).
Go to /release-builds/
in your project folder, and chose the folder of your platform. In there is an executable, .app
if you're on Mac. Double-click it to open the app, note that it may take a few seconds the first time so be patient.
Once the app is opened, trigger your keyboard shortcut. You'll be asked to grant Privacy & Security permissions. You may need to repeat this another one or two times for all permissions to work properly, and to restart the app.
Some improvements I'd like to make, in no particular order:
I'm a self-taught and really like scrapping together fun projects. I write functional code that probably isn't beautiful nor efficient, and share it with the hope that someone else might find it useful.
You can find me as @ralfelfving on Twitter/X. If you liked this project, consider checking my tutorials on my YouTube channel @ralfelfving.