OpenGVLab / InternGPT
- ΠΏΡΡΠ½ΠΈΡΠ°, 26 ΠΌΠ°Ρ 2023β―Π³. Π² 00:00:11
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com
The project is still under construction, we will continue to update it and welcome contributions/pull requests from the community.
InternGPT(short for iGPT) / InternChat(short for iChat) is pointing-language-driven visual interactive system, allowing you to interact with ChatGPT by clicking, dragging and drawing using a pointing device. The name InternGPT stands for interaction, nonverbal, and ChatGPT. Different from existing interactive systems that rely on pure language, by incorporating pointing instructions, iGPT significantly improves the efficiency of communication between users and chatbots, as well as the accuracy of chatbots in vision-centric tasks, especially in complicated visual scenarios. Additionally, in iGPT, an auxiliary control mechanism is used to improve the control capability of LLM, and a large vision-language model termed Husky is fine-tuned for high-quality multi-modal dialogue (impressing ChatGPT-3.5-turbo with 93.89% GPT-4 Quality).
(2023.05.24)
(2023.05.18) We have supported ImageBind. Please see the video demo for the usage.
(2023.05.15) The model_zoo including HuskyVQA has been released! Try it on your local machine!
(2023.05.15) Our code is also publicly available on Hugging Face! You can duplicate the repository and run it on your own GPUs.
InternGPT is online (see https://igpt.opengvlab.com). Let's try it!
[NOTE] It is possible that you are waiting in a lengthy queue. You can clone our repo and run it with your private GPU.
Update:
(2023.05.24) We now support DragGAN. You can try it as follows:
New Image
;Drag It
;(2023.05.18) We now support ImageBind. If you want to generate a new image conditioned on audio, you can upload an audio file in advance:
"generate a real image from this audio"
;"generate a real image from this audio and {your prompt}"
;"generate a new image from above image and audio"
.Main features:
After uploading the image, you can have a multi-modal dialogue by sending messages like: "what is it in the image?"
or "what is the background color of image?"
.
You also can interactively operate, edit or generate the image as follows:
Pick
to visualize the segmented region or press the button OCR
to recognize the words at chosen position;"remove the masked region"
;"replace the masked region with {your prompt}"
;"generate a new image based on its segmentation describing {your prompt}"
Whiteboard
and draw in the board. After drawing, you need to press the button Save
and send the message like: "generate a new image based on this scribble describing {your prompt}"
.See INSTALL.md
Running the following shell can start a gradio service:
python -u app.py --load "HuskyVQA_cuda:0,SegmentAnything_cuda:0,ImageOCRRecognition_cuda:0" --port 3456
if you want to enable the voice assistant, please use openssl
to generate the certificate:
mkdir certificate
openssl req -x509 -newkey rsa:4096 -keyout certificate/key.pem -out certificate/cert.pem -sha256 -days 365 -nodes
and then run:
python -u app.py --load "HuskyVQA_cuda:0,SegmentAnything_cuda:0,ImageOCRRecognition_cuda:0" --port 3456 --https
This project is released under the Apache 2.0 license.
If you find this project useful in your research, please consider cite:
@article{2023interngpt,
title={InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language},
author={Liu, Zhaoyang and He, Yinan and Wang, Wenhai and Wang, Weiyun and Wang, Yi and Chen, Shoufa and Zhang, Qinglong and Yang, Yang and Li, Qingyun and Yu, Jiashuo and others},
journal={arXiv preprint arXiv:2305.05662},
year={2023}
}
Thanks to the open source of the following projects:
Hugging Face β LangChain β TaskMatrix β SAM β Stable Diffusion β ControlNet β InstructPix2Pix β BLIP β Latent Diffusion Models β EasyOCRβ ImageBind β DragGAN β
Welcome to discuss with us and continuously improve the user experience of InternGPT.
WeChat QR Code