jina-ai / jina
- понедельник, 14 июня 2021 г. в 00:36:55
An easier way to build neural search on the cloud
Cloud-Native Neural Search[?] Framework for Any Kind of Data
Jina allows you to build deep learning-powered search-as-a-service in just minutes.
pip install --pre && jina hello fashion
pip install --pre "jina[chatbot]" && jina hello chatbot
pip install --pre "jina[multimodal]" && jina hello multimodal
jina hello fork fashion ../my-proj/
2.0 is in pre-release, add --pre
to install it. Why 2.0?
$ pip install --pre jina
$ jina -v
2.0.0rcN
$ docker run jinaai/jina:master -v
2.0.0rcN
x86/64,arm64,v6,v7,Apple M1 |
On Linux/macOS & Python 3.7/3.8/3.9 | Docker Users |
---|---|---|
Standard | pip install --pre jina |
docker run jinaai/jina:master |
Daemon | pip install --pre "jina[daemon]" |
docker run --network=host jinaai/jina:master-daemon |
With Extras | pip install --pre "jina[devel]" |
docker run jinaai/jina:master-devel |
Version identifiers are explained here. Jina can run on Windows Subsystem for Linux. We welcome the community to help us with native Windows support.
Document, Executor, and Flow are the three fundamental concepts in Jina.
Copy-paste the minimum example below and run it:
import numpy as np
from jina import Document, DocumentArray, Executor, Flow, requests
class CharEmbed(Executor): # a simple character embedding with mean-pooling
offset = 32 # letter `a`
dim = 127 - offset + 1 # last pos reserved for `UNK`
char_embd = np.eye(dim) * 1 # one-hot embedding for all chars
@requests
def foo(self, docs: DocumentArray, **kwargs):
for d in docs:
r_emb = [ord(c) - self.offset if self.offset <= ord(c) <= 127 else (self.dim - 1) for c in d.text]
d.embedding = self.char_embd[r_emb, :].mean(axis=0) # average pooling
class Indexer(Executor):
_docs = DocumentArray() # for storing all documents in memory
@requests(on='/index')
def foo(self, docs: DocumentArray, **kwargs):
self._docs.extend(docs) # extend stored `docs`
@requests(on='/search')
def bar(self, docs: DocumentArray, **kwargs):
q = np.stack(docs.get_attributes('embedding')) # get all embeddings from query docs
d = np.stack(self._docs.get_attributes('embedding')) # get all embeddings from stored docs
euclidean_dist = np.linalg.norm(q[:, None, :] - d[None, :, :], axis=-1) # pairwise euclidean distance
for dist, query in zip(euclidean_dist, docs): # add & sort match
query.matches = [Document(self._docs[int(idx)], copy=True, score=d) for idx, d in enumerate(dist)]
query.matches.sort(key=lambda m: m.score.value) # sort matches by their values
f = Flow(port_expose=12345).add(uses=CharEmbed, parallel=2).add(uses=Indexer) # build a Flow, with 2 parallel CharEmbed, tho unnecessary
with f:
f.post('/index', (Document(text=t.strip()) for t in open(__file__) if t.strip())) # index all lines of this file
f.block() # block for listening request
Keep the above running and start a simple client:
from jina import Client, Document
def print_matches(req): # the callback function invoked when task is done
for idx, d in enumerate(req.docs[0].matches[:3]): # print top-3 matches
print(f'[{idx}]{d.score.value:2f}: "{d.text}"')
c = Client(host='localhost', port_expose=12345) # connect to localhost:12345
c.post('/search', Document(text='request(on=something)'), on_done=print_matches)
It finds the lines most similar to "request(on=something)
" from the server code snippet and prints the following:
Client@1608[S]:connected to the gateway at localhost:12345!
[0]0.168526: "@requests(on='/index')"
[1]0.181676: "@requests(on='/search')"
[2]0.192049: "query.matches = [Document(self._docs[int(idx)], copy=True, score=d) for idx, d in enumerate(dist)]"
Document
& DocumentArray
: the basic data type in Jina.
Executor
: how Jina processes Documents.
Flow
: how Jina streamlines and distributes Executors.
Jina is backed by Jina AI. We are actively hiring full-stack developers, solution engineers to build the next neural search ecosystem in open source.
We welcome all kinds of contributions from the open-source community, individuals and partners. We owe our success to your active involvement.