Cloud-Native Neural Search? Framework for Any Kind of Data
Jina๐
is a neural search framework that empowers anyone to build SOTA and scalable deep learning search applications in minutes.
๐ All data types - Scalable indexing, querying, understanding of any data: video, image, long/short text, music, source code, PDF, etc.
โฑ๏ธ Save time - The design pattern of neural search systems, from zero to a production-ready system in minutes.
๐ฉ๏ธ Fast & cloud-native - Distributed architecture from day one, scalable & cloud-native by design: enjoy containerization, streaming, sharding, replication, async scheduling, HTTP/gRPC/WebSocket protocols.
๐ฑ Own your stack - Keep end-to-end stack ownership of your solution, avoid integration pitfalls you get with fragmented, multi-vendor, generic legacy tools.
- via PyPI:
pip install jina
- via Conda:
conda install jina -c conda-forge
- via Docker:
docker run jinaai/jina:latest
- More install options
- ๐ Fashion image search:
jina hello fashion
- ๐ค QA chatbot:
pip install "jina[demo]" && jina hello chatbot
- ๐ฐ Multimodal search:
pip install "jina[demo]" && jina hello multimodal
- ๐ด Fork the source of a demo to your folder:
jina hello fork fashion ../my-proj/
Document, Executor, and Flow are three fundamental concepts in Jina.
- ๐ Document is the basic data type in Jina;
- โ๏ธ Executor is how Jina processes Documents;
- ๐ Flow is how Jina streamlines and distributes Executors.
Leveraging these three components, let's build an app that find lines from a code snippet that are most similar to the query.
๐ก Preliminaries: character embedding, pooling, Euclidean distance ๐ Read our docs for details
1๏ธโฃ Copy-paste the minimum example below and run it:
import numpy as np
from jina import Document, DocumentArray, Executor, Flow, requests
class CharEmbed(Executor): # a simple character embedding with mean-pooling
offset = 32 # letter `a`
dim = 127 - offset + 1 # last pos reserved for `UNK`
char_embd = np.eye(dim) * 1 # one-hot embedding for all chars
@requests
def foo(self, docs: DocumentArray, **kwargs):
for d in docs:
r_emb = [ord(c) - self.offset if self.offset <= ord(c) <= 127 else (self.dim - 1) for c in d.text]
d.embedding = self.char_embd[r_emb, :].mean(axis=0) # average pooling
class Indexer(Executor):
_docs = DocumentArray() # for storing all documents in memory
@requests(on='/index')
def foo(self, docs: DocumentArray, **kwargs):
self._docs.extend(docs) # extend stored `docs`
@requests(on='/search')
def bar(self, docs: DocumentArray, **kwargs):
docs.match(self._docs, metric='euclidean')
f = Flow(port_expose=12345, protocol='http', cors=True).add(uses=CharEmbed, replicas=2).add(uses=Indexer) # build a Flow, with 2 replica CharEmbed, tho unnecessary
with f:
f.post('/index', (Document(text=t.strip()) for t in open(__file__) if t.strip())) # index all lines of _this_ file
f.block() # block for listening request
2๏ธโฃ Open http://localhost:12345/docs
(an extended Swagger UI) in your browser, click /search tab and input:
{"data": [{"text": "@requests(on=something)"}]}
That means, we want to find lines from the above code snippet that are most similar to @request(on=something)
. Now click Execute button!
3๏ธโฃ Not a GUI fan? Let's do it in Python then! Keep the above server running and start a simple client:
from jina import Client, Document
from jina.types.request import Response
def print_matches(resp: Response): # the callback function invoked when task is done
for idx, d in enumerate(resp.docs[0].matches[:3]): # print top-3 matches
print(f'[{idx}]{d.scores["euclidean"].value:2f}: "{d.text}"')
c = Client(protocol='http', port=12345) # connect to localhost:12345
c.post('/search', Document(text='request(on=something)'), on_done=print_matches)
This prints the following results:
Client@1608[S]:connected to the gateway at localhost:12345!
[0]0.168526: "@requests(on='/index')"
[1]0.181676: "@requests(on='/search')"
[2]0.218218: "from jina import Document, DocumentArray, Executor, Flow, requests"
๐ Doesn't work? Our bad! Please report it here.
- Join our Slack community to chat to our engineers about your use cases, questions, and support queries.
- Join our Engineering All Hands meet-up to discuss your use case and learn Jina's new features.
- When? The second Tuesday of every month
- Where? Zoom (see our public calendar/.ical/Meetup group) and live stream on YouTube
- Subscribe to the latest video tutorials on our YouTube channel
Jina is backed by Jina AI and licensed under Apache-2.0. We are actively hiring AI engineers, solution engineers to build the next neural search ecosystem in open source.
We welcome all kinds of contributions from the open-source community, individuals and partners. We owe our success to your active involvement.