As the Document Store in DocArray, Weaviate can make the Document processing and retrieval in the cloud more quickly.
DocArray & Weaviate big base
DocArray: Data structure for unstructured data
DocArray is an extensible data structure ideal for deep learning tasks. It is primarily used for the transfer of nested and unstructured data, including text, image, audio, video, 3D Mesh, and more.
Compared to other data structures:
✔ Come to ✅, come to some, come to ❌, come to none
With DocArray, deep learning engineers can efficiently process, embed, search, recommend, store, and transfer data with the help of the Pythonic API.
Weaviate: Open source vector search engine
Weaviate is an open source vector search engine that stores both objects and vectors. Weaviate combines vector search with structured filtering to create a robust, fault-tolerant search engine.
Weaviate also provides the Weaviate Cluster Service, an out-of-the-box cloud storage infrastructure.
Jina + Weaviate =?
💥 Jina + Weaviate, what kind of spark can collision?
There are two ways to create cloud storage instances using Weaviate:
-
Start the Weaviate instance locally
-
Create a Weaviate cloud service instance
1. Start the Weaviate instance locally
To use the Weaviate storage service on the back end, you need to start a brand new Instance of Weaviate. You can do this by creating docker-comemess.yml as follows:
-- version: '3.4' services: weaviate: command: --- host-0.0.0.0 --- port - '8080' --- scheme-http image: Semitechnologies/weaviate: 1.11.0 ports: - "8080-8080" restart: on - failure: 0 environment: QUERY_DEFAULTS_LIMIT: 25 AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true' PERSISTENCE_DATA_PATH: '/var/lib/weaviate' DEFAULT_VECTORIZER_MODULE: 'none' ENABLE_MODULES: '' CLUSTER_HOSTNAME: 'node1' ...Copy the code
Once created, you can run Docker Compose to start the instance.
2. Create Weaviate cloud service instance
You can create Weaviate instances for free with the Weaviate Cloud Service.
To register and create a new instance, visit Here.
Watch this video to walk you through creating a Weaviate instance.
Introductory tutorial demo
With this tutorial, you will understand:
-
Create a Weaviate local instance to store the Document
-
Create a simple text search system
1. Start the Weaviate service and create oneDocumentArray
An array of instance
from docarray import DocumentArray
da = DocumentArray(
storage="weaviate", config={"name": "Persisted", "host": "localhost", "port": 8080}
)
Copy the code
2, Index Documents
da.extend(
[
Document(text="Persist Documents with Weaviate."),
Document(text="And enjoy fast nearest neighbor search."),
Document(text="All while using DocArray API."),
]
)
Copy the code
3. Use BERT model to generate vectors
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")
def collate_fn(da):
return tokenizer(da.texts, return_tensors="pt", truncation=True, padding=True)
da.embed(model, collate_fn=collate_fn)
Copy the code
4, select Documents from index Documents
results = da.find(
DocumentArray([Document(text="How to persist Documents")]).embed(
model, collate_fn=collate_fn
),
limit=1,
)
print(results[0].text)
Copy the code
Output:Persist Documents with Weaviate.
The two artifacts create H&M’s map search system
Integrating DocArray and Weaviate makes it much easier to build a system that searches for images.
See GitHub Repo Here
DocArray and vector database Qdrant, and what kind of spark can wipe out? Check us out next time!
Related links:
GitHub Repo
DocArray Documentation
Jina’s Learning Bootcamp
Weaviate’s Documentation