Without worrying about complicated implementation details, SOTA representation vectors can be created for text and images with a simple call to the API.

From BERT to BERT – as – service

In September 2018, a Google paper related to BERT model set the Internet on fire: The natural language model broke 11 NLP test records in SQuAD1.1, surpassing humans in both metrics.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

This not only opens up a new era of NLP, but also marks the beginning of people’s vision of transfer learning and pre-training + fine-tuning.

Bert-as-service came out in October 2018, just a month after BERT was launched. Users can use a line of code, through the C/S architecture, to connect to the server, quickly obtain the sentence vector.

Architecture of Bert-as-Service when it was released in 2018

As the first microservice framework based on BERT, Bert-As-Service has won wide attention of NLP and machine learning technology community through the high encapsulation and deep optimization of BERT and the convenient and easy-to-use network microservice API.

Its concise API interaction, document-writing style, and even README layout have become the template for many open source projects since.

Bert-as-service can easily obtain a sentence vector with a few lines of code

If BERT is a milestone of transfer learning, then the emergence of Bert-as-Service can be called a milestone of transfer learning in engineering servitization.

Many of the BERT model contributors on GitHub also actively contribute to the bert-as-Service code. Bert-as-service also inspired the first edition of Pytorch- Transformers, which universally acclaimed Face launched in November 2018.

Despite the gradual suspension of updates to bert-as-service after February 2019, the project has accumulated 10,000 stars, more than 2,000 forks and mountains of issues on GitHub over the past three years, Both show great interest and enthusiasm for Bert-as-service in the community.

Many of these developers Fork bert-as-service and develop their own microservices combined with their own businesses.

Bert-as-service upgrades hit the big time

Three years later, Bert-AS-Service was updated again and upgraded to the new Clip-AS-Service, which not only retains the original features of high concurrency, microservice and ease of use, but also can generate text and image representation vectors simultaneously.

Behind clip-as-Service is the Contrastive Language-Image Pre-training (CLIP) model, released by OpenAI in January 2021, which categorizes images based on text, It breaks the “clear-cut” boundary between natural language processing and computer vision, and realizes multi-mode AI system.

Clip-as-service has the following features:

  • Out of the box: With no extra learning, vector output of images and text can be generated in real time by simply calling the client or server API.

  • Fast: Tailored for large data sets and long time consuming tasks, with support for both ONNX and PyTorch model engines to provide fast inference services.

  • High scalability: Supports parallel expansion of multiple CLIP models on multi-core and single-core Gpus and automatic load balancing. The server can use gRPC, Websocket, or HTTP to provide external services.

  • Neural search family barrel: Developers can quickly integrate Clip-as-Service with Jina and DocArray to build cross-modal and multi-modal search industry solutions ina short period of time.

Clip-as-service Operation guide

Install the CLIP – as – service

Like the C/S architecture of Bert-as-Service, clip-as-Service is divided into two installation packages: server and client.

Through PIP, developers can optionally install the CLIP client or server on different machines.

Note: Be sure to use Python 3.7+

1. Install the CLIP server (usually a GPU server)

pip install clip-server
Copy the code

2. Install the CLIP client (e.g. on a local laptop)

pip install clip-client
Copy the code

Starting the CLIP Server

Starting the server means downloading the pre-training model, starting the microservice framework, opening the interface to the outside world, and so on. All of this can be done with a single command.

Start the server:

python -m clip_server
Copy the code

After the server is started, the following output is displayed:


 🔗         Protocol                  GRPC   
 🏠     Local access         0.0.0.0:51000   
 🔒  Private network    192.168.3.62:51000   
 🌐   Public address  87.191.159.105:51000
Copy the code

This means that the server is ready and provides an interface in gRPC mode.

Connect from the client

When the server is ready, you can connect to it and send requests through the GRPC client. You can use different IP addresses based on the location of the client and server.

For more details, see the clip-as-Service documentation.

Run the Python script to verify the connection between the client and server:

From clip_client import Client c = Client(' GRPC ://0.0.0.0:51000') c.profile()Copy the code

If the connection is normal, you should see the time tree shown below:

Roundtrip 16 ms 100% ├ ─ ─ the Client/server network 12 ms 75% └ ─ ─ server 4 ms 25% ├ ─ ─ Gateway - CLIP network 0 0% ms └ ─ ─ CLIP model  4ms 100%Copy the code

Build a cross-modal search system: text to image

In this example, we will use clip-as-service to set up a simple text-to-image search case where the user enters text and outputs matching images.

This example will use a Totally look-like data set and Jina AI’s DocArray for the data download.

Note: DocArray is included with Clip-Client as an upstream dependency and does not need to be installed separately.

1. Load the image

from docarray import DocumentArray

da = DocumentArray.pull('ttl-original', show_progress=True, local_cache=True)
Copy the code

The Totally- Silent-like dataset contains 12,032 images and may take a while to download.

2. After loading, use DocArray built-in function da.plot_image_sprites() to visualize it, as shown below:

Run the python -m clip_server command to start the CLIP server and encode the image.

The from clip_client import Client c = Client (server = 'GRPC: / / 87.105.159.191:51000') da = c.e ncode (da, show_progress = True)Copy the code

4. Type “A happy potato” to view the search results

vec = c.encode(["a happy potato"])
r = da.find(query=vec, limit=9)
r.plot_image_sprites()
Copy the code

The following output is displayed:

Query the output after “A happy potato”

Try typing “Professor cat is very serious” and the output looks like this:

For more detailed documentation, visit clip-as-Service

Build a cross-modal search system: image to text

We can also switch the two modes to realize the search from picture to text.

In the example below, we use the English text of the entire Pride and Prejudice novel as the matching target. Then you type in an image and you get the text of that image in Pride and Prejudice.

To start, run the clip-as-service server, which is locally hosted and accessible by clients.

Run the clip-as-service server 👉

Once the server is up and running, you can use the client to send requests to it and get results.

A clip-as-service client is used to create a multimodal search instance.

That is all about clip-as-Service in this issue. For more exciting content, please continue to pay attention to Jina AI official account!


Related learning materials of this article:

CLIP – as – service documents

CLIP-as-service GitHub Repo

Participate in Jina learning and become an expert in neural search

DocArray document

Jina document