This article is for the blogger to study notes, the content comes from the Internet, if there is infringement, please contact delete.

Personal Note: github.com/dbses/TechN…

01 | background introduction

1.1 What is an API Gateway?

API gateway is a traffic gateway between external requests and internal services, which implements protocol conversion, authentication, flow control, parameter verification, monitoring and other general functions for external requests.

To sum up, gateway mainly solves two problems: first, unified API management; Second, integrate repetitive functions of microservices;

1.2 Why do WE make Shepherd API gateway

There are three main reasons.

1. Improve r&d efficiency: Before the Shepherd API gateway, meituan business R&D personnel usually need to complete basic authentication, traffic limiting, log monitoring, parameter verification, protocol conversion and other work before an external HTTP API interface. Meanwhile, they need to maintain code logic, which leads to relatively low R&D efficiency.

2. Reduce communication cost: after having Shepherd API gateway, business r&d personnel can configure API and automatically generate API front-end and back-end interaction documents and client SDK, which is convenient for front-end and back-end developers to interact and coordinate.

3. Improve resource utilization: How to understand this? For basic tasks such as authentication, traffic limiting, log monitoring, parameter verification, and protocol conversion, each Web application needs to maintain machines, configurations, and databases, resulting in poor resource utilization.

It provides meituan with a unified API gateway solution with high performance, high availability and scalability, enabling service r&d personnel to open functions and data to the outside world through configuration.

At present, more than 18,000 apis are connected to Shepherd API gateway, and more than 90 clusters are running online, with the total number of daily calls reaching more than 10 billion.

02 | overall architecture

2.1 control surface

Including the management platform (complete API life-cycle management) and monitoring center (complete API request monitoring data collection and service alarm function).

Business research and development personnel start from the creation of API, complete parameter input, DSL script generation; You can then test the API through documentation and MOCK functionality;

After the API test was completed, in order to ensure the on-line stability, Shepherd management platform provided a series of security assurance measures such as release approval, gray on-line, version rollback, etc.

In these two steps, API invocation failures are monitored, request logs are recorded, and alarms are generated when exceptions are found.

Finally, after an API that is no longer used is taken offline, all resources occupied by the API are reclaimed and wait to be enabled again.

2.2 Configuration Center

The configuration center is used to complete the information interaction between the control plane and the data plane. It stores the configuration information of the API, which is described in DSL domain-specific language. The API described by the DSL language is shown below.

  • Filters: Functional components used by the API;
  • Request: specifies the requested domain name, path, and parameters.
  • Response: Response result, such as exception handling, Header, and Cookies;
  • Invokers: request rules for back-end services (RPC/HTTP/Function);
  • FilterConfigs: indicates the configuration information of the functional component used by the API.

2.3 the data plane

Generalize the back-end service and respond to the results.

When request traffic hits the API request path and enters the server, the logic is handled by a series of functional components configured in the DSL.

When request traffic enters the Shepherd server, the logic is handled by a set of functional components configured in the API DSL. The gateway provides API routing, functional component integration, protocol transformation, and service invocation.

API routing implementation:

After passing through some of these functional components, it is eventually routed to the microservice. The route implementation principle can be simply understood as a Map structure, in which Key is the complete domain name and path information, and Value is the specific API configuration.

The final step in API calls is protocol transformation and service invocation.

03 | high performance design

3.1 Performance Optimization

Shepherd makes the API request fully asynchronous:

QPS for 2000.

Performance Optimization 1: Enable the long connection between Nginx and Web applications, and the performance is increased to over 10000.

Performance optimization 2: Preheat API requests to Shepherd service, reduce local log printing on master link, and increase QPS by another 30%, i.e. 13000.

Performance optimization 3: Replace the Jetty container with the Netty network framework, and increase the QPS by 10% to more than 15000.

04 | high-availability design

4.1 Service Isolation

There are two types of service isolation:

One is cluster isolation by line of business dimension. The other is the service node dimension. Fast and slow thread pool isolation is mainly used for some apis that use synchronous blocking components, such as SSO authentication and custom authentication, which may block the shared business thread pool for a long time.

The principle of fast/slow isolation is to collect statistics on the processing time of API requests and isolate THE API requests that take a long time to process and exceed the tolerance threshold to the slow thread pool to avoid affecting other normal API calls.

4.2 Stability Guarantee

The guarantee measures include traffic control, request caching, timeout management, and fuse degrade.

  • Traffic control: Provides traffic protection from multiple dimensions, such as App traffic limiting, IP traffic limiting, and cluster traffic limiting.
  • Request caching: Enable the request caching function for some requests that are frequently queried and are not sensitive to data timeliness.
  • Timeout management: Each API sets a timeout period for processing requests. For timeout requests, the system processes failures quickly to avoid resource occupation.
  • Fuse degrade: When the failure threshold is reached, the fuse is automatically disconnected and the default value is returned.

4.3 Monitoring Alarms

Shepherd provides monitoring from machine indicators (1), business indicators (234), and service status indicators (5), as shown in the following table.

The following table lists the main alarm capabilities:

4.4 Ease-of-use Design

Automatically generate DSL.

05 | extensible design

5.1 Customizing Components

Shepherd enables businesses to extend some of their custom logic by providing the ability to load custom components.

This is an example of a custom component implementation. The Invoke method implements business logic for custom components, such as continuing execution, jumping to a page, returning a result directly, throwing an exception, and so on.

5.2 Service Orchestration

For some business scenarios that need to be satisfied by invoking multiple interfaces, service choreography can be implemented.

Shepherd supports this service choreography API when creating the API to meet extensibility requirements.

The resources

  • Tech.meituan.com/2021/05/20/…
  • Tech.meituan.com/2018/07/26/…
  • www.infoq.cn/article/qxc…