Observability is a measure of how well the infrastructure, system platform, or application is performing. A common approach is to collect metrics, logging, and tracing and events data to help development/operations detect, investigate, warn, and correct system problems. This article will share solutions and practices on observability from the aspects of Nginx observability, the relationship between Apache APISIX and Nginx, Apache APISIX observability, and the combination of Apache SkyWalking to further improve observability.

Observability of Nginx

1, Nginx common monitoring methods

Nginx does this to a certain extent. The following are the common monitoring methods of Nginx:

  • Ngx_http_stub_status_module.
  • VTS module + [exporter] + Prometheus + Grafana. (If the Nginx version is lower, you need to introduce An Exporter)
  • Nginx Amplify SaaS.

ngx_http_stub_status_module

Ngx_http_stub_status_module collects instance-level statistics.

VTS module

The VTS Module has three obvious disadvantages.

  • Complex Installation Although the VTS Module can collect indicators of various types, its installation is complex. If you want to use the VTS Module, you need to recompile Nginx, add the VTS Moudle before compiling Nginx, and reinstall Nginx after compiling.
  • The extension capability of VTS is divided into two parts. One is to add extensions to VTS before compiling. The second is to add an extension after compilation — modify the nginx.conf configuration file. Adding an extension to the nginx.conf file to make Nginx Reload, or direct production environment reload, more or less affects the business.
  • The VTS Module, which was last updated in 2018, has been out of service for 3 years.

Nginx Amplify SaaS

Nginx Amplify is a SaaS service. Nginx Amplify provides services on the remote end, installing Agents outside of Nginx services. If the collection module is installed outside Nginx, the collection indicators will be limited. Only the information exposed by Nginx can be obtained, and the internal information that is not exposed cannot be obtained. In addition, this is a SaaS service, which needs to transfer the collected data to the server through the public network, which will bring some security risks and keep some enterprise users out. Maybe Nginx Amplify is aimed at business users like Nginx Plus, not open source users. The Nginx Amplify SaaS community has also been inactive for two years.

2. Nginx’s own Events defect

Nginx is configured based on nginx.conf, and the configuration takes effect via the Reload nginx.conf file. There are no other Events other than reload, so we can’t know what happens to the file each time we change it. For example, we only configured one route at first, and we changed the file to add 10 routes. Only the reload event can’t know which 10 routes are added. 2. Nginx open source products lack active health checks. Nginx is a reverse proxy, and the real back-end service may be restarted, upgraded, or abnormal. If there is no active health check, the passive check can only know that the service is in trouble when the traffic is abnormal. As a result, many Events will be lost, resulting in incomplete upstream Events event information.

3. Summary of Nginx observability

The open source version of Nginx does not provide very useful monitoring. Although Nginx provides some monitoring tools, the installation and configuration of these tools are complex and have little scalability. Perhaps these tools are not designed for observability, but simply to be able to see metrics or statistics to locate problems. There are various observable Settings classes available, but they are difficult to integrate into Nginx. In addition, the Nginx community is stagnating, resulting in slow iterations of Nginx.

Apache APISIX overview

1. Relationship between Apache APISIX and Nginx

Apache APISIX is based on Nginx, but only relies on Nginx network library. Based on Nginx, Apache APISIX implements its own core code, and reserves the extension mechanism. The table lists the comparison between Apache APISIX and Nginx. Apache APISIX can be used as a reverse proxy, and can perform functions that Nginx does not support, such as active health check, traffic management, horizontal scaling, and so on. All these functions are open source.

  • API design: It is easier to use Apache APISIX for API design.
  • Open source Dashboard: The reverse proxy can be configured on the interface.
  • Active health check: Apache APISIX supports active health check to improve observability in conjunction with Events.
  • Traffic management: Suitable for monitoring data or when services are released online.
  • Scale-out: Apache APISIX supports scale-out, thanks to the Apache APISIX architecture (see figure below).
  • Plug-in extension mechanism
  • Plug-in choreography: Multiple plug-ins are logically choreographed and combined according to business requirements
  • Dynamic certificate management

2. Introduction to Apache APISIX

Apache APISIX is a dynamic, real-time, high-performance API gateway that provides load balancing, dynamic upstream, grayscale publishing, service fuse, identity authentication, observable, and other rich traffic management functions. Apache APISIX is also the world’s most active open source API gateway project and is a production available high-performance gateway. Apache APISIX has been used by hundreds of enterprises in the world to handle critical business traffic, covering finance, Internet, manufacturing, retail, operators and so on, such as NASA, THE EU’s digital Factory, China Airlines, China Mobile, Tencent, Huawei, Weibo, netease, Shell Find House, 360, Taikang and so on.

Apache APISIX solution

On the left of the figure above, from top to bottom, is the evolution from individual services to SOA (Service-oriented Architecture) to microservices. In SOA, the gateway generally adopts Nginx or HAProxy. In the microservices architecture, the gateway uses Nginx for load balancing. There are two common solutions to microservices architecture: one is based on Java technology stack implementations such as the Spring Cloud family; The other is the Service Mesh. Where does Apache APISIX stand in this evolution and what can it do? In short, The red parts in the left picture (Nginx/HAProxy/Kong/Spring Cloud Zuul/Spring Cloud Gateway/Traefik/Envoy/Ingress Nginx) can be substituted Apache APISIX’s solution. There is Apache APISIX SLB solution under SOA, Apache APISIX Gateway under microservice architecture, and Apache APISIX Ingress deployed in Kubernetes. Apache APISIX Mesh has been deployed in Service Mesh.In terms of service request traffic, when a client initiates a request, the request passes through the LB, passes through the Gateway, and is distributed to the back-end business service. The red part (LB/Gateway/Spring Cloud Gateway/K8s Ingress/Sidecar) can be selected as the Apache APISIX solution. Apache APISIX supports the development of plug-ins in multiple languages. Plug-ins can be written in Java under the Java system. Apache APISIX is the data plane for full traffic. Apache APISIX has solutions for LB, Gateway, Ingress and Sidecar. They are a unified solution, and they are a unified solution in terms of observability. When the solution is unified, the management control chain is also easy to implement.

Observability of Apache APISIX

What can Apache APISIX do in terms of observability? What is the observable advantage of Apache APISIX?

1. Apache APISIX Data types supported for collection

Apache APISIX supports the following data types:

  • Tracing – Integrate SkyWalking
  • Metrics — integrated SkyWalking/Prometheus
  • Apache APISIX is a gateway alternative to Nginx and other Logging Platforms. Apache APISIX can integrate multiple APM or observability systems for observability, such as: Tracing can be integrated with SkyWalking, Metrics can be integrated with SkyWalking or Prometheus, and Logging can be integrated with SkyWalking and other Logging systems.

2. Apache APISIX’s advantage in observability

2.1. High scalability

Apache APISIX can extend its capabilities through plug-ins. The three data types mentioned above are implemented through the plug-in mechanism. Why Is Apache APISIX scalable? Because Apache APISIX supports custom plug-ins. Apache APISIX supports plug-ins written in multiple languages. By default, Lua is used. Plug-ins can also be written in Java, Golang, and other programming languages.

2.2 flexible configuration capability

Three examples illustrate the flexible configuration capabilities of Apache APISIX. The first example is that Apache APISIX can modify logging configuration at run time, such as adding/modifying logging fields. It is a common requirement to modify log fields. For example, after the service is online and the log fields are configured, you need to modify or add several log fields after the system runs for a period of time. If you are using Nginx, modify the nginx.conf file to make the configuration take effect. Apache APISIX only needs to configure the field through the script, it will take effect dynamically. A second example of flexible configuration capability is the use of Prometheus. In Apache APISIX, to create/remove a metric or a metrics labels extension, simply add a metirCs in Prometheus or enter the corresponding information. The Apache APISIX hot reload mechanism takes effect without restarting. A third flexible configuration capability is found in the Apache APISIX implementation. Apache APISIX manages all routing objects and has a set of object management mechanisms in memory. If you add a plugin to an API in Apache APISIX, the level of effectiveness can be refined down to the API. Each API can be bound to the plugin, or the plugin can be removed from the API. Apache APISIX fine-controls observable data acquisition for every API in every service. In other words, you can collect only the data you care about most, and these configurations are dynamic and can be adjusted on the fly.

2.3. Active Community

One of Apache APISIX’s most important strengths is that it has an active community that allows the product to iterate quickly and become better and better to meet everyone’s needs.The graph above shows the growth curve of Apache APISIX (green), Kong (light blue), MOSN (yellow), and BFE (dark blue) contributors, with Apache APISX showing the fastest growth trend and steepest curve. Apache APISIX has the most active community of its kind.

In combination with Apache SkyWalking, the observability is further improved

What improvements can Apache APISIX and Apache SkyWalking make? In addition to the SkyWalking Tracing plugin, tracing, metrics, logging, and events can be aggregated into SkyWalking, enabling data linkage with SkyWalking’s aggregation capabilities.

1、 SkyWalking Satellite

SkyWalking Satellite is developed by the Apache APISIX community, Apache SkyWalking community and Baidu.Following the steps shown above, the SkyWalking Satellite can be deployed closer to the front end of the data generation, in the form of a sidecar. From the top down, the business request goes through the Apache APISIX proxy to Upsteam, and the Satellite is deployed next to Apache APISIX as a sidecar, Data of The Three data types of Apache APISIX, tracing, metrics and logging, were collected and sent to SkyWalking through the GRPC protocol. Most importantly, with this deployment, Apache APISIX can integrate the three data types directly into SkyWalking without any changes.

2. ALS scheme

Access Log Service (ALS) sends Access logs that have passed Through Apache APISIX. Special fields are added to common Access logs. For example, key fields are added to generate topology graphs and aggregate metrics. The most important benefit of the ALS solution is that topology, metrics, and logging data can be analyzed and aggregated directly through access Log. When Prometheus was used, if METRICS at URI level were configured, the overall metrics would inflate dramatically. Because there may be dozens of SERVICES at the URI level, there may be many labels after each metrics, which reduces gateway performance and makes metrics harder to obtain. With the ALS scheme, you stream the data to SkyWalking, give SkyWalking the computation, and then you can easily query it, without having to pull a very large amount of data every few seconds.

3. Integrate Events into SkyWalking

Common Events include configuration distribution, cluster changes, and health check.

  • Configure distribution: Routes and plug-ins may be added, modified, or deleted when API distribution is configured.
  • Cluster change: When a cluster changes, you need to know the number of services in the cluster. For example, IP addresses change during capacity expansion. The IP addresses change when the gateway receives messages. Each process is an event, and these events need to be exposed.
  • Health check: Actively detects whether the system is healthy. For example, the business request failure rate suddenly becomes high, and the event detects that the business service is unhealthy. In this case, you can quickly locate the problem.

read

Q: How is the Apache APISIX extension mechanism implemented, and does extending this feature affect the stability of Apache APISIX itself? . A: The Apache APISIX extension mechanism benefits from its architecture to add business logic to each of the phases (rewrite/access/header_filter/body_filter/PRERead_filter/log). In terms of stability, Apahce APISIX has opened source nearly 50 plugins, each of which has end-to-end testing and is proven to be stable and available. But custom plug-ins should follow a certain specification, although very simple, but you can not be too casual. The stability of custom plug-ins needs to be guaranteed by the business side itself. Q: The Nginx nginx.conf file may contain a number of rules, which may be blocked by the previous rules. It is not clear whether the later rules are valid. Does Apache APISIX have any way of knowing which rules are valid? A: The more nginx.conf files are configured for Nginx, the more complex the configuration services are, and the more difficult this file is to maintain. The Apache APISIX configuration file is fixed. The Apache APISIX official configuration is the optimal configuration in most scenarios. The other route configuration is configured through the API, and the route configuration is in memory. In terms of management, you can organize your routes in a variety of ways, such as Dashboard. For example, there is a service named ABC, and various route definitions can be found under the service. Route definitions can be viewed in a list. A field in a route is called priority. Another way to view routes is to label the apis in the Dashboard, which makes the management of routes more user-friendly. It is easy to filter and query routes according to the labels.

About the author

Apache APISIX PMC and Apache SkyWalking Committer

About the Apache APISIX

Apache APISIX is a dynamic, real-time, high-performance open source API gateway, providing load balancing, dynamic upstream, gray publishing, service fuse, identity authentication, observable and other rich traffic management functions. Apache APISIX helps companies quickly and securely handle API and microservice traffic, including gateways, Kubernetes Ingress, and service grids. Apache APISIX is used by hundreds of companies around the world to handle critical business traffic, including finance, Internet, manufacturing, retail, operator, and more. For example, NASA, THE European Union’s digital Factory, China Airlines, China Mobile, Tencent, Huawei, Weibo, netease, Shell Find room, 360, Taikang, Naixue tea and so on. Over 200 contributors create Apache APISIX, the world’s most active open source gateway project. Smart developers! Come join this active and diverse community and bring more beautiful things to the world!

  • Apache APISIX GitHub: github.com/apache/apis…
  • Apache APISIX official website: apisix.apache.org/
  • Apache APISIX document: apisix.apache.org/zh/docs/api…