How to observe and diagnose Kubernetes without invasion

This paper introduces an open source non-invasive Kubernetes observation and diagnosis platform — Pixie, and introduces the use of Pixie in detail.

Pixie profile

Pixie is an open source Kubernetes observation and diagnosis platform from New Relic. It is based on eBPF and can diagnose, observe and debug systems and applications through PxL scripts without modifying applications. Pixie will be donated to CNCF in the future, so you can use it without worrying about source control.

Pixie’s key features include:

Automatic measurement (Instrumentation) : thanks to the eBPF of the Linux kernel, Pixie automatically collects application requests from various protocols (such as HTTP, DNS, gRPC, etc.), system indicators and network layer, without modifying application code.
Fully scripted: Pixie provides PxL scripts that can be used to script analysis of all data, and provides a wealth of predefined scripts for reference.
In-cluster edge computing: Pixie collects and processes all data in Kubernetes cluster, so there is no need to send massive data to remote cloud for centralized processing, which not only saves the cost of network transmission, but also ensures the performance of data processing and security isolation.

Pixie is installed and deployed

Pixie provides various installation and deployment methods, such as the Helm, YAML, and CLI. CLI is recommended. The deployment steps are as follows:

bash -c "$(curl -fsSL https://withpixie.ai/install.sh)"
px auth login
px deploy
Copy the code

Note:

The second step is to open the browser and go to work.withpixie.ai/ register an account for reverse proxy authentication of Pixie UI.
Because the PX checks the Linux kernel versions of all nodes during the deployment, the deployment fails if the cluster contains Windows nodes.

By default, the Pixie UI is added to the newly deployed cluster using the Work.withPixie. ai reverse proxy. If you don’t want to use a reverse proxy managed by Pixie, you can also switch data transfer mode to isolation mode:

px get viziers
px config update -c <YOUR_CLUSTER_ID> --passthrough=false
Copy the code

At the beginning of Pixie experience

Open the UI site above, select the cluster you just deployed, select PX/Cluster from the preset script list, and then click RUN to get an overview of the entire cluster. As shown in the figure below, it lists all the namespaces, Services, Nodes, Pods, and Service Graphs in the cluster.

All resources listed in the interface can be clicked to obtain more specific observation information of corresponding resources. For example, click on pl/ Vizier-Cloud-connector in the Service list to get the request metrics for this Service and a list of PODS:

Of course, all of the above UI operations can also be done through the CLI:

List all scripts
px run -l
# run px/cluster
px run px/cluster
Run px/service and specify the service name
px run px/service -- -service pl/vizier-cloud-connector
Run the script interactively
px live px/service -- -service pl/vizier-cloud-connector
Copy the code

Px Run will also output the link of the same script in the UI, which is convenient for users to switch to the UI with one click. This is very useful when there is a large amount of data, because the DISPLAY effect of the UI is usually more intuitive.

The PxL script

Often, Pixie’s predefined scripts may not be sufficient for real-world observation and diagnosis, so you need to write your own PxL scripts. PxL scripts can query and filter data from existing collection data sources or extend collection data sources. For example, here is a query for all network connections in the last 30 seconds:

# cat conn-stats.pxl
# Import Pixie's module for querying data
import px

# Load the last 30 seconds of Pixie's `conn_stats` table into a Dataframe.
df = px.DataFrame(table='conn_stats', start_time='-30s')

# Display the DataFrame with table formatting
px.display(df)
Copy the code

Then, execute the script with the px command:

px live -f conn-stats.pxl
Copy the code

The use of the PxL script is similar to, and is basically a subset of, Python’s well-known data-processing library Pandas. For example, the network connection statistics obtained above can be further grouped by Pod and Service, and then filter out the connection information of undefined Service to obtain the network connection statistics of all services and PODS:

# cat service-conns.pxl
# Import Pixie's module for querying data
import px

# Load the last 30 seconds of Pixie's `conn_stats` table into a Dataframe.
df = px.DataFrame(table='conn_stats', start_time='-30s')

# Each record contains contextual information that can be accessed by the reading ctx.
df.pod = df.ctx['pod']
df.service = df.ctx['service']

# Group data by unique values in the 'pod' column and calculate the
# sum of the 'bytes_sent' and 'bytes_recv' for each unique pod grouping.
df = df.groupby(['pod'.'service']).agg(
    bytes_sent=('bytes_sent', px.sum),
    bytes_recv=('bytes_recv', px.sum))# Force ordering of the columns (do not include _clusterID_, which is a product of the CLI and not the PxL script)
df = df[['service'.'pod'.'bytes_sent'.'bytes_recv']]

# Filter out connections that don't have their service identified.
df = df[df.service != ' ']

# Display the DataFrame with table formatting
px.display(df)
Copy the code

The PxL documentation provides examples of bPFTrace, dynamic tracking Go, DNS tracing, flame maps, and Slack alerts. For more details on how to use them, see the official documentation docs.pixielabs.ai/.

conclusion

Pixie’s appearance was pretty amazing. Although the Kubernete open source community already has a number of observation and troubleshooting tools based on eBPF, the use of many tools depends on specific conditions. For example, Cilium’s open-source Hubble tool requires clusters to use the Cilium network plug-in. Pixie, on the other hand, does not rely on any network plug-ins or cloud platforms and can be used in various public cloud and on-premise deployments.

Welcome to pay attention to chat cloud native public number, learn more cloud native knowledge.

How to observe and diagnose Kubernetes without invasion

Pixie profile

Pixie is installed and deployed

At the beginning of Pixie experience

The PxL script

conclusion

Related Posts

Java source code – java.util.collections

Nginx Static Resource Deployment [2]

Load configuration of the Nacos configuration center