Sentry is an open source application monitoring and error tracking solution. This solution consists of SDKS corresponding to various languages and a set of huge data backend services. The application uses the token bound to it to access the Sentry SDK to configure data reporting. The Sentry SDK can also be configured to report version information and release environment associated with errors. In addition, the Sentry SDK automatically captures related operations before exceptions occur, facilitating subsequent exception tracing. After the abnormal data is reported to the data service, it is filtered, extracted, and displayed on the Web page of the data background.

After the connection is complete, we can check the abnormality of the application in real time from the management system, so as to actively monitor the running status of the application on the client. By configuring alarms and analyzing the occurrence trend of anomalies, we can nip anomalies in the bud more actively and affect fewer users. By analyzing the details of exceptions and tracking abnormal operations, the client can avoid applying the status of two eyes and one black eye to solve problems more efficiently.

This article will also start with one-click deployment services, and share the whole detailed process of completing front-end application monitoring and abnormal data usage by solving the problems encountered in the deployment process. Hopefully, it will help you to solve the problems encountered in the deployment and use.

Rapidly deploy the Sentry service

Sentry’s admin background is developed based on Python Django. This admin background is backed by the Postgres database (admin background default database, Postgres, ClickHouse, Relay, Kafka, Redis and other basic services or a total of 23 services officially maintained by Sentry. As you can see, it would be extremely complex and difficult to deploy and maintain these 23 services independently. Fortunately, a one-click deployment implementation of GetSentry /onpremise based on docker images is available.

This deployment relies on Docker 19.03.6+ and Compose 1.24.1+

The preparatory work

Docker is open source containerization technology that can be used to build and containerize applications.

Compose is a tool for configuring and running multiple Docker apps, allowing you to configure all of your app’s services from a single profile and create and run them with one click.

After you have your Linux server ready and have Docker and Compose installed as per the official documentation, clone the onpremise source code to your workbench directory:

git clone https://github.com/getsentry/onpremise.git
# Switch to version 20.10.1, subsequent sharing will be based on this versionGit checkout release / 20.10.1Copy the code

Docker image acceleration

In the subsequent deployment, a large number of images need to be pulled, and the official source is slow to be pulled. You can modify the docker image source, modify or generate the /etc/dock/daemon. json file:

{
  "registry-mirrors": ["Mirror address"]}Copy the code

Then reload the configuration and restart the Docker service:

sudo systemctl daemon-reload
sudo systemctl restart docker
Copy the code

A key deployment

There is an install.sh file in the root path of onpremise. You only need to execute the script to complete the rapid deployment.

  1. Environmental audit
  2. Generate the service configuration
  3. Create a docker volume. Create a docker volume. Create a docker volume.
  4. Pull and upgrade the base image
  5. Build the mirror
  6. Service initialization
  7. Set an administrator account (manually create an administrator account if you skip this step)

At the end of the execution, the docker-compose up -d command will prompt you to start the service.

After running the docker-compose up command without adding the -d parameter, we can see the startup log of the service. The service can be fully started only after the internal Web, relay, snuba, kafka, etc., are started and initialized. You can use the default port to access the default service address of the management terminal. At this time, you can configure the domain name and resolve port 80 to the default service port.

When you access the management background for the first time, the welcome page is displayed. You can officially access the management background after completing the required configuration.

  • Root URL: specifies the public Root address of the interface that reports exceptions. (When configuring network resolution, the background service can configure two domain names to the Intranet and the Internet, and reports only the resolution rule of the interface/api/[id]/store/Configure it on the public network to prevent data leaks.
  • Admin Email: administrator account created during the install.sh phase.
  • Outbound Email: indicates the mail service configuration. You can skip this configuration.

Once this is done, those with no customization requirements for the service can jump to the front-end access and use section.

Docker data store location changed

Docker Volume is mounted in the /var directory by default. If your /var directory has a small capacity, it will soon be full with the running of the service. You need to modify the docker Volume mount directory.

Create a folder in the directory with the maximum capacity
mkdir -p /data/var/lib/
# Stop docker service
systemctl stop docker
Copy the default docker data to the new path, delete the old data and create a soft connection, even if the storage actually occupies the disk as the new path
/bin/cp -a /var/lib/docker /data/var/lib/docker && rm -rf /var/lib/docker &&  ln -s /data/var/lib/docker /var/lib/docker
Restart the Docker service
systemctl start docker
Copy the code

Custom service

One-click Deployment of Sentry services will always be inconsistent with our usage and maintenance design, and in such cases, we will need to modify the deployment configuration to meet our needs.

Service composition and operation mechanism

After the rapid deployment of docker-compose, we will first observe which services are started and analyze the functions of these services for subsequent adaptation and modification. Run the docker command to view all containers:

docker ps
Copy the code

It can be seen that all services are started now, and some services are started by different startup parameters using the same image. According to the image differentiation and the author’s research, the functions of each service are as follows:

  • Nginx: 1.16
    • Sentry_onpremise_nginx_1: Configure the network between services
  • Sentry-onpremise -local: The following services use the same image, that is, the same set of environment variables
    • sentry_onpremise_worker_1
      • It could be background tasks, emails, alarms
    • sentry_onpremise_cron_1
      • A scheduled task, I’m not sure what a scheduled task is, maybe a scheduled cleanup
    • sentry_onpremise_web_1
      • Web Services (UI + Web API)
    • sentry_onpremise_post-process-forwarder_1
    • sentry_onpremise_ingest-consumer_1
      • Process Kafka messages
  • sentry-cleanup-onpremise-local
    • sentry_onpremise_sentry-cleanup_1
      • Data cleaning, not important for now, but should share some configuration with other Sentry services
    • sentry_onpremise_snuba-cleanup_1
      • Data cleaning, not important for now
  • Getsentry/relay: 20.10.1
    • sentry_onpremise_relay_1
      • Data reported from applications goes to relay first.
      • Relay directly returns the response status
      • Data processing continues in background tasks
      • Discard data such as parsing events, formatting, and enabling filtering rules
      • Data is written to Kafka
  • symbolicator-cleanup-onpremise-local
    • sentry_onpremise_symbolicator-cleanup_1
      • Data cleaning, not important for now
  • Getsentry/snuba: 20.10.1
    • Appears to consume kafka messages, write to ClickHouse, use Redis, purpose unknown
    • sentry_onpremise_snuba-api_1
      • Snuba interface service, doesn’t work
    • sentry_onpremise_snuba-consumer_1
      • Consuming Kafka provides ClickHouse with events
    • sentry_onpremise_snuba-outcomes-consumer_1
      • Consume Kafka to ClickHouse Outcomes
    • sentry_onpremise_snuba-sessions-consumer_1
      • Consume Kafka for ClickHouse Sessions
    • sentry_onpremise_snuba-replacer_1
      • It appears to be converting old (or other conversion) data from Kafka and writing it to Kafka
  • tianon/exim4
    • sentry_onpremise_smtp_1
      • The mail service
  • Memcached: 1.5 – alpine
    • sentry_onpremise_memcached_1
    • Perhaps to reduce the frequency and collision of data storage
  • getsentry/symbolicator:bc041908c8259a0fd28d84f3f0b12daa066b49f6
    • sentry_onpremise_symbolicator_1
      • The most basic facility: parsing (native) error information
  • Postgres: 9.6
    • sentry_onpremise_postgres_1
      • Basic facilities, service background default database, storage abnormal data
  • Confluentinc/cp – kafka: 5.5.0
    • sentry_onpremise_kafka_1
      • Basic infrastructure, ClickHouse and PG data is definitely coming from Kafka
  • Redis: 5.0 – alpine
    • sentry_onpremise_redis_1
      • Basic infrastructure, there are some interceptor configurations here
  • Confluentinc/cp – zookeeper: 5.5.0
    • sentry_onpremise_zookeeper_1
      • Basic facilities
  • Another dual/ClickHouse – server: 19.17
    • sentry_onpremise_ClickHouse_1
      • Unlike pg storage, storage is key information about exceptions and is used for quick retrieval

In addition, the operating mechanism is as follows based on the logs recorded after exceptions are reported to the service:

  • Abnormal data is parsed by nginx to the relay service.
  • The relay uses PG to obtain the latest matching relationship between the application and the token, verifies the token in the data, returns 403 or 200, and blocks and filters the data.
  • Relay sends data to different Topics in Kafka.
  • Sentry subscribes to some of these topics and stores the parse data to Postgres for later viewing error details.
  • Snuba subscribes to other topics, tags the data, extracts key features, and stores them in ClickHouse for quick retrieval of data based on key features.

File structure and function

To make changes to the deployment and operation, you need to find the corresponding configuration file. Take a look at the main file structure and functions of the onPremise deployment implementation:

  • Clickhouse /config. XML: Clickhouse configuration file
  • Cron / : mirror build configuration and startup script for scheduled tasks
  • Nginx /nginx.conf: nginx configuration
  • Relay /config.example.yml: indicates the relay service configuration file
  • Sentry / : sentry-onpremise-local The build of the sentry-onpremise-local image and the configuration of the master service started based on the image are in this folder
    • Dockerfile: sentry-onpremise-local mirroring build configuration that starts many services
    • Requirements.example. TXT: This generates requirements.txt, where additional Django plugins need to be written
    • .dockerignore: A Docker ignore configuration that initially ignores all files except requirements.txt, which will need to be modified if you need to COPY something new when building a new image
    • Config.example. yml: config.yml is generated from this command
    • Sentry.conf.example. py: This generates sentry.conf.py, which is Python code that is overridden or merged into the Sentry service to affect sentry operation.
  • Env: mirroring version, data retention days, and port number
  • Docker-comemess. yml: Compose tool configuration, batch configuration of multiple Dockers and startup Settings
  • Install. sh: Sentry one-click deployment process script

Note also that once deployed, the install.sh script will generate valid files based on xx.example.xx, and the install.sh script will check the existence of these files when executed again. So in case of needing to modify the configuration after redeployment, we’d better delete the generated file and modify the configuration in xx.example.xx file.

The main service is started based on the sentry-onpremise-local image. The sentry configuration in the sentry-onpremise-local image incorporates sentry.conf.py. This file is generated by sentry.conf.example.py. Therefore, you will modify the sentry.conf.example.py configuration template file during service customization.

Use a separate database to ensure data stability

In the case of standalone database deployment, data will be damaged and lost in case of machine failure. Onpremise’s one-click deployment is a standalone database service in the form of docker, and database data is stored locally.

You can see that Sentry has two databases, Postgres and ClickHouse.

Postgres stores the ids and tokens of service applications that access Sentry. After the data is lost, Postgres stores the ids and tokens of service applications that access Sentry. The business application must modify the code to change the token and bring it back online. To avoid this impact, and because the company had a Postgres database ready for disaster recovery and regular backup, the database was switched to an external database.

Modify the DATABASES variable in sentry.conf.example.py to create the DATABASES variable:

DATABASES = {
  'default': {
    'ENGINE': 'sentry.db.postgres'.'NAME': 'database name'.'USER': 'Database username'.'PASSWORD': 'Database password'.'HOST': 'Database domain name'.'PORT': 'Database port number',}}Copy the code

Since it is no longer necessary to start the Postgres database service with Docker, the Postgres information is removed from the docker-comemage.yml file. Delete the Postgres configuration.

depends_on:
    - redis
    - postgres # remove
#...
services:
#...
# delete start
  postgres:
    << : *restart_policy
    image: 'postgres: 9.6'
    environment:
      POSTGRES_HOST_AUTH_METHOD: 'trust'
    volumes:
      - 'sentry-postgres:/var/lib/postgresql/data'
# Delete end
#...
volumes:
  sentry-data:
    external: true
  sentry-postgres: # remove
    external: true # remove
Copy the code

In addition, before Sentry starts, pg/ CIText extensions will be used to initialize the database structure and create functions. Therefore, the user permission of the database is required, and the extension must be enabled in advance; otherwise, install.sh will fail.

Controlling disk Usage

As data is reported, the local disk usage and database size of the server increase. After 3 million/day traffic is generated, the total disk usage increases by 1.4-2g per day. If the data is retained for 90 days according to Sentry scheduled data task configuration, the disk usage remains at a relatively large value after full access. At the same time such a large amount of data on the data query is also a burden. To ease the burden, you need to start on both the server and business application sides. In consideration, we changed the data retention period to 7 days. Env file:

SENTRY_EVENT_RETENTION_DAYS=7
Copy the code

You can also modify sentry.conf.example.py directly:

SENTRY_OPTIONS["system.event-retention-days"] = int(
    env("SENTRY_EVENT_RETENTION_DAYS"."90"))# is changed to
SENTRY_OPTIONS["system.event-retention-days"] = 7
Copy the code

Note That the scheduled task uses the DELETE statement to delete expired data. In this case, the disk space will not be released. If the database does not have a scheduled reclamation mechanism, you need to manually delete the expired data.

Reclaim statement for referenceVacuumdb -u [user name] -d [database name] -v -f --analyzeCopy the code

Single sign-on (SSO) CAS login access

Sentry itself supports SAML2, Auth0, and other single sign-on methods, but we need to support CAS3.0. Sentry and Django do not have a good plugin for this, so I put together a basic working plugin sentry_cas_ng.

Install, register, and configure the plugin. The plugin is installed using the Github address and requires some pre-installed command line tools. Instead of configuring the plugin in the requirements. TXT file, modify the Sentry /Dockerfile file to install the plugin and add the following content:

Set mirror source acceleration
RUN echo 'deb http://mirrors.aliyun.com/debian/ buster main non-free contrib \n\
deb http://mirrors.aliyun.com/debian/ buster-updates main non-free contrib \n\
deb http://mirrors.aliyun.com/debian/ buster-backports main non-free contrib \n\
deb http://mirrors.aliyun.com/debian-security/ buster/updates main non-free contrib \n\
deb-src http://mirrors.aliyun.com/debian/ buster main non-free contrib \n\
deb-src http://mirrors.aliyun.com/debian/ buster-updates main non-free contrib \n\
deb-src http://mirrors.aliyun.com/debian/ buster-backports main non-free contrib \n\
deb-src http://mirrors.aliyun.com/debian-security/ buster/updates main non-free contrib' > /etc/apt/sources.list
Upgrade and install the preinstall tools
RUNapt-get update && apt-get -y build-dep gcc \ && apt-get install -y -q libxslt1-dev libxml2-dev libpq-dev libldap2-dev libsasl2-dev libssl-dev sysvinit-utils procps
RUN apt-get install -y git
Install the basic cas login plug-in
RUN pip install git+https://github.com/toBeTheLight/sentry_cas_ng.git
Copy the code

Also modify the sentry.conf.example.py file to register the plug-in and configure the configuration items:

# Modify the session library to solve the problem of long sessions
SESSION_ENGINE = 'django.contrib.sessions.backends.db'
Install the plugin in Django
INSTALLED_APPS = INSTALLED_APPS + (
    'sentry_cas_ng'.)# Register plug-in middleware
MIDDLEWARE_CLASSES = MIDDLEWARE_CLASSES + (
    'sentry_cas_ng.middleware.CASMiddleware'.)Register the plugin data management side
AUTHENTICATION_BACKENDS = (
    'sentry_cas_ng.backends.CASBackend',
) + AUTHENTICATION_BACKENDS
 
Configure the login address for CAS3.0 sso
CAS_SERVER_URL = 'https://xxx.xxx.com/cas/'
Configure the CAS version
CAS_VERSION = '3'
# Because the plugin is implemented by blocking the login page to force a jump to the SSO page
So you need to configure login interception to redirect SSO login operations
You need to configure pathReg as the re for your project's login URL
# Also, when the page has? If the admin=true parameter is used, SSO is not switched
def CAS_LOGIN_REQUEST_JUDGE(request) :
  import re
  pathReg = r'.*/auth/login/.*'
  return not request.GET.get('admin'.None) and re.match(pathReg, request.path) is not None
# configure logout block to do logout
# let the plugin recognize the current logout operation and destroy the current user session
# is fixed content, unchanged
def CAS_LOGOUT_REQUEST_JUDGE(request) :
  import re
  pathReg = r'.*/api/0/auth/.*'
  return re.match(pathReg, request.path) is not None and request.method == 'DELETE'
Whether to automatically associate SSO CAS information with the Sentry user
CAS_APPLY_ATTRIBUTES_TO_USER = True
The default organization name assigned after login must be the same as the organization name set on the admin UI
AUTH_CAS_DEFAULT_SENTRY_ORGANIZATION = '[Organization name]'
Default role permissions after login
AUTH_CAS_SENTRY_ORGANIZATION_ROLE_TYPE = 'member'
The default mailbox suffix after login, such as 163.com in @163.com
AUTH_CAS_DEFAULT_EMAIL_DOMAIN = '[mailbox suffix]'
Copy the code

After the configuration, use the default Sentry organization name Sentry to access XXX /auth/login/ Sentry? Admin =true. Log in to the Sentry as an administrator and change the value of AUTH_CAS_DEFAULT_SENTRY_ORGANIZATION configured in the CAS plug-in. Otherwise, an error occurs when a new user logs in through SSO because the organization name to be assigned does not match the organization name set by the service.

Changing the Default Time Zone

After logging in to Sentry, you can find that the time of the exception is UTC time. Each user can change the time zone to local time zone in Settings:

For user friendliness, you can change the default time zone of the service directly by adding the configuration in the sentry.conf.example.py file:

# http://en.wikipedia.org/wiki/List_of_tz_zones_by_name
SENTRY_DEFAULT_TIME_ZONE = 'Asia/Shanghai'
Copy the code

Obtaining the real IP address

The first IP address of the request header, x-Forwarded-For, is the IP address of the real user. The first service to be Forwarded by Sentry is an Nginx service. Http_forwarded-for $remote_ADDR; http_forwarded-for $remote_ADDR; http_forwarded-for $remote_ADDR; Where $remote_addr indicates the “client” IP, but this client is relative to the Nginx service. If there are other proxy servers in front of it, the proxy server IP is retrieved. In our deployment environment, X-Forwarded-For is served by the front-loaded Nginx service and is already processed to the desired format, so delete this line.

Modifying Role Permissions

The default role authority system in Sentry has the following terms in the information structure in terms of inclusion relationships: organization, team, project, event.

At the role level, it also has:

  • Superuser: system administrator (an unconventional role) who can delete user accounts. The account created when the install.sh script is executed is the system administrator.
  • Owner: organization administrator. In private deployment, only one organization is available. That is, the owner can modify information other than service configuration and control the configuration and deletion of the organization and the following layers.
  • Manager: The team manager, who can remove users from the team, create and delete all projects, and create and delete all teams.
  • Admin: can set the project (such as alarm and inbound rule), approve the user to join the team, create the team, delete the team, adjust the configuration of the project of the team.
  • Member: can handle problems.

Roles follow accounts, which means an admin will be admin on all the teams he joins.

In our permission design, we want the owner to create the team and the projects under the team, and then assign admin to the team. That is, the admin role manages permissions for teams, but cannot create or delete teams or projects. In Sentry’s current situation, the closest thing to this kind of permission design would be to cancel admin’s permission to add or remove teams and projects, rather than to set him to have only one team’s permission.

The Sentry configuration manages permissions like this:

SENTRY_ROLES = (
  # Other characters
  #...
  {
    'id': 'admin'.'name': 'Admin'.'desc': 'government'
    'of.'.'scopes': set(["org:read"."org:integrations"."team:read"."team:write"."team:admin"."project:read"."project:write"."project:admin"."project:releases"."member:read"."event:read"."event:write"."event:admin",])})Copy the code

“Read” and “write” are read and write configurations, and “admin” is added or deleted. You can delete “team:admin” and “project:admin” and duplicate the SENTRY_ROLES variable in the sentry.conf.example.py file. You can adjust the rights of other roles as required.

Other Configuration Modifications

At this point, our custom configuration is complete.

Basically all configuration can be through the sentry. Conf. Example. Py files to assign the variable, or a field in the way of adjustment, what configuration items can go to the source code of the SRC/sentry/conf/server py file query, If you have other requirements, you can try to modify them yourself.

Front-end access and use

The subsequent access and use were demonstrated by Vue project.

The SDK access

First, create the corresponding team and project:

After selecting the platform language and other information, you can create teams and projects:

npm i @sentry/browser @sentry/integrations
Copy the code

Where @sentry/browser is the access SDK of the browser side. It should be noted that it only supports error reporting of IE11 and later versions. Earlier versions require raven.js, which we will not introduce.

In the @Sentry/Integrations package are official enhancements to the various front-end frameworks, which will be covered later.

To access, it is important to know the DSN (client key) that is bound to your current project, which can be viewed in the configuration of the specific project from Settings on the management side.

import * as Sentry from '@sentry/browser'
import { Vue as VueIntegration } from '@sentry/integrations'
import Vue from 'vue'

Sentry.init({
  // High traffic applications can control the percentage of reports
  tracesSampleRate: 0.3.// Different environments are reported to different environment categories
  environment: process.env.ENVIRONMENT,
  // DSN configuration for the current project
  dsn: 'https://[clientKey]@sentry.xxx.com/[id]'.// Trace vUE error, report to props, keep console error output
  integrations: [new VueIntegration({ Vue, attachProps: true.logErrors: true})]})Copy the code

As you can see, VueIntegration enhances the reporting of props for Vue components, as well as the additional reporting of build releases. At this point, Sentry starts reporting console.error, Ajax error, uncatch Promise, and so on. At the same time, we can also actively report and associate users.

Sentry.captureException(err)
Sentry.setUser({ id: user.id })
Copy the code

Sentry also provides a Webpack-based plugin: webpack-Sentry-plugin helps with access, which I won’t cover.

How to use monitoring data

After entering a specific project, you can see the list of issues sorted by Sentry’s error message, stack, and location:

On the right, you can see the trend of each error, the number of times it occurred, the number of users affected, and the buttons assigned to solve the problem. We can use these metrics to prioritize and assign error handling.

Through the development trend, we can also observe whether it is related to a certain online. We can also create a custom trend kanban through Discover on the left for more targeted observation.

After clicking on each issue, you can see the details:

From top to bottom, you can see the name of the error, the main environment information that occurred, the characteristics of the error extracted by Sentry, and the error stack at the bottomBREADCRUMBSYou can view the operations performed before an exception occurs, which helps you restore the operation procedure and troubleshoot the fault.

So much for getting started with Sentry. Other functions such as alarm configuration and performance monitoring can be explored on their own.

recruitment

As the front-end architecture team of Zhaopin, we are always looking for like-minded front-end architects and senior engineers. If you share our passion for technology, learning and exploration, please join us! Please send your resume to [email protected] or search WindieChai on wechat.