There are a lot of things to back up in our online production environment: various service logs, database data, user-uploaded data, code, and so on. Backing up with JuiceFS can save you a lot of time, and we’ll write a series of tutorials on this topic to put together a set of best practices for your convenience.

Today’s first article is about the most commonly used Nginx log backups.

How to back up Nginx logs with JuiceFS

In the production environment, Nginx is often configured as a reverse proxy to interconnect various application services. There are two main types of logs, access logs and error logs.

The logs are scattered across the disk of each Nginx node, each machine’s own disk is not secure, and scattered logs are difficult to maintain and use. Therefore, we collect logs in a more reliable storage system, which is safe and reliable for long-term storage and convenient for analysis.

In the log storage needs, the capacity is strong scalability, stable and safe, convenient operation and maintenance, and the price is cheap. It is best to pay according to the usage, and the requirements for storage performance will be lower. Currently, NFS, HDFS, and object storage are commonly used. Compare these with JuiceFS:

There are two methods for collecting logs: periodic collection and real-time collection. JuiceFS uses the client’s own object store to store file data, so it naturally inherits the benefits of object storage. On top of that, we provide a high-performance metadata service and full POSIX compatibility that is much easier to use than object storage.

Time collect

Logs are copied to a single storage location, usually by the hour or day. There are many toolsets in this area, and we use logrotate, the default installation of Linux, as an example.

First, create a file system on JuiceFS, say super-backup.

The first step is to install the JuiceFS client on each machine and mount it to/JFS.

Download the JuiceFS client

curl -L juicefs.io/static/juicefs -o juicefs && chmod +x juicefs
Copy the code

Mount a file system

sudo ./juicefs mount super-backup /jfs
Copy the code

It is also convenient to use JuiceFS in automatic configuration management. For details, see the howto guide for command line authentication and automatic mount at startup. We support mount in Docker and Kubernates.

Step 2 configure the log scrolling policy with Logrotate on each machine and modify /etc/logrotate.d/nginx

Postscript rotate [-f /var/run/nginx.pid] /var/ log/nginx.pid && kill -usr1 'cat /var/run/nginx.pid' # Reload endscript lastAction rsync-au *. Gz/JFS /nginx-logs/ 'hostname -s' / # Sync compressed logs to JuiceFS endScript}Copy the code

At this point, Nginx logs can be rotated daily and saved to JuiceFS. When adding an Nginx node, you only need to do the same configuration on the new node.

If you use NFS, the configuration in Logrotate is basically the same. However, NFS has several disadvantages:

  • Most NFS has a single point of failure, and JuiceFS is highly available (the professional version promises 99.95% SLA).
  • NFS protocol transport is not encrypted, so you need to ensure that NFS and Nginx are in the same VPC. If there are other services that need to be backed up, deployment can be difficult. JuiceFS transport is ssl-encrypted and not restricted by VPC.
  • NFS requires prior capacity planning, and JuiceFS is flexible and pay-by-capacity, which is less stressful and cheaper. If HDFS or object storage is used, accessing backup data in the future will be troublesome. JuiceFS is much simpler, for example, you can query directly with Zgrep.

A few more Tips:

  1. performlogrotate -f /etc/logrotate.d/nginxPerform a validation of the Logrotate configuration immediately. You can also use -d for debugging.
  2. According to hourly, weekly, daily, or hourly, Logrotate is based on cron/etc/crontabIn the modification.
  3. If you think the log file is too much, we also provide itjuicefs mergeCommand to quickly merge gzipped log files.

In the next section, we’ll talk about log collection in real time.

Real time collecting

There are many open source tools for real-time log collection, including Logstash, Flume, Scribe, Kafka, etc.

When the cluster is not very large, there is a family bucket solution ELK, which uses Logstash to collect and analyze logs.

The following deployment is required:

  1. Deploy a Logstash Agent on each machine (same with other tools like Flume);
  2. Deploy a Logstash Central for log aggregation;
  3. Deploy a Redis as a whole service Broker to buffer log collection and writing and to prevent log loss when Central hangs.
  4. Then configure the Central disk mode to store logs to JuiceFS/NFS/object storage/HDFS.

First look at the architecture diagram:

There is no Logstash configuration for collection, analysis, and filtering. There are many articles on the web. Let’s talk about the output.

Save the Logstash logs to JuiceFS by setting them in the output section of the configuration:

output {
   file {
       path => "/jfs/nginx-logs/%{host}-%{+yyyy/MM/dd/HH}.log.gz"
       message_format => "%{message}"
       gzip => true
   }
}
Copy the code

Storage to NFS can also be used with the above configuration, with the same disadvantages as mentioned in the periodic collection section above.

If you want to save to an object store or HDFS, you need to configure third-party plugins for Logstash. Most of them are unofficial, and as the Version of Logstash evolves, you may need to do a bit of messing around.

The simplest real-time collection scheme

There is an even simpler way to collect logs in real time, which is to have Nginx directly export logs to JuiceFS, eliminating the need to maintain and deploy a log collection system. There are two things you might worry about if JuiceFS fails and Nginx runs properly:

  • JuiceFS itself is a highly available service. The Professional edition promises 99.95% availability and should be in the same availability level as your database and other services.
  • Nginx log output is implemented using asynchronous IO, even if JuiceFS is temporarily jitter, Nginx can run normally (restart or reload may be affected).

If you don’t like the complexity of running and maintaining a log collection system, this solution is worth a try.

Add a remote backup to Nginx logs

Nginx logs stored in super-backup can be stored in remote backup mode.

Just two steps:

  1. Go to the JuiceFS web console, go to the Settings menu of your file system, check “Start Replication”, then select the object store to which you want to copy, and save.
  2. Re-mount Super-Backup on all the machines on which it is mounted. The newly written data is then quickly synchronized to the Bucket to be replicated, and the old data is also synchronized during periodic client scans (weekly by default).

In this way, a piece of data can be automatically synchronized to another object store, effectively preventing the failure of a single object store or disaster in the region.

You must ask: What if JuiceFS dies? I can’t access metadata. I can’t just have data in the object store.

We also have an important feature — JuiceFS in compatible mode, where all files are stored in the object store as is, and the metadata service that is removed from JuiceFS can still access the files inside. This is suitable for scenarios where a write is not modified once, such as backup.

If you have any help, please pay attention to our project Juicedata/JuiceFS! (0 ᴗ 0 ✿)