Source: mp.weixin.qq.com/s/zaspcRKep…

preface

Today, I would like to introduce FastDFS, an open source distributed file system, which is also a technology I came into contact with after I joined the company. Due to the business requirements of the company’s project, there are hundreds of millions of files stored in the server, so I used such a technology to store these files, and THEN I began to understand this technology. And I’m here to start with you from zero to one.

FastDFS introduction

FastDFS is an open source lightweight distributed file system developed in C language by Alibaba. It manages files, including file storage, file synchronization, file access (upload, download) and so on. This solution solves the problems of large-capacity storage and load balancing. It is especially suitable for online services with file as the carrier, such as photo album website, video website and so on.

FastDFS is tailor-made for the Internet. It takes into account redundant backup, load balancing, and linear expansion, and emphasizes high availability and performance. It is easy to set up a high-performance file server cluster to provide file uploading and downloading services.

From 0, some questions of your own: Is FastDFS obsolete?

I believe that these are also some questions that many students want to ask. I also had such questions when I did not understand this technology.

First of all, there are a lot of file storage will choose cloud services like Seven Niuyun, Ali Cloud OSS and so on, why to build a set of file server to increase maintenance costs?

Secondly, this is not the hot topic of interview. I haven’t even contacted or even heard about the hot technology before joining the company. On the contrary, even if I don’t take the initiative to understand it, when I meet it, I can know what it is used for.

First of all, this technology must not be outdated, because some special files, information security concerns and other reasons, will not choose public cloud server, and due to cost considerations, there are still many medium-sized Internet companies based on FastDFS to do their own file server. In addition, as a distributed server, FastDFS takes lightweight, horizontal scaling, DISASTER recovery, high availability, high performance, and load balancing into full consideration. It is still the best choice for a file server.

So why is such a technology so poorly understood today?

First, I think it is due to demand, the number of businesses that need to store a large number of files is relatively small compared to other businesses. If the file storage capacity is not large, according to the traditional file storage mode will not be a big problem.

Second, there are now seven niuyun, Ali Cloud OSS and other companies to provide object storage, coupled with the domestic pursuit of “on the cloud”, few people are willing to take their own server to do file storage services.

Of course, for a technical person, all kinds of technology have to learn, to adapt, so this article hopes to help students who are interested in, or in the work of high level of file storage students, FastDFS is a good choice.

Traditional file storage

! [the need to build a high performance file system? I recommend you try it] (https://p3-tt.byteimg.com/origin/pgc-image/8afe2a33205648cdbdfa0eeb1ec6cea0?from=pc)

This is the traditional way of file storage, there is no need to install any application on the server, just need to have SFTP service, we can write corresponding code, complete the CRUD of the file.

The advantage of this approach is that it is very convenient, only need a machine, a few lines of code to handle the file storage, but this approach’s bottlenecks and drawbacks are obvious.

First of all, for a single server, the bandwidth and disk capacity of a single file server are limited regardless of downtime, so when the volume of files occupies the entire disk, we can only choose to expand, but this single server approach is not friendly for expansion, we can think about, Should we copy the data from the old hard drive to a bigger one and then replace it?

In addition to the extension, we also need to face a problem, is the file search, if we put all files together, if the number of files reaches a certain number, we will face disk IO speed bottleneck, I don’t know if you have encountered the following scenario:

! [the need to build a high performance file system? I recommend you try it] (https://p6-tt.byteimg.com/origin/pgc-image/616fd5b96c124eb8891310923aa4f5ad?from=pc)

Slow disk query

If we need to find a file on a disk, without a path, there are many files or directory, then the system will scan disk, as we all know, computer architecture, about the speed, the CPU > > memory hard disk, if in a production environment, really need to store huge amounts of documents, assuming that storage is the user’s head, Therefore, every time users open the APP, they need to wait for more than ten seconds before their avatar will be displayed, so it is estimated that no one will use the APP.

The Redis String type can store binary data, and the Redis String type is a Key corresponding to a value, so the query efficiency will be very high. Indeed, the efficiency of the query can be achieved, but we calculate according to a picture 1M, and how many pictures can be stored in the cache? This is obviously a very expensive approach.

Just now we are considering the state of the server is not down, so if the server is down, then we can no longer provide data storage service; If the hard drive fails, all data will be lost.

Distributed file system

The drawbacks of the traditional file storage approach mentioned above are also the drawbacks of any single point, whether it is a single point database, or a single point cache, or a single point gateway, or a single point registry, which is moving towards distributed clustering.

To summarize, single point file systems have these general weaknesses:

1. The disk capacity bottleneck exists

2. The I/O speed has a bottleneck

3. Risks of data loss due to downtime and disk damage

So for file systems, how can we use a distributed approach to address the above shortcomings?

Solve the disk capacity bottleneck

As mentioned in the previous section, when the disk capacity of the file system on a server is limited because the disk capacity cannot be easily expanded, you need to expand the disk capacity from the hardware level, for example, replace the large-capacity disk.

This method is obviously unrealistic, because replacing the hard disk means that we need to shut down the server. Even if the server in the production environment is shut down for 30 seconds, it is a serious production accident, so we can only use the way of server horizontal expansion. If the hard disk cannot be replaced, we will add servers.

! [the need to build a high performance file system? I recommend you try it] (https://p3-tt.byteimg.com/origin/pgc-image/9353dd9c1e324e1586955cbbe643399d?from=pc)

Multiple servers

This allows us to use multiple servers to make up our file system. Each file server is an independent node to store different files, depending on the specific logic (which needs to be written in this case) to determine which file server to store the files. This way, even when the server is full, we can continue to scale horizontally, and theoretically there is no upper limit to the capacity of our file system.

Solve I/O speed bottleneck

We have just solved the single-point file server capacity bottleneck, but if there are too many files on one or several servers (inefficient query), or a large number of users accessing a server, I/O speed bottleneck can still occur. So how to solve this problem.

Consider a similar case — the MySQL database.

As we all know, MySQL data is also stored in the disk, and when we write SQL statements, in order to ensure query efficiency, we usually avoid full table scan, but let it find our corresponding data through the index.

So we can also avoid sweeping or sweeping in this way, and we all know that the operating system has a natural index for files, which is our multilevel directory. FastDFS also uses this to increase the efficiency of our file IO, which will be shown below.

Resolve the risks of downtime and disk damage

Solving this problem is the fundamental difference between a distributed file system and a stand-alone file system, because both disk capacity bottlenecks and IO speed bottlenecks can be solved by adding hardware configuration, but it is inconvenient and costly. The single-machine mode cannot solve the file service failure caused by downtime or data loss caused by disk damage, because only one copy of data exists.

So let’s think about how a distributed file system solves these problems.

! [the need to build a high performance file system? I recommend you try it] (https://p3-tt.byteimg.com/origin/pgc-image/d123fc5c5cff449ab98712a7a511fc3e?from=pc)

First we need to solve the problem of downtime, pictured above, we have more than one file server node, but if we write logic to determine which server should save a file, assuming that the server is down at this time, the file is still unable to deposit, of course, we can continue to write logic to determine if the downtime after what to do, But FastDFS does this for you. The Tracker node helps you choose which server to upload files to, and if a node goes down, you can select it to upload files from the backup node, preventing you from being unable to operate files due to downtime.

If the hard disk of a server is damaged, the data will still be backed up. Even if the backup server is also damaged, some data will be lost, but not all of it.

FastDFS

With all this talk about the practical problems that distributed file systems can solve, it’s time to jump right into today’s topic, FastDFS.

The overall architecture

The FastDFS file system consists of two parts, the client side and the server side.

A client is usually a program we write (FastDFS also provides a client test program). For example, if we use Java to connect to FastDFS and manipulate files, then our Java program is a client. FastDFS provides proprietary API access. Apis for programming languages such as C, Java, and PHP are available to access FastDFS file systems.

The server consists of two parts, a Tracker and a Storage.

Tracker: It records the status information of storage nodes in the cluster in memory. It is the hub of storage nodes on both client and server. Because all relevant information is stored in memory, each storage will connect to Tracker after startup to inform the group to which it belongs. The performance of the TrackerServer is very high. Assuming that we have hundreds of Storage nodes, we only need about 3 trackers.

A Storage node stores files, including files and file attributes, on a server disk, and implements all file management functions, such as file Storage, file synchronization, and file access. A Storage group is organized by a group. A group can contain multiple storages whose data is backed up for each other. The Storage space is determined by the Storage with the smallest capacity in the group (a bucket). When the Storage is started for the first time, it creates a secondary directory in each Storage directory. There are 256 x 256 secondary directories in total. The uploaded files are Hash routed to one of these sub-directories.

! [the need to build a high performance file system? I recommend you try it] (https://p1-tt.byteimg.com/origin/pgc-image/f39154552eeb4c3f8bf7b292ae01b813?from=pc)

FastDFS overall architecture

The working process

upload

! [the need to build a high performance file system? I recommend you try it] (https://p6-tt.byteimg.com/origin/pgc-image/a45456c5ce9b4736993bb14317672a01?from=pc)

FastDFSUpload

download

! [the need to build a high performance file system? I recommend you try it] (https://p3-tt.byteimg.com/origin/pgc-image/432da4f337dc409d95e06c1e0b29132f?from=pc)

FastDFSDownload

The client sends a download request, and the Tracker returns the Storage address and port according to the file information. (The client can also access the Storage directly from its own file location.)

When a client accesses a Storage, the Storage finds a file based on file_id (group name, virtual disk, secondary directory, and file name) and returns file data.

When the client initiates an upload request, it will first access the Tracker. Because the Storage periodically sends status information to the Tracker, the Tracker contains information about all Storage groups.

Tracker Assigns a Storage Group to the file uploaded by the client based on the local Storage Group information and returns the file to the client.

After obtaining the address and port of the Storage Group, the client uploads the file to the specified Storage Group.

Storage Returns the path and file name of the file.

The Client stores the file information locally.

Single machine installation

Preparations before Installation

The installation

To install FastDFS, you need two source packages: libfastcommon-1.0.43.tar.gz and fastdfS-6.06.tar. gz.

The author’s Github address: fastdfs, libfastcommon, you can download the corresponding package here.

Once the download is complete, upload it to our Linux server

! [the need to build a high performance file system? I recommend you try it] (https://p3-tt.byteimg.com/origin/pgc-image/20b7f67c891f4ae4bcd7c79c34a39247?from=pc)

Run tar -zxvf fastdfs-6.06.tar.gz and tar -zxvf libfastcommon-1.0.43.tar.gz to decompress the files and run sh make.sh in the libfastcommon-1.0.43 directory. After the installation is complete, run sh make.sh install and go to the FastdfS-6.06 directory to perform the same operations.

If multiple FDFS commands exist in the /usr/bin directory, the installation is successful.

! [the need to build a high performance file system? I recommend you try it] (https://p1-tt.byteimg.com/origin/pgc-image/55b47edc12b74a6daec0ba77092a2fc0?from=pc)

The /etfc/ FDFS directory contains all FastDFS configuration files:

Then go to /etc/fdfs, where all the fastDFS configuration files are stored

! [the need to build a high performance file system? I recommend you try it] (https://p6-tt.byteimg.com/origin/pgc-image/156f27542f304dd682c323e21002b00e?from=pc)

As a final step, we need to go to the conf directory where the FastDFS installation package was decompressed, find http.conf and mime.types and copy them to /etc/fdfs.

! [the need to build a high performance file system? I recommend you try it] (https://p1-tt.byteimg.com/origin/pgc-image/29e2d0f58ca2473984ea18482be0da8e?from=pc)

This completes the FastDFS installation.

Configuration file details

Conf, storage.conf, and client.conf. All the configuration items are pasted here as a configuration template. You can copy the configuration directly.

The first is tracker.conf:

! [the need to build a high performance file system? I recommend you try it] (https://p6-tt.byteimg.com/origin/pgc-image/fc52fb13646d4015be915c9f96e158f8?from=pc)
! Need to build a high-performance file system? I recommend you try it] (https://p6-tt.byteimg.com/origin/dfic-imagehandler/291c8e3f-fa23-4415-a4e4-f046920014c5?from=pc)

Storage. The conf:

! Need to build a high-performance file system? I recommend you try it] (https://p6-tt.byteimg.com/origin/dfic-imagehandler/39d29432-4ccd-4c84-b545-1932a9229fbd?from=pc)

Tracker. conf and storage.conf need to be configured

! [the need to build a high performance file system? I recommend you try it] (https://p1-tt.byteimg.com/origin/pgc-image/b6f3bab3889d471793ab789a647fb3e7?from=pc)
! [the need to build a high performance file system? I recommend you try it] (https://p1-tt.byteimg.com/origin/pgc-image/34858bf6e0eb4ba5b9e7ed7dedc18613?from=pc)

Start the

We need some minimal configuration to support FastDFS startup.

The first is the tracker. Conf

! [the need to build a high performance file system? I recommend you try it] (https://p3-tt.byteimg.com/origin/pgc-image/7fdc5bd244974f27af29476d984a0b3b?from=pc)

And then the storage. Conf

! [the need to build a high performance file system? I recommend you try it] (https://p6-tt.byteimg.com/origin/pgc-image/828e60bfe9464ecd8d7c730608d76955?from=pc)

After you have configured and checked that all the directories in the configuration file exist, copy the configuration file to /etc/fdfs and start tracker and storage.

! [the need to build a high performance file system? I recommend you try it] (https://p1-tt.byteimg.com/origin/pgc-image/eb47995243a9400cab98e4834e5c6821?from=pc)

FastDFS file storage method

After FastDFS is enabled, you can go to the storage_path directory that we just configured in storage.conf. You can see that FastDFS creates a data directory in this directory. In the data directory, 256*256 folders are created to store data separately. This improves the efficiency of finding files. FastDFS is a way to solve the PROBLEM of I/O efficiency. It distributes files to each directory, similar to the HashMap in Java, and quickly determines the location of the file through the HashCode of the file.

! [the need to build a high performance file system? I recommend you try it] (https://p6-tt.byteimg.com/origin/pgc-image/b2bc87f4c39d469da73ebb04550dfa47?from=pc)

So far our FastDFS has started successfully.

Check whether GCC, libevent, and libevent-devel are installed on Linux

! [the need to build a high performance file system? I recommend you try it] (https://p6-tt.byteimg.com/origin/pgc-image/27b95c99e5544d1f870305250e5467ab?from=pc)

If no, install it

yum install gcc libevent libevent-devel -y
Copy the code

A functional test

You’ve successfully started FastDFS above, and you can see the changes to the data directory since it started. Now you can test the FastDFS functionality using the client.

First we need to configure the client. Find the client.conf configuration file in /etc/fdfs and do the following minimal configuration:

! [the need to build a high performance file system? I recommend you try it] (https://p1-tt.byteimg.com/origin/pgc-image/af533264050f4fe19ea8a7c1ae81b9a2?from=pc)

After the configuration is complete, create a test file in any directory.

! [the need to build a high performance file system? I recommend you try it] (https://p1-tt.byteimg.com/origin/pgc-image/bd305d05c9264897a2abac9a91f887ef?from=pc)

Once the files are created and written, you can test your deployed FDFS in a variety of ways.

The FastDFS client uses the fDFs_test configuration file upload path to upload a file, as shown in the following example:

! Need to build a high-performance file system? I recommend you try it] (https://p1-tt.byteimg.com/origin/dfic-imagehandler/cdd72ee0-3204-453b-80ef-73a4d3316da0?from=pc)

After executing this command, you can view some information about the storage server to which the file was uploaded, such as group_name, IP address, port number, etc. You can also view the group to which the file was uploaded and the path to which the file is stored in the group.

The file we uploaded this time is uploaded to group1 (because we only have one storage and only set one group). The path to upload is M00/ED/49/wKiJA19kwRqAI_g6AAAAEbcXlKw7921454. Then you can go to this directory to see if the file has been uploaded successfully.

! [the need to build a high performance file system? I recommend you try it] (https://p3-tt.byteimg.com/origin/pgc-image/509eb9924b8f4e258f5b9b6280605b76?from=pc)

The file has been uploaded successfully, but there are not only the file we uploaded just now, but also some other files under this directory. Here is a description:

Filename: indicates the file ontology.

Filename -m: specifies the file metadata, such as the file type, size, length and width if the file is a picture.

Filename_big: Indicates a backup file. If active and standby servers exist, the file is stored on the backup server.

Filename_big -m: backs up file metadata. If active and standby servers exist, the file is stored on the backup server.

To download a file, use the FastDFS test client. The method for downloading a file is similar to that for uploading a file. Run the fDFs_test command to configure the file path download Group name Remote file name.

The following is an example:

! [the need to build a high performance file system? I recommend you try it] (https://p6-tt.byteimg.com/origin/pgc-image/33ddb560b1a7493781b513cca9551646?from=pc)

The file is downloaded successfully. Procedure

File deletion test, run fdfs_test to configure file path delete Group name The following is an example of a remote file name:

! [the need to build a high performance file system? I recommend you try it] (https://p6-tt.byteimg.com/origin/pgc-image/a385718ecbea4d42ae0f5848f984f10f?from=pc)

Only backup files exist, and the file body has been deleted successfully.

FastDFS HTTP access

We only use FastDFS client testing tools to test file uploads, downloads, and deletions. However, in reality, we do not use FastDFS client testing tools to test file uploads, downloads, and deletions. We need the cooperation of Nginx.

Nginx installation will not be described here, but by default everyone has installed Nginx. Nginx is directly configured here.

Nginx-module fastdfs-nginx-module fastdfs-nginx-module fastdfs-nginx-module

Once the download is complete, upload it to the server and unzip it:

! [the need to build a high performance file system? I recommend you try it] (https://p1-tt.byteimg.com/origin/pgc-image/40f229e2993343bbb254eeee16e135c7?from=pc)

Then copy the mod_fastdfs.conf file to /etc/fdfs and modify it as follows:

! [the need to build a high performance file system? I recommend you try it] (https://p3-tt.byteimg.com/origin/pgc-image/eb34cf0d484d4498bafe94d725d593ff?from=pc)

After configuration, we need to add this extension module to the original nginx:

! [the need to build a high performance file system? I recommend you try it] (https://p1-tt.byteimg.com/origin/pgc-image/d9c4f20c4710478e9d0795f37f7521af?from=pc)

Add a server to the nginx.conf file:

! [the need to build a high performance file system? I recommend you try it] (https://p1-tt.byteimg.com/origin/pgc-image/1e970aff4265494986fdc1ec18636c10?from=pc)

Then restart nginx:

! [the need to build a high performance file system? I recommend you try it] (https://p1-tt.byteimg.com/origin/pgc-image/3c170f021abe488bbc6d5f11551f0533?from=pc)

If your Nginx and Fastdfs are both started, then access is already available.

! [the need to build a high performance file system? I recommend you try it] (https://p3-tt.byteimg.com/origin/pgc-image/72f7a25a238a4fee9125174bf3d71b5c?from=pc)

Access to success

Fastdfs-nginx-module execution principle

After the file access, let’s review what we have just done. First, we installed nginx, then added nginx fastdFS extension module to nginx, and then configured the extension module and nginx, but when configuring the nginx proxy, Instead of writing the proxy address directly to the configuration as before, we wrote the fastdFS extension module directly where the proxy address was written. So how does this module work?

In the traditional nginx reverse proxy configuration, we should intercept the request and then directly configure the address to the target server, but this points directly to the FastdFS module, which obviously does this for us.

Remember that we configured the address of the Tracker Server in the config file for the extension module?

When we request any Group, it will be intercepted by Nginx and sent to the extension module. Then the extension module forwards the request to the Tracker through the address of the Tracker Server that we configured. The Tracker will follow its own local Group mapping table. The Tracker returns 192.168.137.3:23000 to nginx. Then the extension module uses this address to access the storage. Gets the file stream and returns it to the browser.

! [the need to build a high performance file system? I recommend you try it] (https://p6-tt.byteimg.com/origin/pgc-image/4af0df2b36cd4e9b9d1b9d5734a303f9?from=pc)

The extension module executes the process

FastDFS distributed cluster setup

After the FastDFS single-point FastDFS runs, some students may be confused, but it is not very different from our previous file system. We have not seen the horizontal expansion and disaster backup mentioned above.

No rush, no rush. It’s coming.

Just now we deployed FastDFS on a machine, and tested the upload, download and delete functions, and finally integrated nginx to complete the use of the browser to access files, and learned about the operation principle of the extension module. This is to give you a better understanding of FastDFS, but this article focuses on distributed file systems, which are characterized by disaster recovery, backup, scalability, and high availability. So the next is the highlight, to talk about FastDFS distributed cluster construction.

Architecture diagram

We need to prepare seven Linux VMS to complete the cluster construction, including one Nginx, two Tracker servers, and four storages. They are divided into two groups, one of which is active and one of which is standby.

The Linux server information prepared here is as follows:

! [the need to build a high performance file system? I recommend you try it] (https://p1-tt.byteimg.com/origin/pgc-image/c4b8faf78ace4da39984900b2636e560?from=pc)

Two storages in Group1 back up each other, and two storages in Group2 back up each other.

! [the need to build a high performance file system? I recommend you try it] (https://p1-tt.byteimg.com/origin/pgc-image/bbf89eb365b74587bb4519611a6e30cd?from=pc)

Set up

Install Nginx and FastDFS on each server as described above. (Steps above)

You are advised to run the yum command to install the dependency packages before installation:

yum -y install gcc perl openssl openssl-devel pcre pcre-devel zlib zlib-devel libevent libevent-devel wget net-tools

Configure the cluster

The configuration of the cluster is slightly different from that of the single unit above. Since we separate the Tracker and the Storage, it is not necessary to configure the Storage on the server where the Tracker is installed, and it is also not necessary to configure the Tracker on the machine where the Storage is installed.

Points that Tracker(101 and 102 servers) needs to be configured:

! [the need to build a high performance file system? I recommend you try it] (https://p1-tt.byteimg.com/origin/pgc-image/4a369a717a5f4a148edcf6f75cfa167c?from=pc)

Storage(103 104 105 106 Server)

! [the need to build a high performance file system? I recommend you try it] (https://p1-tt.byteimg.com/origin/pgc-image/153e1e5b46c84682a09542c9c1d9914e?from=pc)

Storage configuration

! [the need to build a high performance file system? I recommend you try it] (https://p6-tt.byteimg.com/origin/pgc-image/e0d980fe31124c34b1eded538faea09a?from=pc)

The cluster start

Trakcer is launched using the fDFs_trackered configuration file path:

! [the need to build a high performance file system? I recommend you try it] (https://p1-tt.byteimg.com/origin/pgc-image/657487c08950462b9e9a78df2f86bc51?from=pc)

The Tracker start

Start storage using the fdfs_stroaged configuration file path:

! [the need to build a high performance file system? I recommend you try it] (https://p1-tt.byteimg.com/origin/pgc-image/8e2bc3cf6f4d4422beeb196e80a3d588?from=pc)

You can run the fdfs_monitor /etc/fffs/storage. conf command on any storage server to check the status of the entire cluster.

! Need to build a high-performance file system? I recommend you try it] (https://p1-tt.byteimg.com/origin/dfic-imagehandler/941fa2de-a07a-443b-a303-46ef4a9d2a45?from=pc)

We can see that the cluster has been set up successfully, and we can see the status of each storage, such as each node’s group, IP, storage space size, HTTP port, whether started, connected tracker server, and so on.

Cluster testing

Configure the client.conf file on any of the six machines. The configuration items are as follows:

! [the need to build a high performance file system? I recommend you try it] (https://p3-tt.byteimg.com/origin/pgc-image/74078b7079664471b371dd083ecf8d59?from=pc)

Then create a file to test the upload function and upload it using fDFs_upload_file. Since we set the upload mode to polling, remember to upload several times to see the effect.

! [the need to build a high performance file system? I recommend you try it] (https://p1-tt.byteimg.com/origin/pgc-image/bd569d0386174885aaa10a3a2ee86fbe?from=pc)

Uploading files to a Cluster

In the upload effect, you can see that the two servers in group1 and group2 are mutually backed up.

! [the need to build a high performance file system? I recommend you try it] (https://p1-tt.byteimg.com/origin/pgc-image/d6d64255ed274f6891dbd0872818fbf7?from=pc)

Load Balancing Policy

The upload policy we just set is polling, so we can see that every time we upload, we will switch to a different group than the last one. FastDFS supports three load balancing policies: polling, specifying a group for uploading, and selecting the group with the most free space for uploading.

! [the need to build a high performance file system? I recommend you try it] (https://p6-tt.byteimg.com/origin/pgc-image/1fa0922ca601460c8c25e20b1f84468c?from=pc)

Due to the limited space, there is not a test here, interested students can take the test offline.

Access files in the cluster

Just to recap, when configuring singleton FastDFS above, how do we access files over HTTP?

We use Nginx to install the FastdFS extension module, and then make a reverse proxy in nginx to point to the extension module, and the extension module requests our tracker server to obtain the IP port number of the storage server corresponding to the group. The extension module then takes this information and fetches the file stream from the Storage Server and returns it to the browser.

So the FastDFS cluster is the same, we also need nginx to access files, but the configuration is slightly different here.

We configure nginx in several cases: Tracker, Storage, and portal server.

Nginx configuration for Tracker Server:

! [the need to build a high performance file system? I recommend you try it] (https://p1-tt.byteimg.com/origin/pgc-image/9e659125c8604bd78f9f742d77c699ef?from=pc)

If the work process of nginx is not started properly, copy the mime.types and http.conf files in the mod_fastdfs.conf and fastdfs decompression directories to /etc/fdfs.

Storage Server nginx configuration:

First you need to configure mod_fastdfs.conf

! Need to build a high-performance file system? I recommend you try it] (https://p6-tt.byteimg.com/origin/dfic-imagehandler/04b16cce-7b5b-49f8-8ddb-5e2d04c94931?from=pc)

Nginx configuration:

! [the need to build a high performance file system? I recommend you try it] (https://p3-tt.byteimg.com/origin/pgc-image/2cc8a9d39c734dba809a1be32146f688?from=pc)

Then start nginx for Storage.

Test the access:

! [the need to build a high performance file system? I recommend you try it] (https://p1-tt.byteimg.com/origin/pgc-image/982d391f3e2642fe84801a5581055b87?from=pc)

Test access

Cluster Access Process

In fact, no matter which server we visited just now, we can access this file normally.

We have configured stroage load balancing in tracker and fastdFS extension module in Stroage reverse proxy.

Assuming that we are accessing tracker, then the tracker server is configured with load balancing, and the load balancing will automatically route to any storage. The storage is configured with the extension module, which will bring the group we are accessing to request tracker. Tracker returns the IP and port number of the group’s storage.

If we access a storage, then the extension module in the storage will directly carry the group in our URL to access the tracker, and also find the IP and port of the storage.

So as long as the group is correct, no matter which machine you access, you can access the file.

Unified Gateway Configuration

Remember before we built the cluster, we said we needed seven machines, but now we only use six, and that’s where the seventh machine is used.

Because we just put the cluster together, but this way we need to remember six IP addresses. Let’s go back to the original architecture diagram:

! [the need to build a high performance file system? I recommend you try it] (https://p3-tt.byteimg.com/origin/pgc-image/67fdec3985d0447b9083163c2d9888b8?from=pc)

We need to provide an Nginx, load balancing between the two trackers, and then our subsequent applications will only need to access this portal to use the entire cluster.

Nginx configuration is as follows:

! [the need to build a high performance file system? I recommend you try it] (https://p1-tt.byteimg.com/origin/pgc-image/aac6b68eed0f47d28b3a0f29449d5a55?from=pc)

Testing:

! [the need to build a high performance file system? I recommend you try it] (https://p1-tt.byteimg.com/origin/pgc-image/48c92696bd934cbfb36e8897a1cf0260?from=pc)

The access succeeded through the entry. Procedure

The cluster is set up.

conclusion

Distributed file system has some advantages over traditional file system, including disaster recovery backup, horizontal expansion, and the solution to traditional file system. FastDFS integrated Nginx, as one of the solutions of distributed file system, can well solve the problem of massive file storage in some production environments.

FastDFS also makes it easy to upload and download files using Java programs, but this is not covered in this article due to space, but I will explain how to use FastDFS in real projects in the next blog post if you are interested.