Architecture path FastDFS distributed file system

An overview of the

Distributed file system: DFS, also known as Network file system. A file system that allows files to be shared across a network across multiple hosts, allowing multiple users on multiple machines to share files and storage space.

FastDFS is written in c an open source distributed file system, give full consideration to the redundancy backup, load balancing and linear expansion mechanism, and pay attention to the high availability, high performance and other indicators, features include: file storage, file synchronization, file access (such as file upload, download), solved the problem of the large capacity storage and load balancing. It is especially suitable for small and medium files (recommended range: 4KB < file_size <500MB), and for online file-based services, such as album websites and video websites.

FastDFS architecture

The FastDFS architecture includes Tracker Server and Storage Server. The client requests the Tracker Server to upload and download files. The Storage Server uploads and downloads files through the Tracker Server.

The Tracker Server

Mainly do scheduling work, play a balanced role; Responsible for the management of all storage servers and groups. After each storage is started, it will connect to Tracker to inform its group and other information, and maintain periodic heartbeat. Tracker Sets up a mapping table for group==>[Storage ServerList] based on the storage heartbeat information.

A Tracker has very little meta information to manage and is stored in memory. In addition, the meta information on tracker is generated by the information reported by the storage, and there is no need to persist any data itself, which makes tracker very easy to expand. By directly adding tracker machine, you can expand to serve tracker cluster. Each tracker in the cluster is completely equivalent. All the trackers accept stroage’s heartbeat information and generate metadata information to provide read and write services.

Storage Server Storage Server

Mainly provides capacity and backup services; The unit is group. Each group can contain multiple storage servers for mutual backup. Group storage can facilitate application isolation, load balancing, and number of copies customization (the number of storage servers in a group is the number of copies in the group). For example, application data can be isolated by storing different application data in different groups. In addition, applications can be assigned to different groups for load balancing based on their access characteristics. The disadvantage is that the capacity of the group is limited by the storage capacity of a single machine. At the same time, when a machine in the group fails, data recovery can only rely on other machines in the group, which takes a long time to recover.

The storage of each storage in a group depends on the local file system. A storage can be configured with multiple data storage directories. For example, if there are 10 disks mounted to /data/disk1-/data/disk10, all 10 directories can be configured as data storage directories of the storage. When receiving a file write request, the storage selects a storage directory to store files based on the configured rules. The number of documents in order to avoid a single directory is too much, for the first time in storage starts, in each data stored in the directory to create 2 levels of subdirectories, each level 256, a total of 65536 files, newly written documents will be routed to the one in the form of a hash a subdirectory, then the file data as a local file stored in the directory.

FastDFS storage policy

To support large capacity, storage nodes (servers) are organized into volumes (or groups). A storage system consists of one or more volumes whose files are independent of each other. The file capacity of all volumes is the total file capacity of the entire storage system. A volume can be composed of one or more storage servers. All files on the storage servers under a volume are the same. Multiple storage servers in a volume provide redundant backup and load balancing.

When a server is added to a volume, the system automatically synchronizes existing files. After the synchronization is complete, the system automatically switches the new server to online services. When the storage space is insufficient or about to be used up, you can dynamically add volumes. You only need to add one or more servers and configure them as a new volume, thus increasing the capacity of the storage system.

FastDFS upload process

FastDFS provides users with basic file access interfaces, such as Upload, Download, Append, and Delete, in the form of client libraries.

The Storage Server periodically sends its Storage information to the Tracker Server. If there is more than one Tracker Server in a Tracker Server Cluster, the relationship between the Tracker servers is equal. Therefore, the client can select any Tracker when uploading.

When the Tracker receives a request from the client to upload a file, it will assign a group for the file. After the group is selected, the Tracker must decide which storage server in the group to assign to the client. After a storage server is allocated, the client sends a file write request to the storage. The storage allocates a data storage directory for the file. Then assign a fileID to the file, and finally generate a file name to store the file based on the above information.

Select the tracker server

When there is more than one Tracker server in the cluster, the client can choose any Trakcer when uploading a file because the relationship between trackers is completely peer.

Select the group for the storage

When the tracker receives a request for an Upload file, it will assign the file to a group that can store the file. The following rules support the selection of groups: 2. Specified group to specify a Specified group. 3

Choose the storage server

After the tracker is selected, it will select a storage server from the group to the client. The following rules support the selection of storage: First server ordered by IP. First server ordered by priority. Sort by priority (Priority is configured on storage)

Choose the storage path

After a storage server is allocated, the client sends a file write request to the storage. The storage allocates a data storage directory for the file. The following rules are supported: Round robin: Round robin among multiple storage directories. The one with the most free storage space takes precedence

Generate a count

After the storage directory is selected, the storage generates a Fileid for the file, which is a concatenation of the storage server IP address, file creation time, file size, file CRc32, and a random number. The binary string is base64 encoded and converted into a printable string.

Select a two-level directory

After selecting a storage directory, the storage assigns a fileID to the file. Each storage directory has two levels of 256 x 256 subdirectories. The storage hashes the file twice based on the fileID, routes the file to one of the subdirectories, and stores the file to the subdirectory with the fileID as the file name.

Generate file name

After a file is stored in a subdirectory, it is considered that the file is successfully stored. Then, a file name is generated for the file. The file name is a combination of group, storage directory, two-level subdirectories, FileID, and file name extension (specified by the client to distinguish file types).

FastDFS file synchronization

When a file is written to a storage server in a group, the client considers that the file is successfully written. After the storage Server writes the file, the background thread synchronizes the file to other storage servers in the same group.

After each storage writes a file, it also writes a binlog. The binlog does not contain file data, but only file name and other meta information. This binlog is used for background synchronization. Progress is recorded as a timestamp, so it is best to keep the clocks of all servers in the cluster in sync.

The synchronization progress of a storage is reported to tracker as part of metadata. Tracke uses the synchronization progress as a reference when selecting a storage to read.

For example, if there are three storage servers A, B, and C in A group, A synchronizes to T1 from C (all files written before T1 have been synchronized to B), and B synchronizes to T2 from C (T2 > T1), when the tracker receives the synchronization progress information, it will arrange it. Use the smallest one as the synchronization timestamp of C. In this case, T1 is C’s synchronization timestamp T1 (all data written before T1 has been synchronized to C). Similarly, according to the above rules, the tracker generates A synchronization timestamp for A and B.

FastDFS file download

After the client Uploadfile succeeds, it will get a file name generated by the storage. Then the client can access the file according to the file name.

As with the Upload file, the client can select any Tracker server when downloading the file. When tracker sends a download request to a tracker, the file name must be displayed. Tracke retrieves information about the file name, such as the file group, size, and creation time, and then selects a storage for the request to serve the read request.

FastDFS performance solution

FastDFS installation

The software package	version
FastDFS	v5.05
libfastcommon	v1.0.7

Download and install libfastCommon

download


     
      Wget HTTP: / / https://github.com/happyfish100/libfastcommon/archive/V1.0.7.tar.gz
     
Copy the code

Unpack the


     
      The tar - XVF V1.0.7. Tar. Gz
      CD libfastcommon - 1.0.7
     
Copy the code

Compile and install


     
      ./make.sh
      ./make.sh install
     
Copy the code

Creating soft Links


     
      ln -s /usr/lib64/libfastcommon.so /usr/local/lib/libfastcommon.so
      ln -s /usr/lib64/libfastcommon.so /usr/lib/libfastcommon.so
      ln -s /usr/lib64/libfdfsclient.so /usr/local/lib/libfdfsclient.so
      ln -s /usr/lib64/libfdfsclient.so /usr/lib/libfdfsclient.so 
     
Copy the code

Download and install FastDFS

Download FastDFS


     
      Wget HTTP: / / https://github.com/happyfish100/fastdfs/archive/V5.05.tar.gz
     
Copy the code

Unpack the


     
      The tar - XVF V5.05. Tar. Gz
      CD fastdfs - 5.05
     
Copy the code

Compile and install


     
      ./make.sh
      ./make.sh install
     
Copy the code

Configuring the Tracker Service

After the above installation is successful, there will be an FDFS directory in the /etc/directory to access it. Here is a sample file that the author gave us. We need to change the tracker.conf.sample file to the tracker.conf configuration file and modify it:


     
      cp tracker.conf.sample tracker.conf
      vi tracker.conf
     
Copy the code

Edit the tracker. Conf


     
      False indicates that the configuration file does not take effect
      disabled=false
      
      The port that provides the service
      port=22122
      
      # Tracker Data and log directory address
      base_path=//home/data/fastdfs
      
      # HTTP service port
      http.server_port=80
     
Copy the code

Create the base data directory for tracker, the directory corresponding to base_path


     
      mkdir -p /home/data/fastdfs
     
Copy the code

Use ln -s to establish a soft link


     
      ln -s /usr/bin/fdfs_trackerd /usr/local/bin
      ln -s /usr/bin/stop.sh /usr/local/bin
      ln -s /usr/bin/restart.sh /usr/local/bin
     
Copy the code

Start the service


     
      service fdfs_trackerd start
     
Copy the code

Check the monitor


     
      netstat -unltp|grep fdfs
     
Copy the code

If port 22122 is monitored normally, the Tracker service is started successfully.

Tracker server directory and file structure After the Tracker service is successfully started, the data and logs directories are created in base_path. The directory structure is as follows:


     
      ${base_path}
        |__data
      | | __storage_groups. Dat: storage group information
      | | __storage_servers. Dat: storage server list
        |__logs
      | | __trackerd. Log: the tracker server log file
     
Copy the code

Configuring Storage Services

Go to the /etc/fdfs directory, copy the FastDFS storage sample configuration file storage.conf.sample, and name it storage.conf


     
      # cd /etc/fdfs
      # cp storage.conf.sample storage.conf
      # vi storage.conf
     
Copy the code

Edit storage. Conf


     
      False indicates that the configuration file does not take effect
      disabled=false
      
      Storage server group (volume)
      group_name=group1
      
      Storage server service port
      port=23000
      
      # Heartbeat interval, in seconds (this refers to actively sending heartbeat to tracker server)
      heart_beat_interval=30
      
      The root directory must exist, subdirectories will be generated automatically.
      base_path=/home/data/fastdfs/storage
      
      Storage Server supports multiple paths for storing files. The number of base paths for storing files is set here. Usually, only one directory is configured.
      store_path_count=1
      
      # configure store_path_count paths one by one, index number based on 0.
      If store_path0 is not configured, it is the same path as base_path.
      store_path0=/home/data/fastdfs/storage
      
      # FastDFS uses two levels of directories to store files. Set the number of directories for storing files.
      # If this parameter is set to N (e.g. : 256), the storage Server will automatically create N * N subdirectories under store_path for storing files when it is first run.
      subdir_count_per_path=256
      
      # tracker_server list, will actively connect to Tracker_Server
      If there are multiple tracker servers, write one line for each tracker server
      Tracker_server = 192.168.1.190:22122
      
      The time period during which system synchronization is allowed (default: full day). It is used to avoid problems with peak synchronization.
      sync_start_time=00:00
      sync_end_time=23:59
     
Copy the code

Use ln -s to establish a soft link


     
      ln -s /usr/bin/fdfs_storaged /usr/local/bin
     
Copy the code

Start the service


     
      service fdfs_storaged start
     
Copy the code

Check the monitor


     
      netstat -unltp|grep fdfs
     
Copy the code

Make sure Tracker is enabled before starting Storage. First start is successful, can be in/home/data/fastdfs/storage directory to create data, logs two directories. If port 23000 is monitored normally, the Storage service is started successfully.

Check whether the Storage and Tracker are communicating


     
      /usr/bin/fdfs_monitor /etc/fdfs/storage.conf
     
Copy the code

FastDFS configures the Nginx module

The software package	version
openresty	v1.13.6.1
fastdfs-nginx-module	v1.1.6

FastDFS uses the Tracker server to store files on the Storage server. However, files need to be replicated between Storage servers in the same group, resulting in synchronization delay.

Suppose the Tracker server uploads the file to 192.168.1.190, and the file ID is returned to the client. In this case, the FastDFS storage cluster mechanism synchronizes the file to 192.168.1.190. If a client uses this file ID to fetch files from 192.168.1.190, the file cannot be accessed. Fastdfs-nginx-module redirects the file link to the source server to retrieve the file, avoiding the file failure caused by replication delay on the client.

Nginx and fastdfs-nginx-module:

It is recommended that you install the following development libraries using yum:


     
      yum install readline-devel pcre-devel openssl-devel -y
     
Copy the code

Download the latest version and unzip:


     
      Wget HTTP: / / https://openresty.org/download/openresty-1.13.6.1.tar.gz
      
      The tar - XVF openresty - 1.13.6.1. Tar. Gz
      
      wget https://github.com/happyfish100/fastdfs-nginx-module/archive/master.zip
      
      unzip master.zip
     
Copy the code

Install nginx and add the fastdfs-nginx-module module:


     
      ./configure --add-module=.. /fastdfs-nginx-module-master/src/
     
Copy the code

Compile, install:


     
      make && make install
     
Copy the code

View Nginx modules:


     
      /usr/local/openresty/nginx/sbin/nginx -v
     
Copy the code

If the following is present, the module was added successfully

Copy the fastdfs-nginx-module configuration file to /etc/fdfs and modify it:


     
      cp /fastdfs-nginx-module/src/mod_fastdfs.conf /etc/fdfs/
     
Copy the code


     
      Connection timeout
      connect_timeout=10
      
      # Tracker Server
      Tracker_server = 192.168.1.190:22122
      
      # StorageServer Default port
      storage_server_port=23000
      
      Set to true if the uri of the file ID contains /group**
      url_have_group_name = true
      
      The store_path0 path configured for Storage must be the same as that specified in storage.conf
      store_path0=/home/data/fastdfs/storage
     
Copy the code

Copy some FastDFS configuration files to /etc/fdfs directory:


     
      cp /fastdfs-nginx-module/src/http.conf /etc/fdfs/
      cp /fastdfs-nginx-module/src/mime.types /etc/fdfs/
     
Copy the code

Configure nginx, modify nginx.conf:


     
      location ~/group([0-9])/M00 {
          ngx_fastdfs_module;
      }
     
Copy the code

Nginx start:


     
      [root@iz2ze7tgu9zb2gr6av1tysz sbin]# ./nginx
      ngx_http_fastdfs_set pid=9236
     
Copy the code

Test upload:


     
      [root@iz2ze7tgu9zb2gr6av1tysz fdfs]# /usr/bin/fdfs_upload_file /etc/fdfs/client.conf /etc/fdfs/4.jpg
      group1/M00/00/00/rBD8EFqVACuAI9mcAAC_ornlYSU088.jpg
     
Copy the code

Deployment structure diagram:

JAVA client integration

Pom. XML is introduced into:


     
      <! -- fastdfs -->
      <dependency>
          <groupId>org.csource</groupId>
          <artifactId>fastdfs-client-java</artifactId>
      < version > 1.27 < / version >
      </dependency>
     
Copy the code

Fdfs_client. Conf configuration:


     
      Specifies the timeout period for connecting to the tracker server
      connect_timeout = 2  
      Timeout duration of socket connection
      network_timeout = 30
      # File content encoding
      charset = UTF-8 
      #tracker server port
      http.tracker_http_port = 8080
      http.anti_steal_token = no
      http.secret_key = FastDFS1234567890
      #tracker server IP address and port
      Tracker_server = 192.168.1.190:22122
     
Copy the code

FastDFSClient upload class:


     
      public class FastDFSClient{
          private static final String CONFIG_FILENAME = "D:\\itstyle\\src\\main\\resources\\fdfs_client.conf";
          private static final String GROUP_NAME = "market1";
          private TrackerClient trackerClient = null;
          private TrackerServer trackerServer = null;
          private StorageServer storageServer = null;
          private StorageClient storageClient = null;
      
          static{
              try {
                  ClientGlobal.init(CONFIG_FILENAME);
              } catch (IOException e) {
                  e.printStackTrace();
              } catch (MyException e) {
                  e.printStackTrace();
              }
          }
          public FastDFSClient() throws Exception {
             trackerClient = new TrackerClient(ClientGlobal.g_tracker_group);
             trackerServer = trackerClient.getConnection();
             storageServer = trackerClient.getStoreStorage(trackerServer);;
             storageClient = new StorageClient(trackerServer, storageServer);
          }
      
      / * *
      * Upload files
      * @param file File object
      * @param fileName specifies the fileName
           * @return
      * /
          public  String[] uploadFile(File file, String fileName) {
              return uploadFile(file,fileName,null);
          }
      
      / * *
      * Upload files
      * @param file File object
      * @param fileName specifies the fileName
      * @param metaList file metadata
           * @return
      * /
          public  String[] uploadFile(File file, String fileName, Map<String,String> metaList) {
              try {
                  byte[] buff = IOUtils.toByteArray(new FileInputStream(file));
                  NameValuePair[] nameValuePairs = null;
      if (metaList ! = null) {
                      nameValuePairs = new NameValuePair[metaList.size()];
                      int index = 0;
                      for (Iterator<Map.Entry<String,String>> iterator = metaList.entrySet().iterator(); iterator.hasNext();) {
                          Map.Entry<String,String> entry = iterator.next();
                          String name = entry.getKey();
                          String value = entry.getValue();
                          nameValuePairs[index++] = new NameValuePair(name,value);
                      }
                  }
                  return storageClient.upload_file(GROUP_NAME,buff,fileName,nameValuePairs);
              } catch (Exception e) {
                  e.printStackTrace();
              }
              return null;
          }
      
      / * *
      * Get file metadata
      * @param fileId Indicates the ID of a file
           * @return
      * /
          public Map<String,String> getFileMetadata(String groupname,String fileId) {
              try {
                  NameValuePair[] metaList = storageClient.get_metadata(groupname,fileId);
      if (metaList ! = null) {
                      HashMap<String,String> map = new HashMap<String, String>();
                      for (NameValuePair metaItem : metaList) {
                          map.put(metaItem.getName(),metaItem.getValue());
                      }
                      return map;
                  }
              } catch (Exception e) {
                  e.printStackTrace();
              }
              return null;
          }
      
      / * *
      * Delete files
      * @param fileId Indicates the ID of a file
      * @return Returns -1 on delete failure, or 0 otherwise
      * /
          public int deleteFile(String groupname,String fileId) {
              try {
                  return storageClient.delete_file(groupname,fileId);
              } catch (Exception e) {
                  e.printStackTrace();
              }
              return -1;
          }
      
      / * *
      * Download files
      * @param fileId fileId (ID returned after successfully uploading the file)
      * @param outFile File download location
           * @return
      * /
          public  int downloadFile(String groupName,String fileId, File outFile) {
              FileOutputStream fos = null;
              try {
                  byte[] content = storageClient.download_file(groupName,fileId);
                  fos = new FileOutputStream(outFile);
                  InputStream ips = new ByteArrayInputStream(content); 
                  IOUtils.copy(ips,fos);
                  return 0;
              } catch (Exception e) {
                  e.printStackTrace();
              } finally {
      if (fos ! = null) {
                      try {
                          fos.close();
                      } catch (IOException e) {
                          e.printStackTrace();
                      }
                  }
              }
              return -1;
          }
          public static void main(String[] args) throws Exception {
              FastDFSClient client = new FastDFSClient();
              File file = new File("D:\\23456.png");
              String[] result = client.uploadFile(file, "png");
              System.out.println(result.length);
              System.out.println(result[0]);
              System.out.println(result[1]);
          }
      }
     
Copy the code

Executing the main method returns:


     
      2
      group1
      M00/00/00/rBD8EFqTrNyAWyAkAAKCRJfpzAQ227.png
     
Copy the code

Source:

https://gitee.com/52itstyle/spring-boot-fastdfs

A wechat public account with temperature

I look forward to making progress together with you and sharing beautiful articles

Share various Java learning resources