An overview of the

Distributed file system: DFS, also known as Network file system. A file system that allows files to be shared across a network across multiple hosts, allowing multiple users on multiple machines to share files and storage space.

FastDFS is written in c an open source distributed file system, give full consideration to the redundancy backup, load balancing and linear expansion mechanism, and pay attention to the high availability, high performance and other indicators, features include: file storage, file synchronization, file access (such as file upload, download), solved the problem of the large capacity storage and load balancing. It is especially suitable for small and medium files (recommended range: 4KB < file_size <500MB), and for online file-based services, such as album websites and video websites.

FastDFS architecture

The FastDFS architecture includes Tracker Server and Storage Server. The client requests the Tracker Server to upload and download files. The Storage Server uploads and downloads files through the Tracker Server.

The Tracker Server

Mainly do scheduling work, play a balanced role; Responsible for the management of all storage servers and groups. After each storage is started, it will connect to Tracker to inform its group and other information, and maintain periodic heartbeat. Tracker Sets up a mapping table for group==>[Storage ServerList] based on the storage heartbeat information.

A Tracker has very little meta information to manage and is stored in memory. In addition, the meta information on tracker is generated by the information reported by the storage, and there is no need to persist any data itself, which makes tracker very easy to expand. By directly adding tracker machine, you can expand to serve tracker cluster. Each tracker in the cluster is completely equivalent. All the trackers accept stroage’s heartbeat information and generate metadata information to provide read and write services.

Storage Server Storage Server

Mainly provides capacity and backup services; The unit is group. Each group can contain multiple storage servers for mutual backup. Group storage can facilitate application isolation, load balancing, and number of copies customization (the number of storage servers in a group is the number of copies in the group). For example, application data can be isolated by storing different application data in different groups. In addition, applications can be assigned to different groups for load balancing based on their access characteristics. The disadvantage is that the capacity of the group is limited by the storage capacity of a single machine. At the same time, when a machine in the group fails, data recovery can only rely on other machines in the group, which takes a long time to recover.

The storage of each storage in a group depends on the local file system. A storage can be configured with multiple data storage directories. For example, if there are 10 disks mounted to /data/disk1-/data/disk10, all 10 directories can be configured as data storage directories of the storage. When receiving a file write request, the storage selects a storage directory to store files based on the configured rules. The number of documents in order to avoid a single directory is too much, for the first time in storage starts, in each data stored in the directory to create 2 levels of subdirectories, each level 256, a total of 65536 files, newly written documents will be routed to the one in the form of a hash a subdirectory, then the file data as a local file stored in the directory.

FastDFS storage policy

To support large capacity, storage nodes (servers) are organized into volumes (or groups). A storage system consists of one or more volumes whose files are independent of each other. The file capacity of all volumes is the total file capacity of the entire storage system. A volume can be composed of one or more storage servers. All files on the storage servers under a volume are the same. Multiple storage servers in a volume provide redundant backup and load balancing.

When a server is added to a volume, the system automatically synchronizes existing files. After the synchronization is complete, the system automatically switches the new server to online services. When the storage space is insufficient or about to be used up, you can dynamically add volumes. You only need to add one or more servers and configure them as a new volume, thus increasing the capacity of the storage system.

FastDFS upload process

FastDFS provides users with basic file access interfaces, such as Upload, Download, Append, and Delete, in the form of client libraries.

The Storage Server periodically sends its Storage information to the Tracker Server. If there is more than one Tracker Server in a Tracker Server Cluster, the relationship between the Tracker servers is equal. Therefore, the client can select any Tracker when uploading.

When the Tracker receives a request from the client to upload a file, it will assign a group for the file. After the group is selected, the Tracker must decide which storage server in the group to assign to the client. After a storage server is allocated, the client sends a file write request to the storage. The storage allocates a data storage directory for the file. Then assign a fileID to the file, and finally generate a file name to store the file based on the above information.

Select the tracker server

When there is more than one Tracker server in the cluster, the client can choose any Trakcer when uploading a file because the relationship between trackers is completely peer.

Select the group for the storage

When the tracker receives a request for an Upload file, it will assign the file to a group that can store the file. The following rules support the selection of groups: 2. Specified group to specify a Specified group. 3

Choose the storage server

After the tracker is selected, it will select a storage server from the group to the client. The following rules support the selection of storage: First server ordered by IP. First server ordered by priority. Sort by priority (Priority is configured on storage)

Choose the storage path

After a storage server is allocated, the client sends a file write request to the storage. The storage allocates a data storage directory for the file. The following rules are supported: Round robin: Round robin among multiple storage directories. The one with the most free storage space takes precedence

Generate a count

After the storage directory is selected, the storage generates a Fileid for the file, which is a concatenation of the storage server IP address, file creation time, file size, file CRc32, and a random number. The binary string is base64 encoded and converted into a printable string.

Select a two-level directory

After selecting a storage directory, the storage assigns a fileID to the file. Each storage directory has two levels of 256 x 256 subdirectories. The storage hashes the file twice based on the fileID, routes the file to one of the subdirectories, and stores the file to the subdirectory with the fileID as the file name.

Generate file name

After a file is stored in a subdirectory, it is considered that the file is successfully stored. Then, a file name is generated for the file. The file name is a combination of group, storage directory, two-level subdirectories, FileID, and file name extension (specified by the client to distinguish file types).

FastDFS file synchronization

When a file is written to a storage server in a group, the client considers that the file is successfully written. After the storage Server writes the file, the background thread synchronizes the file to other storage servers in the same group.

After each storage writes a file, it also writes a binlog. The binlog does not contain file data, but only file name and other meta information. This binlog is used for background synchronization. Progress is recorded as a timestamp, so it is best to keep the clocks of all servers in the cluster in sync.

The synchronization progress of a storage is reported to tracker as part of metadata. Tracke uses the synchronization progress as a reference when selecting a storage to read.

For example, if there are three storage servers A, B, and C in A group, A synchronizes to T1 from C (all files written before T1 have been synchronized to B), and B synchronizes to T2 from C (T2 > T1), when the tracker receives the synchronization progress information, it will arrange it. Use the smallest one as the synchronization timestamp of C. In this case, T1 is C’s synchronization timestamp T1 (all data written before T1 has been synchronized to C). Similarly, according to the above rules, the tracker generates A synchronization timestamp for A and B.

FastDFS file download

After the client Uploadfile succeeds, it will get a file name generated by the storage. Then the client can access the file according to the file name.

As with the Upload file, the client can select any Tracker server when downloading the file. When tracker sends a download request to a tracker, the file name must be displayed. Tracke retrieves information about the file name, such as the file group, size, and creation time, and then selects a storage for the request to serve the read request.

FastDFS performance solution

FastDFS installation

The software package version
FastDFS v5.05
libfastcommon v1.0.7

Download and install libfastCommon

  • download
Wget HTTP: / / https://github.com/happyfish100/libfastcommon/archive/V1.0.7.tar.gzCopy the code
  • Unpack the
Tar -xvf v1.0.7.tar. gz CD libfastcommon-1.0.7Copy the code
  • Compile and install
./make.sh
./make.sh installCopy the code
  • Creating soft Links
ln -s /usr/lib64/libfastcommon.so /usr/local/lib/libfastcommon.so
ln -s /usr/lib64/libfastcommon.so /usr/lib/libfastcommon.so
ln -s /usr/lib64/libfdfsclient.so /usr/local/lib/libfdfsclient.so
ln -s /usr/lib64/libfdfsclient.so /usr/lib/libfdfsclient.so Copy the code

Download and install FastDFS

  • Download FastDFS
Wget HTTP: / / https://github.com/happyfish100/fastdfs/archive/V5.05.tar.gzCopy the code
  • Unpack the
Tar -xvf v5.05.tar. gz CD fastdfS-5.05Copy the code
  • Compile and install
./make.sh
./make.sh installCopy the code

Configuring the Tracker Service

After the above installation is successful, there will be an FDFS directory in the /etc/directory to access it. Here is a sample file that the author gave us. We need to change the tracker.conf.sample file to the tracker.conf configuration file and modify it:

cp tracker.conf.sample tracker.conf
vi tracker.confCopy the code

Edit the tracker. Conf

Is the configuration file invalid? False Disabled =false # Service port port=22122 # Tracker Data and log directory address base_path=//home/data/fastdfs # HTTP service port http.server_port=80Copy the code

Create the base data directory for tracker, the directory corresponding to base_path

mkdir -p /home/data/fastdfsCopy the code

Use ln -s to establish a soft link

ln -s /usr/bin/fdfs_trackerd /usr/local/bin
ln -s /usr/bin/stop.sh /usr/local/bin
ln -s /usr/bin/restart.sh /usr/local/binCopy the code

Start the service

service fdfs_trackerd startCopy the code

Check the monitor

netstat -unltp|grep fdfsCopy the code

If port 22122 is monitored normally, the Tracker service is started successfully.

Tracker server directory and file structure After the Tracker service is successfully started, the data and logs directories are created in base_path. The directory structure is as follows:

${base_path} | __data | | __storage_groups. Dat: storage grouping information | | __storage_servers. Dat: storage server list | __logs | | __trackerd. Log: Tracker Server log fileCopy the code

Configuring Storage Services

Go to the /etc/fdfs directory, copy the FastDFS storage sample configuration file storage.conf.sample, and name it storage.conf

# cd /etc/fdfs
# cp storage.conf.sample storage.conf
# vi storage.confCopy the code

Edit storage. Conf

Is the configuration file invalid? Group_name =group1 # Port =23000 # Heartbeat interval, Heart_beat_interval =30 # Storage Data and log directory address (the root directory must exist, Subdirectories) will automatically generate base_path = / home/data/fastdfs/storage store # file storage server supports multiple paths. The number of base paths for storing files is set here. Usually, only one directory is configured. Store_path_count =1 # Configure store_path_count paths one by one. The index number is based on 0. If store_path0 is not configured, it is the same path as base_path. Store_path0 = / home/data/fastdfs/storage # fastdfs storage file, using the two level directory. Set the number of directories for storing files. # If this parameter is set to N (e.g. : 256), the storage Server will automatically create N * N subdirectories under store_path for storing files when it is first run. Subdir_count_per_path =256 # tracker_server list, will actively connect to tracker_server # Write a line tracker_server=192.168.1.190:22122 for each tracker server. It is used to avoid problems with peak synchronization. sync_start_time=00:00 sync_end_time=23:59Copy the code

Use ln -s to establish a soft link

ln -s /usr/bin/fdfs_storaged /usr/local/binCopy the code

Start the service

service fdfs_storaged startCopy the code

Check the monitor

netstat -unltp|grep fdfsCopy the code

Make sure Tracker is enabled before starting Storage. First start is successful, can be in/home/data/fastdfs/storage directory to create data, logs two directories. If port 23000 is monitored normally, the Storage service is started successfully.

Check whether the Storage and Tracker are communicating

/usr/bin/fdfs_monitor /etc/fdfs/storage.confCopy the code

FastDFS configures the Nginx module

The software package version
openresty v1.13.6.1
fastdfs-nginx-module v1.1.6

FastDFS uses the Tracker server to store files on the Storage server. However, files need to be replicated between Storage servers in the same group, resulting in synchronization delay.

Suppose the Tracker server uploads the file to 192.168.1.190, and the file ID is returned to the client. In this case, the FastDFS storage cluster mechanism synchronizes the file to 192.168.1.190. If a client uses this file ID to fetch files from 192.168.1.190, the file cannot be accessed. Fastdfs-nginx-module redirects the file link to the source server to retrieve the file, avoiding the file failure caused by replication delay on the client.

Nginx and fastdfs-nginx-module:

It is recommended that you install the following development libraries using yum:

yum install readline-devel pcre-devel openssl-devel -yCopy the code

Download the latest version and unzip:

Wget https://openresty.org/download/openresty-1.13.6.1.tar.gz tar - XVF openresty - 1.13.6.1. Tar. Gz wget https://github.com/happyfish100/fastdfs-nginx-module/archive/master.zip unzip master.zipCopy the code

Install nginx and add the fastdfs-nginx-module module:

./configure --add-module=.. /fastdfs-nginx-module-master/src/Copy the code

Compile, install:

make && make installCopy the code

View Nginx modules:

/usr/local/openresty/nginx/sbin/nginx -vCopy the code

If the following is present, the module was added successfully

Copy the fastdfs-nginx-module configuration file to /etc/fdfs and modify it:

cp /fastdfs-nginx-module/src/mod_fastdfs.conf /etc/fdfs/
Copy the code
Connect_timeout =10 # Tracker Server Tracker_Server =192.168.1.190:22122 # StorageServer Default port Storage_server_port =23000 # If the uri of file ID contains /group**, set it to true url_have_group_name = true # storage_server_port=23000 # If the URI of file ID contains /group**, set it to true. The same must and storage. The conf store_path0 = / home/data/fastdfs/storageCopy the code

Copy some FastDFS configuration files to /etc/fdfs directory:

cp /fastdfs-nginx-module/src/http.conf /etc/fdfs/
cp /fastdfs-nginx-module/src/mime.types /etc/fdfs/Copy the code

Configure nginx, modify nginx.conf:

location ~/group([0-9])/M00 {
    ngx_fastdfs_module;
}Copy the code

Nginx start:

[root@iz2ze7tgu9zb2gr6av1tysz sbin]# ./nginx
ngx_http_fastdfs_set pid=9236Copy the code

Test upload:

[root@iz2ze7tgu9zb2gr6av1tysz fdfs]# /usr/bin/fdfs_upload_file /etc/fdfs/client.conf /etc/fdfs/4.jpg
group1/M00/00/00/rBD8EFqVACuAI9mcAAC_ornlYSU088.jpgCopy the code

Deployment structure diagram:

JAVA client integration

Pom. XML is introduced into:

<! -- fastdfs --> <dependency> <groupId>org.csource</groupId> <artifactId>fastdfs-client-java</artifactId> The < version > 1.27 < / version > < / dependency >Copy the code

Fdfs_client. Conf configuration:

Connect_timeout = 2 # Socket timeout duration network_timeout = 30 # File content encoding charset = UTF-8 #tracker server port Anti_steal_token = no http.secret_key = FastDFS1234567890 # tracker_http_port = 8080 http.anti_steal_token = no http.secret_key = FastDFS1234567890 Tracker_server = 192.168.1.190:22122Copy the code

FastDFSClient upload class:

public class FastDFSClient{ private static final String CONFIG_FILENAME = "D:\\itstyle\\src\\main\\resources\\fdfs_client.conf"; private static final String GROUP_NAME = "market1"; private TrackerClient trackerClient = null; private TrackerServer trackerServer = null; private StorageServer storageServer = null; private StorageClient storageClient = null; static{ try { ClientGlobal.init(CONFIG_FILENAME); } catch (IOException e) { e.printStackTrace(); } catch (MyException e) { e.printStackTrace(); } } public FastDFSClient() throws Exception { trackerClient = new TrackerClient(ClientGlobal.g_tracker_group); trackerServer = trackerClient.getConnection(); storageServer = trackerClient.getStoreStorage(trackerServer);; storageClient = new StorageClient(trackerServer, storageServer); } /** * uploadFile * @param file file object * @param fileName * @return */ public String[] uploadFile(file file, String fileName) { return uploadFile(file,fileName,null); } /** * upload file * @param file file object * @param fileName file name * @param metaList file metadata * @return */ public String[] uploadFile(File file, String fileName, Map<String,String> metaList) { try { byte[] buff = IOUtils.toByteArray(new FileInputStream(file)); NameValuePair[] nameValuePairs = null; if (metaList ! = null) { nameValuePairs = new NameValuePair[metaList.size()]; int index = 0; for (Iterator<Map.Entry<String,String>> iterator = metaList.entrySet().iterator(); iterator.hasNext();) { Map.Entry<String,String> entry = iterator.next(); String name = entry.getKey(); String value = entry.getValue(); nameValuePairs[index++] = new NameValuePair(name,value); } } return storageClient.upload_file(GROUP_NAME,buff,fileName,nameValuePairs); } catch (Exception e) { e.printStackTrace(); } return null; } @param fileId fileId * @return */ public Map<String,String> getFileMetadata(String groupname,String) fileId) { try { NameValuePair[] metaList = storageClient.get_metadata(groupname,fileId); if (metaList ! = null) { HashMap<String,String> map = new HashMap<String, String>(); for (NameValuePair metaItem : metaList) { map.put(metaItem.getName(),metaItem.getValue()); } return map; } } catch (Exception e) { e.printStackTrace(); } return null; } /** * delete file * @param fileId fileId * @return return -1, Public int deleteFile(String groupName,String fileId) {try {return storageClient.delete_file(groupname,fileId); } catch (Exception e) { e.printStackTrace(); } return -1; } /** ** downloadFile * @param fileId fileId (the ID returned after the file is successfully uploaded) * @param outFile file location to download and save * @return */ public int downloadFile(String) groupName,String fileId, File outFile) { FileOutputStream fos = null; try { byte[] content = storageClient.download_file(groupName,fileId); fos = new FileOutputStream(outFile); InputStream ips = new ByteArrayInputStream(content); IOUtils.copy(ips,fos); return 0; } catch (Exception e) { e.printStackTrace(); } finally { if (fos ! = null) { try { fos.close(); } catch (IOException e) { e.printStackTrace(); } } } return -1; } public static void main(String[] args) throws Exception { FastDFSClient client = new FastDFSClient(); File file = new File("D:\\23456.png"); String[] result = client.uploadFile(file, "png"); System.out.println(result.length); System.out.println(result[0]); System.out.println(result[1]); }}Copy the code

Executing the main method returns:

2
group1
M00/00/00/rBD8EFqTrNyAWyAkAAKCRJfpzAQ227.pngCopy the code

Source: gitee.com/52itstyle/s…


Author: Xiao Qi

Reference: blog.52itstyle.com

Sharing is a happy experience, and it also witnessed the personal growth process. Most of the articles are summary of work experience and daily learning accumulation. Based on my own cognitive deficiencies, I would like to ask you to correct me and make progress together.

The copyright of this article belongs to the author, welcome to reprint, but without the consent of the author must retain this paragraph of statement, and give a prominent position in the article page, if you have any questions, please email ([email protected]) for consultation.