FastDFS distributed file system details

What is a file system

A file system is a method and data structure used by an operating system to organize files on a disk or partition. We don’t know what disk space looks like, but a file system gives us a very clear picture of how we can create, delete, modify, and copy these files, and the software that does that is the file system. The software that manages and stores file information in an operating system is called a file management system, or file system for short.

As an important part of an operating system, a file system abstracts the storage space managed by the operating system to provide users with a unified and objectified interface for accessing the storage space, shielding direct operations on physical devices and resource management. In other words, file systems solve the problem of disk storage for ordinary users.

The history of file systems

File systems can be divided into the following types based on the computing environment and functions provided by them.

Stand-alone file system

Features: Local storage for operating systems and applications.

Disadvantages: Data cannot be shared across multiple machines.

EXT2, EXT3, EXT4, NTFS, FAT, FAT32, XFS, JFS and so on.

Network file system

Features: Based on the existing Ethernet architecture, realizes the traditional file system data sharing between different servers.

Disadvantages: Two servers cannot access changes at the same time, limited performance.

NFS, CIFS, etc. For example, the following figure shows that network file sharing between Windows hosts is implemented by Microsoft’s OWN CIFS service.

Distributed file system

The amount of data is more and more, in the scope of the jurisdiction of an operating system can not be saved, so it is allocated to more operating system management disk, but it is not convenient to manage and maintain, so there is an urgent need for a system to manage the files on multiple machines, this is distributed file management system.

Distributed File System (Distributed File System) is a File System that allows files to be shared among multiple hosts over the network, enabling multiple users on multiple machines to share and store files. In such a file system, the client does not directly access the underlying data storage block, but communicates with the server over the network using a specific communication protocol. DFS provides a logical tree file system structure for resources distributed anywhere on the network, making it easier for users to access shared files distributed on the network. All high-level file systems are based on low-level traditional file systems and implement higher-level functions.

Features: On a traditional file system, additional modules distribute data across servers and integrate RAID protection to ensure that multiple servers can access and modify the same file system at the same time. Excellent performance, scalability and reliability.

Disadvantages: Some types have the risk of single point of failure.

Representatives: HDFS (ASF), MogileFS (LiveJournal), FastDFS (Yuqing), Lustre (Oracle), GlusterFS (RedHat), and so on.

universal

General-purpose distributed file systems correspond to traditional local file systems such as EXT4 and NTFS. Typical examples: Lustre, MooseFS.

Advantages: Traditional file system operation mode, low threshold for developers.

Disadvantages: The system is complex and supports standard file operations, such as directory structure, file read and write permissions, and file locks. Overall System performance is reduced because of the POSIX standard (Portable Operating System Interface of UNIX) support.

POSIX: Portable operating system interface. After the birth of Unix, various manufacturers have implemented their own Unix systems, resulting in the interface is not unified, based on different operating systems development has become extremely chaotic, in order to solve this problem, POSIX standard.

Conclusion: POSIX standard was born to unify the interface of operating system, convenient for developers to develop applications, write portable code procedures. POSIX – based library functions are portable on this standard operating system platform.

special

Dedicated distributed File System is based on the design idea of Google File System (Google File System), which cannot be modified after the File is uploaded. Access to files using proprietary apis, also known as distributed file storage services. Typical examples are HDFS, MogileFS, and FastDFS.

Advantages: The system has low complexity and does not need to support standard file operations, such as directory structure, file read and write permissions, and file locks. The overall system performance is higher because POSIX standards are not required and the system is more efficient.

Disadvantages: The use of proprietary API for file access, high threshold for developers, generally is directly encapsulated into a tool class for use.

The history of file servers

With the advent of the Internet image and video era, file processing has become a huge challenge for each service system. It is urgent to build a special file server to solve the problem of file sharing.

Local file server

Features: A local file server stores file data on a local node. For example, create folders directly under the project directory to store project file resources. If subdivided according to different types, you can continue to create different subdirectories under the project directory for differentiation.

Advantages: simple and convenient, the project can be directly referenced, convenient access.

Disadvantages: File and code mixed storage is not easy to manage, with the increase of files will affect the project release and online cycle.

Standalone file server

Features: Set up an independent Server for file storage. When uploading files in a project, upload files to a directory on the Server through FTP or SSH, and then reverse proxy this directory through Ngnix or Apache Http Server to return a file URL with an independent domain name. The front end accesses the file directly from this URL.

Advantages: Independent storage, facilitating capacity expansion, DISASTER recovery, and data migration. It is convenient for image access request load balancing, convenient for application of various Cache policies (HTTP Header, Proxy Cache, etc.), and convenient for migration to CDN. And since image access is a server resource hog (because it involves operating system context switches and disk I/O operations), the Web/App server can focus more on dynamic processing.

Disadvantages: Performance bottlenecks exist in a single machine, poor Dr And vertical scalability.

Distributed file server

Features: Distributed file systems generally include access arbitration, file storage, and file disaster recovery. The quorum module acts as the brain of the file server and determines the location of the file according to certain algorithms. The file storage module is responsible for saving files. The Dr Module backs up file data.

Advantages: elastic expansion, excellent performance, strong scalability, high reliability.

Disadvantages: Higher system complexity, more servers required.

FastDFS profile

FastDFS is the dedicated distributed file system mentioned above. Let’s take a closer look at its core concepts, architecture and environment.

FastDFS is a lightweight open source and high-performance distributed file system based on THE C language. The main functions are: file storage, file synchronization, file access (file upload/download), to solve the problem of large file storage and high concurrent access, file access to achieve load balance. FastDFS is especially suitable for file-based online services of medium and large websites. It is suitable for storing small files ranging from 4KB to 500MB, such as photo sharing websites and video sharing websites (pictures, documents, audio and videos, etc.).

FastDFS is a domestic open source software, written by Yu Qing, project open source address: Github: github.com/happyfish10… Official forum: bbs.chinaunix.net/forum-240-1…

FastDFS architecture

Client

The client, the server that implements file upload and download, is the server where our own project is deployed. Data interaction with the trace server or storage server through a proprietary interface using the TCP/IP protocol. FastDFS provides users with basic file access interfaces, such as Upload, Download, Append, and Delete, in the form of client libraries.

Tracker Server

Tracking servers, responsible for file access scheduling and load balancing, responsible for managing all Storage Servers and group groups/volumes.

Storage Server

Storage server, responsible for file storage, file synchronization/backup, file access interface, file metadata management. Each Storage Server in a group can back up data for each other to achieve DISASTER recovery. After startup, each Storage will actively connect to Tracker, inform itself of the group it belongs to and other Storage related information, and maintain periodic heartbeat.

Group

Group, also known as Volume Volume. The files on the servers in the same Storage group are identical. The Storage servers in the same Storage group are peers. Files can be uploaded or deleted on any Storage Server.

Metadata

Data stored in a file system is divided into data and metadata. Data refers to the actual data in a file, that is, the actual content of a file. Metadata is system data used to describe the characteristics of a file, such as access permissions, file owners, and the distribution of file data blocks, etc. If the file is an image, the metadata is the image’s width, height, and so on.

FastDFS Storage policy

To support large – capacity Storage, Storage servers use group (or volume) mode. A storage system consists of one or more groups whose files are independent of each other. The file capacity of all groups is the total file capacity of the entire storage system. A group can be composed of one or more storage servers. All files on the storage servers in a group are the same. Multiple storage servers in a group perform redundant backup and load balancing.

When a new server is added to a group, the system automatically synchronizes existing files. After the synchronization, the system automatically switches the new server to online services.

When the storage space is insufficient, you can dynamically add groups. You only need to add one or more servers and configure them as a new group to expand the storage system capacity. When the concurrency of an application or module (corresponding group) is too high, you can directly add several storages to the group to achieve load balancing.

To avoid too many files in a single directory, when the Storage is started for the first time, it creates two levels of subdirectories in each data Storage directory, with 256 subdirectories in each level. A total of 65536 subdirectories are created. The uploaded files are hash routed to one of the subdirectories. The file data is then stored directly to that directory as a local file.

FastDFS installation

Download resources

Directly via Github: github.com/happyfish10… Download libfastcommon fastdfs, fastdfs nginx – module three projects corresponding package download or use the git command, or by resource address: sourceforge.net/projects/fa… Download.

libfastcommonFrom:fastdfsThe project andfastdhtA library of common C functions extracted from the project.
fastdfs: FastDFS core project.
fastdfs-nginx-module: Module resources that Nginx needs to add when integrating FastDFS.

Install dependencies

FastDFS is based on THE C language, and you must install the environment on which it depends before installing it.

yum install -y make cmake gcc gcc-c++
Copy the code

Install the public libraries

Upload libfastcommon-master.zip to the /usr/local/src directory on the server and decompress it.

#Unzip is installed for decompression
yum install -y unzip
#Unzip libfastcommon to the current directory
unzip libfastcommon-master.zip
Copy the code

Compile and install.

#The decompressed libfastcommon-master directory is displayed
cd libfastcommon-master
#Compile and install
./make.sh && ./make.sh install
Copy the code

Libfastcommon is installed in /usr/lib64 and /usr/include/fastcommon by default, and soft links are created in /usr/lib.

Install FastDFS

Upload the fastdfs-master.zip resource to the /usr/local/src directory on the server and decompress the fastdfs-master.zip resource.

#Unzip fastdfs to the current directory
unzip fastdfs-master.zip
Copy the code

Compile and install.

#The decompressed libfastcommon-master directory is displayed
cd fastdfs-master
#Compile and install
./make.sh && ./make.sh install
Copy the code

Fastdfs is installed in the following locations by default:

/usr/bin: Executable file
/etc/fdfs: Configuration file
/etc/init.d: Main program code
/usr/include/fastdfs: the plug-in set

Start the Tracker

Tracker and storage are both Fastdfs, but they play different roles when they are started using different configuration files. In other words, when you install tracker and storage, you install FastDfs and start it from the specific configuration file for each role.

View all configuration files in the /etc/fdfs directory.

[root@localhost ~]# ls /etc/fdfs/
client.conf.sample  storage.conf.sample  storage_ids.conf.sample  tracker.conf.sample
Copy the code

client.conf.sample: Client configuration file for testing
storage.conf.sample: configuration file for storage
tracker.conf.sample: Configuration file of the tracker

Edit the tracker.conf configuration file.

#Copy the file tracker.conf.sample and name it tracker.conf
cp /etc/fdfs/tracker.conf.sample /etc/fdfs/tracker.conf
#Edit the tracker.conf configuration file
vi /etc/fdfs/tracker.conf
Copy the code

There are a lot of configuration items in the configuration file. Focus on the following items and adjust other configuration items based on the actual situation.

Specifies the IP address that allows access to the tracker server
bind_addr =
The # tracker service listens on the port
port = 22122
The parent path of the server's running data and logs (need to be created in advance)
base_path = /fastdfs/tracker
The port exposed under the HTTP protocol of the tracker server
http.server_port = 8080
Copy the code

Start the Tracker service.

#Create a parent path for storing run data and logs for the tracker server
mkdir -p /fastdfs/tracker
#Start the Tracker service
service fdfs_trackerd start
#View the tracker service status
service fdfs_trackerd status
#Restarting the Tracker service
service fdfs_trackerd restart
#Stop the Tracker service
service fdfs_trackerd stop
Copy the code

Start the Storage

Edit the storage.conf configuration file.

#Copy the storage.conf.sample file and rename it storage.conf
cp /etc/fdfs/storage.conf.sample /etc/fdfs/storage.conf
#Edit the storage.conf configuration file
vi /etc/fdfs/storage.conf
Copy the code

There are a lot of configuration items in the configuration file. Focus on the following items and adjust other configuration items based on the actual situation.

# storage Group name/volume name, default is group1
group_name = group1
Specifies the IP address that is allowed to access the storage server
bind_addr =
The parent path of the running data and logs of the storage server (need to be created in advance)
base_path = /fastdfs/storage/base
The parent path of the file uploaded by the client on the storage server (need to be created in advance)
store_path0 = /fastdfs/storage/store
Storage server HTTP protocol exposed port
http.server_port = 8888
The IP address and port of the tracker server
tracker_server = 192.168.10.101:22122
Copy the code

Start the storage service.

#Create a parent path for storing running data and logs of the storage server
mkdir -p /fastdfs/storage/base
#Create a parent storage path for the file uploaded by the client on the storage server
mkdir -p /fastdfs/storage/store
#Starting the Storage Service
service fdfs_storaged start
#Check the storage service status
service fdfs_storaged status
#Restarting the Storage Service
service fdfs_storaged restart
#Stopping the Storage Service
service fdfs_storaged stop
Copy the code

View the /fastdfs/storage/store directory and you can see that the storage server has created 65536 folders to store the files uploaded by the client.

The Client operation

FastDFS provides users with basic file access interfaces, such as Upload, Download, Append, and Delete, in the form of client libraries.

Edit the tracker.conf configuration file on the Tracker server’s machine.

#Copy the file client.conf.sample and name it client.conf
cp /etc/fdfs/client.conf.sample /etc/fdfs/client.conf
#Edit the client.conf configuration file
vi /etc/fdfs/client.conf
Copy the code

Modify the following two contents in the configuration file.

The parent path of the client's run data and logs (need to be created in advance)
base_path = /fastdfs/client
The IP address and port of the tracker server
tracker_server = 192.168.10.101:22122
Copy the code

Remember mkdir -p /fastdfs/client to create client directory.

upload

Select the Tracker Server

As shown in the figure above, the Storage Server periodically sends Storage information to the Tracker Server, such as its owning group. If the Tracker Server is clustered, the client can select any Tracker when uploading because the relationship between the Tracker servers is peer to peer.

Select group

When the Tracker receives a request from the client to upload a file, it will assign an available group to the file for Storage. Once the group is selected, it must decide which Storage Server in the group to assign to the client.

As shown in the figure above, the optional rules for group in the tracker.conf configuration file are:

round robin: Polling between all groups
specify group: Specifies a specific group
load balance: Preferentially selects the group with more free storage space

Choose the Storage Server

After a Storage Server is allocated, the client sends a file upload request to the Storage Server. The Storage Server allocates a specific data Storage directory for the file.

As shown in the figure above, the optional rules for file distribution in the storage.conf configuration file are:

round robin: Polls all storages in the group
random: Random, distributed according to hash code

Generate the file_id

After the Storage directory is selected, the Storage generates a file_ID for the file, which consists of the Storage Server IP address, file creation time, file size, file crc32, and a random number. Then the binary string is base64 encoded and converted into a string.

Generate file name

When a file is stored in a subdirectory, the file is considered to be saved successfully. Then a file name is generated for the file. The file name is group name/storage directory/two-level subdirectory /file_id. The suffix name is spliced together.

FastDFS file upload message interpretation

group1:Group name/Volume name. The name of the Storage group where the file is successfully uploaded is returned by the Storage server.
M00:Virtual Disk Path. Disk option with Storage configuration filestore_path*The corresponding. If configuredstore_path0It isM00If yesstore_path1It isM01And so on. Such as:store_path0 = /fastdfs/storage/store.M00Says:/fastdfs/storage/store/data.
/ 02/44:Data two-level directory. Storage A two-level directory created by the server in each virtual disk path for storing data files.
wKgDrE34E8wAAAAAAAAGkEIYJK42378File_id, which consists of Storage Server IP, file creation time, file size, file crc32, and a random number. Then base64 encoding this binary string and converting it to a string.
group1/M00/02/44/wKgDrE34E8wAAAAAAAAGkEIYJK42378.sh: File name.

Methods a

The format of the upload command is fdfs_upload_file /etc/fdfs/client.conf File to be uploaded.

[root@localhost ~]# fdfs_upload_file /etc/fdfs/client.conf /usr/local/src/china.jpg
group1/M00/00/00/wKgKZl9skn6AHZKUAADhaCZ_RF0650.jpg
Copy the code

After the file is successfully uploaded, the Storage location of the file on the Storage server and the randomly generated file name are displayed. The group1 said Storage group name and/or volume name, M00 is a virtual directory, said/fastdfs/Storage/store/data/data directory in the real path.

As shown in the following figure, the Storage server finds that the file has been successfully uploaded.

Way 2

Or run the fdfs_test /etc/fdfs/client.conf upload command to upload the file.

[root@localhost ~]# fdfs_test /etc/fdfs/client.conf upload /usr/local/src/china.jpg This is FastDFS client test program V6.07 Copyright (C) 2008, Happy Fish / YuQing FastDFS may be copied only under the terms of the GNU General Public License V3, which may be found in the FastDFS source kit. Please visit the FastDFS Home Page http://www.fastken.com/ for more detail. [2020-09-24 20:59:11] DEBUG - base_path=/fastdfs/client, connect_timeout=5, network_timeout=60, tracker_server_count=1, anti_steal_token=0, anti_steal_secret_key length=0, use_connection_pool=0, g_connection_pool_max_idle_time=3600s, use_storage_id=0, storage server id count: 0 tracker_query_storage_store_list_without_group: Group_name =group1, ip_addr=192.168.10.102, port=23000 group_name=group1, ip_addr=192.168.10.102, port=23000 storage_upload_by_filename group_name=group1, remote_filename=M00/00/00/wKgKZl9smB-AVBRKAADhaCZ_RF0518.jpg source ip address: Timestamp =2020-09-24 20:59:11 file size=57704 file crc32=645874781 Example file URL: http://192.168.10.102/group1/M00/00/00/wKgKZl9smB-AVBRKAADhaCZ_RF0518.jpg storage_upload_slave_by_filename group_name=group1, remote_filename=M00/00/00/wKgKZl9smB-AVBRKAADhaCZ_RF0518_big.jpg source ip address: Timestamp =2020-09-24 20:59:11 file size=57704 file crc32=645874781 Example file URL: http://192.168.10.102/group1/M00/00/00/wKgKZl9smB-AVBRKAADhaCZ_RF0518_big.jpgCopy the code

After a file is uploaded in fDFs_test mode, detailed information about the uploaded file is displayed.

group_name: Storage group name/volume name
remote_filename: Specifies the storage path and name of the uploaded file
source_ip address: IP address of the Storage server where the uploaded file is located
file timestamp: Indicates the timestamp when the file was successfully uploaded
file size: Indicates the size of the uploaded file
example file url: URL of the uploaded file.Nginx can be accessed directly
storage_upload_slave_by_filename: FastDFS file master/slave feature, which generates slave files from the master file

As shown in the following figure, the Storage server finds that the file has been successfully uploaded. The jpg-m file contains metadata information about the uploaded file.

The metadata information is as follows:

[root@localhost ~]# more /fastdfs/storage/store/data/00/00/wKgKZl9smB-AVBRKAADhaCZ_RF0518.jpg-m 
ext_namejpgfile_size115120height80width160
Copy the code

download

After the client upload file succeeds, the client will get a file name generated by the Storage, and then the client can access the file according to the file name. As with upload file, the client can select any Tracker Server when downloading a file. When a client sends a Download request to a Tracker, the file name must be displayed. Tracke retrieves information about the file name, such as the file group, size, and creation time, and then selects a Storage for the request to provide access services.

Methods a

The command format is fdfs_download_file /etc/fdfs/client.conf group_name/remote_filename.

fdfs_download_file /etc/fdfs/client.conf group1/M00/00/00/wKgKZl9smB-AVBRKAADhaCZ_RF0518.jpg
Copy the code

Way 2

Alternatively, run the fdfs_test /etc/fdfs/client.conf download group_name remote_filename command.

fdfs_test /etc/fdfs/client.conf download group1 M00/00/00/wKgKZl9smB-AVBRKAADhaCZ_RF0518.jpg
Copy the code

delete

Methods a

The command for deleting files is in the format of fdfs_delete_file /etc/ffs/client. conf File to be deleted.

fdfs_delete_file /etc/fdfs/client.conf group1/M00/00/00/wKgKZl9smB-AVBRKAADhaCZ_RF0518.jpg
Copy the code

Tips: Deleting files deletes metadata files altogether.

Way 2

Alternatively, run the fdfs_test /etc/fdfs/client.conf delete group_name remote_filename command.

fdfs_test /etc/fdfs/client.conf delete group1 M00/00/00/wKgKZl9smB-AVBRKAADhaCZ_RF0518_big.jpg
Copy the code

So far, this is the core FastDFS concept, architecture architecture and environment to build and use. When it comes to the use of file servers, the ultimate goal is to access files through HTTP. However, at this time, it is not possible to access files through HTTP, so we need to use other tools to achieve this. Nginx is a good choice, it is a high-performance HTTP and reverse proxy Web server. In the next article, we will use Nginx to integrate FastDFS to implement file server setup.

This article is licensed under a Creative Commons attribution – Noncommercial – No Deductive 4.0 International license.

You can check out more FastDFS articles in the category below.

your likes and retweets are the biggest support for me.

follow the public account Mr. Hello Ward “document + video” each article is equipped with a special video explanation, learning more easily oh ~

FastDFS distributed file system details

What is a file system

The history of file systems

Stand-alone file system

Network file system

Distributed file system

universal

special

The history of file servers

Local file server

Standalone file server

Distributed file server

FastDFS profile

FastDFS architecture

Client

Tracker Server

Storage Server

Group

Metadata

FastDFS Storage policy

FastDFS installation

Download resources

Install dependencies

Install the public libraries

Install FastDFS

Start the Tracker

Start the Storage

The Client operation

upload

Methods a

Way 2

download

Methods a

Way 2

delete

Methods a

Way 2

Related Posts

It’s so convenient! One-click generation of database documents

Getting Started with Linux (7) ~ Nginx Deployment Vue project

ThreadLocal = ThreadLocal = ThreadLocal = ThreadLocal = ThreadLocal = ThreadLocal