In the data storage of the project, structured data usually adopts relational database, while unstructured data (files) can be stored in many ways, such as server local storage, Nas mount, FTP and so on. Today, we will take stock of distributed file storage system.

Introduction to distributed storage

1. What is distributed storage

Before we introduce distributed storage, let’s look at non-distributed storage solutions.

In the standalone era, files are stored directly on the server where the service is deployed

  • Direct storage (DAS) : Directly connects storage and data, which has poor scalability and flexibility.

To expand, separate files and services over a network connection —

  • Centralized storage (NAS and SAN) : Diversified devices are interconnected over networks. However, the expansion capability is limited by the controller capability. At the same time, equipment needs to be replaced in the life cycle, and data migration needs a lot of time and energy.

Distributed storage: Uses the disk space on each machine in an enterprise over the network to form a virtual storage device, and stores data in all corners of the enterprise.

2. Advantages of distributed storage

Scalable: Distributed storage systems can scale to hundreds or even thousands of such cluster sizes, and the overall performance of the system can grow linearly.

High availability: In distributed file systems, high availability consists of two layers: the availability of the entire file system and data integrity and consistency

Low cost: Automatic fault tolerance and automatic load balancing of distributed storage systems allow distributed storage systems to be built on lower-cost servers. In addition, linear scalability can increase and decrease the cost of the server.

Elastic storage: Allows you to flexibly add or delete resources from data stores and storage pools without interrupting system running

Mainstream distributed file storage systems

The mainstream distributed file systems include GFS, HDFS, Ceph, Lustre, MogileFS, MooseFS, FastDFS, TFS, GridFS, etc.

1. GFS (Google File System)

A linux-based proprietary distributed file system developed by Google to meet its own needs. Although Google has released some technical details of the system, Google has not released the software portion of the system as open source.

2. Hadoop Distributed File System (HDFS)

Hadoop Distributed File System (HDFS) is a subproject of Hadoop. Hadoop is one of the core components of Hadoop, which is very suitable for storing large data (such as TB and PB). It uses HDFS as a storage system. HDFS stores files on multiple computers and provides a unified access interface. It uses a distributed file system like accessing a common file system.

3. TFS (Taobao FileSystem)

TFS is a distributed file system with high scalability, high availability, high performance and Internet service. It is mainly used for massive unstructured data. It is built on common Linux machine cluster and provides external storage access with high reliability and high concurrency. TFS provides Taobao with massive storage of small files, usually no more than 1M in size, which meets Taobao’s demand for small file storage and is widely used in various taobao applications. It adopts HA architecture and smooth expansion to ensure the availability and expansibility of the entire file system. In addition, the flat data organization structure allows the file name to be mapped to the physical address of the file, simplifying the file access process and providing good read and write performance for TFS to some extent.

4, Lustre

Lustre is a large-scale, secure, and highly available clustered file system developed and maintained by SUN. The main objective of the project is to develop a next-generation clustered file system that can support over 10,000 nodes and petabytes of data storage systems. Lustre is already used in several applications, such as HP SFS products.

5、 MooseFS

MooseFS is a relatively minority distributed file system that can be used directly without modifying the upper-layer application interface. It supports FUSE operation mode, easy deployment and provides Web interface for management and monitoring. Like other distributed operating systems, MooseFS supports online expansion and horizontal expansion. MooseFS also has the ability to retrieve mistakenly deleted files, acting as a recycle bin for business customization; MooseFS is also much more efficient at reading and writing massive small files than large ones.

However, the disadvantages of MooseFS are also obvious. The master/slave architecture of MFS is similar to the master/slave replication of MySQL. The slave can be extended, but the master is not easily extended. The short-term solution is to do business by business shards. As the total number of files stored in the MFS architecture increases, the memory requirements of the Master Server will increase. And for its single point of problem, the official system is to synchronize data information from Master Server to Metalogger Server. Once the Master Server fails, Metalogger Server can be restored and upgraded to Master Server. But you need recovery time. Currently, it is also possible to solve the single point problem of the Master Server with a third-party high availability solution (Heartbeat + DRBD + Moosefs).

6, MogileFS

Danga, the development company of Memcahed, is a product developed by Perl. Currently, Yupoo, a picture hosting website of mogielFS, is used in China. MogileFS is a set of efficient automatic file backup components developed by Six Apart and widely used on web2.0 sites including LiveJournal.

7. FastDFS

Google FS is a similar open source distributed file system, is pure C language development. FastDFS is an open source lightweight distributed file system. It manages files, including file storage, file synchronization, and file access (file upload and download). It solves the problems of large-capacity storage and load balancing. It is especially suitable for online services with file as the carrier, such as photo album website, video website and so on.

8 GlusterFS.

Open source distributed scale-out file system, which can quickly allocate storage according to storage requirements, contains rich automatic failover function, and eliminates the idea of centralized metadata server. Scalable network file system for data-intensive tasks, with scalability, high performance, high availability and other characteristics. Gluster was acquired by Red Hat on October 7, 2011.

9, the GridFS

MongoDB is a well-known NoSQL database. GridFS is a built-in function of MongoDB, which provides a set of APIS for file operations to store files using MongoDB. The basic principle of GridFS is to store files in two collections, one for file index and the other for file content. The content of the file is divided into several pieces according to a certain size, and each piece is stored in a Document. This method not only provides file storage, but also provides the storage of some additional properties related to the file (such as MD5 value, file name and so on). Files are stored in 4MB chunks in GridFS.

Comparison of distributed file systems

1. Overall comparison

The file system developers Development of language Open source licenses Ease of use Applicable scenario features disadvantages
GFS Google Don’t open
HDFS Apache Java Apache Simple installation, professional official documents Store very large files Read and write data in batches, providing high throughput. Write once, read many times, read and write sequentially It is difficult to meet the low latency data access of millisecond level; Multiple users cannot concurrently write the same file. Not suitable for large numbers of small files
Ceph Sage Weil, University of California, Santa Cruz C++ LGPL Simple installation, professional official documents Large, medium and small files in a single cluster Distributed, no single point of dependence, written in C, better performance BTRFS are based on immature BTRFS and are not mature enough to be used in production environments
TFS Alibaba C++ GPL V2 Installation is complicated and official documents are few Small files across clusters Tailored for small files, random IO performance is relatively high; Soft RAID is implemented to enhance the concurrent processing capability and data fault tolerance and recovery capability of the system. Supports active/standby heat switchover to improve system availability. Supports active/standby cluster deployment, and the secondary cluster provides read/standby functions Not suitable for large file storage; Does not support POSIX, low versatility; It does not support custom directory structure and file permission control. Through API download, there is a single point of performance bottleneck; There are few official documents and high learning costs
Lustre SUN C GPL Complex and heavily kernel dependent, requiring a kernel recompile Large file reading and writing Enterprise-class products, very large, with deep kernel and ext3 dependencies
MooseFS Core Sp. z o.o. C GPL V3 It is easy to install, has many official documents, and provides Web interface for management and monitoring Read and write a large number of small files Relatively lightweight, written in Perl, the domestic use of more people There is a single point of dependency on the Master server and relatively poor performance
MogileFS Danga Interactive Perl GPL Mainly used in the web field to process large amounts of small images Key-value meta file system; Much more efficient than mooseFS Does not support the FUSE
FastDFS Chinese developer Yu Qing C GPL V3 Easy to install and relatively active community Small and medium files in a single cluster The system does not need to support POSIX, which reduces system complexity and improves processing efficiency. Soft RAID is implemented to enhance the concurrent processing capability and data fault tolerance and recovery capability of the system. Support master/slave files, support custom extension; Active/standby Tracker services to enhance system availability Does not support breakpoint continuation, not suitable for large file storage; Does not support POSIX, low versatility; For file synchronization across the public network, there is a large delay, and the corresponding fault tolerance policy needs to be applied. The synchronization mechanism does not support file correctness verification. Through API downloads, there is a single point of performance bottleneck
GlusterFS Z RESEARCH C GPL V3 Simple installation, professional official documents Suitable for large files, small file performance has a lot of room for optimization No metadata server, stack architecture (basic function modules can be stacked to achieve powerful functions), with linear horizontal expansion ability; Than mooseFS huge Since there is no metadata server, the load on the client side increases, consuming considerable CPU and memory. However, when traversing a file directory, the implementation is complicated and inefficient. You need to search all storage nodes. Therefore, you are not advised to use a deep path
GridFS MongoDB C++ Simple installation Usually used to process large files (over 16M) The ability to access partial files instead of loading the entire file into memory to maintain high performance; Automatic file and metadata synchronization

2. Feature comparison

The file system Data storage mode Cluster node communication protocol Dedicated metadata storage point Online expansion Redundancy backup A single point of failure Synchronization across clusters The FUSE mounted Access interface
HDFS file Private Protocol (TCP) Take up the MDS support There are Does not support support Does not support the POSIX
Ceph Object/file/block Private Protocol (TCP) Take up the MDS support support There are Does not support support POSIX
Lustre object Private Protocol (TCP)/RDAM (Remote Direct Access to memory) Dual MDS support Does not support There are The unknown support POSIX/MPI
MooseFS block Private Protocol (TCP) Take up the MFS support support There are Does not support support POSIX
MogileFS file HTTP Take up the DB support Does not support There are Does not support Does not support Does not support the POSIX
FastDFS File/piece Private Protocol (TCP) There is no support support There is no Part of the support Does not support Does not support the POSIX
GlusterFS File/piece Private Protocol (TCP) /RDAM (Remote Direct Access to memory) There is no support support There is no support support POSIX
TFS file Private Protocol (TCP) Take up the NS support support There are support The unknown Does not support the POSIX

What is POSIX?

POSIX stands for Portable Operating System Interface of UNIX (POSIX), a specification common to UNIX applications. Posix-enabled applications mean cross-platform support across Unix systems.

Four, selection reference

  • Generic file systems are: Ceph, Lustre, MooseFS, GlusterFS;

  • File systems suitable for small file storage include: Ceph, MooseFS, MogileFS, FastDFS, TFS;

  • File systems suitable for large file storage include HDFS, Ceph, Lustre, GlusterFS, and GridFS.

  • Lightweight file systems are MooseFS, FastDFS;

  • Easy to use, active file systems are MooseFS, MogileFS, FastDFS, GlusterFS;

  • FUSE mounted file systems are HDFS, Ceph, Lustre, MooseFS, and GlusterFS.

Reference:

[1] : Distributed file system comparison and selection reference

[2] : Only know HDFS and GFS? You don’t really understand distributed file systems

[3] : Mainstream framework of distributed storage

[4] : If you want to design a distributed file system, what aspects should be considered?

[5] : Common distributed file storage introduction, selection comparison, architecture design

[6] : Distributed file system comparison and selection reference

[7] : Sme storage: DAS, NAS and SAN options

[8] : From DAS to distributed storage, storage form summary