1. Overall design of Ceph
-
Basic storage system RADOS (Reliable, Autonomic,Distributed Object Store)
This is a complete object storage system, and all user data stored in the Ceph system is actually ultimately stored in this layer. Ceph’s high reliability, high scalability, high performance, high automation and so on are essentially provided by this layer. Therefore, understanding RADOS is the foundation and key to understanding Ceph. Physically, RADOS is layered with a large number of storage device nodes, each of which has its own hardware resources (CPU, memory, hard disk, network) and runs an operating system and file system.
-
Based on library librados
The function of this layer is to abstract and encapsulate RADOS and provide apis to the upper layer for application development directly based on RADOS rather than the entire Ceph. It’s important to note that RADOS is an object storage system, so librados implements apis for object storage only. RADOS is developed in C++ and provides native Librados apis in BOTH C and C++. Librados are physically on the same machine as the applications developed on top of them, and are therefore also known as native apis. The application calls the Librados API of the local machine, and the latter communicates with nodes in RADOS cluster through socket and completes various operations.
-
High level application interface
This layer consists of three parts: RADOS GW (RADOS Gateway), RBD (Reliable Block Device), Ceph FS (Ceph File System), Its purpose is to provide a higher level of abstraction on top of the Librados library, which is easier for applications or clients to use. RADOS GW is a gateway that provides RESTful apis compatible with Amazon S3 and Swift for object storage application development. RADOS GW provides an API with a higher level of abstraction, but not as powerful as Librados. Therefore, developers should choose to use it for their own needs. RBD provides a standard block device interface and is used to create volumes for VMS in virtualization scenarios. Red Hat integrates the RBD driver with KVM/QEMU to improve VM access performance. Ceph FS is a POSIX-compatible distributed file system (POSIX stands for Portable Operating system interface and defines the interface standards for operating system and application interaction. Linux and Windows implement basic POSIX standards and programs are portable at the source level). As it is still under development, the Ceph website does not recommend its use in production environments.
-
The application layer
This layer shows various application methods of Ceph application interfaces in different scenarios, such as object storage application developed directly based on Librados, object storage application developed based on RADOS GW, cloud hard disk implemented based on RBD, etc. One area of confusion might arise from the introduction above: RADOS is already an object storage system and provides the Librados API, so why develop a separate RADOS GW?
Understanding this question actually helps to understand the nature of RADOS, so it is worth analyzing here. At first glance, the difference between librados and RADOS GW is that librados provides a native API, while RADOS GW provides a RESTfulAPI, with different programming models and actual performance. Further, it is related to the differences in target application scenarios between these two levels of abstraction. In other words, although RADOS, S3 and Swift belong to the same distributed object storage system, RADOS provides more basic and low-level functions and richer operation interfaces. This can be seen by comparison.
Since Swift and S3 support similar API functions, Swift is used as an example. The API functions provided by Swift mainly include:
-
User management operations: user authentication, obtaining account information, listing container list, etc.
-
Container management operations: create/delete containers, read container information, list objects in containers, etc.
-
Object management operations: Write, read, copy, update, delete objects, set access permissions, and read or update metadata.
-
2. Logical architecture
Ceph Client is a Ceph file system user. Ceph Metadata Daemon provides a Metadata server. Ceph Object Storage Daemon provides physical Storage (for both data and metadata). The Ceph Monitor provides cluster management.
3. Ceph terminology
- OSD: Object storage device of Ceph. The OSD daemon stores data, copies, restores, backfills, and rebalances data, and provides monitoring information to Ceph Monitors by checking the heartbeat messages of other OSD daemon processes. Generally, one hard disk corresponds to one OSD node, which manages hard disk storage. A disk partition can also become an OSD node.
- Monitors: Monitors maintain charts that show cluster status, including Monitor charts, OSD charts, Home group (PG) charts, and CRUSH charts. Ceph keeps historical information (called epoch) about each state change that occurred on Monitors, OSD, and PG.
- PG: Ceph group. Because of the large number of objects, Ceph introduces the concept of PG to manage objects. Each Object will be mapped to a PG through CRUSH calculation.
- MDSs: Ceph metadata server (MDS), which stores metadata for Ceph file systems. Metadata servers enable POSIX file system users to execute basic commands such as ls and find without overloading Ceph storage clusters.
- CephFS: Ceph file system, CephFS provides a posiX-compliant distributed file system of any size. CephFS relies on MDS to keep track of file hierarchies, known as metadata.
- RADOS: Reliable Autonomic Distributed Object Store: indicates the Reliable, automatic, and Distributed Object storage. Everything in Ceph is stored as objects, and RADOS is responsible for storing these objects. The RADOS layer ensures data consistency and reliability. For data consistency, it performs data replication, fault detection, and recovery, as well as migration and rebalancing of data between cluster nodes.
- Librados: Librados library is a way to simplify access to RADOS and currently supports PHP, Python, Ruby, Java, Python, C, and C++ languages. Librados is the foundation of RBD and RGW and provides POSIX interfaces to CephFS.
- RBD: Block device of Ceph that provides block storage that can be mapped, formatted, and mounted to the server like any other disk.
- RGW/RADOSGW: Ceph object gateway, which provides a restful API interface compatible with S3 and Swift.
This article was created and shared by Mirson. For further communication, please add to QQ group 19310171 or visit www.softart.cn