As for the use of local storage, the conventional use methods are divided into hostPath, local PV (after 1.14 GA) and self-developed CSI.
The working principle of
-
hostPath
As the name implies, using the host’s disk and directory, directly mount the host specified directory to the current host specified container.
-
local PV
Like hostPath, it is also the disk and directory used by the host. This feature, since 1.14 GA, complies with PV and PVC lifecycle and management logic. The basic principle is pre-planning and creation. PV is created for each node, and PVC is declared when the user uses it, and the PVC chooses the appropriate PV for binding. Administrators need to create, declare, and delete a PV.
-
Local Paper CSI(omitted)
contrast
hostPath | Local PV | The local volume CSI | Cloud drive CSI | |
---|---|---|---|---|
Storage resources | The local store | The local store | The local store | CBS |
use | volumes[N].hostPath | Create a PVC | Create a PVC | Create a PVC |
Supply of PV | X | Static, manually created in advance | Dynamic, created on demand | Dynamic, created on demand |
Capacity limits | Unable to limit (dependent on logical volumes) | Unable to limit (dependent on logical volumes) | Square root | Square root |
Volume increase | A hostPath cannot be independently expanded | A local PV cannot be independently expanded | Square root | Square root |
Volume drift | X | X | X | Square root |
The behavior of a volume when Pod drifts | Re-create a directory on a different node. If a directory with the same name exists on the node, the directory is reused. Data may be distorted, and old data may remain to be cleared | It is necessary to solve the binding relationship between PV and PVC first, and then make PVC reselect PV (triggered by deleting PVC and rebuilding PVC), and the old data remains to be cleaned up | Cooperate with Operator to realize the recycling station function, so that the old and new PV and PVC can be retained | Drift with Pod |
Delete applications and delete data | Manually or as a script, you need to clear the directories with the same names of all nodes | The administrator deletes a PV. Follow-up operations are the same as those on hostPath | The user can delete the PV by deleting the PVC (according to the deletion policy); You can also delete a PV as an administrator | The user can delete the PV by deleting the PVC (according to the deletion policy); You can also delete a PV as an administrator |
If the application exists, delete invalid data generated by failover | Manually or as a script, you need to clear the directory with the same name of other nodes except the node where the current Pod resides | With hostPath | Cooperate with Operator to clean up the recycle bin | No invalid data is generated |
conclusion
HostPath is the original use of a local volume, and its basic functions are smaller than or equal to local PV. Therefore, it is recommended not to use hostPath in production.
The main problem of local PV is that the operation and maintenance cost is high, and the volume cannot drift
- The node must have abundant local storage resources; otherwise, the storage becomes a bottleneck.
- PV resources are created in advance and cannot dynamically respond to creation requests.
- There is no size control function, and paths with the same name may be mixed, resulting in poor isolation
- Data cleaning is frequent and heavy
- Capacity expansion relies on manual operations and cannot be accurately expanded
- The volume cannot drift. Finding old data requires finding the node on which the Pod is located
While there are community solutions such as Local Provisioner that can circumvent issues 3 and 5 by using a separate Logic volume for a separate Local PV, this does not mask the problem of mass use of local PVS in production.
A typical scenario is that when a cluster has N working nodes, if a Pod drifts n-1 times, all nodes will store the Pod persistent data, and at least n-2 copies of the data except the last one will be invalid. If you do not clear the storage resources in a timely manner, a large number of storage resources on the node are consumed, and the number of directories on the node is too large to match the owning Pod. If a Pod uses a path with the same name, data may be corrupted or even conflict.
Another scenario is that if the volume size cannot be precisely limited, it will be necessary to plan which nodes to place Pod on before work, and it may conflict with other resource scheduling, resulting in huge workload.
Local volumes are required for stateful applications that have heavy IOPS requirements, such as mysql and Kafka. The industry mainstream is to recommend storage and computing separation, schedulers no longer consider complex storage issues. If local volumes are reserved in some scenarios, most lightweight and stateless applications will use back-end storage as persistent volume storage, which makes application and deployment more flexible.