As for the use of local storage, the conventional use methods are divided into hostPath, local PV (after 1.14 GA) and self-developed CSI.

The working principle of

  1. hostPath

    As the name implies, using the host’s disk and directory, directly mount the host specified directory to the current host specified container.

  2. local PV

    Like hostPath, it is also the disk and directory used by the host. This feature, since 1.14 GA, complies with PV and PVC lifecycle and management logic. The basic principle is pre-planning and creation. PV is created for each node, and PVC is declared when the user uses it, and the PVC chooses the appropriate PV for binding. Administrators need to create, declare, and delete a PV.

  3. Local Paper CSI(omitted)

contrast

hostPath Local PV The local volume CSI Cloud drive CSI
Storage resources The local store The local store The local store CBS
use volumes[N].hostPath Create a PVC Create a PVC Create a PVC
Supply of PV X Static, manually created in advance Dynamic, created on demand Dynamic, created on demand
Capacity limits Unable to limit (dependent on logical volumes) Unable to limit (dependent on logical volumes) Square root Square root
Volume increase A hostPath cannot be independently expanded A local PV cannot be independently expanded Square root Square root
Volume drift X X X Square root
The behavior of a volume when Pod drifts Re-create a directory on a different node. If a directory with the same name exists on the node, the directory is reused. Data may be distorted, and old data may remain to be cleared It is necessary to solve the binding relationship between PV and PVC first, and then make PVC reselect PV (triggered by deleting PVC and rebuilding PVC), and the old data remains to be cleaned up Cooperate with Operator to realize the recycling station function, so that the old and new PV and PVC can be retained Drift with Pod
Delete applications and delete data Manually or as a script, you need to clear the directories with the same names of all nodes The administrator deletes a PV. Follow-up operations are the same as those on hostPath The user can delete the PV by deleting the PVC (according to the deletion policy); You can also delete a PV as an administrator The user can delete the PV by deleting the PVC (according to the deletion policy); You can also delete a PV as an administrator
If the application exists, delete invalid data generated by failover Manually or as a script, you need to clear the directory with the same name of other nodes except the node where the current Pod resides With hostPath Cooperate with Operator to clean up the recycle bin No invalid data is generated

conclusion

HostPath is the original use of a local volume, and its basic functions are smaller than or equal to local PV. Therefore, it is recommended not to use hostPath in production.

The main problem of local PV is that the operation and maintenance cost is high, and the volume cannot drift

  1. The node must have abundant local storage resources; otherwise, the storage becomes a bottleneck.
  2. PV resources are created in advance and cannot dynamically respond to creation requests.
  3. There is no size control function, and paths with the same name may be mixed, resulting in poor isolation
  4. Data cleaning is frequent and heavy
  5. Capacity expansion relies on manual operations and cannot be accurately expanded
  6. The volume cannot drift. Finding old data requires finding the node on which the Pod is located

While there are community solutions such as Local Provisioner that can circumvent issues 3 and 5 by using a separate Logic volume for a separate Local PV, this does not mask the problem of mass use of local PVS in production.

A typical scenario is that when a cluster has N working nodes, if a Pod drifts n-1 times, all nodes will store the Pod persistent data, and at least n-2 copies of the data except the last one will be invalid. If you do not clear the storage resources in a timely manner, a large number of storage resources on the node are consumed, and the number of directories on the node is too large to match the owning Pod. If a Pod uses a path with the same name, data may be corrupted or even conflict.

Another scenario is that if the volume size cannot be precisely limited, it will be necessary to plan which nodes to place Pod on before work, and it may conflict with other resource scheduling, resulting in huge workload.

Local volumes are required for stateful applications that have heavy IOPS requirements, such as mysql and Kafka. The industry mainstream is to recommend storage and computing separation, schedulers no longer consider complex storage issues. If local volumes are reserved in some scenarios, most lightweight and stateless applications will use back-end storage as persistent volume storage, which makes application and deployment more flexible.