Docker prefers volume as a mechanism for persisting data.
A volume is a single accessible storage area (logical drive) with a single file system at a hard disk or storage. Docker documentation lists the following advantages of volumes to place containers.
- Volumes are easier to back up or migrate.
- You can manage volumes using Docker CLI commands or the Docker API.
- Volumes work on both Linux and Windows containers.
- Volumes can be more safely shared among multiple containers.
- Volume drivers allow you to store volumes on remote hosts or cloud providers, to encrypt the contents of volumes, or to add other functionality.
- A new volume’s contents can be pre-populated by a container.
By default, Docker and Kubernetes uses a default file system to write the data such as log files, application data and any temporary data. Due to the transient nature of containers, the data will be lost when the container crashes, exit and migrated to another host. A container is ephemeral.
Kubernetes Pod (Group of Containers) is also transient. If the Pod is deleted or recreated, any data stored in the shared volume is lost.
Kubernetes supports a variety of persistent volumes.
Adding a persistent volume entry into deployed component configuration file creates tight coupling infrastructure architecture. It is important to abstract the infrastructure that provides volumes and file systems.
Kubernetes provides the following mechanisms to enable the abstraction layer;
Persistent Volume Claim (PVC) – This decoupling architecture enable Kubernetes bind the Persistent Volume and PVC. The Kubernetes administrator can simply enter the size of the volume in the configuration file.
Dynamic Provisioning of Persistent Volumes – It can be cost efficient at Public Cloud environments.
Stateful set architecture – It creates a bond between the pod and the Persistent volume that helps the administrator to configure the auto scaling of pods with Persistent Volumes simultaneously. The stateful set can be scalable.
The following section describes the common volume types from the leading Public Cloud providers such as AWS, GCP and Azure.
Amazon Web Services
AWS Elastic Block Store (EBS)
The limitations of AWS EBS are as follows
- Single EBS Volume to a Single EC2 Instance.
- Pods must run on AWS EC2 instances as nodes.
- Pods can access EBS volumes only within the availability zones.
The volume restriction for the single availability zone eliminates the ability of the DR, meaning the file systems can’t be replicated and synchronized at the different Region for DR purposes.
AWS Elastic File System (EFS)
This managed NFS services provides following advantages over AWS EBS
- Auto Scale-up and Scale-down.
- More than one EC2 instance can access the same files that are distributed across multiple availability zones (strictly within the same region).
- On-premise server integration to EFS over VPN.
- A Persistent Volume can be attached to multiple Pods of Kubernetes.
- SSD drives of EFS that are automatically replicated across availability zones.
Google Compute Engine
The GCE persistent disk is very similar to AWS EBS with the difference of the same volume can be used as read-only on multiple instances. Therefore, the user can use a GCE persistent disk to share data as read-only between multiple pods in the same availability zone.
The Azure data disk is a virtual hard disk stored in Azure storage. It is similar to AWS EBS.
In addition to this, Azure also has a shared file storage provisioning that uses SMB/CIFS protocol. It also mandates the customers to install cifs-utils package on each client VM.
Several customer implementations uses GlusterFS and Ceph as an abstraction layer for the underlying storage systems. GlusterFS supports network filesystem and Ceph is an object store.
There is no perfect solution for container provisioning on storage systems.
Rook may be the answer.
From their website;
Rook (www.rook.io) is an open source orchestrator for distributed storage systems running in Kubernetes. Rook is currently in alpha state and has focused initially on orchestrating Ceph on top of Kubernetes.
Rook turns distributed storage software into a self-managing, self-scaling, and self-healing storage services. It does this by automating deployment, bootstrapping, configuration, provisioning, scaling, upgrading, migration, disaster recovery, monitoring, and resource management. Rook uses the facilities provided by the underlying cloud-native container management, scheduling and orchestration platform to perform its duties.
Rook integrates deeply into cloud native environments leveraging extension points and providing a seamless experience for scheduling, lifecycle management, resource management, security, monitoring, and user experience.
Kuberiter is a SaaS start-up that focuses on Multi-Cloud DevOps deployment. Please do contact me if you need any assistance to provision your storage systems for Docker and Kubernetes.