How to Compress and Deduplicate Storage with VDO in CentOS/RHEL 8

Describing Virtual Data Optimizer

CentOS/RHEL 8 includes the Virtual Data Optimizer (VDO) driver, which optimizes the data footprint on block devices. VDO is a Linux device-mapper driver that reduces disk space usage on block devices, and minimizes the replication of data, saving disk space and even increasing data throughput. VDO includes two kernel-modules: the kvdo module to transparently control data compression, and the uds module for deduplication.

The VDO layer is placed on top of an existing block storage device, such as a RAID device or a local disk. Those block devices can also be encrypted devices. The storage layers, such as LVM logical volumes and file systems, are placed on top of a VDO device. The following diagram shows the placement of VDO in an infrastructure consisting of KVM virtual machines that are using optimized storage devices.

VDO-based virtual machines RHEL 8

VDO applies three phases to data in the following order to reduce the footprint on storage devices:

  1. Zero-Block Elimination filters out data blocks that contain only zeroes (0) and records the information of those blocks only in the metadata. The nonzero data blocks are then passed to the next phase of processing. This phase enables the thin provisioning feature in the VDO devices.
  2. Deduplication eliminates redundant data blocks. When you create multiple copies of the same data, VDO detects the duplicate data blocks and updates the metadata to use those duplicate blocks as references to the original data block without creating redundant data blocks. The universal deduplication service (UDS) kernel module checks the redundancy of the data through the metadata it maintains. This kernel module ships as part of the VDO.
  3. Compression is the last phase. The kvdo kernel module compresses the data blocks using LZ4 compression and groups them into 4 KB blocks.

Implementing Virtual Data Optimizer

The logical devices that you create using VDO are called VDO volumes. VDO volumes are similar to disk partitions; you can format the volumes with the desired file-system type and mount it like a regular file system. You can also use a VDO volume as an LVM physical volume. To create a VDO volume, specify a block device and the name of the logical device that VDO presents to the user. You can optionally specify the logical size of the VDO volume. The logical size of the VDO volume can be more than the physical size of the actual block device.

Because the VDO volumes are thinly provisioned, users can only see the logical space in use and are unaware of the actual physical space available. If you do not specify the logical size while creating the volume, VDO assumes the actual physical size as the logical size of the volume. This 1:1 ratio of mapping logical size to physical size gives better performance but provides less efficient use of storage space. Based on your infrastructure requirements, you should prioritize either performance or space efficiency.

When the logical size of a VDO volume is more than the actual physical size, you should proactively monitor the volume statistics to view the actual usage using the vdostats –verbose command.

Enabling VDO

Install the vdo and kmod-kvdo packages to enable VDO in the system.

[root@host ~]# yum install vdo kmod-kvdo
...output omitted...
Is this ok [y/N]: y
...output omitted...
Complete!

Creating a VDO Volume

To create a VDO volume, run the vdo create command.

[root@host ~]# vdo create --name=vdo1 --device=/dev/vdd --vdoLogicalSize=50G
...output omitted...

If you omit the logical size, the resulting VDO volume gets the same size as its physical device. When the VDO volume is in place, you can format it with the file-system type of your choice and mount it under the file-system hierarchy on your system.

Analyzing a VDO Volume

To analyze a VDO volume, run the vdo status command. This command displays a report on the VDO system, and the status of the VDO volume in YAML format. It also displays attributes of the VDO volume. Use the –name= option to specify the name of a particular volume. If you omit the name of the specific volume, the output of the vdo status command displays the status of all the VDO volumes.

[root@host ~]# vdo status --name=vdo1
...output omitted...

The vdo list command displays the list of VDO volumes that are currently started. You can start and stop a VDO volume using the vdo start and vdo stop commands, respectively.