Trim is commonly used as a way to notify SSD drive that some part of the data it holds is garbage and can be discarded, it extends SSD lifetime and releases otherwise reserved space on the disk. Turns out it’s not the only scenario where trim comes handy. Compression is a popular way to save on space usage when deploying virtual machines and it seems to be a good option especially that the performance penalty is marginal.
But there’s one compression-specific issue some admins wasn’t aware of or misunderstood. It might cause serious problems impacting all KVM guests on such storage if it’s not addressed. I will describe it in the example scenario below, note that it’s not ZFS-specific issue.
Let’s say we:
-
defined ZFS pool ‘t1’:
1 2 3 4 |
hv:~# zpool create t1 mirror sda9 sdb9 hv:~# zfs get compression t1 NAME PROPERTY VALUE SOURCE t1/guest1 compression off local |
-
created some sparse (thin provisioned) datasets in that pool, one per KVM guest:
1 2 3 4 |
hv:~# zfs create -o compression=on -V 100G -s t1/guest1 hv:~# zfs get compression t1/guest1 NAME PROPERTY VALUE SOURCE t1/guest1 compression on local |
-
created KVM guest which will use compressed dataset we just made:
1 2 3 4 |
hv:~# virt-install --virt-type kvm --name C6-guest1 --ram 8096 --disk /dev/zvol/t1/guest1 \ --network bridge=bridge4,model=virtio --graphic vnc,password=trivialpw,listen=0.0.0.0 \ --noautoconsole --os-type=linux --os-variant=centos6.10 \ --cdrom=/opt/ISOs/CentOS-6.9-x86_64-minimal.iso --vcpus=8 |
Initial space usage on the pool and inside the KVM guest:
1 2 3 4 5 6 7 8 9 10 11 |
hv:~# zfs list -r t1 ; zpool list t1 NAME USED AVAIL REFER MOUNTPOINT t1 584M 27.5G 96K /t1 t1/guest1 583M 27.5G 583M - NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT t1 29G 584M 28.4G - - 0% 1% 1.00x ONLINE - # Inside KVM guest: [root@C6-guest1 ~]# df -h / Filesystem Size Used Avail Use% Mounted on /dev/vda1 99G 719M 93G 1% / |
Now amount of data (let it be 10GB of random data) deleted inside KVM guest will be the amount of free space immediately available back for that KVM guest.
1 2 3 4 5 6 7 |
[root@C6-guest1 ~]# touch 10GB.dat ; shred -n1 -s10G 10GB.dat ; df -h / Filesystem Size Used Avail Use% Mounted on /dev/vda1 99G 11G 83G 12% / [root@C6-guest1 ~]# rm -f 10GB.dat ; df -h / Filesystem Size Used Avail Use% Mounted on /dev/vda1 99G 719M 93G 1% / |
Outside the guest the dataset still shows either the same space usage as it was before deletion – or released amount is so much smaller than the size of the data deleted:
1 2 3 4 5 6 |
hv:~# zfs list -r t1 ; zpool list t1 NAME USED AVAIL REFER MOUNTPOINT t1 10.7G 17.4G 96K /t1 t1/guest1 10.7G 17.4G 10.7G - NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT t1 29G 10.7G 18.3G - - 0% 36% 1.00x ONLINE - |
this is because the data wasn’t really deleted but only marked as available for overwriting whenever the OS needs to write data again. As a result, there is now 10GB of garbage data on the VM’s storage, from VM’s OS perspective there is no problem as it perceives marked garbage as 10GB of available space – however outside the VM it’s a problem – the ZFS pool with compressed dataset have no idea it now holds 10GB of garbage because from the outside of KVM guest there is no way to tell which data is garbage and which isn’t. It’s 10GB less space available on the ZFS pool, affecting remaining KVM guests.
That’s where TRIM would come handy.
1 2 |
[root@C6-guest1 ~]# fstrim -v / fstrim: /: FITRIM ioctl failed: Operation not supported |
Bummer. In many cases the virtual controller defaults to virtio-blk (for example – current version of Softaculous’s Virtualizor) which doesn’t support discard feature (actually virtio-blk supports discard since kernel 5.0 but that version might be not an option for many reasons) so the workaround needs to be used.
In order to release the 10GB of garbage data from ZFS dataset without TRIM, the Guest’s OS would need to overwrite garbage with zeros (or any other stream with high compression ratio), so then the ZFS’s dataset can start compressing 10GB of data made just from a single character, therefore reducing it to – wild guess – let’s say few bytes/kilobytes:
1 2 3 4 5 6 |
[root@C6-guest1 ~]# dd if=/dev/zero of=zero bs=4M ; rm -f zero dd: writing `zero': No space left on device 24910+0 records in 24909+0 records out 104477106176 bytes (104 GB) copied, 801.62 s, 130 MB/s |
After that the 10GB on the dataset has been reclaimed:
1 2 3 4 5 6 |
hv:~# zfs list -r t1 ; zpool list t1 NAME USED AVAIL REFER MOUNTPOINT t1 559M 27.5G 96K /t1 t1/guest1 558M 27.5G 558M - NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT t1 29G 561M 28.5G - - 0% 1% 1.00x ONLINE - |
However TRIM is the preferred way.
For the TRIM to work inside KVM guest, the virtual disk must support ‘discard’ feature, here’s how:
1. Create a new file named new-virtio-scsi-ctl.xml with SCSI controller definition, SCSI drives will be attached to it, add the content below and save the file:
1 |
<controller type='scsi' model='virtio-scsi' index='5'/> |
2. Define new SCSI drive – create new file new-virtio-scsi-drive.xml, fill it as below, note that the ‘controller’ number must match the one we just defined above:
1 2 3 4 5 6 |
<disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none' discard='unmap'/> <source dev='/dev/sdb8'/> <target dev='sdb' bus='scsi'/> <address type='drive' controller='5' bus='0' target='0' unit='1'/> </disk> |
3. Import both controller and new disk definition, to our example KVM guest named C6-guest-13:
1 2 3 4 5 |
# virsh list Id Name State -------------------------------- 37 C6-guest-13 running |
The controller:
1 2 |
# virsh attach-device 37 --config --live new-virtio-scsi-ctl.xml Device attached successfully |
Inside the KVM guest, dmesg will show something like:
1 |
scsi host2: Virtio SCSI HBA |
The drive:
1 2 |
# virsh attach-device 37 --config --live new-virtio-scsi-drive.xml Device attached successfully |
And the guest’s dmesg will show:
1 |
sd 2:0:0:1: [sdb] Attached SCSI disk |
Lets quickly test the drive for fstrim support, execute as below inside KVM guest (bc and lsscsi needed):
1 2 3 4 5 6 |
# new virtio-scsi drive name as seen on the Guest drv=sdb # send commands to fdisk p=$(echo -e "o\nn\np\n1\n\n$(bc<<<5*1024^3/$(cat /sys/block/$drv/queue/hw_sector_size))\np\nw"| \ fdisk /dev/${drv}|grep ^/|cut -d' ' -f1)&&mkfs.ext4 \ 1>/dev/null ${p};mount -v ${p} /mnt;lsscsi;fstrim -v /mnt |
The output should look as below:
1 2 3 4 5 6 |
Building a new DOS disklabel with disk identifier 0xc58806bb. mke2fs 1.42.9 (28-Dec-2013) mount: /dev/sdb1 mounted on /mnt. [0:0:0:0] cd/dvd QEMU QEMU DVD-ROM 2.5+ /dev/sr0 [6:0:0:0] disk QEMU QEMU HARDDISK 2.5+ /dev/sdb /mnt: 4.8 GiB (5128200192 bytes) trimmed |
That’s it.
In order to convert existing ‘virtio’ storage to ‘virtio-scsi’ just add SCSI controller and adjust existing storage definition to match the new controller as an example above.
https://interviewtip.net
Hey very nice blog!