Using Data Deduplication and Compression with VDO on RHEL 7 and 8

Storage deduplication technology has been on the market for quite some time now. Unfortunately all of the implementations have been vendor specific proprietary software. With VDO, there is now an open source Linux native solution available.

Red hat has introduced VDO (Virtual Data Optimizer) in RHEL 7.5, a storage deduplication technology bough with Permabit in 2017. Of course it has been open sourced since then.

In contrast to ZFS which provides the same functionality on the file system level, VDO is an inline data reduction which works on block device level, it is file system agnostic.

Use cases

There are basically two major use cases: VM Storage and Object Storage Backends.

VM Storage

The main use case is storage for virtual machines where a lot of data is redundant, i.e. the base operating system of the VMs. This allows to deduplicate the data on disk on a large scale, think about 100 VMs where the operating system takes about 5Gbyte each will be reduced to approx. 5 Gbyte instead of 500 Gbyte.

Typically VM storage can be over committed by factor 10.

Object- and Block storage backends

As a backend for CEPH and Glusterfs, it is recommended to not over commit more than factor 3. The reason for the lower over commitment is that the storage administrator usually does not know what kind of data will be stored on it.

Availability

VDO is available since RHEL 7.5, it is included in the base subscription. At the moment it is not available for Fedora (yet).

The source code is available on github:

At the moment the Kernel code is not yet in the upstream Mainline Kernel, it is ongoing work to get it into the Mainstream Kernel.

Typical setup

Physical disk -> VDO -> Volumegroup -> Logical volume -> file system.

Block device can be a physical disk (or a partition on it), multi path device, LUKS disk, or a software RAID device (md or LVM RAID).

Restrictions

You can not use LVM cache, LVM snapshots and thin provisioned logical volumes on top of VDO. Theoretically you can use LUKS on top of VDO, but it makes no sense because there is nothing to deduplicate. Needless to say that VDO on top of a VDO device does not make any sense as well. Be aware that you can not make use of partitioning or (LVM) Raid on top of VDO devices, all that things should be done in the underlying layer of VDO.

When using SAN, check if your storage box already does deduplication. In this case VDO is useless for you.

Installation

Its straight forward:

[root@vdotest ~]# yum -y install vdo kmod-kvdo

Create the VDO volume

In this test case, I attached a 110Gbyte disk, created a 100 GByte partition and will over commit it by factor 10.

Warning! As of writing this article, never use a whole physical disk, use a partition instead and leave some spare space in the disk to avoid data loss! (see further below)

[root@vdotest ~]# vdo create --name=vdo1 --device=/dev/vdb1 --vdoLogicalSize=1T

Creating volume group, logical volume and file system on top of the VDO volume

[root@vdotest ~]# pvcreate /dev/mapper/vdo1
[root@vdotest ~]# vgcreate vg_vdo /dev/mapper/vdo1
[root@vdotest ~]# lvcreate -n lv_vdo vg_vdo -L 900G
[root@vdotest ~]# mkfs.xfs -K /dev/vg_vdo/lv_vdo
[root@vdotest ~]# echo "/dev/mapper/vg_vdo-lv_vdo       /mnt    xfs     defaults,x-systemd.requires=vdo.service 0 0" >> /etc/fstab

Display the whole stack

[root@vdotest ~]# lsblk /dev/vdb
NAME                MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vdb                 252:16   0  110G  0 disk 
└─vdb1              252:17   0  100G  0 part 
  └─vdo1            253:7    0    1T  0 vdo  
    └─vg_vdo-lv_vdo 253:8    0  900G  0 lvm  /mnt
[root@vdotest ~]#

Populate the disk with data

The ideal test for VDO is to put some real-life VM-Images to the file system on top of it. In this case I scp’ed three IPA server and some instances to that file system. This kind of systems are all quite similar, the disk space saved is tremendous. The total size of the vm images is 105G

Lets have a look:

[root@vdotest ~]# df -h /mnt
Filesystem                 Size  Used Avail Use% Mounted on
/dev/mapper/vg_vdo-lv_vdo  900G  105G  800G  12% /mnt
[root@vdotest ~]# 
[root@vdotest ~]# ll -h /mnt
total 101G
-rw-------. 1 root root 21G Dec 17 09:41 ipa1.lab.delouw.ch.qcow2
-rw-------. 1 root root 21G Dec 17 09:47 ipa1.ldelouw.ch
-rw-------. 1 root root 21G Dec 17 09:54 ipa2.ldelouw.ch
-rw-r--r--. 1 root root 21G Dec 17 09:58 ipaclient-rhel6.home.delouw.ch
-rw-------. 1 root root 21G Dec 17 10:03 ipatest.delouw.ch.qcow2
[root@vdotest ~]#

Lets use the vdostats utility to display the actually used storage on disk:

[root@vdotest ~]# vdostats --si
Device                    Size      Used Available Use% Space saving%
/dev/mapper/vdo1        107.4G     15.2G     92.2G  14%           89%
[root@vdotest ~]#

Performance Tuning

There are a lot of parameters that can be changed. Unfortunately the documentation available at the moment is rudimentary, thus its more a guesswork than facts.

  • Number of worker threads of different kind
  • Enable or Disable compression

On machines with a lot of CPUs using more threads than the defaults can dramatically boost performance. man 8 vdo gives a glimpse of the different parameters related to threads.

Compression is a quite expensive operation. On top of that, depending on the kind of data you are storing, it does not make much sense to use compression (Well, deduplication is kind of compression as well).

Pitfalls

Be aware! With every storage deduplication solution there comes a big pitfall: The logical volume on top of VDO shows free disk space while the actual disk space on the physical disk can be (almost) exhausted. You need to carefully monitor the actual disk usage.

The fill grade can rapidly change if the data to be stored contains a lot of non-deduplicatable and/or compressible data. A good example is virtual machine images containing a LUKS encrypted disk, In such a case, use LUKS on the storage, not on the VM level.

Even if you update one virtual machine, the delta to other machine images will grow and less physical space is available.

VDO comes with a few Nagios plugins which are very useful for alerting administrators in the cause the available physical disk is filling up. They are located in /usr/share/doc/vdo/examples/nagios

According to df -h, on my test system there is still 800 Gbyte available. What happens if I store my 700 Gbyte Satellite 6 image? The data is mostly RPMs which are already compressed quite well. Lets see….

After a transfer of approx 155 Gbyte, the physical disk got full and the file system is inaccessible. I was hitting the worst case that can happen: Complete and unrecoverable data loss.

The df command shows some 241 Gbyte free.

[root@vdotest ~]# df -h |grep mnt
/dev/mapper/vg_vdo-lv_vdo                900G  241G  660G  27% /mnt
[root@vdotest ~]#

The vdostat command tells a different story, like expected.

[root@vdotest ~]# vdostats --si
Device                    Size      Used Available Use% Space saving%
/dev/mapper/vdo1        107.4G    107.4G      0.0B 100%           59%
[root@vdotest ~]# 

When attempting to access the data, there will be an I/O error.

[root@vdotest ~]# ll -h /mnt
ls: cannot access /mnt: Input/output error
[root@vdotest ~]# 

Thats bad. I mean really bad. The device is not accessible anymore.

xfs_repair does not work. Do not attempt to make use of the -L option! Your file system will be gone.

Recovering from a full physical disk

Lets resize the partition instead. First unmount the file system

[root@vdotest ~]# umount /mnt

Delete and recreate the partition using fdisk

[root@vdotest ~]# fdisk /dev/vdb
Welcome to fdisk (util-linux 2.23.2).

Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.


Command (m for help): d
Selected partition 1
Partition 1 is deleted

Command (m for help): n
Partition type:
   p   primary (0 primary, 0 extended, 4 free)
   e   extended
Select (default p): 
Using default response p
Partition number (1-4, default 1): 
First sector (2048-41943039, default 2048): 
Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-41943039, default 41943039): 
Using default value 41943039
Partition 1 of type Linux and of size 20 GiB is set

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table. The new table will be used at
the next reboot or after you run partprobe(8) or kpartx(8)
Syncing disks.
[root@vdotest ~]# partprobe
[root@vdotest ~]#
[root@vdotest ~]# vdo growPhysical -n vdo1

Run a file system check.

Now you are able to mount the file system again and your data is available again.

Documentation

Red Hat maintains a nice documentation about storage administration, VDO is covered by an own chapter. https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html-single/storage_administration_guide/#vdo

Conclusion

The technology is very interesting and will kick some ass. Storage deduplication will be more and more important, with VDO there is now a Linux native solution for that.

At the moment it is quite dangerous to use VDO in production. Filling up a physical disk without spare space is an unrecoverable error, a complete data loss. That means: Always create the VDO device on top of a partition that is not using the whole disk or another device that can grow in size to prevent data loss.

If you plan to use VDO in production make sure you have a proper monitoring in place that alerts quite ahead of time to be able to take corrective action.

Nevertheless: Its cool stuff and I’m sure the current situation will be fixed soon.

Audit your systems for security compliance with OpenSCAP

OpenSCAP logoIntroduction to (Open)SCAP

SCAP stands for Security Content Automation Protocol. It is an open standard which defines methods for security policy compliance, vulnerability management and measurement etc. This article focuses on the operating system compliance part of SCAP.

It comes originally from the US National Institute of Standards and Technology (NIST) to provide a way for US government agencies to audit its systems for regulatory compliance.

OpenSCAP is a NIST validated open source implementation of SCAP.

Why should I make use of OpenSCAP anyway?

Lot of people will ask this question to them self, in particular System Administrators and Engineers since they are not IT Security Officers.

The simple answer is that you just sit down with the IT Security Officer once and define which systems need to be compliant to what regulatory, With OpenSCAP you can always ensure the systems are configured according the the policy (or policies).

Organizations that need to be compliant according to a official policy will sooner or later facing an external security audit. I experienced that several times, its a nightmare. If you can proof that your systems are scanned regularly with the SCAP standard, you will be very well prepared, an external auditor will not bug you for a long time.

Abbreviations, abbreviations, abbreviations

Its obvious, government agencies love abbreviations 😉 Lets explain the two most important ones.

XCCDF

Extensible Configuration Checklist Description Format. This files, i.e. /usr/share/xml/scap/ssg/content/ssg-rhel7-xccdf.xml contain descriptions used for auditing a system against compliance to a policy.

This files are usually included in your distribution and are updated if needed.

OVAL

Open Vulnerability and Assessment Language. Its used to detect vulnerabilities and patches.

Since vulnerabilities and patches are popping up very quickly they need to be downloaded and distributed to all systems to be audited on a regular base (i.e. daily).

OVAL files can be downloaded as listed below:

Organizations using System Management Tools such as Red Hat Satellite or SUSE Magager will not profit from OVAL patch scans as those products will report which patches have been applied or not by themself. Nevertheless, additional OVAL scans add the benefit of vulnerability scanning regardless of installed patches.

More Abbreviations

More abbreviations and a short description of them can be found here: https://www.open-scap.org/resources/acronyms/

OpenSCAP Scap Security Guide (SSG)

There are a lot of regulations out there. Government of some countries releases policies and sometimes SCAP content for some Operating Systems, mostly RHEL and Windows. The SSG Project works on collecting and implementing content for this policies for the operating systems as well as for some other software such as JBoss. Included in the scap-security-guide are the most important US Government and PCI-DSS for RHEL. Only available for Debian at the moment is the content for the French ANSSI DAT-NT28.

The only Linux distributions I’m aware of that provides packages for scap-security-guide are RHEL and Fedora. However, upstream there is some content for more distributions available. I really hope that all important and fine distributions such as SLES, Debian and Ubuntu will jump on the bandwagon.

Regulations covered by OpenSCAP SSG

Here a list of what is available for the most important Linux distributions.

Red Hat Enterprise Linux 7

  • PCI-DSS (Payment Card Industry – Data Security Standard), Commercial – USA
  • C2S (Commercial Cloud Services), Government – USA
  • USGCB/STIG (United States Government Configuration Baseline/Security Technical Implementation Guide), Government – USA
  • CNSSI 1253 (Committee on National Security Systems), Government – USA
  • CJIS (Criminal Justice Information Services), Government – USA

Debian and Ubuntu

Officially there is nothing available. Its is currently under development, see https://github.com/OpenSCAP/scap-security-guide/tree/master/Ubuntu/16.04 and https://github.com/OpenSCAP/scap-security-guide/tree/master/Debian/8.

As of 2017-03-04 compiling fails.

  • ANSSI DAT-NT28 (Agence nationale de la sécurité des systèmes d’information), Government – France

Suse Linux Entrprise Server

Suse does not provide the scap-security-guide package and there is no XCCDF content for regulatory compliance checks delivered by Suse. However, some basic tests are available. It is not clear if Suse has some plans to join the scap-security-guide community, would be nice to see that. SLES customers can open a support case at https://scc.suse.com/login and ask for enhancement.

Using SCAP content without scap-security-guide

You can make use of SCAP content without the OpenSCAP security guide. Its rather complex and not covered in this article.

Installing the required packages

RHEL 7

[root@server ~]# yum -y install scap-security-guide

All required dependencies will be installed as well

Debian and Ubuntu

root@ubuntu:~# aptitude install python-openscap

All required dependencies will be installed as well

SLES12sp2

sles12sp2:~ # zypper install openscap openscap-content openscap-extra-probes openscap-utils

All required dependencies will be installed as well

Tailoring profiles

For most users it is probably too much to secure its systems according to military standards which includes turning off USB support and the like.

The most important civil regulatory by far is PCI-DSS. Each company handling kind of Credit- or Debitcard data must obey the current standard. As of writing this article this is version 3.2.

PCI-DSS is a de-facto standard in Enterprise Linux environments.

Of course it makes sense for all kind of companies to secure its systems. On systems which are not exposed, security policies can be more relaxed.

Also good to know is that some tests simply do not apply to your system. I.e. if you are using a centralized identity management software such as Redhat IdM with IPA or Microsoft Active Directory then the central instance will take care about the password policies, not the particular system to be audited.

Installation of the SCAP Workbench

The Scap Workbench is available in RHEL to be installed by yum, a binary for Windows and Mac OS is available as well. Needless to say that the source code is available.

Downloads: https://github.com/OpenSCAP/scap-workbench/releases

Usage

In the following examples, we disable the check for AIDE.

SCAP-Workbench Screencast

SCAP-Workbench Screencast

You can save the tailoring file as a single XML file or even better safe it as an RPM for easy distribution to all your systems.

Scanning

The usage is the same on all tested Linux distributions. Be aware, XCCDF scanning makes no sense w/o any SCAP content. If your distribution does not provide you the necessary data, 3rd party providers may.

RHEL 7 comes with the scap-workbench which is GUI that allows you to scan the local or remote systems via SSH. The scap-workbench is a nice tool to scan a handful of servers manually but not to scan a whole zoo of servers.

You also can scan your systems with the CLI on the host itself. Kind of automation can be done with i.e with Ansible.

Manual Scan

The oscap info command gives you an overview which profiles are available.

[root@server ~]# oscap info /usr/share/xml/scap/ssg/content/ssg-rhel7-xccdf.xml
Document type: XCCDF Checklist
Checklist version: 1.1
Imported: 2017-02-14T13:33:08
Status: draft
Generated: 2017-02-14
Resolved: true
Profiles:
        standard
        pci-dss
        C2S
        rht-ccp
        common
        stig-rhel7-workstation-upstream
        stig-rhel7-server-gui-upstream
        stig-rhel7-server-upstream
        ospp-rhel7-server
        nist-cl-il-al
        cjis-rhel7-server
Referenced check files:
        ssg-rhel7-oval.xml
                system: http://oval.mitre.org/XMLSchema/oval-definitions-5
        ssg-rhel7-ocil.xml
                system: http://scap.nist.gov/schema/ocil/2
        http://www.redhat.com/security/data/oval/Red_Hat_Enterprise_Linux_7.xml
                system: http://oval.mitre.org/XMLSchema/oval-definitions-5
[root@server ~]# 

Lets choose pci-dss and start a scan:

[root@server ~]# oscap xccdf eval --profile pci-dss --results scan.xml --report scan.html /usr/share/xml/scap/ssg/content/ssg-rhel7-xccdf.xml
Title   Ensure Red Hat GPG Key Installed
Rule    ensure_redhat_gpgkey_installed
Ident   CCE-26957-1
Result  pass

Title   Ensure gpgcheck Enabled In Main Yum Configuration
Rule    ensure_gpgcheck_globally_activated
Ident   CCE-26989-4
Result  pass
[Lot of Output immited]

The parameter –results saves the result in a HTML file.

Automated scanning with Redhat Satellite 6

Users of Redhat Satellite 6 can schedule scans of large server farms. The screenshots shows you how compliance tests can be presented to a IT Security Officer.

Compliance Report

Compliance Overview

The Compliance report shows a overview of hosts and a brief look at how many test have been failed.

Compliance Report Detail view

Compliance Report Detail view

The Compliance report detail shows which test have been failed. It also provides a description of each topic.

Host details

Host details

The detail view of a host shows that this host is not compliant. In this case, security errata must be applied and the host must be reconfigured to get compliant to the security policy.

Alternatives to OpenSCAP

There are a few alternatives to OpenSCAP as listed by the NIST’s Security Content Automation Protocol Validated Products.

Further reading

Configure SSSD to work on IPv6-only Hosts

SSSD is used for the client side of IPA and other centralized Identity Management Services. Unfortunately it does not behave as it should. The default is to look up first IPv4 addresses and if that fails IPv6 should be used. Well, if IPv4 fails, the whole request fails and you got weird error messages when joining an IPA domain.

As the pool for IPv4 addresses is depleted, IPv6 is getting more and more important. Thus, IPv6-only hosts are on the rise.

Here is an example error message from the IPA client.

[root@ipv6host ~]# ipa-client-install
[output ommited] 
SSSD enabled
Configured /etc/openldap/ldap.conf
Unable to find 'admin' user with 'getent passwd admin@example.com'!
Unable to reliably detect configuration. Check NSS setup manually.
[output ommited]

The host itself gets properly joined to the IPA domain and authentication works with Kerberos but you can not log in because SSSD fails.

Workaround

Configure SSSD to only use IPv6. This is done in /etc/sssd/sssd.conf

[domain/example.com]
lookup_family_order = ipv6_only
cache_credentials = True
krb5_store_password_if_offline = True
ipa_domain = example.com
id_provider = ipa
auth_provider = ipa
access_provider = ipa
ipa_hostname = ipv6host.example.com
chpass_provider = ipa
ipa_server = _srv_, ipa1.example.com
ldap_tls_cacert = /etc/ipa/ca.crt
[sssd]
services = nss, sudo, pam, ssh

domains = example.com
[nss]
homedir_substring = /home

[pam]

[sudo]

[autofs]

[ssh]

[pac]

[ifp]

Solution

At the moment there is no solution yet (just the workaround described), but its addressed at the SSSD project team, as you can see in https://pagure.io/SSSD/sssd/issue/2128 and https://bugzilla.redhat.com/show_bug.cgi?id=1021435

Happy IPv6-ing 🙂

Secure your system with SELinux

SELinux Logo

SELinux Logo

Introduction to SELinux

SELinux is well known as the most sophisticated Linux Mandatory Access Control (MAC) System. If you install any Fedora or Redhat operating System it is enabled by default and running in enforcing mode. So far so good.

Its available for many years and its not rocket science to use it. This article is supposed to give you some hints how to make your system even more secure and how to solve some troubles SELinux may have on your system.

DAC vs. MAC

Linux and traditional Unix systems are using DAC (Discretionary Access Control). Every user can change access rights to its own files. SELinux is a MAC (Mandatory Access Control) System where access rights are ruled by system wide policies. This can cause confusion when access is denied to a resource. Be aware that DAC will kick in before SELinux policies do. So if access to a resource is denied, please check access rights first. In such a case you will not see any AVC denials in your logs. The return code (EACCES) is the same.

RTFM

There is plenty of information available in the man pages. Some of the configuration file examples also contains additional information.

server:~# man -k selinux

Gives a good overview

Stick to Standards

Sofware installed from a RHEL or Fedora repository is usually not a problem at all, as long as you are using standards for config files, data, ports etc. Stick to the standards wherever possible. I.e. It does not make any sense to store websites in /opt instead of /var/www/html

Standards do not work for you?

If you can not stick to the standards for whatever reason, you can adjust a lot of settings with semanage.

Adding an allowed TCP Port for Apache

If you want to run your Apache httpd on port 8010, Apache will not start and a SELinux AVC denial is filed. To check which ports are allowed for Apache run:

server:~# semanage port -l|grep http_port_t
http_port_t                    tcp      80, 81, 443, 488, 8008, 8009, 8443, 9000
server:~# 

There is nothing like 8010

You can simply add port 8010 to the allowed ports by running

server:~# semanage port -a -t http_port_t 8010 -p tcp

Check again:

server:~# semanage port -l|grep http_port_t
http_port_t                    tcp      8010, 80, 81, 443, 488, 8008, 8009, 8443, 9000

Voilà!

Using a non-standard location for HTML files

Lets assume you want to store your HTML files in /opt/srv. To do so, you need to change the file context of that path and restore the file context afterwards.

server:~# semanage fcontext -a -t httpd_sys_rw_content_t '/opt/srv(/.*)?'
server:~# restorecon -R -v /opt/srv

Make use of Boolean variables

There are plenty of bool variables which simple allows to turn on or off a particular protection.

To get a list of defined bools, run

server:~# getsebool -a

You may want to pipe it to less or grep for a search pattern.

As an example, the default behavior is that a web application running in the httpd_t context will not be allowed to send emails. That helps greatly to prevent a vulnerable web application to send out SPAM. Well, if you want to operate a web mail service Apache must be able to send emails. No big deal:

server:~# setsebool -P httpd_can_sendmail on

Troubleshooting

The are some CLI (and GUI) tools available to troubleshoot AVC denials. The most important is sealert. Here is an example of an AVC because of a mislabled file in /var/www/html

sealert -a /var/log/audit/audit.log
SELinux is preventing /usr/sbin/httpd from getattr access on the file /var/www/html/test.html
*****  Plugin restorecon (99.5 confidence) suggests   ************************
If you want to fix the label. 
/var/www/html/test.html default label should be
httpd_sys_content_t.
Then you can run restorecon.
Do
# /sbin/restorecon -v /var/www/html/test.html

As you can see, sealert already provides you a hint how to fix the problem. In more complex cases, audit2why and audit2allow will help you. You simply grep for the misbehaving process:

server:~# grep httpd /var/log/audit/audit.log |audit2allow -m my_local_module

Review the result to check if it makes sense (ensure your grep statement does not catch too much). If you’re confident its okay, run the same command again with a capital M as parameter. It will create you a Local Policy Module which can be inserted:

server:~# grep httpd /var/log/audit/audit.log |audit2allow -M my_local_module
server:~# semodule -i my_local_module.pp

Temporary mitigation of SELinux troubles

If sealert and audit2allow can not immediately solve your problems and you quickly need to get your service up and running again, temporary put your system in permissive mode.

server:~# setenforce permissive

It will stay in pemissive mode until you reboot your system.

Permissive mode does not enforce the SELinux policies, it just logs AVC denials and help you to solve the problems without any service interruption. Be aware: This is a temporary quick fix, not a solution.

Put the affected domain only into permissive mode

If all your investigation did not help, all answers from support did not helped (very unlikely) you can put a particular domain into permissive mode. The rest of the policies are still in enforcing mode, your system still have some protection.

As an example, you can put the Apache module into permissive mode:

server:~# semanage permissive -a http_t

Hardening your System

Most people are not aware of the fact that when a system is in enforcing mode a malicious user with root access can manipulate policies or put SELinux into permissive mode.

There is a method to prevent this: Lock down your system

server:~# setsebool -P secure_mode_policyload on

Be aware: Once active nothing can not be changed during runtime, you need to reboot your system and provide selinux=1 enforcing=0 as grub boot parameter to be able to change any SELinux settings.

Have some fun!

Download “The SELinux Coloring Book” and learn more 🙂

Further reading

Have fun 🙂