Force OpenSSL AES-NI usage on a VPS without the AES CPU flag passthrough for Tor and other apps

Initially published on 2017-08-22

Turns out it is possible to force OpenSSL enable the use of CPU AES acceleration even if it doesn't detect the “aes” CPU flag.

Many VPS hosts configure their hypervisors in a way that does not have the flag passed through into VPSes, even though all their host nodes surely have CPUs with support for AES-NI. In my experience hosts have not been forthcoming in reconfiguring their systems to include that flag passthrough.

But it turns out we can force OpenSSL to believe AES is supported, even if the CPU does not report it. This can be done with the “OPENSSL_ia32cap” environment variable. Searching around, all I found was scenarios to use it for disabling AES-NI (for testing), e.g.:

I believe the syntax used there applies “xor” over the real flag values, e.g. OPENSSL_ia32cap=”~0x200000200000000” to disable AES. But what if you need to force-enable it? Turns out the syntax working for that is simply:

  • OPENSSL_ia32cap=”+0x200000200000000”

2023-02 update

A reader of this article has sent in the following update:

“The OPENSSL_ia32cap=”+0x200000200000000” environment variable no longer works on recent OpenSSL versions. The plus sign has to be removed.

In fact, it never worked as you probably had intended. OpenSSL only ever supported the tilde mark (~) to remove or mask some bits, and never the plus sign (to add or enable some feature bits). It happened to work in previous versions because they used strtoul() which supports '+' sign. The value after the plus sign overwrites the cpuid, not adds to it. However, they changed to a custom parser in commit and thus '+' is no longer supported.

I am not sure about the full consequences of setting cpuid to 0x200000200000000, because it apparently removes the MMX, SSE(2) and many other very basic features, although AES-NI is indeed enabled. More recent OpenSSL versions (like 3.0) prints the full ia32cap value when running “openssl speed”. Maybe it is a better idea to override it with a value taken from a realistic CPU model.”

Original text

Let's take one VPS box with the aforementioned problem.

# cat /proc/cpuinfo 
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 13
model name	: QEMU Virtual CPU version 1.5.3
stepping	: 3
microcode	: 0x1
cpu MHz		: 1695.729
cache size	: 4096 KB
physical id	: 0
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 4
wp		: yes
flags		: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm rep_good nopl eagerfpu pni cx16 hypervisor lahf_lm
bugs		:
bogomips	: 3391.45
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

As you can see “aes” is not present.

Testing OpenSSL speed in the default configuration.

# openssl speed -elapsed -evp aes-128-gcm
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-gcm      32332.17k    40365.80k    42265.77k    42376.19k  43196.42k    43401.22k

And now we force AES hardware acceleration usage:

# OPENSSL_ia32cap="+0x200000200000000" openssl speed -elapsed -evp aes-128-gcm
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-gcm      59047.07k   104467.39k   141712.21k   151093.93k   158321.32k   156827.65k

Almost 4 times faster! For Tor this leads to higher bandwidth utilization and lower CPU usage (which will actually make your VPS host happier :)

But how to apply that to Tor? Well, for some initial testing I just chose to add the following line to /etc/init.d/tor (not using systemd here), right after ”#! /bin/bash”:

  • export OPENSSL_ia32cap=”+0x200000200000000”

(but this will get overwritten and removed by the next Tor version upgrade).

Seems to work just fine, my CPU usage is half of what it was before, at similar bandwidth levels. So now it has headroom to ramp up further.

A word of caution, firstly always test as shown above to verify that it works. If the underlying CPU actually doesn't support AES, all programs trying to use it (including `openssl speed`) will crash outright, with an “Illegal instruction” error. So there is no risk to jeopardize encryption strength or security of Tor or other apps which you try this on.

Also even if it works now, it may stop working down the line if the host migrates your VPS to a node with older CPU, one which doesn't support AES. But migrations of customers between do not happen very often, and in fact all CPUs used today in a hosting environment should support AES, as that's been implemented in server (and even desktop) CPUs a very very long time ago.

Force AES-NI usage in the Linux kernel

It is also possible to force usage of the AES-NI extensions within kernel, to help with things like full disk encryption. The feature detect flag can be overridden with:

#include <linux/bitops.h>
set_bit(153, (unsigned long *)(boot_cpu_data.x86_capability));

See this post for details:

force-enable-openssl-aes-ni-usage.txt · Last modified: 2023-02-27 16:57 UTC by rm