Tuesday, June 13, 2017

NVMe: Officially faster for emulated controllers!

The Doorbell Buffer Config command

When I last wrote about NVMe, the feature to improve NVMe performance over emulated environments was just a living discussion and a work in progress patch. However, it has now been officially released in the NVMe Specification Revision 1.3 under the name "Doorbell Buffer Config command", along with an implementation that is already in the mainline Linux Kernel! \o/

You can already feel the difference in performance if you compile Kernel 4.12-rc1 (or later) and run it over a virtual machine hosted on Google Compute Engine. Google actually updated their hypervisor as soon as the feature was ratified by the NVMe working group, even before it was publicly released.

There were very few changes from the original proposal, I.e. opcodes, return values and now fancy names; the buffers (as described in my last post) are now called Shadow Doorbell and EventIdx buffers.

In short, the first one mimics the Doorbell registers in memory, allowing the emulated controller to fetch the Doorbell value when convenient instead of waiting for the Doorbell register to be written. For its part, the EventIdx provides a hint given by the emulated controller to tell the host if the Doorbell register needs to be updated (in case the emulated controller is not fetching the Doorbell value from the Shadow Doorbell buffer). You can check section 7.13 of the specification for an example of usage.


The following test results were obtained in a machine of type n1-standard-4 (4 vCPUs, 15 GB memory) at Google Cloud Engine platform with Kernel 4.12.0-rc5 using the following command:

$ sudo fio --time_based --name=benchmark --runtime=30 \
--filename=/dev/nvme0n1 --nrfiles=1 --ioengine=libaio --iodepth=32 \
--direct=1 --invalidate=1 --verify=0 --verify_fatal=0 --numjobs=1 \
--rw=randread --blocksize=4k --randrepeat=0

Results (in Input/Ouput Operations per Second):
Without Shadow Doorbell and EventIdx buffers: 43.9K IOPS
With Shadow Doorbell and EventIdx buffers: 184K IOPS
Gain ~= 4 times

Screenshot - Without Shadow Doorbell and EventIdx buffers

Screenshot - With Shadow Doorbell and EventIdx buffers

Enjoy your enhanced numbers of IOPS! :D

Wednesday, May 3, 2017

Collabora Contributions to Linux Kernel 4.11

Linux Kernel v4.11 was released, and 9 different Collabora developers contributed a total of 44 patches, an increase of 5 patches from version 4.10. The majority of Collabora's work this time was around fixes and clean ups in the DRM. In addition to our contributions as authors, Collabora also added 22 Reviewed-by tags for patches reviewed by our engineers. You can learn more information about the v4.11 merge window in's extensive coverage: part 1, part 2 and part 3.

Now here is a look at the specific changes made by Collaborans. To begin with, Enric Balletbo fixed an issue and improved documentation of IIO regarding sensors and also added support for several buses and peripherals for the Toby-Churchill SL50 board. Romain Perier added an ASoC machine driver for Rockchip rk3288-based boards that have an HDMI and analog audio output, and also added support for slave mode in the Everest Semi ES8328 audio codec, while including ES8388 as a compatible device in the ES8328's codec driver.

For his part, Tomeu Vizoso fixed the sink display error in DRM EDID when no deep color is available for Rotel RSX-1058 and also fixed several issues and code cleanups regarding CRC in DRM and integrated the new CRC debugfs API in i915. Gabriel Krisman Bertazi, who wrote a great article recently on tracing the user space and Operating System interactions, made several improvements to DRM by adding documentation, cleaning up the code and fixing several issues, including allowing QXL build when FBDEV_EMULATION is disabled.

Lastly, Daniel Stone, Collabora's Graphics Lead, fixed an important issue in DRM regarding the use of Atomic State in legacy ioctls, while Fabien Lahoudere cleaned up the code for the Epson RTC removing an unnecessary spinlock, and Robert Foss fixed a copy of uninitialized memory in Ethernet qed code.

