SOLVED: Performance Issues With FreeBSD ZFS Backed ESXi

Update November 14th, 2016: This is an old article but my recommendation to hack the NFS file still stand even given how inexpensive small SSDs are. An SSD ZIL still delivers low performance with ESXi/NFS unfortunately. We have have a dual SSD ZIL setup on this file server, and without the NFS hack we still only see 50 MiB/sec writes — we now have 10G fiber so this is in contrast to 650 MiB/sec reads, too.

There is a special issue when using ZFS-backed NFS for a Datastore under ESXi.

The problem is that the ESXi NFS client forces a commit/cache flush after every write. This makes sense in the context of what ESXi does as it wants to be able to reliably inform the guest OS that a particular block was actually written to the underlying physical disk. However for ZFS writes and cache flushes trigger ZIL event log entries.

The end result is that the ZFS array will end up doing a massively disproportional amount of writing to the ZIL log and throughput will suffer (I was seeing under 1 MiB/sec on Gigabit Ethernet!).

Performance Benchmarking

Here are the results of testing the various work-arounds, as you can see that modifying the kernel is the clear winner. This also has minimal side affects when compared to the other options.

Method	Read Speed	Read Ltncy.	Write Speed	Write Ltncy.
NFS Kernel Mod	67 MiB/sec	341 ms	110 MiB/sec	153 ms
zfs set sync=disabled	69 MiB/sec	198 ms	69 MiB/sec	1628 ms
cache_flush_disable=”1″	67 Mib/sec	760 ms	16 MiB/sec	1543 ms

* Tested with dedicated 1 Gbit Ethernet interconnect.

Here are the four solutions:

IDEAL: Hack the NFS Subsystem

This makes the kernel ignore NFS clients’ requests to commit to disk, and in doing so does not pass along ESXi (or any other NFS client’s) request to commit/flush the cache to the file system.

This, in my view, is the ideal. If you have UPS power there is very very little risk here.

Per this article we’re going to modify nfs_nfsdport.c: http://christopher-technicalmusings.blogspot.com/2011/06/speeding-up-freebsds-nfs-on-zfs-for-esx.html

vi /usr/src/sys/fs/nfsserver/nfs_nfsdport.c

Search for NFSWRITE_UNSTABLE and find this block:

if (stable == NFSWRITE_UNSTABLE)
  ioflags = IO_NODELOCKED;
else
  ioflags = (IO_SYNC IO_NODELOCKED);
uiop->uio_resid = retlen;
uiop->uio_rw = UIO_WRITE;

And change it to:

// if (stable == NFSWRITE_UNSTABLE)
ioflags = IO_NODELOCKED;
// else
// ioflags = (IO_SYNC | IO_NODELOCKED);
uiop->uio_resid = retlen;
uiop->uio_rw = UIO_WRITE;

Then recompile the kernel and remember this needs to be re-done after doing a freebsd-update or if you update /usr/src.

The Other Options

There are other solutions, and for completeness’ sake here they are (and why I think the above solution is better):

SSD ZIL Disks

For this you optimally want two SSDs (mirrored for redundancy) to locate your ZIL on instead of the array disks themselves.

Especially when you consider that writing is what wears out SSDs, I think this is a poor solution as there will still be many excessive writes, they’re just faster.

Disable the ZIL Entirely

This is a pretty blunt solution, but a quick and easy temporary fix. Running this on a zvol:

zfs set sync=disabled zroot

Which turns off sync forcing/cache flushing for the entire FS. There are some who say this can lead to underlying ZFS corruption and cry wolf but per this article I do not believe that is the case: https://blogs.oracle.com/roch/entry/nfs_and_zfs_a_fine

What it does say though is that you can end up with NFS client corruption (in the form of inconsistency). This may be so but remember that the guest filesystem itself also has protections built into it (ie; NTFS or UFS) which can help mitigate these things.

And of course if everything is UPS backed (and nothing panics) this is even less of an issue.

I used this method temporarily until I made the NFS change and experienced no problems, but I dislike how this affects “everything” including native writes, Samba, etc.

Setting vfs.zfs.cache_flush_disable=”1″ in /boot/loader.conf

This I think is an older “solution” in the 8.x days, and the sync=disable option supersedes it. I found that while it did improve performance by a factor of 15x, that only meant 15 MiB/sec writes which I consider to be still unacceptable. And the “risks” are similar to the above sync=disable which has much better performance.

Call 1-828-376-0458 to Work With Professionals Who Truly Understand FreeBSD

A-Team Systems is a proud supporter of the FreeBSD Foundation and many of our administrators are direct project contributors.

5 Responses to “SOLVED: Performance Issues With FreeBSD ZFS Backed ESXi Storage Over NFS”

Crouse October 22nd, 2014

For Zil, if your worried about SSD failures (although that’s why you use two…) you can use these : http://www.hgst.com/solid-state-storage/enterprise-ssd/sas-ssd/zeusram-sas-ssd Rather expensive, but work very well.

- Adam Strohl November 3rd, 2014
  
  Very slick, thanks!
  
Jimmy Koerting July 8th, 2015

Adam, I wonder if I understood this at all 🙂

I guess my setup is the other way around: a linux storage (nfs4 server), a xen dom0 server (centOS) and a freebsd VM. This freebsd VM has a ufs2 root and a zfs mount where the active jail is hosted.
Am I right that it should be no problem to disable zil (sync) as the nfs is not the layer writing into the zfs, so there is no risk for a data lost from this point?

Would be great to get your view about this!

- Adam Strohl July 9th, 2015
  
  Hey Jimmy,
  
  My experience with the issue is specific to FreeBSD as the NFS server with ZFS, but as you may gather the underlying issue is caused by ESXi triggering the “flush” action when writing to the NFS server.
  
  Xen likely does the same, however Linux’s (in your case CentOS) ext3/ext4 file system doesn’t have the severe reaction to this as ZFS. Nor does FreeBSD’s UFS (the ‘native’ file system of FreeBSD), which is ultimately what you’re writing too, correct?
  
  That being said, how are you doing ZFS inside Xen? Virtual disks on the NFS server, or pass-through directly to devices? Can you show me the ‘zpool status’ output from the FreeBSD server?
  
Christian P September 5th, 2016

Adam:

Thanks a lot for your article. We changed the file and recompiled Freenas 9.10. we’re now getting 90+ MB/S transfers speeds with ESXI.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

SOLVED: Performance Issues With FreeBSD ZFS Backed ESXi Storage Over NFS

Performance Benchmarking

IDEAL: Hack the NFS Subsystem

The Other Options

SSD ZIL Disks

Disable the ZIL Entirely

Setting vfs.zfs.cache_flush_disable=”1″ in /boot/loader.conf

5 Responses to “SOLVED: Performance Issues With FreeBSD ZFS Backed ESXi Storage Over NFS”

Leave a Reply Cancel reply

Adam Strohl

FreeBSD & Linux Support

Get Professional Support Now

A-Team Systems

Performance Benchmarking

IDEAL: Hack the NFS Subsystem

The Other Options

SSD ZIL Disks

Disable the ZIL Entirely

Setting vfs.zfs.cache_flush_disable=”1″ in /boot/loader.conf

5 Responses to “SOLVED: Performance Issues With FreeBSD ZFS Backed ESXi Storage Over NFS”

Leave a Reply Cancel reply

Adam Strohl

FreeBSD & Linux Support

Get Professional Support Now

Popular Topics

A-Team Systems