KAFKA-3725; Update documentation with regards to XFS

I've updated the ops documentation with information on using the XFS filesystem, based on LinkedIn's testing (and subsequent switch from EXT4).

I've also added some information to clarify the potential risk to the suggested EXT4 options (again, based on my experience with a multiple broker failure situation).

Author: Todd Palino <tpalino@linkedin.com>

Reviewers: Sriharsha Chintalapani <harsha@hortonworks.com>, Dana Powers <dana.powers@gmail.com>, Ismael Juma <ismael@juma.me.uk>

Closes #1605 from toddpalino/trunk

(cherry picked from commit e0eaa7f12e)
Signed-off-by: Ismael Juma <ismael@juma.me.uk>
This commit is contained in:
Todd Palino 2016-07-11 08:51:55 +01:00 committed by Ismael Juma
parent 4c502ed83d
commit bc805bf2a6
1 changed files with 16 additions and 4 deletions

View File

@ -516,10 +516,22 @@ Using pagecache has several advantages over an in-process cache for storing data
<li>It automatically uses all the free memory on the machine
</ul>
<h4><a id="ext4" href="#ext4">Ext4 Notes</a></h4>
Ext4 may or may not be the best filesystem for Kafka. Filesystems like XFS supposedly handle locking during fsync better. We have only tried Ext4, though.
<p>
It is not necessary to tune these settings, however those wanting to optimize performance have a few knobs that will help:
<h4><a id="filesystems" href="#filesystems">Filesystem Selection</a></h4>
<p>Kafka uses regular files on disk, and as such it has no hard dependency on a specific filesystem. The two filesystems which have the most usage, however, are EXT4 and XFS. Historically, EXT4 has had more usage, but recent improvements to the XFS filesystem have shown it to have better performance characteristics for Kafka's workload with no compromise in stability.</p>
<p>Comparison testing was performed on a cluster with significant message loads, using a variety of filesystem creation and mount options. The primary metric in Kafka that was monitored was the "Request Local Time", indicating the amount of time append operations were taking. XFS resulted in much better local times (160ms vs. 250ms+ for the best EXT4 configuration), as well as lower average wait times. The XFS performance also showed less variability in disk performance.</p>
<h5><a id="generalfs" href="#generalfs">General Filesystem Notes</a></h5>
For any filesystem used for data directories, on Linux systems, the following options are recommended to be used at mount time:
<ul>
<li>noatime: This option disables updating of a file's atime (last access time) attribute when the file is read. This can eliminate a significant number of filesystem writes, especially in the case of bootstrapping consumers. Kafka does not rely on the atime attributes at all, so it is safe to disable this.</li>
</ul>
<h5><a id="xfs" href="#xfs">XFS Notes</a></h5>
The XFS filesystem has a significant amount of auto-tuning in place, so it does not require any change in the default settings, either at filesystem creation time or at mount. The only tuning parameters worth considering are:
<ul>
<li>largeio: This affects the preferred I/O size reported by the stat call. While this can allow for higher performance on larger disk writes, in practice it had minimal or no effect on performance.</li>
<li>nobarrier: For underlying devices that have battery-backed cache, this option can provide a little more performance by disabling periodic write flushes. However, if the underlying device is well-behaved, it will report to the filesystem that it does not require flushes, and this option will have no effect.</li>
</ul>
<h5><a id="ext4" href="#ext4">EXT4 Notes</a></h5>
EXT4 is a serviceable choice of filesystem for the Kafka data directories, however getting the most performance out of it will require adjusting several mount options. In addition, these options are generally unsafe in a failure scenario, and will result in much more data loss and corruption. For a single broker failure, this is not much of a concern as the disk can be wiped and the replicas rebuilt from the cluster. In a multiple-failure scenario, such as a power outage, this can mean underlying filesystem (and therefore data) corruption that is not easily recoverable. The following options can be adjusted:
<ul>
<li>data=writeback: Ext4 defaults to data=ordered which puts a strong order on some writes. Kafka does not require this ordering as it does very paranoid data recovery on all unflushed log. This setting removes the ordering constraint and seems to significantly reduce latency.
<li>Disabling journaling: Journaling is a tradeoff: it makes reboots faster after server crashes but it introduces a great deal of additional locking which adds variance to write performance. Those who don't care about reboot time and want to reduce a major source of write latency spikes can turn off journaling entirely.