Записки на фантиках от конфет или жаль угробить экспириенс из-за склероза :-): File Server

Tips and Recommendations for Storage Server Tuning

Here are some tips and recommendations on how to improve the performance of your storage servers. As usual, the optimal settings depend on your particular hardware and usage scenarios, so you should use these settings only as a starting point for your tuning efforts.

Note: Some of the settings suggested here are non-persistent and will be reverted after the next reboot. To keep them permanently, you could add the corresponding commands to `/etc/rc.local`, use `/etc/sysctl.conf` or create `udev` rules to reapply them automatically when the machine boots.

Partition Alignment & RAID Settings of Local File System

To get the maximum performance out of your RAID arrays and SSDs, it is important to set the partition offset according to the native alignment. See here for a walk-through about partition alignment and creation of a RAID-optimized local file system: Partition Alignment Guide

A very simple and commonly used method to achieve alignment without the challenges of partition alignment is to completely avoid partitioning and instead create the file system directly on the device, e.g.:

$ mkfs.xfs /dev/sdX

Storage Server Throughput Tuning

In general, BeeGFS can be used with any of the standard Linux file systems.

Using XFS for your storage server data partition is generally recommended, because it scales very well for RAID arrays and typically delivers a higher sustained write throughput on fast storage, compared to alternative file systems. (There also have been significant improvements of ext4 streaming performance in recent Linux kernel versions).

However, the default Linux kernel settings are rather optimized for single disk scenarios with low IO concurrency, so there are various settings that need to be tuned to get the maximum performance out of your storage servers.

Formatting Options

Make sure to enable RAID optimizations of the underlying file system, as described in the last section here: Create RAID-optimized File System

While BeeGFS uses dedicated metadata servers to manage global metadata, the metadata performance of the underlying file system on storage servers still matters for operations like file creates, deletes, small reads/writes, etc. Recent versions of XFS (similar work in progress for ext4) allow inlining of data into inodes to avoid the need for additional blocks and the corresponding expensive extra disk seeks for directories. In order to use this efficiently, the inode size should be increased to 512 bytes or larger.

Example: mkfs for XFS with larger inodes on 8 disks (where the number 8 does not include the number of RAID-5 or RAID-6 parity disks) and 128KB chunk size:

$ mkfs.xfs -d su=128k,sw=8 -l version=2,su=128k -isize=512 /dev/sdX

Mount Options
Enabling last file access time is inefficient, because 
it means that the file system needs to update the timestamp by writing 
data to the disk even though the user only actually read file contents 
or even when the file contents were already cached in memory and 
actually no disk access would have been necessary at all. (Note: Recent 
Linux kernels have switched to a new "relative atime" mode, so setting noatime might not be necessary in these cases.)

If your users don't need last access times, you should disable them by adding "noatime" to your mount options.

Increasing the number of log buffers and their size by adding logbufs and logbsize mount options allows XFS to generally handle and enqueue pending file and directory operations more efficiently.

There are also several mount options for XFS that are intended to further optimize streaming performance on RAID storage, such as largeio, inode64, and swalloc.

If you are using XFS and want to go for optimal streaming write throughput, you might also want to add the mount option allocsize=131072k to reduce the risk of fragmentation for large files.

If your RAID controller has a battery-backup-unit (BBU), adding the mount option nobarrier for XFS or ext4 can significantly increase throughput.

Example: Typical XFS mount options for an BeeGFS storage server with a RAID controller battery:

$ mount -onoatime,nodiratime,logbufs=8,logbsize=256k,largeio,inode64,swalloc,allocsize=131072k,nobarrier /dev/sdX <mountpoint>

IO Scheduler

First, set an appropriate IO scheduler for file servers:

$ echo deadline > /sys/block/sdX/queue/scheduler

Now give the IO scheduler more flexibility by increasing the number of schedulable requests:

$ echo 4096 > /sys/block/sdX/queue/nr_requests

To improve throughput for sequential reads, increase the maximum amount of read-ahead data. The actual amount of read-ahead is adaptive, so using a high value here won't harm performance for small random access.

$ echo 4096 > /sys/block/sdX/queue/read_ahead_kb

Virtual memory settings

To avoid long IO stalls (latencies) for write cache flushing in a production environment with very different workloads, you will typically want to limit the kernel dirty (write) cache size:

$ echo 5 > /proc/sys/vm/dirty_background_ratio

$ echo 10 > /proc/sys/vm/dirty_ratio

Only for special use-cases: If you are going for 
optimial sustained streaming performance, you may instead want to use 
different settings that start asynchronous writes of data very early and
 allow the major part of the RAM to be used for write caching. (For 
generic use-cases, use the settings described above, instead.)

$ echo 1 > /proc/sys/vm/dirty_background_ratio

$ echo 75 > /proc/sys/vm/dirty_ratio

Assigning slightly higher priority to inode caching helps to avoid disk seeks for inode loading:

$ echo 50 > /proc/sys/vm/vfs_cache_pressure

Buffering of file system data requires frequent memory allocation. Raising the amount of reserved kernel memory
 will enable faster and more reliable memory allocation in critical 
situations. Raise the corresponding value to 64MB if you have less than 
8GB of memory, otherwise raise it to at least 256MB:

$ echo 262144 > /proc/sys/vm/min_free_kbytes

Transparent huge pages can cause performance degradation
 under high load, due to the frequent change of file system cache memory
 areas. For RedHat 6.x and derivatives, it is recommended to disable transparent huge pages support, unless huge pages are explicity requested by an application:

$ echo madvise > /sys/kernel/mm/redhat_transparent_hugepage/enabled

$ echo madvise > /sys/kernel/mm/redhat_transparent_hugepage/defrag

With recent mainline kernel versions:

$ echo madvise > /sys/kernel/mm/transparent_hugepage/enabled

$ echo madvise > /sys/kernel/mm/transparent_hugepage/defrag

Controller Settings

Optimal performance for hardware RAID systems often depends on large IOs being sent to the device in a single large operation. Please refer to your hardware storage vendor for the corresponding optimal size of /sys/block/sdX/max_sectors_kb.
 It is typically good if this size can be increased to at least match 
your RAID stripe set size (i.e. chunk_size x number_of_disks):

$ echo 1024 > /sys/block/sdX/queue/max_sectors_kb

Furthermore, high values of sg_tablesize (/sys/class/scsi_host/*/sg_tablesize) are recommended to allow large IOs. Those values depend on controller firmware versions, kernel versions and driver settings.

System BIOS & Power Saving
To allow the Linux kernel to correctly detect the system properties and 
enable corresponding optimizations (e.g. for NUMA systems), it is very 
important to keep your system BIOS updated.

The dynamic CPU clock frequency scaling feature for 
power saving, which is often enabled by default, has a high impact on 
latency. Thus, it is recommended to turn off dynamic CPU frequency 
scaling. Ideally, this is done in the machine BIOS (see Intel SpeedStep or AMD PowerNow), but it can also be done at runtime, e.g. via:

$ echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor >/dev/null

Записки на фантиках от конфет или жаль угробить экспириенс из-за склероза :-)

четверг, 21 апреля 2016 г.

How can an unauthenticated user access a windows share?

пятница, 2 октября 2015 г.

StorageServerTuning

Tips and Recommendations for Storage Server Tuning

Partition Alignment & RAID Settings of Local File System

Storage Server Throughput Tuning

Formatting Options

Mount Options

IO Scheduler

Virtual memory settings

Controller Settings

System BIOS & Power Saving

Обо мне

четверг, 21 апреля 2016 г.

How can an unauthenticated user access a windows share?

пятница, 2 октября 2015 г.

StorageServerTuning

Tips and Recommendations for Storage Server Tuning

Partition Alignment & RAID Settings of Local File System

Storage Server Throughput Tuning

Formatting Options

Mount Options

IO Scheduler

Virtual memory settings

Controller Settings

System BIOS & Power Saving

четверг, 21 апреля 2016 г.

пятница, 2 октября 2015 г.