Closed Bug 1437243 Opened 7 years ago Closed 5 years ago

Further investigate ext4 formatting options

Categories

(Taskcluster :: Workers, defect, P5)

defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: gps, Unassigned)

References

(Blocks 1 open bug)

Details

When you `mkfs.ext4`, the default behavior is to zero out the inode tables on first mount. On first mount, an ext4lazyinit process is spawned by the kernel to do that. If we format a 120 GB EBS volume with default inode settings (like we do in docker-worker today), we get 7,864,320 inodes of size 256 (as reported by `tune2fs -l`). e.g.: $ sudo tune2fs -l /dev/nvme1n1 tune2fs 1.42.9 (4-Feb-2014) Filesystem volume name: <none> Last mounted on: /mnt Filesystem UUID: 2642881f-72b9-4aff-829d-21a0c1586795 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 7864320 Block count: 31457280 Reserved block count: 1572864 Free blocks: 30915691 Free inodes: 7864309 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 1016 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 8192 Inode blocks per group: 512 Flex block group size: 16 Filesystem created: Sat Feb 10 01:32:33 2018 Last mount time: Sat Feb 10 01:33:19 2018 Last write time: Sat Feb 10 01:33:19 2018 Mount count: 1 Maximum mount count: -1 Last checked: Sat Feb 10 01:32:33 2018 Check interval: 0 (<none>) Lifetime writes: 132 MB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: ba09703a-c49b-42ec-953e-ac8d9b0eccdd Journal backup: inode blocks Assuming that 256 is bytes, this comes out to 2,013,265,920 bytes of write I/O on first mount of the EBS volume. I haven't measured exactly, but I do notice with dstat that upon initial mount, our I/O counts are several dozen MB/s for several seconds. So I believe we are writing ~2 GB on volume mount. This block zeroing is *possibly* hurting our performance on early instance startup. I think we still have plenty of I/O credits to incur this heavy write I/O. But it may be sucking away perf and preventing workers from getting to a ready state faster. The mitigation for this is to reduce the size of the inode table. That will limit the number of inodes we can track on volumes. I'm pretty sure we come nowhere close to inode exhaustion on docker-worker instances. The thing requiring the most inodes is likely VCS clones and checkouts. And you need >100 clones or checkouts of mozilla-central with default provisioning ratios to be in territory where you worry about inode exhaustion. So I think reducing the inode density (at least on larger volumes) is worth considering.
gps: how much startup time are we likely to gain from experimenting with this change?
Flags: needinfo?(gps)
Priority: -- → P5
I'm not sure. A P5 feels like a good triage in the absence of concrete numbers. I think things like tuning the mount options to throw away filesystem consistency/durability protections and moving to c5d instances with non-EBS NVMe storage are much better time investments.
Flags: needinfo?(gps)
Component: Docker-Worker → Workers
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.