Bugzilla – Bug 893428
cpu soft lockup in ext4_es_lru_del
Last modified: 2015-04-22 11:35:05 UTC
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36 Saw a bunch of these - enough that it made a 48 core system completely unresponsive to anything but keystrokes on the console. This seems to be the same bug as I found on the LKML: https://lkml.org/lkml/2014/5/13/440 stack trace as follows: 2014-08-25T08:49:31.356797-07:00 cov kernel: [322676.159200] BUG: soft lockup - CPU#22 stuck for 22s! [cc1plus:33182] 2014-08-25T08:49:31.356798-07:00 cov kernel: [322676.159720] Modules linked in: veth xt_REDIRECT xt_tcpudp binfmt_misc xt_addrtype xt_conntrack iptable_filter ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables bridge stp llc dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio libcrc32c loop bonding x86_pkg_temp_thermal coretemp kvm_intel iTCO_wdt kvm joydev crc32_pclmul crc32c_intel iTCO_vendor_support gpio_ich tg3 ghash_clmulni_intel aesni_intel libphy igb ablk_helper cryptd lpc_ich lrw ptp gf128mul ioatdma glue_helper mei_me hid_generic sr_mod dcdbas usb_storage pcspkr usbhid dca cdrom mei mfd_core pps_core aes_x86_64 shpchp wmi acpi_pad acpi_power_meter button sg mperf ipmi_devintf ipmi_si ipmi_msghandler dm_mod autofs4 mgag200 ttm drm_kms_helper ehci_pci drm ehci_hcd i2c_algo_bit sysimgblt sysfillrect usbcore syscopyarea usb_common megaraid_sas processor thermal_sys scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh_rdac scsi_dh 2014-08-25T08:49:31.356798-07:00 cov kernel: [322676.159766] CPU: 22 PID: 33182 Comm: cc1plus Not tainted 3.11.10-17-default #1 2014-08-25T08:49:31.356800-07:00 cov kernel: [322676.159768] Hardware name: Dell Inc. PowerEdge R720/061P35, BIOS 2.2.3 05/20/2014 2014-08-25T08:49:31.356801-07:00 cov kernel: [322676.159769] task: ffff880009a7c180 ti: ffff88000ca8a000 task.ti: ffff88000ca8a000 2014-08-25T08:49:31.356801-07:00 cov kernel: [322676.159770] RIP: 0010:[<ffffffff8155daea>] [<ffffffff8155daea>] _raw_spin_lock+0x1a/0x30 2014-08-25T08:49:31.356801-07:00 cov kernel: [322676.159770] RIP: 0010:[<ffffffff8155daea>] [<ffffffff8155daea>] _raw_spin_lock+0x1a/0x30 2014-08-25T08:49:31.356802-07:00 cov kernel: [322676.159775] RSP: 0000:ffff88000ca8b970 EFLAGS: 00000283 2014-08-25T08:49:31.356802-07:00 cov kernel: [322676.159776] RAX: 00000000000096fd RBX: ffff880624dcab08 RCX: 0000000000009709 2014-08-25T08:49:31.356802-07:00 cov kernel: [322676.159777] RDX: 0000000000009709 RSI: ffff880037088290 RDI: ffff880801301480 2014-08-25T08:49:31.356803-07:00 cov kernel: [322676.159778] RBP: ffff880801301000 R08: 0000000000000000 R09: 0000000000000000 2014-08-25T08:49:31.356805-07:00 cov kernel: [322676.159779] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88080561cc00 2014-08-25T08:49:31.356805-07:00 cov kernel: [322676.159779] R13: ffff880037088290 R14: 0000000000000000 R15: 0000000000000000 2014-08-25T08:49:31.356805-07:00 cov kernel: [322676.159781] FS: 00002b9b0d4cb200(0000) GS:ffff88082fb60000(0000) knlGS:0000000000000000 2014-08-25T08:49:31.356806-07:00 cov kernel: [322676.159782] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2014-08-25T08:49:31.356806-07:00 cov kernel: [322676.159783] CR2: 00002b9b1dcef5a0 CR3: 0000000d9c0ad000 CR4: 00000000001407e0 2014-08-25T08:49:31.356807-07:00 cov kernel: [322676.159784] Stack: 2014-08-25T08:49:31.356807-07:00 cov kernel: [322676.159784] ffffffff8122c83c ffff880624dca8a0 ffff880624dca9a8 ffffffff8120e498 2014-08-25T08:49:31.356809-07:00 cov kernel: [322676.159788] ffff880624dca8a0 ffffffff81193da3 ffff88000ca8b9e0 ffff880dbc0ed530 2014-08-25T08:49:31.356810-07:00 cov kernel: [322676.159791] ffff88080561cd08 ffffffff81193ec1 ffff880dbc0ed5a0 ffffffff81194ccc 2014-08-25T08:49:31.356810-07:00 cov kernel: [322676.159794] Call Trace: 2014-08-25T08:49:31.356810-07:00 cov kernel: [322676.159802] [<ffffffff8122c83c>] ext4_es_lru_del+0x1c/0x60 2014-08-25T08:49:31.356811-07:00 cov kernel: [322676.159807] [<ffffffff8120e498>] ext4_clear_inode+0x38/0x80 2014-08-25T08:49:31.356811-07:00 cov kernel: [322676.159811] [<ffffffff81193da3>] evict+0xa3/0x190 2014-08-25T08:49:31.356812-07:00 cov kernel: [322676.159813] [<ffffffff81193ec1>] dispose_list+0x31/0x40 2014-08-25T08:49:31.356814-07:00 cov kernel: [322676.159816] [<ffffffff81194ccc>] prune_icache_sb+0x16c/0x310 2014-08-25T08:49:31.356814-07:00 cov kernel: [322676.159820] [<ffffffff8117ee9b>] prune_super+0x15b/0x190 2014-08-25T08:49:31.356814-07:00 cov kernel: [322676.159825] [<ffffffff81128953>] shrink_slab+0x153/0x2d0 2014-08-25T08:49:31.356815-07:00 cov kernel: [322676.159828] [<ffffffff8112b6ef>] do_try_to_free_pages+0x39f/0x4c0 2014-08-25T08:49:31.356815-07:00 cov kernel: [322676.159831] [<ffffffff8112b8e8>] try_to_free_pages+0xd8/0x160 2014-08-25T08:49:31.356816-07:00 cov kernel: [322676.159836] [<ffffffff81121959>] __alloc_pages_nodemask+0x5c9/0x980 2014-08-25T08:49:31.356818-07:00 cov kernel: [322676.159840] [<ffffffff8115cc85>] alloc_pages_vma+0x95/0x140 2014-08-25T08:49:31.356818-07:00 cov kernel: [322676.159844] [<ffffffff8113d58a>] do_wp_page+0x43a/0x810 2014-08-25T08:49:31.356818-07:00 cov kernel: [322676.159849] [<ffffffff8113e86b>] handle_pte_fault+0x29b/0xa60 2014-08-25T08:49:31.356819-07:00 cov kernel: [322676.159853] [<ffffffff815612f4>] __do_page_fault+0x124/0x4d0 2014-08-25T08:49:31.356819-07:00 cov kernel: [322676.159856] [<ffffffff8155e048>] page_fault+0x28/0x30 2014-08-25T08:49:31.356820-07:00 cov kernel: [322676.159860] [<0000000000d81d60>] 0xd81d5f 2014-08-25T08:49:31.356820-07:00 cov kernel: [322676.159861] Code: ec b8 01 00 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 b8 00 00 01 00 f0 0f c1 07 89 c1 c1 e9 10 66 39 c1 89 ca 74 0d 0f 1f 00 f3 90 <0f> b7 07 66 39 d0 75 f6 c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 This server runs Coverity, and is constantly creating and destroying docker instances. This seems to be exercising the system in a wholly unusual way - this is not the first kernel fault it exposed. I'll file a bug for the other one too. Reproducible: Sometimes Steps to Reproduce: 1. use the server as normal. 2. 3. Actual Results: CPU SOFT LOCKUP Expected Results: No CPU SOFT LOCKUP
Could you check with kernel-debug? This might give more information. Or, it'd be helpful if you can get sysctl-t outputs when this (or similar stall) happens. At a quick glance, this seems like a locking issue...
Yes, AFAICT from the report it is a known bug with ext4 extent cache where under memory pressure scanning the cache takes too long (especially on big machines) and thus softlockups are triggered. There were patches posted for this recently. I have to find some time to review them to speed up their merging upstream and then we can backport them to our kernels as well.
Jan, thank you. Do you have a lkml reference or patchset I can point my management to? Takashi, given Jan's comment, I'm not sure the request for more info is still germane, but if it happens again we'll reboot into kernel-debug. sysctl -t was not possible because all cores were locked. Jan, this machine is a 48 core Dell server with 128G RAM. I'd say that qualifies as big, especially under the kind of IO and memory pressure docker causses.
If the problem was already addressed in the upstream, no need for collecting more logs, so far. Let's see whether we can get some patches to test with.
So I have reviewed the patches and they still need some work. So please stay tuned (this is a non-trivial problem in the implementation of ext4 extent cache so it may take a while to get sorted out).
Thanks for the update. Do you know of any workarounds presently?
Are you aware of similar issues with different filesystems or is this unique to ext4? Also, you mentioned memory pressure, would adding additional memory be helpful?
The problem is unique to ext4. Regarding additional memory - no, I don't think that would be a practical solution - kernel should be able to reclaim unused stuff and it's a fault in ext4 driver that this isn't efficient. I don't know about an easy workaround for this. Generally the more inodes you have cached, the bigger the problem is. So you could purge inode cache occasionally via "echo 3 >/proc/sys/vm/drop_caches" but this is likely going to ruin your performance so I don't think it's a useful workaround.
So I have submitted patches for this upstream (http://marc.info/?l=linux-ext4&m=141596196627969&w=2). Russel, would you be willing to test them? I could build a test kernel for you (probably based on recent upstream kernel but I could try basing it on 13.1 kernel if it would help you significantly).
I probably can't. This bug was found on a production server that was not being stable, and we found a workaround - this system has been stable for months (basically, we reproduced the memory pressure and that seems to have helped.) Getting downtime on a production server to test a bug that has not been an issue for months is probably going to be a losing proposition here. Sorry.
Fixes have been pushed upstream. I've backported them to openSUSE 13.2 and SLE12 kernels. Do you still need the fixes in 13.1 or is 13.2 enough for you?
13.2 should be fine. Thank you for your attention to this. I'll let the affected users know a fix is available.
OK, just be aware that there's a 13.2 kernel maintenance update running which still won't have the fixes. So it will take a while before you'll get them through the update channel...
SUSE-SU-2015:0178-1: An update that solves 5 vulnerabilities and has 59 fixes is now available. Category: security (important) Bug References: 800255,809493,829110,856659,862374,873252,875220,884407,887108,887597,889192,891086,891277,893428,895387,895814,902232,902346,902349,903279,903640,904053,904177,904659,904969,905087,905100,906027,906140,906545,907069,907325,907536,907593,907714,907818,907969,907970,907971,907973,908057,908163,908198,908803,908825,908904,909077,909092,909095,909829,910249,910697,911181,911325,912129,912278,912281,912290,912514,912705,912946,913233,913387,913466 CVE References: CVE-2014-3687,CVE-2014-3690,CVE-2014-8559,CVE-2014-9420,CVE-2014-9585 Sources used: SUSE Linux Enterprise Software Development Kit 12 (src): kernel-docs-3.12.36-38.3, kernel-obs-build-3.12.36-38.2 SUSE Linux Enterprise Server 12 (src): kernel-source-3.12.36-38.1, kernel-syms-3.12.36-38.1 SUSE Linux Enterprise Desktop 12 (src): kernel-source-3.12.36-38.1, kernel-syms-3.12.36-38.1
openSUSE-SU-2015:0713-1: An update that solves 13 vulnerabilities and has 52 fixes is now available. Category: security (important) Bug References: 867199,893428,895797,900811,901925,903589,903640,904899,905681,907039,907818,907988,908582,908588,908589,908592,908593,908594,908596,908598,908603,908604,908605,908606,908608,908610,908612,909077,909078,909477,909634,910150,910322,910440,911311,911325,911326,911356,911438,911578,911835,912061,912202,912429,912705,913059,913466,913695,914175,915425,915454,915456,915577,915858,916608,917830,917839,918954,918970,919463,920581,920604,921313,922542,922944 CVE References: CVE-2014-8134,CVE-2014-8160,CVE-2014-8559,CVE-2014-9419,CVE-2014-9420,CVE-2014-9428,CVE-2014-9529,CVE-2014-9584,CVE-2014-9585,CVE-2015-0777,CVE-2015-1421,CVE-2015-1593,CVE-2015-2150 Sources used: openSUSE 13.2 (src): bbswitch-0.8-3.6.6, cloop-2.639-14.6.6, crash-7.0.8-6.6, hdjmod-1.28-18.7.6, ipset-6.23-6.6, kernel-docs-3.16.7-13.2, kernel-obs-build-3.16.7-13.7, kernel-obs-qa-3.16.7-13.1, kernel-obs-qa-xen-3.16.7-13.1, kernel-source-3.16.7-13.1, kernel-syms-3.16.7-13.1, pcfclock-0.44-260.6.2, vhba-kmp-20140629-2.6.2, virtualbox-4.3.20-10.2, xen-4.4.1_08-12.2, xtables-addons-2.6-6.2