Bug 893428 - cpu soft lockup in ext4_es_lru_del
cpu soft lockup in ext4_es_lru_del
Classification: openSUSE
Product: openSUSE 13.1
Classification: openSUSE
Component: Kernel
x86-64 openSUSE 13.1
: P3 - Medium : Critical (vote)
: ---
Assigned To: Jan Kara
E-mail List
Depends on:
  Show dependency treegraph
Reported: 2014-08-25 19:43 UTC by Russell Miller
Modified: 2015-04-22 11:35 UTC (History)
3 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---
jack: needinfo? (russellx.j.miller)


Note You need to log in before you can comment on or make changes to this bug.
Description Russell Miller 2014-08-25 19:43:20 UTC
User-Agent:       Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36

Saw a bunch of these - enough that it made a 48 core system completely unresponsive to anything but keystrokes on the console.

This seems to be the same bug as I found on the LKML: https://lkml.org/lkml/2014/5/13/440

stack trace as follows:

2014-08-25T08:49:31.356797-07:00 cov kernel: [322676.159200] BUG: soft lockup - CPU#22 stuck for 22s! [cc1plus:33182]
2014-08-25T08:49:31.356798-07:00 cov kernel: [322676.159720] Modules linked in: veth xt_REDIRECT xt_tcpudp binfmt_misc xt_addrtype xt_conntrack iptable_filter ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables bridge stp llc dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio libcrc32c loop bonding x86_pkg_temp_thermal coretemp kvm_intel iTCO_wdt kvm joydev crc32_pclmul crc32c_intel iTCO_vendor_support gpio_ich tg3 ghash_clmulni_intel aesni_intel libphy igb ablk_helper cryptd lpc_ich lrw ptp gf128mul ioatdma glue_helper mei_me hid_generic sr_mod dcdbas usb_storage pcspkr usbhid dca cdrom mei mfd_core pps_core aes_x86_64 shpchp wmi acpi_pad acpi_power_meter button sg mperf ipmi_devintf ipmi_si ipmi_msghandler dm_mod autofs4 mgag200 ttm drm_kms_helper ehci_pci drm ehci_hcd i2c_algo_bit sysimgblt sysfillrect usbcore syscopyarea usb_common megaraid_sas processor thermal_sys scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh_rdac scsi_dh
2014-08-25T08:49:31.356798-07:00 cov kernel: [322676.159766] CPU: 22 PID: 33182 Comm: cc1plus Not tainted 3.11.10-17-default #1
2014-08-25T08:49:31.356800-07:00 cov kernel: [322676.159768] Hardware name: Dell Inc. PowerEdge R720/061P35, BIOS 2.2.3 05/20/2014
2014-08-25T08:49:31.356801-07:00 cov kernel: [322676.159769] task: ffff880009a7c180 ti: ffff88000ca8a000 task.ti: ffff88000ca8a000
2014-08-25T08:49:31.356801-07:00 cov kernel: [322676.159770] RIP: 0010:[<ffffffff8155daea>]  [<ffffffff8155daea>] _raw_spin_lock+0x1a/0x30
2014-08-25T08:49:31.356801-07:00 cov kernel: [322676.159770] RIP: 0010:[<ffffffff8155daea>]  [<ffffffff8155daea>] _raw_spin_lock+0x1a/0x30
2014-08-25T08:49:31.356802-07:00 cov kernel: [322676.159775] RSP: 0000:ffff88000ca8b970  EFLAGS: 00000283
2014-08-25T08:49:31.356802-07:00 cov kernel: [322676.159776] RAX: 00000000000096fd RBX: ffff880624dcab08 RCX: 0000000000009709
2014-08-25T08:49:31.356802-07:00 cov kernel: [322676.159777] RDX: 0000000000009709 RSI: ffff880037088290 RDI: ffff880801301480
2014-08-25T08:49:31.356803-07:00 cov kernel: [322676.159778] RBP: ffff880801301000 R08: 0000000000000000 R09: 0000000000000000
2014-08-25T08:49:31.356805-07:00 cov kernel: [322676.159779] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88080561cc00
2014-08-25T08:49:31.356805-07:00 cov kernel: [322676.159779] R13: ffff880037088290 R14: 0000000000000000 R15: 0000000000000000
2014-08-25T08:49:31.356805-07:00 cov kernel: [322676.159781] FS:  00002b9b0d4cb200(0000) GS:ffff88082fb60000(0000) knlGS:0000000000000000
2014-08-25T08:49:31.356806-07:00 cov kernel: [322676.159782] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2014-08-25T08:49:31.356806-07:00 cov kernel: [322676.159783] CR2: 00002b9b1dcef5a0 CR3: 0000000d9c0ad000 CR4: 00000000001407e0
2014-08-25T08:49:31.356807-07:00 cov kernel: [322676.159784] Stack:
2014-08-25T08:49:31.356807-07:00 cov kernel: [322676.159784]  ffffffff8122c83c ffff880624dca8a0 ffff880624dca9a8 ffffffff8120e498
2014-08-25T08:49:31.356809-07:00 cov kernel: [322676.159788]  ffff880624dca8a0 ffffffff81193da3 ffff88000ca8b9e0 ffff880dbc0ed530
2014-08-25T08:49:31.356810-07:00 cov kernel: [322676.159791]  ffff88080561cd08 ffffffff81193ec1 ffff880dbc0ed5a0 ffffffff81194ccc
2014-08-25T08:49:31.356810-07:00 cov kernel: [322676.159794] Call Trace:
2014-08-25T08:49:31.356810-07:00 cov kernel: [322676.159802]  [<ffffffff8122c83c>] ext4_es_lru_del+0x1c/0x60
2014-08-25T08:49:31.356811-07:00 cov kernel: [322676.159807]  [<ffffffff8120e498>] ext4_clear_inode+0x38/0x80
2014-08-25T08:49:31.356811-07:00 cov kernel: [322676.159811]  [<ffffffff81193da3>] evict+0xa3/0x190
2014-08-25T08:49:31.356812-07:00 cov kernel: [322676.159813]  [<ffffffff81193ec1>] dispose_list+0x31/0x40
2014-08-25T08:49:31.356814-07:00 cov kernel: [322676.159816]  [<ffffffff81194ccc>] prune_icache_sb+0x16c/0x310
2014-08-25T08:49:31.356814-07:00 cov kernel: [322676.159820]  [<ffffffff8117ee9b>] prune_super+0x15b/0x190
2014-08-25T08:49:31.356814-07:00 cov kernel: [322676.159825]  [<ffffffff81128953>] shrink_slab+0x153/0x2d0
2014-08-25T08:49:31.356815-07:00 cov kernel: [322676.159828]  [<ffffffff8112b6ef>] do_try_to_free_pages+0x39f/0x4c0
2014-08-25T08:49:31.356815-07:00 cov kernel: [322676.159831]  [<ffffffff8112b8e8>] try_to_free_pages+0xd8/0x160
2014-08-25T08:49:31.356816-07:00 cov kernel: [322676.159836]  [<ffffffff81121959>] __alloc_pages_nodemask+0x5c9/0x980
2014-08-25T08:49:31.356818-07:00 cov kernel: [322676.159840]  [<ffffffff8115cc85>] alloc_pages_vma+0x95/0x140
2014-08-25T08:49:31.356818-07:00 cov kernel: [322676.159844]  [<ffffffff8113d58a>] do_wp_page+0x43a/0x810
2014-08-25T08:49:31.356818-07:00 cov kernel: [322676.159849]  [<ffffffff8113e86b>] handle_pte_fault+0x29b/0xa60
2014-08-25T08:49:31.356819-07:00 cov kernel: [322676.159853]  [<ffffffff815612f4>] __do_page_fault+0x124/0x4d0
2014-08-25T08:49:31.356819-07:00 cov kernel: [322676.159856]  [<ffffffff8155e048>] page_fault+0x28/0x30
2014-08-25T08:49:31.356820-07:00 cov kernel: [322676.159860]  [<0000000000d81d60>] 0xd81d5f
2014-08-25T08:49:31.356820-07:00 cov kernel: [322676.159861] Code: ec b8 01 00 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 b8 00 00 01 00 f0 0f c1 07 89 c1 c1 e9 10 66 39 c1 89 ca 74 0d 0f 1f 00 f3 90 <0f> b7 07 66 39 d0 75 f6 c3 66 66 66 66 2e 0f 1f 84 00 00 00 00

This server runs Coverity, and is constantly creating and destroying docker instances.  This seems to be exercising the system in a wholly unusual way - this is not the first kernel fault it exposed.  I'll file a bug for the other one too.

Reproducible: Sometimes

Steps to Reproduce:
1. use the server as normal.
Actual Results:  

Expected Results:  
Comment 1 Takashi Iwai 2014-08-26 16:05:18 UTC
Could you check with kernel-debug?  This might give more information.
Or, it'd be helpful if you can get sysctl-t outputs when this (or similar stall) happens.  At a quick glance, this seems like a locking issue...
Comment 2 Jan Kara 2014-08-26 16:23:29 UTC
Yes, AFAICT from the report it is a known bug with ext4 extent cache where under memory pressure scanning the cache takes too long (especially on big machines) and thus softlockups are triggered. There were patches posted for this recently. I have to find some time to review them to speed up their merging upstream and then we can backport them to our kernels as well.
Comment 3 Russell Miller 2014-08-26 16:27:10 UTC
Jan, thank you.  Do you have a lkml reference or patchset I can point my management to?

Takashi, given Jan's comment, I'm not sure the request for more info is still germane, but if it happens again we'll reboot into kernel-debug.  sysctl -t was not possible because all cores were locked.

Jan, this machine is a 48 core Dell server with 128G RAM.  I'd say that qualifies as big, especially under the kind of IO and memory pressure docker causses.
Comment 4 Takashi Iwai 2014-08-26 17:56:49 UTC
If the problem was already addressed in the upstream, no need for collecting more logs, so far.  Let's see whether we can get some patches to test with.
Comment 5 Jan Kara 2014-08-27 15:51:45 UTC
So I have reviewed the patches and they still need some work. So please stay tuned (this is a non-trivial problem in the implementation of ext4 extent cache so it may take a while to get sorted out).
Comment 6 Russell Miller 2014-08-27 16:16:31 UTC
Thanks for the update.  Do you know of any workarounds presently?
Comment 7 Russell Miller 2014-08-27 18:56:47 UTC
Are you aware of similar issues with different filesystems or is this unique to ext4?

Also, you mentioned memory pressure, would adding additional memory be helpful?
Comment 8 Jan Kara 2014-09-02 20:03:24 UTC
The problem is unique to ext4. Regarding additional memory - no, I don't think that would be a practical solution - kernel should be able to reclaim unused stuff and it's a fault in ext4 driver that this isn't efficient.

I don't know about an easy workaround for this. Generally the more inodes you have cached, the bigger the problem is. So you could purge inode cache occasionally via "echo 3 >/proc/sys/vm/drop_caches" but this is likely going to ruin your performance so I don't think it's a useful workaround.
Comment 9 Jan Kara 2014-11-18 15:14:35 UTC
So I have submitted patches for this upstream (http://marc.info/?l=linux-ext4&m=141596196627969&w=2). Russel, would you be willing to test them? I could build a test kernel for you (probably based on recent upstream kernel but I could try basing it on 13.1 kernel if it would help you significantly).
Comment 10 Russell Miller 2014-11-18 17:50:21 UTC
I probably can't.  This bug was found on a production server that was not being stable, and we found a workaround - this system has been stable for months (basically, we reproduced the memory pressure and that seems to have helped.)  Getting downtime on a production server to test a bug that has not been an issue for months is probably going to be a losing proposition here.  Sorry.
Comment 11 Jan Kara 2014-12-17 19:00:30 UTC
Fixes have been pushed upstream. I've backported them to openSUSE 13.2 and SLE12 kernels. Do you still need the fixes in 13.1 or is 13.2 enough for you?
Comment 12 Russell Miller 2014-12-17 19:16:11 UTC
13.2 should be fine.  Thank you for your attention to this.  I'll let the affected users know a fix is available.
Comment 13 Jan Kara 2014-12-17 19:59:31 UTC
OK, just be aware that there's a 13.2 kernel maintenance update running which still won't have the fixes. So it will take a while before you'll get them through the update channel...
Comment 14 Swamp Workflow Management 2015-01-30 10:08:11 UTC
SUSE-SU-2015:0178-1: An update that solves 5 vulnerabilities and has 59 fixes is now available.

Category: security (important)
Bug References: 800255,809493,829110,856659,862374,873252,875220,884407,887108,887597,889192,891086,891277,893428,895387,895814,902232,902346,902349,903279,903640,904053,904177,904659,904969,905087,905100,906027,906140,906545,907069,907325,907536,907593,907714,907818,907969,907970,907971,907973,908057,908163,908198,908803,908825,908904,909077,909092,909095,909829,910249,910697,911181,911325,912129,912278,912281,912290,912514,912705,912946,913233,913387,913466
CVE References: CVE-2014-3687,CVE-2014-3690,CVE-2014-8559,CVE-2014-9420,CVE-2014-9585
Sources used:
SUSE Linux Enterprise Software Development Kit 12 (src):    kernel-docs-3.12.36-38.3, kernel-obs-build-3.12.36-38.2
SUSE Linux Enterprise Server 12 (src):    kernel-source-3.12.36-38.1, kernel-syms-3.12.36-38.1
SUSE Linux Enterprise Desktop 12 (src):    kernel-source-3.12.36-38.1, kernel-syms-3.12.36-38.1
Comment 15 Swamp Workflow Management 2015-04-13 12:05:13 UTC
openSUSE-SU-2015:0713-1: An update that solves 13 vulnerabilities and has 52 fixes is now available.

Category: security (important)
Bug References: 867199,893428,895797,900811,901925,903589,903640,904899,905681,907039,907818,907988,908582,908588,908589,908592,908593,908594,908596,908598,908603,908604,908605,908606,908608,908610,908612,909077,909078,909477,909634,910150,910322,910440,911311,911325,911326,911356,911438,911578,911835,912061,912202,912429,912705,913059,913466,913695,914175,915425,915454,915456,915577,915858,916608,917830,917839,918954,918970,919463,920581,920604,921313,922542,922944
CVE References: CVE-2014-8134,CVE-2014-8160,CVE-2014-8559,CVE-2014-9419,CVE-2014-9420,CVE-2014-9428,CVE-2014-9529,CVE-2014-9584,CVE-2014-9585,CVE-2015-0777,CVE-2015-1421,CVE-2015-1593,CVE-2015-2150
Sources used:
openSUSE 13.2 (src):    bbswitch-0.8-3.6.6, cloop-2.639-14.6.6, crash-7.0.8-6.6, hdjmod-1.28-18.7.6, ipset-6.23-6.6, kernel-docs-3.16.7-13.2, kernel-obs-build-3.16.7-13.7, kernel-obs-qa-3.16.7-13.1, kernel-obs-qa-xen-3.16.7-13.1, kernel-source-3.16.7-13.1, kernel-syms-3.16.7-13.1, pcfclock-0.44-260.6.2, vhba-kmp-20140629-2.6.2, virtualbox-4.3.20-10.2, xen-4.4.1_08-12.2, xtables-addons-2.6-6.2