Bugzilla – Bug 1175245
kernel BUG in cachefiles for 5.8.0-1-default
Last modified: 2020-09-17 19:04:33 UTC
Yet another kernel bug related to cachefiles. My previous report for 5.6:https://bugzilla.opensuse.org/show_bug.cgi?id=1171307 Issue can be triggered by just doing some io via nfs. If you retry to do a lot of read/write it can be triggered repetitively. Mitigation: disable cachefilesd or rollback to 5.7 [ 51.591731] CacheFiles: [ 51.591743] CacheFiles: Assertion failed [ 51.591820] ------------[ cut here ]------------ [ 51.591827] kernel BUG at fs/cachefiles/rdwr.c:715! [ 51.591895] invalid opcode: 0000 [#1] SMP NOPTI [ 51.591905] CPU: 2 PID: 6687 Comm: php Kdump: loaded Not tainted 5.8.0-1-default #1 openSUSE Tumbleweed (unreleased) [ 51.591912] Hardware name: HP HP t630 Thin Client/8158, BIOS M40 v01.11 06/25/2019 [ 51.591986] RIP: 0010:cachefiles_read_or_alloc_pages.cold+0x32/0x4e [cachefiles] [ 51.591995] Code: 8b d7 de dc 48 c7 c7 18 58 72 c0 e8 7f d7 de dc 0f 0b 48 c7 c7 40 4b 72 c0 e8 71 d7 de dc 48 c7 c7 18 58 72 c0 e8 65 d7 de dc <0f> 0b 48 c7 c7 40 4b 72 c0 e8 57 d7 de dc 48 c7 c7 18 58 72 c0 e8 [ 51.592006] RSP: 0018:ffffa24d82587a50 EFLAGS: 00010246 [ 51.592013] RAX: 000000000000001c RBX: ffffa24d82587d28 RCX: ffff950fced1ae18 [ 51.592018] RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff950fced1ae10 [ 51.592023] RBP: ffff950eec1ae540 R08: 0720072007200720 R09: 00000000000004e5 [ 51.592028] R10: 0720072007200720 R11: 0720072007200720 R12: ffff950f7a1b80e8 [ 51.592033] R13: ffffa24d82587d28 R14: ffffa24d82587bcc R15: ffff950eea2d0de8 [ 51.592040] FS: 00007ff7f3d9ad48(0000) GS:ffff950fced00000(0000) knlGS:0000000000000000 [ 51.592046] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 51.592051] CR2: 00005579429990f0 CR3: 000000002a436000 CR4: 00000000001406e0 [ 51.592056] Call Trace: [ 51.592073] ? check_preempt_curr+0x67/0x90 [ 51.592080] ? ttwu_do_wakeup+0x19/0x150 [ 51.592088] ? _cond_resched+0x16/0x40 [ 51.592095] ? kmem_cache_alloc_trace+0x18c/0x280 [ 51.592117] ? fscache_alloc_retrieval+0x2f/0xe0 [fscache] [ 51.592129] ? fscache_attach_object+0x139/0x1a0 [fscache] [ 51.592144] ? fscache_run_op+0x56/0xb0 [fscache] [ 51.592159] __fscache_read_or_alloc_pages+0x235/0x2e0 [fscache] [ 51.592203] __nfs_readpages_from_fscache+0x60/0x160 [nfs] [ 51.592231] nfs_readpages+0xb5/0x1e0 [nfs] [ 51.592241] ? get_page_from_freelist+0x286/0x2e0 [ 51.592249] read_pages+0x185/0x270 [ 51.592257] page_cache_readahead_unbounded+0x153/0x210 [ 51.592264] generic_file_buffered_read+0x54a/0x970 [ 51.592288] nfs_file_read+0x6d/0xa0 [nfs] [ 51.592297] new_sync_read+0x112/0x1a0 [ 51.592305] vfs_read+0x14f/0x180 [ 51.592311] ksys_read+0x5f/0xe0 [ 51.592319] do_syscall_64+0x52/0xd0 [ 51.592326] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 51.592333] RIP: 0033:0x7ff7f3d59923 [ 51.592339] Code: c3 8b 07 85 c0 75 24 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c 24 08 0f 05 <c3> e9 3a d3 ff ff 41 54 b8 02 00 00 00 49 89 f4 be 00 08 08 00 55 [ 51.592349] RSP: 002b:00007ffc1f27fb78 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 51.592356] RAX: ffffffffffffffda RBX: 00007ff7f3d9ad48 RCX: 00007ff7f3d59923 [ 51.592361] RDX: 000000000000016a RSI: 00007ff7f327c000 RDI: 0000000000000003 [ 51.592366] RBP: 00007ff7f327a620 R08: 0000000000000000 R09: 0000000000000000 [ 51.592371] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 51.592375] R13: 0000000000000003 R14: 00007ff7f327c000 R15: 00007ffc1f27fff0 [ 51.592382] Modules linked in: macvlan veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_tables x_tables bpfilter rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc iscsi_ibft iscsi_boot_sysfs cachefiles fscache edac_mce_amd kvm_amd snd_hda_codec_realtek ccp snd_hda_codec_generic kvm ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core irqbypass snd_hwdep crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_pcm aesni_intel r8169 snd_timer crypto_simd realtek cryptd glue_helper efi_pstore pcspkr libphy snd hp_wmi sp5100_tco sparse_keymap wmi_bmof fam15h_power k10temp i2c_piix4 soundcore tiny_power_button button rfkill_gpio rfkill acpi_cpufreq nls_iso8859_1 nls_cp437 vfat fat amdgpu btrfs blake2b_generic libcrc32c xor iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper xhci_pci xhci_pci_renesas [ 51.592437] xhci_hcd ehci_pci ehci_hcd syscopyarea sysfillrect raid6_pq sysimgblt fb_sys_fops cec rc_core usbcore drm crc32c_intel serio_raw wmi video overlay msr sg br_netfilter bridge stp llc dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua efivarfs
It sounds like the same bug as an upstream report https://bugzilla.kernel.org/show_bug.cgi?id=208883 Adding experts to Cc.
>It sounds like the same bug as an upstream report > https://bugzilla.kernel.org/show_bug.cgi?id=208883 I have read the description of the submitted rewrite of fscache and cachefilesd code. It states that all the networking fs (nfs, cifs, afs etc) will need a patch too. So I doubt this will be included in the kernel anytime soon.
Hrm, there is little change between 5.7 and 5.8 wrt fs/cachefiles/*, and they are likely irrelevant with the bug. And looking at the oops closely, this seems like hitting just a false-positive ASSERT(). The readpages aops gets converted to readhead recently for many filesystems, hence the NULL check of readpages might be simply bogus and to be dropped.
A test kernel is being built in OBS home:tiwai:bsc1175245 repo. It'll appear at http://download.opensuse.org/repositories/home:/tiwai:/bsc1175245/standard/ Please give it a try later.
Created attachment 841015 [details] A test fix patch
# uname -ar Linux dock01 5.8.3-1.ga51ffb9-default #1 SMP Tue Aug 25 10:52:44 UTC 2020 (a51ffb9) x86_64 x86_64 x86_64 GNU/Linux Have been testing and stressing nfs io with this kernel for some 12 hours now. No issues with regular process combined it iozone or bonnie++ on the nfs mounts backed by fscache/cachefilesd. Looks like you fixed it.
OK, I'll inform it on the upstream bugzilla.
I had the same problem and your updated kernel package fixes the problem for me as well. Thanks!
I submitted to the upstream and backported to stable/master git branch of openSUSE kernel. It'll be included in the upcoming (or the next) update kernel.
I can confirm this issue as resolved. Will mark VERIFIED. Thank you!