Bug 1175245 - kernel BUG in cachefiles for 5.8.0-1-default
kernel BUG in cachefiles for 5.8.0-1-default
Status: VERIFIED FIXED
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel
Current
x86-64 All
: P5 - None : Major (vote)
: ---
Assigned To: openSUSE Kernel Bugs
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2020-08-13 18:37 UTC by Arjen Runsink
Modified: 2020-09-17 19:04 UTC (History)
5 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
A test fix patch (1.41 KB, patch)
2020-08-25 10:57 UTC, Takashi Iwai
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Arjen Runsink 2020-08-13 18:37:39 UTC
Yet another kernel bug related to cachefiles.

My previous report for 5.6:https://bugzilla.opensuse.org/show_bug.cgi?id=1171307

Issue can be triggered by just doing some io via nfs. If you retry to do a lot of read/write it can be triggered repetitively. Mitigation: disable cachefilesd or rollback to 5.7


[   51.591731] CacheFiles: 
[   51.591743] CacheFiles: Assertion failed
[   51.591820] ------------[ cut here ]------------
[   51.591827] kernel BUG at fs/cachefiles/rdwr.c:715!
[   51.591895] invalid opcode: 0000 [#1] SMP NOPTI
[   51.591905] CPU: 2 PID: 6687 Comm: php Kdump: loaded Not tainted 5.8.0-1-default #1 openSUSE Tumbleweed (unreleased)
[   51.591912] Hardware name: HP HP t630 Thin Client/8158, BIOS M40 v01.11 06/25/2019
[   51.591986] RIP: 0010:cachefiles_read_or_alloc_pages.cold+0x32/0x4e [cachefiles]
[   51.591995] Code: 8b d7 de dc 48 c7 c7 18 58 72 c0 e8 7f d7 de dc 0f 0b 48 c7 c7 40 4b 72 c0 e8 71 d7 de dc 48 c7 c7 18 58 72 c0 e8 65 d7 de dc <0f> 0b 48
 c7 c7 40 4b 72 c0 e8 57 d7 de dc 48 c7 c7 18 58 72 c0 e8
[   51.592006] RSP: 0018:ffffa24d82587a50 EFLAGS: 00010246
[   51.592013] RAX: 000000000000001c RBX: ffffa24d82587d28 RCX: ffff950fced1ae18
[   51.592018] RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff950fced1ae10
[   51.592023] RBP: ffff950eec1ae540 R08: 0720072007200720 R09: 00000000000004e5
[   51.592028] R10: 0720072007200720 R11: 0720072007200720 R12: ffff950f7a1b80e8
[   51.592033] R13: ffffa24d82587d28 R14: ffffa24d82587bcc R15: ffff950eea2d0de8
[   51.592040] FS:  00007ff7f3d9ad48(0000) GS:ffff950fced00000(0000) knlGS:0000000000000000
[   51.592046] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   51.592051] CR2: 00005579429990f0 CR3: 000000002a436000 CR4: 00000000001406e0
[   51.592056] Call Trace:
[   51.592073]  ? check_preempt_curr+0x67/0x90
[   51.592080]  ? ttwu_do_wakeup+0x19/0x150
[   51.592088]  ? _cond_resched+0x16/0x40
[   51.592095]  ? kmem_cache_alloc_trace+0x18c/0x280
[   51.592117]  ? fscache_alloc_retrieval+0x2f/0xe0 [fscache]
[   51.592129]  ? fscache_attach_object+0x139/0x1a0 [fscache]
[   51.592144]  ? fscache_run_op+0x56/0xb0 [fscache]
[   51.592159]  __fscache_read_or_alloc_pages+0x235/0x2e0 [fscache]
[   51.592203]  __nfs_readpages_from_fscache+0x60/0x160 [nfs]
[   51.592231]  nfs_readpages+0xb5/0x1e0 [nfs]
[   51.592241]  ? get_page_from_freelist+0x286/0x2e0
[   51.592249]  read_pages+0x185/0x270
[   51.592257]  page_cache_readahead_unbounded+0x153/0x210
[   51.592264]  generic_file_buffered_read+0x54a/0x970
[   51.592288]  nfs_file_read+0x6d/0xa0 [nfs]
[   51.592297]  new_sync_read+0x112/0x1a0
[   51.592305]  vfs_read+0x14f/0x180
[   51.592311]  ksys_read+0x5f/0xe0
[   51.592319]  do_syscall_64+0x52/0xd0
[   51.592326]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   51.592333] RIP: 0033:0x7ff7f3d59923
[   51.592339] Code: c3 8b 07 85 c0 75 24 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c 24 08 0f 05 <c3> e9 3a d3 ff ff 41 54 b8 02 00 00 00 49 89 f4 be 00 08 08 00 55
[   51.592349] RSP: 002b:00007ffc1f27fb78 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[   51.592356] RAX: ffffffffffffffda RBX: 00007ff7f3d9ad48 RCX: 00007ff7f3d59923
[   51.592361] RDX: 000000000000016a RSI: 00007ff7f327c000 RDI: 0000000000000003
[   51.592366] RBP: 00007ff7f327a620 R08: 0000000000000000 R09: 0000000000000000
[   51.592371] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[   51.592375] R13: 0000000000000003 R14: 00007ff7f327c000 R15: 00007ffc1f27fff0
[   51.592382] Modules linked in: macvlan veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_tables x_tables bpfilter rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc iscsi_ibft iscsi_boot_sysfs cachefiles fscache edac_mce_amd kvm_amd snd_hda_codec_realtek ccp snd_hda_codec_generic kvm ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core irqbypass snd_hwdep crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_pcm aesni_intel r8169 snd_timer crypto_simd realtek cryptd glue_helper efi_pstore pcspkr libphy snd hp_wmi sp5100_tco sparse_keymap wmi_bmof fam15h_power k10temp i2c_piix4 soundcore tiny_power_button button rfkill_gpio rfkill acpi_cpufreq nls_iso8859_1 nls_cp437 vfat fat amdgpu btrfs blake2b_generic libcrc32c xor iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper xhci_pci xhci_pci_renesas
[   51.592437]  xhci_hcd ehci_pci ehci_hcd syscopyarea sysfillrect raid6_pq sysimgblt fb_sys_fops cec rc_core usbcore drm crc32c_intel serio_raw wmi video overlay msr sg br_netfilter bridge stp llc dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua efivarfs
Comment 1 Takashi Iwai 2020-08-13 19:09:03 UTC
It sounds like the same bug as an upstream report
  https://bugzilla.kernel.org/show_bug.cgi?id=208883

Adding experts to Cc.
Comment 2 Arjen Runsink 2020-08-25 10:11:06 UTC
>It sounds like the same bug as an upstream report
>  https://bugzilla.kernel.org/show_bug.cgi?id=208883

I have read the description of the submitted rewrite of fscache and cachefilesd code. 
It states that all the networking fs (nfs, cifs, afs etc) will need a patch too. So I doubt this will be included in the kernel anytime soon.
Comment 3 Takashi Iwai 2020-08-25 10:47:14 UTC
Hrm, there is little change between 5.7 and 5.8 wrt fs/cachefiles/*, and they are likely irrelevant with the bug.

And looking at the oops closely, this seems like hitting just a false-positive ASSERT().  The readpages aops gets converted to readhead recently for many filesystems, hence the NULL check of readpages might be simply bogus and to be dropped.
Comment 4 Takashi Iwai 2020-08-25 10:55:40 UTC
A test kernel is being built in OBS home:tiwai:bsc1175245 repo.  It'll appear at
  http://download.opensuse.org/repositories/home:/tiwai:/bsc1175245/standard/

Please give it a try later.
Comment 5 Takashi Iwai 2020-08-25 10:57:03 UTC
Created attachment 841015 [details]
A test fix patch
Comment 6 Arjen Runsink 2020-08-26 09:07:18 UTC
# uname -ar
Linux dock01 5.8.3-1.ga51ffb9-default #1 SMP Tue Aug 25 10:52:44 UTC 2020 (a51ffb9) x86_64 x86_64 x86_64 GNU/Linux

Have been testing and stressing nfs io with this kernel for some 12 hours now. No issues with regular process combined it iozone or bonnie++ on the nfs mounts backed by fscache/cachefilesd.

Looks like you fixed it.
Comment 7 Takashi Iwai 2020-08-26 09:31:12 UTC
OK, I'll inform it on the upstream bugzilla.
Comment 8 Philipp Wagner 2020-08-27 12:15:35 UTC
I had the same problem and your updated kernel package fixes the problem for me as well. Thanks!
Comment 9 Takashi Iwai 2020-08-28 09:09:08 UTC
I submitted to the upstream and backported to stable/master git branch of openSUSE kernel.  It'll be included in the upcoming (or the next) update kernel.
Comment 10 Arjen Runsink 2020-09-17 19:04:33 UTC
I can confirm this issue as resolved. Will mark VERIFIED. Thank you!