Bug 1141998 - nvme_timeout fix not forward ported to 5.X ?
nvme_timeout fix not forward ported to 5.X ?
Status: RESOLVED NORESPONSE
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel
Current
Other Other
: P5 - None : Normal (vote)
: ---
Assigned To: E-mail List
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2019-07-18 09:58 UTC by Ruediger Oertel
Modified: 2019-11-04 13:21 UTC (History)
7 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ruediger Oertel 2019-07-18 09:58:48 UTC
we had this kernel bug running into nvme timeouts on the cavium/arm machines
some months ago. now we are testing the factory/tumbleweed kernel for another
reason and run into machine stalls like this daily:

[33190.546057] Internal error: synchronous external abort: 96000210 [#1] SMP
[33190.552850] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat cavium_rng_vf joydev thunderx_mmc i2c_thunderx mmc_core thunderx_edac cavium_rng ipmi_ssif uio_pdrv_genirq ipmi_devintf uio ipmi_msghandler efivarfs fuse squashfs loop brd af_packet hid_generic usbhid aes_ce_blk aes_ce_cipher nicvf cavium_ptp crct10dif_ce ast i2c_algo_bit ttm ghash_ce nicpf drm_kms_helper sha2_ce syscopyarea sysfillrect sysimgblt fb_sys_fops xhci_pci drm sha256_arm64 nvme xhci_hcd nvme_core sha1_ce gpio_keys thunder_bgx thunder_xcv usbcore mdio_thunder mdio_cavium sg nbd dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64
[33190.611351] Process kworker/12:1H (pid: 2505, stack limit = 0x00000000bb67bcdf)
[33190.618666] CPU: 12 PID: 2505 Comm: kworker/12:1H Not tainted 5.1.16-1-default #1 openSUSE Tumbleweed (unreleased)
[33190.629015] Hardware name: GIGABYTE R120-T32/MT30-GS1, BIOS T49 02/02/2018
[33190.635901] Workqueue: kblockd blk_mq_timeout_work
[33190.640696] pstate: 20000005 (nzCv daif -PAN -UAO)
[33190.645500] pc : nvme_timeout+0x48/0x310 [nvme]
[33190.650037] lr : blk_mq_check_expired+0x13c/0x160
[34661.689461] sp : ffff000022cb3bc0
[34661.692764] x29: ffff000022cb3bc0 x28: ffff803f87418400
[34661.698065] x27: 0000000000000004 x26: ffff803f82bd3810
[34661.703367] x25: ffff803f86578000 x24: ffff803f83d6e10000000000 x16: 0000000000000001
[34661.729871] x15: 0000000000000001 x14: 0000000000000000
[34661.735172] x13: 0000000000000040 x12: 0000000000000010
[34661.740472] x11: 0000ffffab298dc8 x10: 00000000000019e0
[34661.745773] x9 : ffff803fc01b1268 x8 : fefefefefefefeff
[34661.751075] x7 : 0000000000000018 x6 : ffff803fc01b1268
[34661.756375] x5 : 0000000000000003 x4 : ffff0000104f64a0
[34661.761676] x3 : ffff803f83d6e0d4 x2 : ffff000008de0740
[34661.766977] x1 : 0000000000000000 x0 : ffff0000104f65dc
[34661.772279] Call trace:
[34661.774723]  nvme_timeout+0x48/0x310 [nvme]
[34661.778898]  blk_mq_check_expired+0x13c/0x160
[34661.783244]  bt_for_each+0x148/0x158
[34661.786808]  blk_mq_queue_tag_busy_iter+0xa8/0x140
[34661.791588]  blk_mq_timeout_work+0x60/0x120
[34661.795763]  process_one_work+0x1d4/0x400
[34661.799761]  worker_thread+0x54/0x478
[34661.803414]  kthread+0x104/0x130
[34661.806633]  ret_from_fork+0x10/0x18
[34661.810199] Code: f9401319 f9400734 f940c693 91007273 (b9400273)
[34661.816283] ---[ end trace 785150af62deadb8 ]---

kernel version is 5.1.16-1-default
hardware is aarch64
System Information
        Manufacturer: GIGABYTE
        Product Name: R120-T32
BIOS Information
        Vendor: GIGABYTE
        Version: T49
        Release Date: 02/02/2018

obs-arm-1:~ # nvme list-subsys
nvme-subsys0 - NQN=nqn.2014.08.org.nvmexpress:80868086CVFT6075003W800CGN  INTEL SSDPEDMD800G4                     
\
 +- nvme0 pcie 0005:90:00.0 live
Comment 1 Michal Kubeček 2019-07-25 04:55:47 UTC
The subject seems to suggest this is a known bug which has been fixed in some
older kernel earlier. Do you have a reference to the previous bug or the fix?
Comment 2 Jiri Slaby 2019-11-04 12:35:41 UTC
Closing due to lack of response.
Comment 3 Ruediger Oertel 2019-11-04 13:21:45 UTC
fine with me, 5.3.7 looks solid again
obs-arm-1:~ # uptime
 13:20:55  up 4 days  0:36,  1 user,  load average: 47.53, 49.25, 51.11
obs-arm-7:~ # uptime
 13:21:06  up 4 days  0:46,  1 user,  load average: 68.58, 56.14, 52.83
obs-arm-9:~ # uptime
 13:21:30  up 4 days  0:35,  1 user,  load average: 61.20, 61.01, 58.19