Bugzilla – Bug 961263
NCQ Timeout with SMR drives (e.g. Seagate 8tb hdd)
Last modified: 2018-07-03 21:01:34 UTC
Running openSUSE 42.1 (and also with Tumbleweed) I experienced random hard disk drive crashes. After some investigation I found this bug report which describes the problem better then I can: https://bugzilla.kernel.org/show_bug.cgi?id=93581 It would be great to see this fixed in the kernel packages provided by openSUSE. A fix can be found in comment 67: https://bugzilla.kernel.org/show_bug.cgi?id=93581#c67 The bug is fixed in mainline kernel version 4.4: http://git.kernel.org/cgit/linux/kernel/git/mkp/linux.git/commit/?h=bugzilla-93581&id=7c4fbd50bfece00abf529bc96ac989dd2bb83ca4 So at the moment I have to use the kernel provided by the Kernel:HEAD project to workaround this bug.
The commit in the master branch is: http://git.kernel.org/cgit/linux/kernel/git/mkp/linux.git/commit/?id=ca369d51b3e1649be4a72addd6d6a168cfb3f537
It looks like this should have been backported via stable trees, as the original regression commit was also merged to stable trees...
I have checked that, but I can not find it in the changelog of e.g. 4.1.5. I have now patched 4.3.3 (from Kernel:STABLE project), and can confirm this fixes the bug. After 500GiB of written data no crash (where the drives crashes after 10-100GiB of data with an unpatched kernel).
The commit 4f258a4634 ('sd: Fix maximum I/O size for BLOCK_PC requests') was backported to 4.1.7 kernel.
Hmm, the fix commit (7c4fbd50bf) requires other previous changes as well: one is to revert a change 34b48db66 (commit 30e2bc08b2) and another is to bump BLK_DEF_MAX_SECTORS (commit d2be537c3ba). So it alone cannot be applied as stable to 4.1.x. More badly, this fix changes kABI by adding a new field to queue_limits struct that is embedded in other structs. Fortunately, there is some hole at the tail of the struct, and the new field should fit there, at least for x86-64. In anyways, I set up a test kernel including these fix patches in OBS home:tiwai:bnc961263 repo. The package is being built now. Could you try the kernel later from that OBS repo?
(In reply to Takashi Iwai from comment #5) > In anyways, I set up a test kernel including these fix patches in OBS > home:tiwai:bnc961263 repo. The package is being built now. Could you try > the kernel later from that OBS repo? Yes of cause I can and will test it. But at the moment I can not reboot, so it will have to wait until Friday.
Ok I have started to test the kernel. I will write about 3TiB data, read it and then start a long time test (multiple reads / writes at the same time, let it go to sleep, start up (often crashed then)). I am be back in a few days.
Ok, looks good! No issues found, system runs stable.
Good to hear. I merged the fix patches to openSUSE-42.1 branch now. The next update kernel will include the fix. Let's close the bug. Thanks for reporting and testing.
openSUSE-SU-2016:0280-1: An update that solves 10 vulnerabilities and has 18 fixes is now available. Category: security (important) Bug References: 865096,865259,913996,950178,950998,952621,954324,954532,954647,955422,956708,957152,957988,957990,958439,958463,958504,958510,958886,958951,959190,959399,960021,960710,961263,961509,962075,962597 CVE References: CVE-2015-7550,CVE-2015-8539,CVE-2015-8543,CVE-2015-8550,CVE-2015-8551,CVE-2015-8552,CVE-2015-8569,CVE-2015-8575,CVE-2015-8767,CVE-2016-0728 Sources used: openSUSE Leap 42.1 (src): kernel-debug-4.1.15-8.1, kernel-default-4.1.15-8.1, kernel-docs-4.1.15-8.3, kernel-ec2-4.1.15-8.1, kernel-obs-build-4.1.15-8.2, kernel-obs-qa-4.1.15-8.1, kernel-obs-qa-xen-4.1.15-8.1, kernel-pae-4.1.15-8.1, kernel-pv-4.1.15-8.1, kernel-source-4.1.15-8.1, kernel-syms-4.1.15-8.1, kernel-vanilla-4.1.15-8.1, kernel-xen-4.1.15-8.1
Patch ca369d51b3e1 ("block/sd: Fix device-imposed transfer length limits") introduced a regression: Reproduce bug: $ modprobe -r scsi_debug $ modprobe scsi_debug sector_size=512 $ udevadm settle $ devname=$(grep --with-filename scsi_debug /sys/block/*/device/model | awk -F '/' '{print $4}') $ cat /sys/block/$devname/queue/{minimum_io_size,optimal_io_size} 512 64 Should be: 512 >=512 I've send a github pull request to fix that https://github.com/openSUSE/kernel-source/pull/2 Note: This regression also happens in kernel 4.4 (current Tumbleweed) but should be fixed in 4.5.
We don't handle github pull requests at all. Please give just the upstream commit ids to cherry-pick.
It's commit 9c1d9c207bb800498347a2716da298043ee280c5 Author: Martin K. Petersen <martin.petersen@oracle.com> Date: Wed Dec 16 17:53:52 2015 -0500 Subject: sd: Reject optimal transfer length smaller than page size and commit d0eb20a863ba7dc1d3f4b841639671f134560be2 Author: Martin K. Petersen <martin.petersen@oracle.com> Date: Wed Jan 20 11:01:23 2016 -0500 Subject: sd: Optimal I/O size is in bytes, not sectors Maybe the first one wouldn't be needed but it is somehow related and also needed to apply the 2nd one without conflicts.
Thanks, now I applied both to openSUSE-42.1 branch. The second patch will be merged to stable branch for TW soon later, too.
openSUSE-SU-2016:1008-1: An update that solves 15 vulnerabilities and has 26 fixes is now available. Category: security (important) Bug References: 814440,884701,949936,951440,951542,951626,951638,953527,954018,954404,954405,954876,958439,958463,958504,959709,960561,960563,960710,961263,961500,961509,962257,962866,962977,963746,963765,963767,963931,965125,966137,966179,966259,966437,966684,966693,968018,969356,969582,970845,971125 CVE References: CVE-2015-1339,CVE-2015-7799,CVE-2015-7872,CVE-2015-7884,CVE-2015-8104,CVE-2015-8709,CVE-2015-8767,CVE-2015-8785,CVE-2015-8787,CVE-2015-8812,CVE-2016-0723,CVE-2016-2069,CVE-2016-2184,CVE-2016-2383,CVE-2016-2384 Sources used: openSUSE Leap 42.1 (src): kernel-debug-4.1.20-11.1, kernel-default-4.1.20-11.1, kernel-docs-4.1.20-11.3, kernel-ec2-4.1.20-11.1, kernel-obs-build-4.1.20-11.2, kernel-obs-qa-4.1.20-11.1, kernel-obs-qa-xen-4.1.20-11.1, kernel-pae-4.1.20-11.1, kernel-pv-4.1.20-11.1, kernel-source-4.1.20-11.1, kernel-syms-4.1.20-11.1, kernel-vanilla-4.1.20-11.1, kernel-xen-4.1.20-11.1