Bug 1168661 - [Build 20200404] openQA test fails in await_install of installer, fails to reboot on SMP systems
[Build 20200404] openQA test fails in await_install of installer, fails to re...
Status: RESOLVED FIXED
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Other
Current
Other Other
: P5 - None : Normal (vote)
: ---
Assigned To: dracut maintainers
E-mail List
https://openqa.opensuse.org/tests/122...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2020-04-05 18:04 UTC by Oliver Kurz
Modified: 2020-11-04 21:15 UTC (History)
6 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Oliver Kurz 2020-04-05 18:04:12 UTC
## Observation

openQA test in scenario opensuse-Tumbleweed-DVD-x86_64-install_only@smp_64 fails in
[await_install](https://openqa.opensuse.org/tests/1224945/modules/await_install/steps/40)
fails to reboot on SMP systems


## Reproducible

Fails since Build [20200404](https://openqa.opensuse.org/tests/1224485), this build, reproducibly


## Expected result

Last good: [20200402](https://openqa.opensuse.org/tests/1223880), previous build


## Further details

Always latest result in this scenario: [latest](https://openqa.opensuse.org/tests/latest?arch=x86_64&distri=opensuse&flavor=DVD&machine=smp_64&test=install_only&version=Tumbleweed)

See https://openqa.opensuse.org/snapshot-changes/opensuse/Tumbleweed/diff/20200404 for changes of this Tumbleweed snapshot.
Comment 1 Dominique Leuenberger 2020-04-06 07:56:41 UTC
This looks like a dracut crash, when inspecting the serial log:

https://openqa.opensuse.org/tests/1225239/file/serial0.txt

begin 644 dracut-install.core.pid_13445.sig_11.time_1586122805
Comment 2 Dominique Leuenberger 2020-04-06 08:02:12 UTC
I reverted dracut to version 049.1+suse.138.g9068a629 for the time being
Comment 3 Dominique Leuenberger 2020-04-06 10:31:41 UTC
CC fbui: I was told systemd 245 would require the dracut update
Comment 4 Franck Bui 2020-04-06 12:05:16 UTC
No v245 doesn't depend on dracut 050. It's just we thought that it would be a good idea to test both updates in the same staging.
Comment 5 Daniel Molkentin 2020-04-07 12:26:34 UTC
I'll check dracut on factory. Ignore the latest submission.
Comment 7 Daniel Molkentin 2020-04-13 20:04:21 UTC
I can reproduce a segfault in dracut-install, which is used during initramfs creation.

According to git bisect, this upstream commit is the cause:

commit a01204202b3014c0c761c93bc7de8bf35e6dc5ef
Author: Böszörményi Zoltán <zboszor@pr.hu>
Date:   Thu Oct 24 11:28:55 2019 +0200

    Allow running on a cross-compiled rootfs
Comment 8 Daniel Molkentin 2020-04-14 12:35:44 UTC
Stack exhaustion through recursion.

uas.ko -> usb_storage.ko

#116266 0x0000000000405ddc in install_dependent_modules (modlist=0x643fb0) at install/dracut-install.c:1469
#116267 0x0000000000405e40 in install_dependent_modules (modlist=modlist@entry=0x616ab0) at install/dracut-install.c:1473
#116268 0x0000000000406297 in install_module (mod=mod@entry=0x65aa50) at install/dracut-install.c:1531
#116269 0x00000000004068ba in install_modules (argc=argc@entry=1, argv=argv@entry=0x7fffffffdd30) at install/dracut-install.c:1827
#116270 0x0000000000402ef9 in main (argc=<optimized out>, argv=0x7fffffffdce8) at install/dracut-install.c:2017

caused by these lines in /etc/modprobe.d/00-system.conf:

# uas devices can be unpredictably a fallback for both drivers must be present
softdep usb_storage pre: uas
softdep uas pre: usb_storage

(which cause a questionable circular dependency, but dracut has handled them and should continue to do so). Can anyone make sense of the comment though?
Comment 9 Martin Wilck 2020-04-17 14:36:10 UTC
The generic modaliases 

usb:v*p*d*dc*dsc*dp*ic08isc06ip62in*
usb:v*p*d*dc*dsc*dp*ic08isc06ip50in*

are supported by both uas and usb-storage. Theoretically "ip62" is UAS (https://superuser.com/questions/928741/how-can-i-check-whether-usb3-0-uasp-usb-attached-scsi-protocol-mode-is-enabled), but I guess there are lots of broken disks around which pretend to speak UAS but can't, so there must be a fallback to usb-storage. There's also the "usb-storage.quirks" module parameter.

The softdeps were introduced in bug 862397 to make sure both drivers are loaded. They don't seem to be necessary any more on modern SUSE systems. uas has a hard dependency on usb-storage, so it's indeed non-obvious why the softdep uas->usb-storage was ever needed.

Wrt the reverse dependency, the worst thing that can happen when we remove it is cause some disks to perform sub-optimally with the usb-storage driver, AFAICS.

I'll remove the uas->usb-storage dependency in suse-module-tools, and convert the usb-storage->uas dep into a "softdep post". That should eliminate the circular dependency in this case (there may be other similar cases though).

dracut ignores "softdep post" dependencies currently. So if a customer generates an initrd with an USB storage device connected via usb-storage, uas will not be packaged in the initrd. If the system is booted with this initrd later, and another, UAS-capable disk is attached without explictly loading uas, it *might* happen that uas is not auto-loaded and the new disk may perform worse than expected. But this is a corner case, so this potential regression is justified, as the change eliminates an inconsistent configuration with circular dependency.
Comment 10 Swamp Workflow Management 2020-04-17 17:10:06 UTC
This is an autogenerated message for OBS integration:
This bug (1168661) was mentioned in
https://build.opensuse.org/request/show/794962 Factory / suse-module-tools
Comment 11 Daniel Molkentin 2020-05-27 14:13:44 UTC
Fixed in suse-module-tools. Closing.
Comment 12 Daniel Molkentin 2020-05-27 14:16:16 UTC
Can we reinstate the test? I'll reopen for now.
Comment 13 Daniel Molkentin 2020-05-29 15:12:34 UTC
Test is succeeding again. Closing.
Comment 14 Daniel Molkentin 2020-05-29 15:18:15 UTC
Why was the test disabled even though the last run succeeded?
Comment 15 Oliver Kurz 2020-06-01 08:29:31 UTC
Hi Daniel,

(In reply to Daniel Molkentin from comment #14)
> Why was the test disabled even though the last run succeeded?

I do not understand why you think the test would be disabled. The description mentions the "Always latest result in this scenario" which currently links to https://openqa.opensuse.org/tests/1282620 from 3 days ago. The test scenario is enabled on Tumbleweed and triggered on every new Tumbleweed snapshot. Please keep in mind that old job results are deleted after some time.