Bugzilla – Bug 1204267
Plasmashell crashes on armv7
Last modified: 2024-08-02 13:51:49 UTC
Created attachment 862142 [details] journal.log Plasmashell crashes on armv7 since snapshot 20221006. Oct 13 04:54:05 localhost.localdomain plasmashell[2046]: QFont::setPointSizeF: Point size <= 0 (0.000000), must be greater than 0 Oct 13 04:54:06 localhost.localdomain plasmashell[2046]: LLVM ERROR: Cannot select: 0x3003d38: v4i32 = ARMISD::VCMPZ 0x2f64940, Constant:i32<2> Oct 13 04:54:06 localhost.localdomain plasmashell[2046]: 0x2f64940: v4i32,ch = ARMISD::VLD1DUP<(load (s32) from %ir.326)> 0x2fcdea8, 0x2fb0848, Constant:i32<4> Oct 13 04:54:06 localhost.localdomain plasmashell[2046]: 0x2fb0848: i32 = add 0x3006648, Constant:i32<64> Oct 13 04:54:06 localhost.localdomain plasmashell[2046]: 0x3006648: i32,ch = CopyFromReg 0x2e4bf5c, Register:i32 %35 Oct 13 04:54:06 localhost.localdomain plasmashell[2046]: 0x2fed080: i32 = Register %35 Oct 13 04:54:06 localhost.localdomain plasmashell[2046]: 0x2e65f08: i32 = Constant<64> Oct 13 04:54:06 localhost.localdomain plasmashell[2046]: 0x2fd93c0: i32 = Constant<4> Oct 13 04:54:06 localhost.localdomain plasmashell[2046]: 0x2fd9690: i32 = Constant<2> Oct 13 04:54:06 localhost.localdomain plasmashell[2046]: In function: fs_variant_partial Oct 13 04:54:06 localhost.localdomain plasmashell[2046]: KCrash: Application 'plasmashell' crashing... Oct 13 04:54:06 localhost.localdomain plasmashell[2046]: KCrash: Attempting to start /usr/libexec/drkonqi Oct 13 04:54:06 localhost.localdomain plasmashell[2077]: libEGL warning: DRI2: failed to authenticate Oct 13 04:54:06 localhost.localdomain kded5[1600]: Service "org.kde.StatusNotifierHost-2046" unregistered Oct 13 04:54:06 localhost.localdomain plasmashell[2046]: Unable to start Dr. Konqi Oct 13 04:54:06 localhost.localdomain plasmashell[2046]: Re-raising signal for core dump handling. Oct 13 04:54:06 localhost.localdomain systemd[1284]: plasma-plasmashell.service: Main process exited, code=dumped, status=6/ABRT
It was probably introduced in 20221003, which didn't reach openQA for ARM. Unfortunately that contained both Mesa and llvm updates, so hard to tell what caused that. Reassigning.
Honestly. No clue ... We switched to Mesa 22.2 and llvm15 lately.
The issue can be reproduced with one pyside test on armv7l: [ 3286s] 460/478 Test #461: QtDataVisualization_datavisualization_test ......Subprocess aborted***Exception: 0.82 sec [ 3286s] .LLVM ERROR: Cannot select: 0x1e6a2c0: v4i32 = ARMISD::VCMPZ 0x1d68a08, Constant:i32<2> [ 3286s] 0x1d68a08: v4i32,ch = ARMISD::VLD1DUP<(load (s32) from %ir.235)> 0x11ebd5c, 0x1d68660:1, Constant:i32<4> [ 3286s] 0x1d68660: i32,i32,ch = load<(load (s32) from %ir.232, align 8), <post-inc>> 0x11ebd5c, 0x1d52518, Constant:i32<64> [ 3286s] 0x1d52518: i32,ch = CopyFromReg 0x11ebd5c, Register:i32 %30 [ 3286s] 0x1d68858: i32 = Register %30 [ 3286s] 0x1d67100: i32 = Constant<64> [ 3286s] 0x1afd1a8: i32 = Constant<4> [ 3286s] 0x1d66ec0: i32 = Constant<2> [ 3286s] In function: fs_variant_partial
In my understanding, "Cannot select" is always an LLVM bug, specifically in the backend. Early stages of the backend should "legalize" data types and instructions, sending to instruction selection only what the target supports. So I can have a look, but it would be appreciated if someone could extract the IR that Mesa sends to LLVM. Otherwise I'll have to reverse-engineer a reproducer. Nevertheless, some initial remarks: ARMISD::VCMPZ is a "Vector compare to zero." [1] It should correspond to "vcmpe" in assembly [2]. The first argument being a v4i32 is slightly suspicious. I would have expected a v4f32, but since they live in the same registers maybe the backend doesn't care. The second is a Constant:i32<2> = ARMCC::CondCodes::HS, corresponding to conditional execution only if the carry flag is set, if I understand this correctly. [3,4] Inside we have ARMISD::VLD1DUP, which is a "Vector load N-element structure to all lanes" (same file as [1], different line), and seems to correspond to "vld1.N" in assembly. [5] The Constant:i32<4> could be an alignment, but I'm not sure. [1] https://github.com/llvm/llvm-project/blob/llvmorg-15.0.2/llvm/lib/Target/ARM/ARMISelLowering.h#L148 [2] https://developer.arm.com/documentation/ddi0406/c/Application-Level-Architecture/Instruction-Details/Alphabetical-list-of-instructions/VCMP--VCMPE [3] https://developer.arm.com/documentation/ddi0406/c/Application-Level-Architecture/Instruction-Details/Conditional-execution [4] https://github.com/llvm/llvm-project/blob/llvmorg-15.0.2/llvm/lib/Target/ARM/Utils/ARMBaseInfo.h#L33 [5] https://developer.arm.com/documentation/ddi0406/c/Application-Level-Architecture/Instruction-Details/Alphabetical-list-of-instructions/VLD1--single-element-to-all-lanes-
Created attachment 862276 [details] Various IR dumps of LLVM failure It can be reproduced by running /usr/lib/qt6/examples/datavisualization/bars/bars from the qt6-datavis3d-examples package as well. That has a slightly more complex shader though. I attached the full output of running it inside xvfb-run with GALLIVM_DEBUG=tgsi,ir,asm LP_DEBUG=fs, which dumps all kind of info, including LLVM IR.
Thanks, that should help. This isn't my area of expertise, but at least we can use this to file a bug upstream. (In reply to Aaron Puchert from comment #4) > Nevertheless, some initial remarks: ARMISD::VCMPZ is a "Vector compare to > zero." [1] It should correspond to "vcmpe" in assembly [2]. The first > argument being a v4i32 is slightly suspicious. I would have expected a > v4f32, but since they live in the same registers maybe the backend doesn't > care. The second is a Constant:i32<2> = ARMCC::CondCodes::HS, corresponding > to conditional execution only if the carry flag is set, if I understand this > correctly. [3,4] Seems I was misreading that, the condition code is for the comparison itself. For floating-point ARMCC::CondCodes::HS means ">, ==, or unordered", so we're doing a !(... < 0.0f) comparison. Likely corresponds to one of the fcmp ..., zeroinitializer in the IR.
(In reply to Aaron Puchert from comment #6) > Thanks, that should help. This isn't my area of expertise, but at least we > can use this to file a bug upstream. Will you do that or should one of us take care of that?
Since I'm not sure what the precise target machine is, I've used flags similar to how we build LLVM itself (see the specfile): llc -march=arm --float-abi=hard -mattr=+armv7-a,+vfp3d16 This reproduces the crash, just with a slightly different message: LLVM ERROR: Cannot select: t933: v4i32 = ARMISD::VCMPZ t1307, Constant:i32<2> t1307: v4i32,ch = ARMISD::VLD1DUP<(load (s32) from %ir.584)> t0, t1429:1, Constant:i32<4> t1429: i32,i32,ch = load<(load (s32) from %ir."&context.constants_ptr[]5618", align 8), <post-inc>> t0, t2, Constant:i32<64> t2: i32,ch = CopyFromReg t0, Register:i32 %45 t1: i32 = Register %45 t212: i32 = Constant<64> t49: i32 = Constant<4> t28: i32 = Constant<2> What's different is the added IR names, but they're not immediately helpful: there is a "&context.constants_ptr[]56" in the source, maybe there was disambiguation. The crash is reproducible on the current main branch, so it's still not fixed. With bugpoint --run-llc <input-file> --tool-args <options as above> we can reduce it to this: define void @fs_variant_partial() { entry: %output = alloca <4 x float>, align 16 br label %loop_begin loop_begin: ; preds = %skip, %entry br i1 undef, label %skip, label %0 0: ; preds = %loop_begin %1 = icmp uge <4 x i32> zeroinitializer, undef %2 = sext <4 x i1> %1 to <4 x i32> %3 = load i32, i32* undef, align 4 %4 = insertelement <4 x i32> undef, i32 %3, i32 3 %5 = trunc <4 x i32> %2 to <4 x i1> %6 = select <4 x i1> %5, <4 x i32> zeroinitializer, <4 x i32> %4 %7 = insertvalue [4 x <4 x i32>] undef, <4 x i32> %6, 0 %8 = insertvalue [4 x <4 x i32>] %7, <4 x i32> undef, 1 %9 = insertvalue [4 x <4 x i32>] %8, <4 x i32> undef, 2 %10 = insertvalue [4 x <4 x i32>] %9, <4 x i32> undef, 3 %11 = extractvalue [4 x <4 x i32>] %10, 0 %12 = bitcast <4 x i32> %11 to <4 x float> %13 = fmul <4 x float> zeroinitializer, %12 %14 = fadd <4 x float> %13, zeroinitializer %15 = fadd <4 x float> %14, zeroinitializer %16 = bitcast <4 x float> %15 to <4 x i32> %17 = insertvalue [4 x <4 x i32>] undef, <4 x i32> %16, 0 %18 = insertvalue [4 x <4 x i32>] %17, <4 x i32> undef, 1 %19 = insertvalue [4 x <4 x i32>] %18, <4 x i32> undef, 2 %20 = insertvalue [4 x <4 x i32>] %19, <4 x i32> undef, 3 %21 = extractvalue [4 x <4 x i32>] %20, 0 %22 = bitcast <4 x i32> %21 to <4 x float> store <4 x float> %22, <4 x float>* %output, align 16 br label %skip skip: ; preds = %0, %loop_begin br label %loop_begin } Crash is slightly different now: LLVM ERROR: Cannot select: t48: v4i32 = ARMISD::VCMPZ undef:v4i32, Constant:i32<2> t3: v4i32 = undef t47: i32 = Constant<2> This obviously corresponds to the %1 = icmp uge <4 x i32> zeroinitializer, undef With that knowledge we can reduce further: define <4 x i32> @fs_variant_partial() { %1 = icmp uge <4 x i32> zeroinitializer, undef %2 = sext <4 x i1> %1 to <4 x i32> ret <4 x i32> %2 } or define <4 x i32> @fs_variant_partial(<4 x i32> %0) { %2 = icmp uge <4 x i32> zeroinitializer, %0 %3 = sext <4 x i1> %2 to <4 x i32> ret <4 x i32> %3 } I'll see if I can spot where we're missing something, but likely I'll just file a bug and let the ARM people figure it where this should be fixed. From the looks of it we're simply not able to lower "icmp uge <4 x i32> zeroinitializer, ...", and the nested instructions have nothing to do with it.
The rabbit hole is deep! I noticed two oddities: The bitcode triggers the error in llc-{13,14,15}, so it's not a change in LLVM. I built Mesa 22.2.1 with LLVM14 (building old Mesa with LLVM15 does not work) and the bitcode produced also triggers the error in llc-{13, 14, 15}. I built Mesa 22.1.7 with LLVM14 as well and the bitcode also triggers the error! So the difference has to be somewhere in how Mesa invokes LLVM. I added LLVMPassBuilderOptionsSetDebugLogging(opts, true); to print the passes. Output with Mesa 22.2.1 + LLVM 15: ir_fs322_variant0.bc written Invoke as "opt -sroa -early-cse -simplifycfg -reassociate -mem2reg -constprop -instcombine -gvn ir_fs322_variant0.bc | llc -O2 [-mcpu=<-mcpu option>] [-mattr=<-mattr option(s)>]" Running pass: AlwaysInlinerPass on [module] Running analysis: InnerAnalysisManagerProxy<llvm::FunctionAnalysisManager, llvm::Module> on [module] Running analysis: ProfileSummaryAnalysis on [module] Running pass: CoroConditionalWrapper on [module] Running pass: AnnotationRemarksPass on fs_variant_partial (1387 instructions) Running analysis: TargetLibraryAnalysis on fs_variant_partial LLVM ERROR: Cannot select: 0x10505b0: v4i32 = ARMISD::VCMPZ 0x1287c98, Constant:i32<2> Mesa with LLVM 14 did not output that at all, which was caused by this conditional: #if LLVM_VERSION_MAJOR >= 15 #define GALLIVM_HAVE_CORO 0 #define GALLIVM_USE_NEW_PASS 1 #elif LLVM_VERSION_MAJOR >= 8 #define GALLIVM_HAVE_CORO 1 #define GALLIVM_USE_NEW_PASS 0 #else #define GALLIVM_HAVE_CORO 0 #define GALLIVM_USE_NEW_PASS 0 #endif So with LLVM >= 15 it uses the new pass manager and everything is different. Some experiments with opt + llc proved to be very helpful: opt -passes=always-inline,instcombine ir_fs322_variant0.bc | llc -mcpu=generic -> works opt -passes=always-inline ir_fs322_variant0.bc | llc -mcpu=generic -> fails! So the "instcombine" pass makes all the difference here to avoid the "Cannot select" error. Question is, why is the instcombine pass not used? Mesa hardcodes it in the list of passes after all: if (!(gallivm_perf & GALLIVM_PERF_NO_OPT)) strcpy(passes, "sroa,early-cse,simplifycfg,reassociate,mem2reg,constprop,instcombine,"); else strcpy(passes, "mem2reg"); LLVMRunPasses(gallivm->module, passes, LLVMGetExecutionEngineTargetMachine(gallivm->engine), opts); opt can actually answer that quickly: e06e5d2ccf7e:~/mesa/build # opt-15.0.2 -passes=sroa,early-cse,simplifycfg,reassociate,mem2reg,constprop,instcombine, ir_fs322_variant0.bc | llc-14.0.6 -mcpu=generic opt-15.0.2: unknown function pass 'constprop' (failure) Next try: e06e5d2ccf7e:~/mesa/build # opt-15.0.2 -passes=sroa,early-cse,simplifycfg,reassociate,mem2reg,instcombine, ir_fs322_variant0.bc | llc-14.0.6 -mcpu=generic opt-15.0.2: unknown function pass '' (failure) Next try: e06e5d2ccf7e:~/mesa/build # opt-15.0.2 -passes=sroa,early-cse,simplifycfg,reassociate,mem2reg,instcombine ir_fs322_variant0.bc | llc-14.0.6 -mcpu=generic .text .syntax unified (success!) So the missing "instcombine" pass causes the "Cannot select" error and the pass is missing because Mesa passes an invalid list of passes to LLVMRunPasses and ignores the error. This means that if Mesa was built with LLVM >= 15, only the "default<O2>" passes were actually run, so the code was not really optimized... With this patch, the "Cannot select" error is gone: if (!(gallivm_perf & GALLIVM_PERF_NO_OPT)) - strcpy(passes, "sroa,early-cse,simplifycfg,reassociate,mem2reg,constprop,instcombine,"); + strcpy(passes, "sroa,early-cse,simplifycfg,reassociate,mem2reg,instsimplify,instcombine"); else strcpy(passes, "mem2reg"); I'll send that to mesa upstream.
(In reply to Fabian Vogt from comment #9) > So the missing "instcombine" pass causes the "Cannot select" error and the > pass is missing > because Mesa passes an invalid list of passes to LLVMRunPasses and ignores > the error. Would it be possible to improve error handling here? At least some tracing would be nice. From your analysis it looks like this might affect more platforms and not just armv7, and we wouldn't have noticed anything were it not for the backend bug.
(In reply to Aaron Puchert from comment #10) > (In reply to Fabian Vogt from comment #9) > > So the missing "instcombine" pass causes the "Cannot select" error and the > > pass is missing > > because Mesa passes an invalid list of passes to LLVMRunPasses and ignores > > the error. > > Would it be possible to improve error handling here? At least some tracing > would be nice. From your analysis it looks like this might affect more > platforms and not just armv7, and we wouldn't have noticed anything were it > not for the backend bug. Correct. The code is unfortunately not ready for handling errors (even logging isn't really possible FWICT), so all I could do is ask for some ideas on the MR.
This is an autogenerated message for OBS integration: This bug (1204267) was mentioned in https://build.opensuse.org/request/show/1031948 Factory / llvm15
(In reply to OBSbugzilla Bot from comment #12) > This is an autogenerated message for OBS integration: > This bug (1204267) was mentioned in > https://build.opensuse.org/request/show/1031948 Factory / llvm15 This made it into snapshot 20221029 released today, and it fixes the compilation failure for the reproducer from attachment 862276 [details]. Factory:ARM hasn't released a new snapshot yet. Feel free to close once you're able to confirm that this fixes the original issue.
openQA test passes.
This is an autogenerated message for OBS integration: This bug (1204267) was mentioned in https://build.opensuse.org/request/show/1088949 Backports:SLE-15-SP4 / llvm15
This is an autogenerated message for OBS integration: This bug (1204267) was mentioned in https://build.opensuse.org/request/show/1157115 Backports:SLE-15-SP5 / llvm17