Bug 1202275 - Kernel OOPS on filesystem operations
Kernel OOPS on filesystem operations
Status: RESOLVED WORKSFORME
Classification: openSUSE
Product: openSUSE Distribution
Classification: openSUSE
Component: Kernel
Leap 15.4
Other Other
: P5 - None : Critical (vote)
: ---
Assigned To: Goldwyn Rodrigues
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2022-08-09 14:36 UTC by Fabian Vogt
Modified: 2022-10-27 08:16 UTC (History)
8 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Journal (2.63 MB, text/plain)
2022-08-10 11:51 UTC, Fabian Vogt
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Fabian Vogt 2022-08-09 14:36:58 UTC
This OOPS shows up on the openQA worker "power8" after some uptime. After it occurs for the first time, other similar OOPSes show up frequently and other weird issues appear, like zypper and openQA workers getting stuck in syscalls such as "poll".


Aug 08 19:23:06 power8 openqa-continuous-update[37713]: Loading repository data...
Aug 08 19:23:06 power8 openqa-continuous-update[37713]: Reading installed packages...
Aug 08 19:23:07 power8 kernel: Kernel attempted to read user page (16) - exploit attempt? (uid: 0)
Aug 08 19:23:07 power8 kernel: BUG: Kernel NULL pointer dereference on read at 0x00000016
Aug 08 19:23:07 power8 kernel: Faulting instruction address: 0xc0000000005561dc
Aug 08 19:23:07 power8 kernel: Oops: Kernel access of bad area, sig: 11 [#1]
Aug 08 19:23:07 power8 kernel: LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
Aug 08 19:23:07 power8 kernel: Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache netfs kvm_hv kvm af_packet rfkill crct10dif_vpmsum ipmi_powernv(X) ipmi_devintf ipmi_msghandler leds_powernv(X) r>
Aug 08 19:23:07 power8 kernel: Supported: No, Unsupported modules are loaded
Aug 08 19:23:07 power8 kernel: CPU: 72 PID: 37713 Comm: Zypp-main Tainted: G        W      X  N 5.14.21-150400.24.11-default #1 SLE15-SP4 5031505b0a65e234cdf253965338bef90a38442d
Aug 08 19:23:07 power8 kernel: NIP:  c0000000005561dc LR: c00000000053c9a8 CTR: 0000000000000001
Aug 08 19:23:07 power8 kernel: REGS: c0000010208c3730 TRAP: 0300   Tainted: G        W      X  N  (5.14.21-150400.24.11-default)
Aug 08 19:23:07 power8 kernel: MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 84442222  XER: 00000000
Aug 08 19:23:07 power8 kernel: CFAR: c00000000000cb8c DAR: 0000000000000016 DSISR: 40000000 IRQMASK: 0 
                               GPR00: c00000000053c9a8 c0000010208c39d0 c000000002833a00 c000000804bde100 
                               GPR04: c0000010208c3be8 c0000010208c3ae4 0000000000000000 0000000000000000 
                               GPR08: c00000100a97002c 00000000018e0000 c008000000000000 ffffffffffff0000 
                               GPR12: 0000000000002200 c0000017fffdae80 00007fffe0bae620 00007fffe0bae600 
                               GPR16: 00007fffe0bae5d0 00007fffe0bae5f0 c00000100a970020 00007fffe0bae610 
                               GPR20: 0000000000000000 c0000010208c3be8 0000000000000043 c0000010208c3af8 
                               GPR24: ffffffffffffffff 0000000000000000 c0000010208c3be8 0000000000000000 
                               GPR28: c000000804bde100 c000000804bde100 0000000063800036 fffffffffffffffe 
Aug 08 19:23:07 power8 kernel: NIP [c0000000005561dc] __d_lookup+0x8c/0x290
Aug 08 19:23:07 power8 kernel: LR [c00000000053c9a8] lookup_fast+0x108/0x240
Aug 08 19:23:07 power8 kernel: Call Trace:
Aug 08 19:23:07 power8 kernel: [c0000010208c39d0] [c0000010208c3a00] 0xc0000010208c3a00 (unreliable)
Aug 08 19:23:07 power8 kernel: [c0000010208c3a40] [c00000000053c9a8] lookup_fast+0x108/0x240
Aug 08 19:23:07 power8 kernel: [c0000010208c3aa0] [c00000000054267c] path_openat+0x25c/0x1330
Aug 08 19:23:07 power8 kernel: [c0000010208c3ba0] [c0000000005456e4] do_filp_open+0xa4/0x130
Aug 08 19:23:07 power8 kernel: [c0000010208c3ce0] [c000000000524418] do_sys_openat2+0x2e8/0x440
Aug 08 19:23:07 power8 kernel: [c0000010208c3d50] [c000000000526198] do_sys_open+0x78/0xc0
Aug 08 19:23:07 power8 kernel: [c0000010208c3db0] [c00000000003269c] system_call_exception+0x15c/0x330
Aug 08 19:23:07 power8 kernel: [c0000010208c3e10] [c00000000000c74c] system_call_common+0xec/0x250
Aug 08 19:23:07 power8 kernel: --- interrupt: c00 at 0x7fffb5429e14
Aug 08 19:23:07 power8 kernel: NIP:  00007fffb5429e14 LR: 00007fffb53ed7ec CTR: 0000000000000000
Aug 08 19:23:07 power8 kernel: REGS: c0000010208c3e80 TRAP: 0c00   Tainted: G        W      X  N  (5.14.21-150400.24.11-default)
Aug 08 19:23:07 power8 kernel: MSR:  900000000280f033 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 28444888  XER: 00000000
Aug 08 19:23:07 power8 kernel: IRQMASK: 0 
                               GPR00: 000000000000011e 00007fffe0bae1a0 00007fffb5517200 ffffffffffffff9c 
                               GPR04: 000001001e2b5a40 0000000000084800 0000000000000000 000000006332692f 
                               GPR08: 0000000000004000 0000000000000000 0000000000000000 0000000000000000 
                               GPR12: 0000000000000000 00007fffb3c32450 00007fffe0bae620 00007fffe0bae600 
                               GPR16: 00007fffe0bae5d0 00007fffe0bae5f0 00007fffb62c75a0 00007fffe0bae610 
                               GPR20: 00007fffb62d62e0 00007fffe0bae630 00007fffb62d62d8 0000000000000000 
                               GPR24: 0000000000000000 00007fffe0bae5e0 00007fffe0bae640 00007fffe0bae5d0 
                               GPR28: 000001001e41e85b 000001001e2b5de0 000001001e41e848 000001001e2b5e50 
Aug 08 19:23:07 power8 kernel: NIP [00007fffb5429e14] 0x7fffb5429e14
Aug 08 19:23:07 power8 kernel: LR [00007fffb53ed7ec] 0x7fffb53ed7ec
Aug 08 19:23:07 power8 kernel: --- interrupt: c00
Aug 08 19:23:07 power8 kernel: Instruction dump:
Aug 08 19:23:07 power8 kernel: fb610048 fb810050 fba10058 7c9a2378 7c7c1b78 3b600000 3b200000 3b00ffff 
Aug 08 19:23:07 power8 kernel: 48000010 ebff0000 2fbf0000 419e0060 <813f0018> 7f89f000 409effec 3bbf0050 
Aug 08 19:23:07 power8 kernel: ---[ end trace f3fc0069d5e8e587 ]---
Comment 1 Takashi Iwai 2022-08-10 06:18:47 UTC
Not sure whether it's an arch-specific issue.  The stack trace doesn't indicate it but looks as if it were a generic fs issue, though.

Let's try to toss to filesystem people for taking a look at first.
Comment 2 Fabian Vogt 2022-08-10 11:51:33 UTC
Created attachment 860730 [details]
Journal

Apparently the full journal I attached was silently dropped, probably because of size. I attached the first part until the third OOPS message.
Comment 3 Marius Kittler 2022-08-10 12:14:48 UTC
After upgrading to Leap 15.4 we've also seen other problems on power machines, see https://progress.opensuse.org/issues/114565 and https://bugzilla.opensuse.org/show_bug.cgi?id=1202138 and
https://bugzilla.suse.com/show_bug.cgi?id=1201796. Of course these are all different symptoms and likely also have a different cause. However, it still strikes me that we see so many problems on power machines (with Leap 15.4).
Comment 4 Goldwyn Rodrigues 2022-08-10 20:20:46 UTC
What is ipmi_powernv and leds_powernv? How were they loaded and why are they not supported? Can this be reproduced without these modules?

I am trying to eliminate memory corruption issues because of unsupported modules before we focus on VFS issues.
Comment 5 Fabian Vogt 2022-08-11 10:55:55 UTC
(In reply to Goldwyn Rodrigues from comment #4)
> What is ipmi_powernv and leds_powernv? How were they loaded and why are they
> not supported? Can this be reproduced without these modules?
> 
> I am trying to eliminate memory corruption issues because of unsupported
> modules before we focus on VFS issues.

They're part of kernel-default, but marked as "supported:      external".
Comment 6 Fabian Vogt 2022-08-17 13:18:03 UTC
After the machine got stuck and had to be rebooted a couple of times again, I downgraded to the latest kernel from 15.3. Let's see how that goes.
Comment 7 Fabian Vogt 2022-08-22 15:48:36 UTC
(In reply to Fabian Vogt from comment #6)
> After the machine got stuck and had to be rebooted a couple of times again,
> I downgraded to the latest kernel from 15.3. Let's see how that goes.

The machine has an update of >5 days now without any issues. So it appears like this is an issue with the 15.4 kernel.
Comment 8 Goldwyn Rodrigues 2022-08-22 19:32:46 UTC
Could you please take a kdump? It would help analyze the issue better.
Comment 9 Fabian Vogt 2022-08-23 06:50:52 UTC
(In reply to Goldwyn Rodrigues from comment #8)
> Could you please take a kdump? It would help analyze the issue better.

I configured kdump and the service was running properly after a reboot. I triggered a crash to test it, but that didn't even print the "Starting crashkernel" message and even worse, even IPMI is dead now. Fun.
Comment 10 Fabian Vogt 2022-08-23 09:02:00 UTC
(In reply to Fabian Vogt from comment #9)
> (In reply to Goldwyn Rodrigues from comment #8)
> > Could you please take a kdump? It would help analyze the issue better.
> 
> I configured kdump and the service was running properly after a reboot. I
> triggered a crash to test it, but that didn't even print the "Starting
> crashkernel" message and even worse, even IPMI is dead now. Fun.

IPMI came back after a few minutes, so I could "mc reset cold" and try again. Same result though, so kdump just doesn't want to work. Anything else I can do?
Comment 11 David Disseldorp 2022-08-23 12:56:54 UTC
(In reply to Fabian Vogt from comment #10)
> (In reply to Fabian Vogt from comment #9)
> > (In reply to Goldwyn Rodrigues from comment #8)
> > > Could you please take a kdump? It would help analyze the issue better.
> > 
> > I configured kdump and the service was running properly after a reboot. I
> > triggered a crash to test it, but that didn't even print the "Starting
> > crashkernel" message and even worse, even IPMI is dead now. Fun.
> 
> IPMI came back after a few minutes, so I could "mc reset cold" and try
> again. Same result though, so kdump just doesn't want to work. Anything else
> I can do?

Does kdump work if you trigger it manually? e.g.

    echo 1 > /proc/sys/kernel/sysrq
    echo c > /proc/sysrq-trigger

If kdump isn't an option then we could try to cobble something together with kprobes / ftrace or a custom vfs-debug kernel, but it'd be nice to get confirmation first.

(In reply to Fabian Vogt from comment #2)
> Created attachment 860730 [details]
> Journal

  Aug 08 15:18:10 localhost kernel: opal: OPAL_CONSOLE_FLUSH missing.
  Aug 08 15:18:10 localhost kernel: WARNING: CPU: 11 PID: 1475 at ../arch/powerpc/platforms/powernv/opal.c:528 __opal_flush_console+0xfc/0x110

 523                 /*                                                                
 524                  * If OPAL_CONSOLE_FLUSH is not implemented in the firmware,      
 525                  * the console can still be flushed by calling the polling        
 526                  * function while it has OPAL_EVENT_CONSOLE_OUTPUT events.        
 527                  */                                                               
 528                 WARN_ONCE(1, "opal: OPAL_CONSOLE_FLUSH missing.\n"); 

Old firmware? Please try to upgrade to the latest IBM firmware if possible.
Comment 12 Fabian Vogt 2022-08-23 13:20:31 UTC
(In reply to David Disseldorp from comment #11)
> (In reply to Fabian Vogt from comment #10)
> > (In reply to Fabian Vogt from comment #9)
> > > (In reply to Goldwyn Rodrigues from comment #8)
> > > > Could you please take a kdump? It would help analyze the issue better.
> > > 
> > > I configured kdump and the service was running properly after a reboot. I
> > > triggered a crash to test it, but that didn't even print the "Starting
> > > crashkernel" message and even worse, even IPMI is dead now. Fun.
> > 
> > IPMI came back after a few minutes, so I could "mc reset cold" and try
> > again. Same result though, so kdump just doesn't want to work. Anything else
> > I can do?
> 
> Does kdump work if you trigger it manually? e.g.
> 
>     echo 1 > /proc/sys/kernel/sysrq
>     echo c > /proc/sysrq-trigger

That's what I tried:

power8:~ # echo c > /proc/sysrq-trigger
[  233.397216][ T6857] sysrq: Trigger a crash
[  233.397251][ T6857] Kernel panic - not syncing: sysrq triggered crash
[  233.397270][ T6857] CPU: 104 PID: 6857 Comm: bash Tainted: G        W      X  N 5.14.21-150400.24.18-default #1 SLE15-SP4 a5d3db5b7f5fbb29c4ee73b4cdefcad058b71f7f
[  233.397294][ T6857] Call Trace:
[  233.397310][ T6857] [c00000181de83ab0] [c00000000086755c] dump_stack_lvl+0x70/0xa4 (unreliable)
[  233.397337][ T6857] [c00000181de83af0] [c000000000159724] panic+0x164/0x400
[  233.397357][ T6857] [c00000181de83b80] [c00000000095f230] sysrq_handle_crash+0x30/0x40
[  233.397378][ T6857] [c00000181de83be0] [c00000000095fb50] __handle_sysrq+0xf0/0x2a0
[  233.397397][ T6857] [c00000181de83c90] [c000000000960428] write_sysrq_trigger+0xd8/0x190
[  233.397416][ T6857] [c00000181de83cd0] [c000000000624af8] proc_reg_write+0x108/0x1b0
[  233.397436][ T6857] [c00000181de83d00] [c000000000529e10] vfs_write+0xf0/0x340
[  233.397456][ T6857] [c00000181de83d60] [c00000000052a28c] ksys_write+0xdc/0x130
[  233.397474][ T6857] [c00000181de83db0] [c00000000003269c] system_call_exception+0x15c/0x330
[  233.397494][ T6857] [c00000181de83e10] [c00000000000c74c] system_call_common+0xec/0x250
[  233.397514][ T6857] --- interrupt: c00 at 0x7fffb7a03114
[  233.397532][ T6857] NIP:  00007fffb7a03114 LR: 00007fffb7978fe4 CTR: 0000000000000000
[  233.397550][ T6857] REGS: c00000181de83e80 TRAP: 0c00   Tainted: G        W      X  N  (5.14.21-150400.24.18-default)
[  233.397568][ T6857] MSR:  900000000280f033 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 22242222  XER: 00000000
[  233.397595][ T6857] IRQMASK: 0 
[  233.397595][ T6857] GPR00: 0000000000000004 00007fffe857f0a0 00007fffb7af7200 0000000000000001 
[  233.397595][ T6857] GPR04: 0000010028618830 0000000000000002 0000000000000010 0000000000000000 
[  233.397595][ T6857] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[  233.397595][ T6857] GPR12: 0000000000000000 00007fffb7c5b600 0000000000000000 000000011de337c0 
[  233.397595][ T6857] GPR16: 000000011dd97ed0 0000000000000000 000000011ddddd10 0000000000000000 
[  233.397595][ T6857] GPR20: 000000011ddedb28 0000010028773460 000000011de37828 000000011de36c70 
[  233.397595][ T6857] GPR24: 0000010028518030 0000000000000000 0000000000000002 0000010028618830 
[  233.397595][ T6857] GPR28: 0000000000000002 00007fffb7af16d8 0000010028618830 0000000000000002 
[  233.397670][ T6857] NIP [00007fffb7a03114] 0x7fffb7a03114
[  233.397686][ T6857] LR [00007fffb7978fe4] 0x7fffb7978fe4
[  233.397702][ T6857] --- interrupt: c00
(stuck here)
~. [terminated ipmitool]
(IPMI broke)
^C
SIGN INT: Close Interface IPMI v2.0 RMCP+ LAN Interface
^C^C^C^C
Close Session command failed

As the crash is apparently so severe that it upsets the BMC, I'm not hopeful
here.

> If kdump isn't an option then we could try to cobble something together with
> kprobes / ftrace or a custom vfs-debug kernel, but it'd be nice to get
> confirmation first.
> 
> (In reply to Fabian Vogt from comment #2)
> > Created attachment 860730 [details]
> > Journal
> 
>   Aug 08 15:18:10 localhost kernel: opal: OPAL_CONSOLE_FLUSH missing.
>   Aug 08 15:18:10 localhost kernel: WARNING: CPU: 11 PID: 1475 at
> ../arch/powerpc/platforms/powernv/opal.c:528 __opal_flush_console+0xfc/0x110
> 
>  523                 /*                                                     
> 
>  524                  * If OPAL_CONSOLE_FLUSH is not implemented in the
> firmware,      
>  525                  * the console can still be flushed by calling the
> polling        
>  526                  * function while it has OPAL_EVENT_CONSOLE_OUTPUT
> events.        
>  527                  */                                                    
> 
>  528                 WARN_ONCE(1, "opal: OPAL_CONSOLE_FLUSH missing.\n"); 
> 
> Old firmware? Please try to upgrade to the latest IBM firmware if possible.

I don't really know anything about this system, so I'll leave that to someone
else. @Marius: Could you attempt a FW update (if a newer version is available)
or know someone who could?
Comment 13 David Disseldorp 2022-08-23 18:31:18 UTC
(In reply to Fabian Vogt from comment #12)
> (In reply to David Disseldorp from comment #11)
> > (In reply to Fabian Vogt from comment #10)
> > > (In reply to Fabian Vogt from comment #9)
> > > > (In reply to Goldwyn Rodrigues from comment #8)
> > > > > Could you please take a kdump? It would help analyze the issue better.
> > > > 
> > > > I configured kdump and the service was running properly after a reboot. I
> > > > triggered a crash to test it, but that didn't even print the "Starting
> > > > crashkernel" message and even worse, even IPMI is dead now. Fun.
> > > 
> > > IPMI came back after a few minutes, so I could "mc reset cold" and try
> > > again. Same result though, so kdump just doesn't want to work. Anything else
> > > I can do?
> > 
> > Does kdump work if you trigger it manually? e.g.
> > 
> >     echo 1 > /proc/sys/kernel/sysrq
> >     echo c > /proc/sysrq-trigger
> 
> That's what I tried:
> 
> power8:~ # echo c > /proc/sysrq-trigger
> [  233.397216][ T6857] sysrq: Trigger a crash
> [  233.397251][ T6857] Kernel panic - not syncing: sysrq triggered crash
> [  233.397270][ T6857] CPU: 104 PID: 6857 Comm: bash Tainted: G        W    
> X  N 5.14.21-150400.24.18-default #1 SLE15-SP4
> a5d3db5b7f5fbb29c4ee73b4cdefcad058b71f7f
> [  233.397294][ T6857] Call Trace:
> [  233.397310][ T6857] [c00000181de83ab0] [c00000000086755c]
> dump_stack_lvl+0x70/0xa4 (unreliable)
> [  233.397337][ T6857] [c00000181de83af0] [c000000000159724]
> panic+0x164/0x400
> [  233.397357][ T6857] [c00000181de83b80] [c00000000095f230]
> sysrq_handle_crash+0x30/0x40
> [  233.397378][ T6857] [c00000181de83be0] [c00000000095fb50]
> __handle_sysrq+0xf0/0x2a0
> [  233.397397][ T6857] [c00000181de83c90] [c000000000960428]
> write_sysrq_trigger+0xd8/0x190
> [  233.397416][ T6857] [c00000181de83cd0] [c000000000624af8]
> proc_reg_write+0x108/0x1b0
> [  233.397436][ T6857] [c00000181de83d00] [c000000000529e10]
> vfs_write+0xf0/0x340
> [  233.397456][ T6857] [c00000181de83d60] [c00000000052a28c]
> ksys_write+0xdc/0x130
> [  233.397474][ T6857] [c00000181de83db0] [c00000000003269c]
> system_call_exception+0x15c/0x330
> [  233.397494][ T6857] [c00000181de83e10] [c00000000000c74c]
> system_call_common+0xec/0x250
> [  233.397514][ T6857] --- interrupt: c00 at 0x7fffb7a03114
> [  233.397532][ T6857] NIP:  00007fffb7a03114 LR: 00007fffb7978fe4 CTR:
> 0000000000000000
> [  233.397550][ T6857] REGS: c00000181de83e80 TRAP: 0c00   Tainted: G       
> W      X  N  (5.14.21-150400.24.18-default)
> [  233.397568][ T6857] MSR:  900000000280f033
> <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 22242222  XER: 00000000
> [  233.397595][ T6857] IRQMASK: 0 
> [  233.397595][ T6857] GPR00: 0000000000000004 00007fffe857f0a0
> 00007fffb7af7200 0000000000000001 
> [  233.397595][ T6857] GPR04: 0000010028618830 0000000000000002
> 0000000000000010 0000000000000000 
> [  233.397595][ T6857] GPR08: 0000000000000000 0000000000000000
> 0000000000000000 0000000000000000 
> [  233.397595][ T6857] GPR12: 0000000000000000 00007fffb7c5b600
> 0000000000000000 000000011de337c0 
> [  233.397595][ T6857] GPR16: 000000011dd97ed0 0000000000000000
> 000000011ddddd10 0000000000000000 
> [  233.397595][ T6857] GPR20: 000000011ddedb28 0000010028773460
> 000000011de37828 000000011de36c70 
> [  233.397595][ T6857] GPR24: 0000010028518030 0000000000000000
> 0000000000000002 0000010028618830 
> [  233.397595][ T6857] GPR28: 0000000000000002 00007fffb7af16d8
> 0000010028618830 0000000000000002 
> [  233.397670][ T6857] NIP [00007fffb7a03114] 0x7fffb7a03114
> [  233.397686][ T6857] LR [00007fffb7978fe4] 0x7fffb7978fe4
> [  233.397702][ T6857] --- interrupt: c00
> (stuck here)
> ~. [terminated ipmitool]
> (IPMI broke)
> ^C
> SIGN INT: Close Interface IPMI v2.0 RMCP+ LAN Interface
> ^C^C^C^C
> Close Session command failed
> 
> As the crash is apparently so severe that it upsets the BMC, I'm not hopeful
> here.

Hmm, it might be worth increasing the memory reservation for the dump-capture kernel, although I wouldn't expect those limits to affect the BMC. I'd be happy to take a look at 15.4 kdump if someone can provide access to some ppc64le (preferably power8) hardware.
Comment 16 Marius Kittler 2022-08-25 10:04:48 UTC
I was on vacation. I can attempt a firmware update today.

If you're SUSE employees you can ping me to get access to the hardware. Not sure whether I can give you access to the o3 worker (as it requires access to the o3 network) but I could give you access to qa-power8-4-kvm.qa.suse.de which is reachable from engineering network. On that machine I've also experienced crashes but haven't got any dumps despite kdump being enabled (see issue
https://bugzilla.suse.com/show_bug.cgi?id=1202138#c12 for details).
Comment 17 Marius Kittler 2022-08-25 11:19:37 UTC
I've asked on our internal testing channel whether any PowerPC experts can help with upgrading the firmware.

Note that the BMC is reachable via https://openqaworker-power8-ipmi.suse.de (see https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/432/diffs for credentials). I figured I need firmware for 8274-22L from https://www.ibm.com/support/fixcentral/main/selectFixes?parent=ibm~power&product=ibm~power~824722L&release=All&platform=All but I cannot download anything from there without IBM account. It is also not 100 % clear to me what download is suitable. Currently we have SV840_043 installed and I suppose upgrading to the latest version SV860_243 would make most sense. I'm also not quite sure how I'd actually conduct the firmware upgrade after the download. It doesn't seem possibly directly via the BMC interface.
Comment 19 Marius Kittler 2022-08-25 14:03:04 UTC
Thanks. I'll try flashing. It seems the tool comes from powerpc-utils which is already installed.
Comment 20 Marius Kittler 2022-08-25 14:30:42 UTC
The BMC now shows the new version but the machine hasn't come up again an IPMI access seems broken. Setting the IPMI password within the BMC doesn't help and rebooting the machine within the BMC also hasn't helped so far.

That was the output of the firmware update:

```
power8:/tmp/fwupdate # update_flash -f 01SV860_243_165.img 
info: Temporary side will be updated with a newer or identical image.

Projected Flash Update Results:
Current T Image: SV840_043
Current P Image: SV840_043
New T Image:     SV860_243
New P Image:     SV840_043

FLASH: Image ready...rebooting the system...
FLASH: This will take several minutes.
FLASH: Do not power off!
Connection to power8 closed by remote host.
Connection to power8 closed.
```
Comment 21 Marius Kittler 2022-08-25 14:38:35 UTC
At least the host is back again. I suppose it really just too a while or did you do something? Maybe I can restore IPMI access now using IPMI tool locally.
Comment 24 Marius Kittler 2022-09-15 13:32:59 UTC
I haven't seen this issue on any of our other PowerPC hosts. So unfortunately we can likely not investigate the issue any further right now.
Comment 25 Fabian Vogt 2022-10-27 08:16:22 UTC
(In reply to Marius Kittler from comment #24)
> I haven't seen this issue on any of our other PowerPC hosts. So
> unfortunately we can likely not investigate the issue any further right now.

So let's just be optimistic and close this until it occurs somewhere again.