Bug 1169099 - [Build 20200405][aarch64] crash: cannot determine VA_BITS_ACTUAL: please use /proc/kcore
[Build 20200405][aarch64] crash: cannot determine VA_BITS_ACTUAL: please use ...
Status: RESOLVED FIXED
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel
Current
aarch64 Other
: P2 - High : Normal (vote)
: ---
Assigned To: openSUSE Kernel Bugs
E-mail List
https://openqa.opensuse.org/tests/122...
:
Depends on:
Blocks: 1179863
  Show dependency treegraph
 
Reported: 2020-04-09 12:25 UTC by Petr Cervinka
Modified: 2021-06-23 16:48 UTC (History)
11 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: Yes
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Petr Cervinka 2020-04-09 12:25:59 UTC
## Observation

openQA test in scenario opensuse-Tumbleweed-JeOS-for-AArch64-aarch64-jeos-extra@aarch64-HD24G fails in
[kdump_and_crash](https://openqa.opensuse.org/tests/1226674/modules/kdump_and_crash/steps/66)

## Test suite description
Same as jeos, plus some more tests.


## Reproducible

Fails since (at least) Build [20190724](https://openqa.opensuse.org/tests/992130)


## Expected result

Last good: [20190712](https://openqa.opensuse.org/tests/982186) (or more recent)


## Further details

Always latest result in this scenario: [latest](https://openqa.opensuse.org/tests/latest?arch=aarch64&distri=opensuse&flavor=JeOS-for-AArch64&machine=aarch64-HD24G&test=jeos-extra&version=Tumbleweed)

Issue can be manually reproduced on aarch64:

echo exit | crash /boot/vmlinux-5.6.0-1-default.xz

crash 7.2.8
Copyright (C) 2002-2020  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.
 
NOTE: stdin: not a tty                                          


crash: cannot determine VA_BITS_ACTUAL: please use /proc/kcore

Could it be similar to https://bugs.launchpad.net/ubuntu/+source/crash/+bug/1858958  ?
Comment 1 Petr Cervinka 2020-04-09 12:49:50 UTC
Or it is just expected change and we just need to update the test to use  /proc/kcore on aarch64, could you please confirm?
Comment 2 Petr Cervinka 2020-04-09 13:15:09 UTC
I tried to use /proc/kcore, but got another error:

crash 7.2.8
Copyright (C) 2002-2020  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.
 
GNU gdb (GDB) 7.6                                               
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "aarch64-unknown-linux-gnu"...

WARNING: kernel relocated [43414201MB]: patching 131102 gdb minimal_symbol values

WARNING: ikconfig overflow.                                            
crash: seek error: kernel virtual address: ffff449c3ffff000  type: "pud page"
Comment 3 Michal Suchanek 2020-04-09 13:26:15 UTC
There are some updates for arm64 in upstream crash.

We will need to cherry-pick those for crash to work with recent kernels.
Comment 4 Petr Cervinka 2020-04-09 14:12:28 UTC
There is also upstream issue: https://github.com/crash-utility/crash/issues/52
Comment 7 Miroslav Beneš 2020-09-02 11:48:16 UTC
(In reply to Michal Suchanek from comment #3)
> There are some updates for arm64 in upstream crash.
> 
> We will need to cherry-pick those for crash to work with recent kernels.

Have we already?
Comment 9 Guillaume GARDET 2020-11-16 13:08:44 UTC
This still happens:
https://openqa.opensuse.org/tests/1475498#step/kdump_and_crash/59
Comment 11 Michal Suchanek 2020-12-10 13:00:05 UTC
*** Bug 1179863 has been marked as a duplicate of this bug. ***
Comment 13 Marita Werner 2021-01-18 09:24:24 UTC
Do we have any update here?
Comment 14 Michal Suchanek 2021-01-18 11:26:09 UTC
cherry-picked the arm64 patches from upstream here https://build.opensuse.org/request/show/864052
Comment 20 Michal Suchanek 2021-02-10 21:47:18 UTC
Update to upstream arm64 support should be in Factory now.
Comment 21 Oliver Kurz 2021-02-27 07:09:08 UTC
This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: toolchain_zypper
https://openqa.suse.de/tests/5553017

To prevent further reminder comments one of the following options should be followed:
1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
2. The openQA job group is moved to "Released"
3. The label in the openQA scenario is removed
Comment 23 openQA Review 2021-03-15 07:22:49 UTC
This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: toolchain_zypper
https://openqa.suse.de/tests/5661050

To prevent further reminder comments one of the following options should be followed:
1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
2. The openQA job group is moved to "Released"
3. The label in the openQA scenario is removed
Comment 24 Michal Suchanek 2021-03-16 09:34:57 UTC
Is this fixed in Factory?
Comment 26 Michal Suchanek 2021-03-16 10:30:52 UTC
Then this is upstream problem and we don't have a fix.
Comment 27 Oliver Kurz 2021-03-31 06:06:46 UTC
This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: jeos-extratest
https://openqa.suse.de/tests/5672127

To prevent further reminder comments one of the following options should be followed:
1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
2. The openQA job group is moved to "Released"
3. The label in the openQA scenario is removed
Comment 31 openQA Review 2021-04-14 07:18:42 UTC
This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: toolchain_zypper
https://openqa.suse.de/tests/5814266

To prevent further reminder comments one of the following options should be followed:
1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
2. The openQA job group is moved to "Released"
3. The label in the openQA scenario is removed
Comment 32 Jeffrey Cheung 2021-04-15 04:44:21 UTC
(In reply to Michal Suchanek from comment #26)
> Then this is upstream problem and we don't have a fix.

Well, it seems the fix is here -> 

https://github.com/crash-utility/crash/commit/d379b47f04dc77ea1989609aca9bfd8d37b7b639  Introduce a new ARM64 "--machdep vabits_actual=<value>" command
Comment 33 Michal Suchanek 2021-04-15 09:48:04 UTC
(In reply to Jeffrey Cheung from comment #32)
> (In reply to Michal Suchanek from comment #26)
> > Then this is upstream problem and we don't have a fix.
> 
> Well, it seems the fix is here -> 
> 
> https://github.com/crash-utility/crash/commit/
> d379b47f04dc77ea1989609aca9bfd8d37b7b639  Introduce a new ARM64 "--machdep
> vabits_actual=<value>" command

This is not automated, though.

The automation is supposed to be (crash)

d379b47f04dc77ea1989609aca9bfd8d37b7b639 Introduce a new ARM64 "--machdep vabits_actual=<value>" command line option for Linux 5.4 and later dumpfiles, which require the kernel's dynamically-determined "vabits_actual" value for virtual address translation.  Without the patch, the crash session fails during initialization with the error message "crash: cannot determine VA_BITS_ACTUAL".  This option will become unnecessary when the proposed TCR_EL1.T1SZ vmcoreinfo entry entry is incorporated into the kernel. (anderson@redhat.com)

5e975dd8c817ea6aea35e1e15b83c378aee9c136 When determining the ARM64 kernel's "vabits_actual" value by reading the new TCR_EL1.T1SZ vmcoreinfo entry, display its value during session initialization only when invoking crash with "-d1" or larger -d debug value. (anderson@redhat.com)

bfd9a651f9426d86250295ac875d7e33d8de2a97 Determine the ARM64 kernel's "vabits_actual" value by reading the new TCR_EL1.T1SZ vmcoreinfo entry. (bhsharma@redhat.com)

1c45cea02df7f947b4296c1dcaefa1024235ef10 arm64: Change tcr_el1_t1sz variable name to TCR_EL1_T1SZ

Kernel patch

bbdbc11804ff ("arm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinfo")

We have this in Factory but the error is still reported by openQA
Comment 34 Petr Cervinka 2021-04-15 16:52:20 UTC
It is still in 15-SP3 https://openqa.suse.de/tests/5814118#step/kdump_and_crash/71, but crash version is 7.2.8 and fix is in version 7.2.9. I suppose that we need to wait for next snapshot.
Comment 35 Stefan Weiberg 2021-04-20 11:26:34 UTC
fixed with Snapshot15 Candidate 2 build 176.5
Comment 36 Martin Loviska 2021-04-27 11:38:30 UTC
Test case is passing in sle-15-SP3-JeOS-for-kvm-and-xen arm image. 
https://openqa.suse.de/tests/5900281#step/kdump_and_crash/71