Bug 1169774 - Slow down in OBS since kernel 5.6.0 on 32bit
Slow down in OBS since kernel 5.6.0 on 32bit
Status: RESOLVED FIXED
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel
Current
Other Other
: P3 - Medium : Normal (vote)
: ---
Assigned To: Jan Kara
E-mail List
:
Depends on:
Blocks: 1178762
  Show dependency treegraph
 
Reported: 2020-04-17 13:06 UTC by Adrian Schröter
Modified: 2020-11-13 09:16 UTC (History)
9 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
dmesg of goat01 build worker (host) (472.25 KB, application/gzip)
2020-04-21 08:38 UTC, Adrian Schröter
Details
dmidecode goat01 (4.47 KB, application/gzip)
2020-04-21 08:53 UTC, Adrian Schröter
Details
dmesg from guest with 5.5 (37.83 KB, text/plain)
2020-04-22 09:44 UTC, Jiri Slaby
Details
dmesg from guest with 5.6 (40.71 KB, text/plain)
2020-04-22 09:44 UTC, Jiri Slaby
Details
remote build log on sheep83 (slow) (738.09 KB, text/plain)
2020-04-29 11:37 UTC, Jiri Slaby
Details
remote build log on lamb63 (fast) (738.41 KB, text/plain)
2020-04-29 11:37 UTC, Jiri Slaby
Details
git bisect log (2.78 KB, text/plain)
2020-04-30 11:11 UTC, Jiri Slaby
Details
[PATCH] jbd2: Avoid leaking transaction credits when unreserving (2.64 KB, patch)
2020-05-18 08:12 UTC, Jan Kara
Details | Diff
build log + sysrq output from goat09 + btrfs (1.17 MB, text/plain)
2020-11-12 15:20 UTC, Fabian Vogt
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Adrian Schröter 2020-04-17 13:06:03 UTC
We noticed an increase of build time by factor 5 on some OBS workers.

Exmaple:
 # eosc jobhistory openSUSE:Factory installation-images -M openSUSE standard i586


The goat & sheep systems suddenly need 2.5h instead of 0.5h build time for the same package. This is 32bit/i586 only. These systems used to be fine before, it seems the guest kernel update to 5.6.0 did trigger it (time matches).

The slowed odwn goat and sheep systems are AMD Epyc 3.01/3.02.

While the un-affected lamb systems are AMD Opteron systems.

The host systems did stay on openSUSE Leap 15.1.
Comment 1 Borislav Petkov 2020-04-21 07:52:11 UTC
Got any host and guest logs somewhere?
Comment 2 Adrian Schröter 2020-04-21 08:14:19 UTC
the guest logs are the build logs.

dmesg on the host just lists all the usual EXT4-fs mount messages.
Comment 3 Adrian Schröter 2020-04-21 08:14:56 UTC
should I monitor for anything in particular?
Comment 4 Borislav Petkov 2020-04-21 08:31:35 UTC
Well, you'd have to give me something in order to try to debug this.

As in, host kernel is <version>, guest kernel is 5.6 and the workload is <workload>. It worked fine with guest+host kernel <version> and it is broken with 5.6 32-bit guest kernel and host kernel <version>. And anything else that you think is relevant.

And of course I need dmesg to know what your host is running: BIOS, microcode, etc. and the guest dmesg is needed so that I know how you start your guest. Also, how do you start your guest exactly? qemu cmdline, etc.
Comment 5 Adrian Schröter 2020-04-21 08:38:34 UTC
Created attachment 836248 [details]
dmesg of goat01 build worker (host)
Comment 6 Adrian Schröter 2020-04-21 08:40:22 UTC
the host was always openSUSE Leap 15.1 based and had no change.

The guest became bad with 5.6, the workload is the usual build process.

You can run it local in the same way as usual via:

 # osc build --vm-type=kvm ....

You also see the qemu command line there (and also in the build logs).
Comment 7 Adrian Schröter 2020-04-21 08:53:39 UTC
Created attachment 836252 [details]
dmidecode goat01
Comment 8 Adrian Schröter 2020-04-21 08:55:43 UTC
Microcode: patch_level=0x08301025
Comment 9 Borislav Petkov 2020-04-21 09:05:15 UTC
Thanks.

Please also give a full cmdline example here which causes it:

# osc build --vm-type=kvm ....

You'd need to be more verbose because my build service knowledge is almost non-existent. First question is, what puts the 5.6 kernel on the guest? As part of  that command above or has it been started before?

Thx.
Comment 10 Adrian Schröter 2020-04-21 09:22:09 UTC
The guest is using the kernel (and initrd) provided by kernel-obs-build package which is maintained as part of the kernel sources. (same vmlinux as kernel-default, just repackaged with own initrd)

For running a local build with the mentioned example package you need to do

# osc co openSUSE:Factory installation-images
# cd openSUSE:Factory/installation-images
# osc build --vm-type=kvm -M openSUSE standard i586
Comment 11 Adrian Schröter 2020-04-21 09:23:39 UTC
the build script is doing the preinstall step, detects the extracted kernel inside of of the prepared root and runs qemu-kvm for the rest of the build.
Comment 12 Jiri Slaby 2020-04-22 08:35:44 UTC
(In reply to Adrian Schröter from comment #10)
> # osc build --vm-type=kvm -M openSUSE standard i586

and add --userootforbuild --vm-disk-size=30000. On epyc machine (remus), I had to add -j 32, as -j 33 (or more) results in:
> smpboot: Total of 33 processors activated (148221.48 BogoMIPS)
> BUG: kernel NULL pointer dereference, address: 00000d24
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> *pdpt = 0000000000000000 *pde = f000ff53f000ff53
> Oops: 0000 [#1] SMP NOPTI
> CPU: 0 PID: 1 Comm: swapper/0 Tainted: G S      W         5.6.4-1-pae #1 openSUSE Tumbleweed (unreleased)
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.suse.com 04/01/2014
> EIP: __alloc_pages_nodemask+0xd6/0x2b0
Comment 13 Jiri Slaby 2020-04-22 09:22:04 UTC
(In reply to Jiri Slaby from comment #12)
> > EIP: __alloc_pages_nodemask+0xd6/0x2b0

It dies somewhere in __alloc_pages_nodemask, trying to next_zones_zonelist, perhaps via for_each_zone_zonelist_nodemask. But that is not new...
Comment 14 Jiri Slaby 2020-04-22 09:43:59 UTC
Grabbing this into rpms/:
http://download.opensuse.org/history/20200331/tumbleweed/repo/oss/i586/kernel-obs-build-5.5.13-1.2.i586.rpm

and adding -p rpms to osc build indeed makes a difference...
Comment 15 Jiri Slaby 2020-04-22 09:44:25 UTC
Created attachment 836392 [details]
dmesg from guest with 5.5
Comment 16 Jiri Slaby 2020-04-22 09:44:42 UTC
Created attachment 836393 [details]
dmesg from guest with 5.6
Comment 18 Jiri Slaby 2020-04-23 11:48:37 UTC
NP.

Still bad with 5.6-rc1 (sort of expected).
Comment 19 Jiri Slaby 2020-04-29 11:36:00 UTC
It seems that I managed to reproduce with custom-built kernel, so I am starting bisection.

It seems that the problem is I/O.
> Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
> vda             436.66       167.78      3114.85         0.00     935805   17372932          0
> vda             270.82       822.20      9404.21         0.00     760877    8702840          0


The first is 5.6 = bad, the second is good = 5.5. 3 times slower writes, 5 times slower reads. But it could be due to accumulation of I/O buffers. Let's see if I can bisect it at last as I failed 2 times already (bisection lead to a merge commit or so...).
Comment 20 Jiri Slaby 2020-04-29 11:37:07 UTC
Created attachment 837072 [details]
remote build log on sheep83 (slow)

I have a project in OBS:
https://build.opensuse.org/project/monitor/home:jirislaby:instim-slow
Comment 21 Jiri Slaby 2020-04-29 11:37:40 UTC
Created attachment 837073 [details]
remote build log on lamb63 (fast)
Comment 22 Jiri Slaby 2020-04-30 11:11:06 UTC
Created attachment 837180 [details]
git bisect log

It took only 72 kernels to build and the winner is:
commit 244adf6426ee31a83f397b700d964cff12a247d3
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Thu Jan 23 12:23:17 2020 -0500

    ext4: make dioread_nolock the default

Reverted that on top of 5.6.6 and the performance is back.
Comment 23 Jiri Slaby 2020-04-30 11:12:31 UTC
Now perhaps Honza will advise what to do next...
Comment 24 Jan Kara 2020-04-30 11:30:32 UTC
Yeah, as I wrote to Jiri on IRC, this commit changes how page writeback on ext4 is done so it makes sense it has an impact. I'll probably need to investigate what's going on in the guest while it is building... I'll talk to Jiri how he reproduces the issue and how I can get access to the machine...
Comment 25 Jiri Slaby 2020-04-30 12:10:05 UTC
FYI I reverted the commit in stable and pushed to TW as:
https://build.opensuse.org/request/show/799215
Comment 26 Jiri Slaby 2020-05-03 07:01:47 UTC
The sr was accepted and the build time on goat looks improved:
> 2020-05-01 10:14:21  installation-images:openSUSE                       source change    succeeded               33m 13s   lamb67:2
> 2020-05-01 14:12:51  installation-images:openSUSE                       new build        succeeded            3h  1m  3s   goat01:4      
> 2020-05-01 16:50:14  installation-images:openSUSE                       new build        succeeded               36m 13s   lamb63:4      
> 2020-05-01 19:23:50  installation-images:openSUSE                       rebuild counter  succeeded               25m 10s   lamb71:3      
> 2020-05-03 00:52:49  installation-images:openSUSE                       new build        succeeded               33m 20s   goat17:4      
> 2020-05-03 03:39:12  installation-images:openSUSE                       rebuild counter  succeeded               24m 29s   lamb69:3

Now, we will work on fixing it upstream.
Comment 27 Jan Kara 2020-05-11 18:05:35 UTC
Our performance testing grid actually bisected down to the very same commit causing regression in io-fsmark-small-file-stream-ext4 workload. It is not as pronounced (~15%) but with some luck the culprit is the same and this is much easier to reproduce and analyze than digging in build VMs.

For reference the fsmark run looks like:

fs_mark -D 10000 -S0 -n 548668 -s 4096 -L 8 -t 1 -d dir1 -d dir2 ... -d dir16

So fsmark creates tons of 4k files. The -n and -L arguments are automatically
sized so that the total amount of data is about 2x the amount of machine
memory (which is 64GB in this case). I've seen the regression both on SATA
SSDs and rotational disks.

The comparison on marvin8 with SSD looks like:

                                      fsmark                 fsmark
                                        lock                 nolock
Min       1-files/sec    46974.80 (   0.00%)    34250.40 ( -27.09%)
1st-qrtle 1-files/sec    49614.60 (   0.00%)    42788.80 ( -13.76%)
2nd-qrtle 1-files/sec    48644.50 (   0.00%)    41957.10 ( -13.75%)
3rd-qrtle 1-files/sec    47583.90 (   0.00%)    41690.80 ( -12.38%)
Max-1     1-files/sec    50754.30 (   0.00%)    43109.80 ( -15.06%)
Max-5     1-files/sec    50754.30 (   0.00%)    43109.80 ( -15.06%)
Max-10    1-files/sec    50754.30 (   0.00%)    43109.80 ( -15.06%)
Max-90    1-files/sec    47356.50 (   0.00%)    39640.90 ( -16.29%)
Max-95    1-files/sec    47356.50 (   0.00%)    39640.90 ( -16.29%)
Max-99    1-files/sec    47356.50 (   0.00%)    39640.90 ( -16.29%)
Max       1-files/sec    50754.30 (   0.00%)    43109.80 ( -15.06%)
Hmean     1-files/sec    48540.95 (   0.00%)    40764.38 * -16.02%*
Stddev    1-files/sec     1259.44 (   0.00%)     2913.44 (-131.33%)

Also when looking at other stats it is obvious that 'nolock' kernel writes more:

Ops Sector Reads                     1591684.00     1594780.00
Ops Sector Writes                   60556136.00    74153772.00

When tracing jbd2 machinery, differences are also obvious. Average transaction size is 62281.188235 blocks with 'lock' and 4893.776325 blocks with 'nolock' kernel. The reason why 'nolock' kernel has much smaller transactions seems to be that transaction reservation machinery (used only in 'nolock' variant) forces transaction commits more frequently as lot of transaction space just sits there idly in reserved state... I'm investigating this more trying to confirm some details and figure out whether we can improve this.
Comment 28 Jan Kara 2020-05-13 16:58:04 UTC
So some more data: One fsmark loop dirties ~812000 unique metadata blocks (this is regardless of which writeback strategy is used). Now 'lock' runs do some 110 *millions* of metadata block dirty events, 'nolock' run does ~125 millions of metadata block dirty events. So we see redirtying really happens a *lot* - one block is redirtied over 100 times on average. Due to transaction batching, we actually journal only ~5.5 millions of metadata blocks for 'lock' runs and ~7.5 millions of metadata blocks for 'nolock' runs - transactions are smaller so batching is less efficient here. These differences match very well the difference in the amount of writes we can see in comment 27 and explain the performance difference. So I have now good understanding why the regression happens.

Now I have to figure out if we can somehow improve this so that 'nolock' mode isn't hit that badly.
Comment 29 Jan Kara 2020-05-18 08:09:41 UTC
So was thinking more about this. In the end I've decided I want to verify more that I understand correctly what's going on and added some more trace points to inform about why transaction starts are being blocked, how big are transaction handles and how come resulting commits are so small in dioread_nolock mode. That revealed that there are usually only ~1500 reserved credits (out of 64k total credits in a transaction) which highlighted even more that the theory about reserved credits causing premature transaction commits was not correct and there must be something else going on - this amount of reserved credits could cause a regression of a few percent but not really 20%. After some more debugging I've found out that when we reserve transaction handle but then don't use it, we do not properly return reserved credits (we remove them from the reserved amount but we forgot to remove them also from the total number of credits tracked in a transaction). This results in transaction having lots of leaked credits that then result it forcing transaction commit early because we think the transaction is full (although it is not in the end). Fixing this leak also fixes the fsmark performance for me:

                                      fsmark                 fsmark
                                        lock        nolock-fixunrsv
Min       1-files/sec    46974.80 (   0.00%)    47322.70 (   0.74%)
1st-qrtle 1-files/sec    49614.60 (   0.00%)    49663.50 (   0.10%)
2nd-qrtle 1-files/sec    48644.50 (   0.00%)    49259.20 (   1.26%)
3rd-qrtle 1-files/sec    47583.90 (   0.00%)    47966.80 (   0.80%)
Max-1     1-files/sec    50754.30 (   0.00%)    51919.40 (   2.30%)
Max-5     1-files/sec    50754.30 (   0.00%)    51919.40 (   2.30%)
Max-10    1-files/sec    50754.30 (   0.00%)    51919.40 (   2.30%)
Max-90    1-files/sec    47356.50 (   0.00%)    47473.90 (   0.25%)
Max-95    1-files/sec    47356.50 (   0.00%)    47473.90 (   0.25%)
Max-99    1-files/sec    47356.50 (   0.00%)    47473.90 (   0.25%)
Max       1-files/sec    50754.30 (   0.00%)    51919.40 (   2.30%)
Hmean     1-files/sec    48540.95 (   0.00%)    48884.32 (   0.71%)

Then there was another revelation for me that in this workload ext4 actually starts lots of reserved transaction handles that are unused. This is due to the way how ext4 writepages code works - it starts a transaction, then inspects page cache and writes one extent if found. Then starts again a transaction and checks whether there's more to write. So for single extent files we always start transaction twice, second time only to find there's nothing more to write. This probably also deserves to be fixed but a simple fix I made seems to break page writeback so I need to dig more into it and it doesn't seem to be a pressing issue.

I'll push the jbd2 fix upstream.
Comment 30 Jan Kara 2020-05-18 08:12:09 UTC
Created attachment 837903 [details]
[PATCH] jbd2: Avoid leaking transaction credits when unreserving

This patch fixes the fsmark performance for me. Once upstream accepts it, I'll replace the mount-option switch in master branch with it.
Comment 31 Jiri Slaby 2020-06-08 11:12:04 UTC
In upstream as:
commit 14ff6286309e2853aed50083c9a83328423fdd8c
Author: Jan Kara <jack@suse.cz>
Date:   Wed May 20 15:31:19 2020 +0200

    jbd2: avoid leaking transaction credits when unreserving handle
Comment 32 Jan Kara 2020-07-09 10:16:22 UTC
Patch is upstream and in our master branch as well.
Comment 33 Fabian Vogt 2020-11-11 09:15:55 UTC
This doesn't appear to be entirely resolved.

osc jobhist openSUSE:Factory installation-images standard i586 -M openSUSE:

2020-11-08 22:59:32 1h 48m 18s   goat19:6        
2020-11-09 16:29:39 2h 11m  5s   sheep81:6       
2020-11-09 19:18:34 2h 26m 42s   sheep84:2       
2020-11-09 20:00:56    40m 26s   lamb70:1        
2020-11-09 22:29:17 2h 14m  8s   sheep84:1       
2020-11-10 02:37:42 2h 23m 47s   goat05:4        
2020-11-10 14:03:25    45m 56s   lamb68:2        
2020-11-11 01:27:06 2h 15m 42s   goat07:2        
2020-11-11 05:32:00 1h 52m  0s   goat19:6        
2020-11-11 06:48:39    36m 46s   lamb68:2

For live image builds in openSUSE:Factory:Live i586 this is even more pronounced, rsync takes so long that the job is killed most of the time.
osc jobhist openSUSE:Factory:Live livecd-tumbleweed-x11 images i586:

2020-11-03 01:42:00  failed      2h 12m 37s   sheep82:4       
2020-11-03 04:18:11  failed      2h 36m  7s   sheep82:3       
2020-11-03 06:43:19  succeeded   2h 25m  0s   goat17:3        
2020-11-04 00:31:26  succeeded      13m 49s   build70:2       
2020-11-04 03:31:11  failed      1h 52m 48s   goat09:4        
2020-11-04 05:20:19  succeeded      12m 57s   lamb72:3        
2020-11-04 08:47:22  succeeded      28m 18s   cloud128:2      
2020-11-04 09:31:29  succeeded      16m 31s   build78:2       
2020-11-04 11:08:26  succeeded      24m 13s   cloud107:3      
2020-11-04 21:23:31  failed      2h 16m 44s   sheep84:4       
2020-11-05 09:55:05  succeeded      16m 58s   build76:3       
2020-11-05 10:25:50  succeeded      25m 34s   cloud106:1      
2020-11-05 22:03:18  succeeded      38m 20s   cloud109:3      
2020-11-06 04:42:56  succeeded   2h 35m 17s   goat15:5        
2020-11-07 01:20:37  succeeded      35m 59s   cloud108:1      
2020-11-07 01:48:12  succeeded      27m 12s   cloud116:4      
2020-11-07 02:19:01  succeeded      30m 35s   cloud119:4      
2020-11-07 06:36:30  succeeded      27m 44s   cloud129:1      
2020-11-07 21:50:08  succeeded      30m 36s   cloud113:3      
2020-11-08 00:43:54  succeeded      33m 10s   cloud127:2      
2020-11-08 20:33:00  succeeded      26m 36s   cloud137:1      
2020-11-08 23:17:17  failed      2h  8m 46s   goat15:3        
2020-11-09 08:44:36  succeeded      22m 43s   cloud116:3      
2020-11-09 19:10:09  succeeded      16m 19s   lamb64:3        
2020-11-10 00:33:36  succeeded      23m  9s   cloud102:3      
2020-11-10 13:18:33  succeeded      21m 53s   build70:3       
2020-11-10 23:30:10  succeeded      30m 28s   cloud114:3      
2020-11-11 04:01:43  succeeded      20m 58s   build71:4
Comment 34 Jan Kara 2020-11-11 15:44:02 UTC
Fabian, are you sure this is the same problem? I mean was this consistently failing like this for past 6 months when the change was introduced? AFAIU you're speaking about the 'failed' builds on 'sheep' and 'goat' hosts, aren't you? 'lamb', 'cloud', and 'build' hosts seem to work fine.

If the start of the problem indeed dates back 6 months, I'd like to check whether mounting the filesystem with 'dioread_lock' mount option indeed fixes the issue. What would be the easiest way to try that with OBS? The option needs to be passed either to mount, to mke2fs (as part of default mount options), or I can provide patched kernel with modified defaults...
Comment 35 Fabian Vogt 2020-11-11 16:31:49 UTC
(In reply to Jan Kara from comment #34)
> Fabian, are you sure this is the same problem? I mean was this consistently
> failing like this for past 6 months when the change was introduced? AFAIU
> you're speaking about the 'failed' builds on 'sheep' and 'goat' hosts,
> aren't you? 'lamb', 'cloud', and 'build' hosts seem to work fine.

The osc jobhist only goes back until beginning of october. In that timeframe, it's been consistently slow/failing i586 goat/sheep builds.

I did some tests using different filesystems inside the VM using "Buildflags: vmfstype:foo" in the prjconf.
Using btrfs made no difference (compared to ext4). XFS appears to work much better.
So the issue apparently impacts ext4 and btrfs the most.

Kiwi is still using ext4 for the filesystem on the livecd though, so I could only compare the time
until the final rsync.

XFS on sheep81, 23s for kernel-firmware:

[   94s] [ DEBUG   ]: 16:02:24 | system: (  47/1065) Installing: kbd-legacy-2.3.0-1.1.noarch [............done]
[   94s] [ DEBUG   ]: 16:02:24 | system: Additional rpm output:
[   94s] [ DEBUG   ]: 16:02:24 | system: warning: /var/cache/kiwi/packages/f796d7d2bc4daf38063ff386ebbc072d/kbd-legacy.rpm: Header V3 RSA/SHA256 Signature, key ID 3dbdc284: NOKEY
[  117s] [ DEBUG   ]: 16:02:47 | system: (  48/1065) Installing: kernel-firmware-20201023-1.1.noarch [.......................done]
[  117s] [ INFO    ]: Processing: [#                                       ] 2%[ DEBUG   ]: 16:02:47 | system: Additional rpm output:
[  117s] [ DEBUG   ]: 16:02:47 | system: warning: /var/cache/kiwi/packages/f796d7d2bc4daf38063ff386ebbc072d/kernel-firmware.rpm: Header V3 RSA/SHA256 Signature, key ID 3dbdc284: NOKEY

btrfs on sheep82, 141s for kernel-firmware:

[  108s] [ DEBUG   ]: 16:27:22 | system: (  47/1065) Installing: kbd-legacy-2.3.0-1.1.noarch [............done]
[  108s] [ DEBUG   ]: 16:27:22 | system: Additional rpm output:
[  108s] [ DEBUG   ]: 16:27:22 | system: warning: /var/cache/kiwi/packages/f796d7d2bc4daf38063ff386ebbc072d/kbd-legacy.rpm: Header V3 RSA/SHA256 Signature, key ID 3dbdc284: NOKEY
[  249s] [ DEBUG   ]: 16:29:43 | system: (  48/1065) Installing: kernel-firmware-20201023-1.1.noarch [...................................................................done]
[  249s] [ INFO    ]: Processing: [#                                       ] 2%[ DEBUG   ]: 16:29:43 | system: Additional rpm output:
[  249s] [ DEBUG   ]: 16:29:43 | system: warning: /var/cache/kiwi/packages/f796d7d2bc4daf38063ff386ebbc072d/kernel-firmware.rpm: Header V3 RSA/SHA256 Signature, key ID 3dbdc284: NOKEY

> If the start of the problem indeed dates back 6 months, I'd like to check
> whether mounting the filesystem with 'dioread_lock' mount option indeed
> fixes the issue. What would be the easiest way to try that with OBS? The
> option needs to be passed either to mount, to mke2fs (as part of default
> mount options), or I can provide patched kernel with modified defaults...

I don't think it's possible for users to influence mount options directly, so if you think that test makes sense, a kernel-obs-build package for i586 would be ideal.

FTR, my test projects are at
https://build.opensuse.org/project/show/home:favogt:boo1169774 (xfs)
https://build.opensuse.org/project/show/home:favogt:boo1169774-btrfs (btrfs)
Comment 36 Jan Kara 2020-11-12 12:26:53 UTC
If this is happening on both ext4 and btrfs it seems unlikely to be caused by ext4 specific problem (provided btrfs ever worked before). But anyway, let's debug this and we'll see where the problem is. The difference between XFS and ext4 shouldn't indeed be that big.

Now I'm mostly ignorant of OBS (well, I use it to build simple packages but that's all). So if I understand right 'vmfstype' influences what is used as a root filesystem of the VM that is doing the build of the package? I.e. in your case of live CD images?

What exactly happens between installation of kbd-legacy-2.3.0-1.1.noarch and kernel-firmware-20201023-1.1.noarch in the build system? This is the difference you care about right? But even in the "fast" case it takes 23 seconds so I'd like to understand what the system is doing during this time... Also where can I see the full logs you've pasted from?
Comment 37 Fabian Vogt 2020-11-12 12:57:05 UTC
(In reply to Jan Kara from comment #36)
> If this is happening on both ext4 and btrfs it seems unlikely to be caused
> by ext4 specific problem (provided btrfs ever worked before). But anyway,
> let's debug this and we'll see where the problem is. The difference between
> XFS and ext4 shouldn't indeed be that big.

The werird part is IMO that this only affects i586 builds on goat and
sheep workers. x86_64 on goat/sheep and i586 on other workers are just fine.



> Now I'm mostly ignorant of OBS (well, I use it to build simple packages but
> that's all). So if I understand right 'vmfstype' influences what is used as
> a root filesystem of the VM that is doing the build of the package? I.e. in
> your case of live CD images?

Yep!

> What exactly happens between installation of kbd-legacy-2.3.0-1.1.noarch and
> kernel-firmware-20201023-1.1.noarch in the build system? This is the
> difference you care about right?

The main issue is "general slowness" of builds, with the rsync difference being
the most obvious. That always uses ext4 (not configurable) and takes too long
for quick testing, so I picked some other part of the build for comparison.

>> But even in the "fast" case it takes 23
> seconds so I'd like to understand what the system is doing during this
> time...

Probably just installation of the kernel-firmware rpm. It's quite massive
(zypper says "219.3 MiB (563.5 MiB unpacked)") and contains 2554 files.

Other package installations are affected as well, but most are just smaller
library packages and a sub-second difference isn't that visible in the log.

> Also where can I see the full logs you've pasted from?

btrfs: https://build.opensuse.org/package/live_build_log/home:favogt:boo1169774-btrfs/livecd-tumbleweed-x11/images/i586
ext4: https://build.opensuse.org/package/live_build_log/home:favogt:boo1169774-ext4/livecd-tumbleweed-x11/images/i586
xfs: https://build.opensuse.org/package/live_build_log/home:favogt:boo1169774/livecd-tumbleweed-x11/images/i586
Comment 38 Adrian Schröter 2020-11-12 13:03:12 UTC
JFYI, you can trigger builds your self (I suppose local builds won't help you?).

Also the sending sysreq feature should finally work reliable. See "osc sendsysrq --help". You would be able to watch builds and get kernel traces into your build log file that way.
Comment 39 Ruediger Oertel 2020-11-12 13:19:02 UTC
>The werird part is IMO that this only affects i586 builds on goat and
>sheep workers. x86_64 on goat/sheep and i586 on other workers are just fine.

this basically says 98% of all workers, IMHO this only leaves the
cloud??? (101-138) and build?? (70-85) machines, all older Intel CPUs
and all sheep/lamb machines are Opteron and goat are EPYC.

Something known with 32bit on AMD ?
Comment 40 Dominique Leuenberger 2020-11-12 13:34:53 UTC
(In reply to Ruediger Oertel from comment #39)
> >The werird part is IMO that this only affects i586 builds on goat and
> >sheep workers. x86_64 on goat/sheep and i586 on other workers are just fine.
> 
> this basically says 98% of all workers, IMHO this only leaves the
> cloud??? (101-138) and build?? (70-85) machines, all older Intel CPUs
> and all sheep/lamb machines are Opteron and goat are EPYC.

lamb is fine as far as I know. cloud is significantly slower than lamb, but iiuc, that's the older hardware responsible for that difference

A good example of timing differences is:
> osc jobhist openSUSE:Factory installation-images:openSUSE standard i586
 (in difference to the live images, this at least succeeds, but 2.5 hours instead of 0.5 is a big difference; also the package used initially to report this bug)
Comment 41 Fabian Vogt 2020-11-12 15:13:18 UTC
I did some more tests. Looking just at the time it takes to install kernel-firmware doesn't really work out, sometimes it's fast even on a build which times out later. So instead I'll write down the time until the root system contents are done, i.e. when "mkfs.ext4" gets called.

goat03 + ext4 + TW kernel: 1739s
goat19 + xfs + TW kernel: 613s
lamb72 + btrfs + TW kernel: 384s
lamb66 + ext4 + TW kernel: 303s
goat19 + ext4 + 15.1 (GA) kernel: 189s
sheep81 + ext4 + 15.2 (GA) kernel: 224s

So with kernel-obs-build-4.12.14-lp151.27.3 from openSUSE:Leap:15.1 the build on goat was considerably faster than others, which matches the x86_64 build times.
Same for kernel-obs-build-5.3.18-lp152.19.2 from openSUSE:Leap:15.2 on sheep81.
Comment 42 Fabian Vogt 2020-11-12 15:20:23 UTC
Created attachment 843540 [details]
build log + sysrq output from goat09 + btrfs

Something odd happened with the goat07 and goat09 builds with btrfs.
Both take *really* long inside rpmkeys:

[ 5308s] [ DEBUG   ]: 14:13:05 | + rpm --import /usr/lib/rpm/gnupg/keys/gpg-pubkey-39db7c82-5f68629b.asc
[ 5308s] [ DEBUG   ]: 14:13:05 | warning: Rebuilding outdated index databases
[ 7531s] [ DEBUG   ]: 14:50:08 | warning: Generating 18 missing index(es), please wait...

On goat09 I triggered sysrq and attached the output.
rpmkeys is currently inside balance_dirty_pages:

[ 5092s] [ 5085.900187] task:rpmkeys         state:D stack:    0 pid:10877 ppid: 10674 flags:0x00004000
[ 5092s] [ 5085.900187] Call Trace:
[ 5092s] [ 5085.900187]  __schedule+0x1c1/0x6b0
[ 5092s] [ 5085.900187]  ? __mod_timer+0x1b2/0x3c0
[ 5092s] [ 5085.900187]  schedule+0x46/0xb0
[ 5092s] [ 5085.900187]  schedule_timeout+0x70/0x130
[ 5092s] [ 5085.900187]  ? __next_timer_interrupt+0xd0/0xd0
[ 5092s] [ 5085.900187]  io_schedule_timeout+0x48/0x70
[ 5092s] [ 5085.900187]  balance_dirty_pages+0x2a2/0xe00
[ 5092s] [ 5085.900187]  balance_dirty_pages_ratelimited+0x2cd/0x370
[ 5092s] [ 5085.900187]  fault_dirty_shared_page+0xc7/0xd0
[ 5092s] [ 5085.900187]  wp_page_shared+0x84/0x170
[ 5092s] [ 5085.900187]  do_wp_page+0x1e7/0x210
[ 5092s] [ 5085.900187]  handle_pte_fault+0x327/0x330
[ 5092s] [ 5085.900187]  __handle_mm_fault+0x119/0x330
[ 5092s] [ 5085.900187]  handle_mm_fault+0xa4/0x250
[ 5092s] [ 5085.900187]  do_user_addr_fault+0x16e/0x330
[ 5092s] [ 5085.900187]  exc_page_fault+0x47/0x160
[ 5092s] [ 5085.900187]  ? irqentry_exit+0x35/0x40
[ 5092s] [ 5085.900187]  ? sysvec_kvm_asyncpf_interrupt+0x40/0x40
[ 5092s] [ 5085.900187]  handle_exception+0x13d/0x140

No idea whether that's the same issue or something else entirely.
Comment 43 Jan Kara 2020-11-13 09:08:42 UTC
Thanks for all the tests! This looks like a very different from the one originally reported in this bug. The original issue was specific to ext4 filesystem while what you report now seems like a general slow down (just affecting different filesystems to different extent - see how XFS build on goat19 takes over 600s with TW kernel on XFS while it takes just 189s with 15.1 kernel on ext4 - it would be good to get comparison with 15.1 build on XFS but I'd expect it to be close to ext4 numbers on that kernel). So I'll clone this bug so that we don't confuse this bug.
Comment 44 Jan Kara 2020-11-13 09:15:15 UTC
Let's solve this in bug 1178762 which is cloned from this bug. I'll copy there the relevant comments.
Comment 45 Jan Kara 2020-11-13 09:16:01 UTC
Resolving again since the original ext4 problem is still fixed.