Bug 1111414 - snapper's background comparison makes the system unusable
snapper's background comparison makes the system unusable
Status: RESOLVED FIXED
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel
Current
Other Other
: P2 - High : Normal (vote)
: ---
Assigned To: YaST Team
E-mail List
https://trello.com/c/LzTx1c2d/2609-tw...
:
Depends on: 1049574
Blocks:
  Show dependency treegraph
 
Reported: 2018-10-10 14:48 UTC by Fabian Vogt
Modified: 2019-02-15 14:10 UTC (History)
8 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---
aschnell: needinfo? (dsterba)


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Fabian Vogt 2018-10-10 14:48:23 UTC
Multiple times the system became slowed down by snapper which used up most of the IO capacity of the hard drive and made the system unusable.

strace showed that it was busy diffing kernel modules and sources.

This could be improved massively by using the output of btrfs send to compare snapshots.
Comment 1 Steffen Winterfeldt 2018-10-11 09:21:35 UTC
Arvin, could it?
Comment 2 Arvin Schnell 2018-10-11 09:27:57 UTC
Snapper does already use btrfs send.
Comment 3 Fabian Vogt 2018-10-11 09:32:04 UTC
(In reply to Arvin Schnell from comment #2)
> Snapper does already use btrfs send.

Then I wonder why snapper read the files from the snapshot and compared them manually.

I'm not sure why snapper does this, is it just to determine whether a pre-post pair is empty and should be cleaned up?
Comment 4 Stefan Hundhammer 2018-10-11 09:33:28 UTC
I am pretty sure that it's the Btrfs subsystem in the kernel that causes this load; nothing that we can fix on the Snapper side (which is NOT bug component YaST to begin with).
Comment 5 Jeff Mahoney 2018-10-12 13:40:32 UTC
It's premature to call this a btrfs issue.  The reporter claims that it was doing a manual comparison.  Comment #2 just dismisses this.  This needs to be reconciled and the workload needs to be described before we'll look at this as a file system issue.
Comment 6 Arvin Schnell 2018-10-12 14:12:19 UTC
If btrfs send fails then snapper uses manual comparison. The code in snapper
was added more than five years ago - maybe it does not work any longer due to
btrfs API changes (there is already conditional code depending on the version
of libbtrfs).

/var/log/snapper.log shows what is going on.

Anyway, even manual comparison should not makes the system unusable.
Comment 7 Arvin Schnell 2018-10-12 14:55:05 UTC
Funny, just now someone reports that the btrfs send used in snapper does
not work: https://github.com/openSUSE/snapper/pull/438
Comment 8 Jeff Mahoney 2018-10-12 18:10:47 UTC
That looks like a pretty good candidate.  If this issue still exists with that patch applied, I'll dig a little deeper.

CC'ing Dave for the heads up of how this ultimately was a user-visible change that broke userspace.
Comment 9 Arvin Schnell 2018-10-15 07:23:56 UTC
I added a card to the YaST trello task board so that the issue is
prioritised with the other tasks.
Comment 10 Arvin Schnell 2018-11-29 12:47:57 UTC
I had the chance to look at the issue now.

First thing to notice is that SLE15 is also affected while SLE12 SP4 is not
affected.

AFAIS the change leading to the problem is in libbtrfs (4.11), but not even the
patchlevel of libbtrfs was increased. On the other hand the patch for snapper
should also work with the older libbtrfs.

David, can you please confirm this?
Comment 11 Fabian Vogt 2018-12-06 14:26:58 UTC
(In reply to Arvin Schnell from comment #10)
> I had the chance to look at the issue now.
> 
> First thing to notice is that SLE15 is also affected while SLE12 SP4 is not
> affected.
> 
> AFAIS the change leading to the problem is in libbtrfs (4.11), but not even
> the
> patchlevel of libbtrfs was increased. On the other hand the patch for snapper
> should also work with the older libbtrfs.

AFAICT that's also what the comment on the PR (https://github.com/openSUSE/snapper/pull/438#issuecomment-429358392) says.

So is there still a reason to not merge it?

> David, can you please confirm this?
Comment 12 Arvin Schnell 2018-12-06 17:41:09 UTC
Well, for once an confirmation would be fine. Also, as I already wrote on
github, there is the possibility of regressions by enabling it now. After
all this was never tested in the SLE15 code base (and MUs are requested for
SLE15). So real testing is needed. That this idea is not just an abstract
idea shows bug #1049574, which from my point of view must be fixed first.
Comment 13 Arvin Schnell 2019-01-22 11:22:40 UTC
PR: https://github.com/openSUSE/snapper/pull/472
Comment 14 Arvin Schnell 2019-01-23 13:38:14 UTC
SR for openSUSE-Factory: https://build.opensuse.org/request/show/667835

Closing as fixed.
Comment 15 Arvin Schnell 2019-01-23 13:39:05 UTC
SR for SLE15-SP1: https://build.suse.de/request/show/182327

MU for SLE15: https://build.suse.de/request/show/182435
Comment 16 Swamp Workflow Management 2019-02-06 14:11:00 UTC
SUSE-RU-2019:0255-1: An update that has two recommended fixes can now be installed.

Category: recommended (moderate)
Bug References: 1049574,1111414
CVE References: 
Sources used:
SUSE Linux Enterprise Module for Basesystem 15 (src):    snapper-0.5.6-5.7.1
Comment 17 Swamp Workflow Management 2019-02-15 14:10:37 UTC
openSUSE-RU-2019:0187-1: An update that has two recommended fixes can now be installed.

Category: recommended (moderate)
Bug References: 1049574,1111414
CVE References: 
Sources used:
openSUSE Leap 15.0 (src):    snapper-0.5.6-lp150.3.9.1