Bugzilla – Bug 950178
Can't rename files (on btrfs)
Last modified: 2018-04-13 14:59:07 UTC
There is something really fishy on my test machine - and it happened out of the sudden when I tried to zypper dup: tortuga:/ # df -h Dateisystem Größe Benutzt Verf. Verw% Eingehängt auf devtmpfs 7,9G 8,0K 7,9G 1% /dev tmpfs 7,9G 0 7,9G 0% /dev/shm tmpfs 7,9G 11M 7,8G 1% /run tmpfs 7,9G 0 7,9G 0% /sys/fs/cgroup /dev/sda3 41G 4,5G 36G 12% / /dev/sda3 41G 4,5G 36G 12% /var/opt /dev/sda3 41G 4,5G 36G 12% /var/lib/mailman /dev/sda3 41G 4,5G 36G 12% /var/tmp /dev/sda3 41G 4,5G 36G 12% /var/spool /dev/sda3 41G 4,5G 36G 12% /var/log /dev/sda3 41G 4,5G 36G 12% /var/lib/pgsql /dev/sda3 41G 4,5G 36G 12% /var/lib/named /dev/sda3 41G 4,5G 36G 12% /var/lib/mariadb /dev/sda3 41G 4,5G 36G 12% /var/lib/libvirt/images /dev/sda3 41G 4,5G 36G 12% /boot/grub2/x86_64-efi /dev/sda3 41G 4,5G 36G 12% /var/crash /dev/sda3 41G 4,5G 36G 12% /usr/local /dev/sda3 41G 4,5G 36G 12% /tmp /dev/sda3 41G 4,5G 36G 12% /opt /dev/sda3 41G 4,5G 36G 12% /srv /dev/sda3 41G 4,5G 36G 12% /boot/grub2/i386-pc /dev/sda1 156M 4,5M 152M 3% /boot/efi /dev/sda5 1,0T 15G 1010G 2% /home tortuga:/ # LC_ALL=C mv bar baz mv: cannot move 'bar' to 'baz': No space left on device tortuga:/ # touch baz I simply can't move files, but I can create new ones - also by copy. Running 4.1.8-1-default from 42.1-Current
Not being able to update packages on Leap is a ship stopper
after a reboot it's working now
When this happens try running "sync" or "btrfs filesystem sync <mountpoint>" and retry the rename/move. Also, for this sort of issues, the report from df is pretty much useless. The output of "btrfs filesystem usage <mountpoint>" gives useful information, how much space is allocated for data and metadata and how much of that space is used/free. A rename in btrfs requires at least 2883584 bytes of free metadata space (quite large, but it's because it's very pessimistic). If an issue persists across unmount/mount, an image of the fs is useful for debugging, made with "btrfs-image -c 9 /dev/sda image_file" (the fs must not be mounted).
tortuga:/ # mv bar baz mv: cannot move 'bar' to 'baz': No space left on device tortuga:/ # sync / sync: ignoring all arguments tortuga:/ # mv bar baz mv: cannot move 'bar' to 'baz': No space left on device tortuga:/ # btrfs filesystem sync / FSSync '/' tortuga:/ # mv bar baz mv: cannot move 'bar' to 'baz': No space left on device tortuga:/ # btrfs filesystem usage / Overall: Device size: 40.00GiB Device allocated: 40.00GiB Device unallocated: 1.00MiB Device missing: 0.00B Used: 4.69GiB Free (estimated): 35.12GiB (min: 35.12GiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 64.00MiB (used: 0.00B) Data,single: Size:39.44GiB, Used:4.32GiB /dev/sda3 39.44GiB Metadata,DUP: Size:256.00MiB, Used:189.27MiB /dev/sda3 512.00MiB System,DUP: Size:32.00MiB, Used:16.00KiB /dev/sda3 64.00MiB Unallocated: /dev/sda3 1.00MiB
extracting an image of / is a bit challenging ;(
thanks to the rescue system: /suse/coolo/Export/BUG950178.btrfs note however that after a reboot, it's working for a short while, but zypper up will stop at the 3rd to 5th package with the same problem.
When this happens try running balance against data block groups only to see if it helps: btrfs balance -dusage=0 <mountpoint> btrfs balance -dusage=5 <mountpoint> btrfs balance -dusage=10 <mountpoint> Where dusage means data block groups with a usage of N% or below. It seems there's way too much space allocated for data that is preventing new metadata block groups from being allocated. Though there's still considerably more than enough metadata free to allow for the rename operation to succeed (there's probably some bad accounting of metadata bug somewhere).
For renames, there's only 3 MB available for the transaction. What we should be doing here is allocating a new metadata block group since there is ~ 36GB free.
Nevermind. "Free" isn't unallocated in this context. Yeah, the balance should help.
So if that's what helps, why is btrfs not doing it itself?
How can btrfs run into this situation and how to prevent it? GM is just a few days away ...
(In reply to Ludwig Nussel from comment #11) > How can btrfs run into this situation and how to prevent it? GM is just a > few days away ... While "auto balance" might have serious performance impact (and thus is not implemented upstream in brtfs yet, as far as I know), the "btrfsmaintenance" package contains the right scripts to keep the btrfs filesystem in a reasonable shape. Make sure that "btrfsmaintenance" is installed by default, and the scripts run.
(In reply to Matthias Eckermann from comment #12) > (In reply to Ludwig Nussel from comment #11) > > How can btrfs run into this situation and how to prevent it? GM is just a > > few days away ... > > While "auto balance" might have serious performance impact (and thus > is not implemented upstream in brtfs yet, as far as I know), the > "btrfsmaintenance" package contains the right scripts to keep the > btrfs filesystem in a reasonable shape. Make sure that "btrfsmaintenance" > is installed by default, and the scripts run. btrfsmaintenance is well installed in a default setup, BUT: > systemctl status btrfsmaintenance-refresh ● btrfsmaintenance-refresh.service - Update cron periods from /etc/sysconfig/btrfsmaintenance Loaded: loaded (/usr/lib/systemd/system/btrfsmaintenance-refresh.service; disabled; vendor preset: disabled) Active: inactive (dead) being disabled (as any default service) it is not going to do a lot of work - if this should be changed, a diff to systemd-presets-branding-openSUSE is needed. After I started the service, it indeed created links in /etc/cron.{weekly,monthly} as expected.
that doesn't seem to be the problem though, tortuga has these cron jobs. But I doubt a weekly cronjob is the solution to the problem we saw.
So, I looked at the image and restored it into a test device on my machine, with an upstream 4.2 kernel. I couldn't reproduce the issue. Some analysis details: There are over 20 data blocks groups that are empty, i.e. not used: $ btrfs-image -r BUG950178.btrfs image $ btrfs-debug-tree image 2>&1 | egrep 'BLOCK_GROUP_ITEM' -A 1 (...) item 194 key (26705133568 BLOCK_GROUP_ITEM 1073741824) itemoff 5977 itemsize 24 block group used 0 chunk_objectid 256 flags DATA item 195 key (27778875392 BLOCK_GROUP_ITEM 1073741824) itemoff 5953 itemsize 24 block group used 0 chunk_objectid 256 flags DATA item 196 key (28852617216 BLOCK_GROUP_ITEM 1073741824) itemoff 5929 itemsize 24 block group used 0 chunk_objectid 256 flags DATA item 197 key (29926359040 BLOCK_GROUP_ITEM 1073741824) itemoff 5905 itemsize 24 block group used 0 chunk_objectid 256 flags DATA item 198 key (31000100864 BLOCK_GROUP_ITEM 1073741824) itemoff 5881 itemsize 24 block group used 0 chunk_objectid 256 flags DATA item 199 key (32073842688 BLOCK_GROUP_ITEM 1073741824) itemoff 5857 itemsize 24 block group used 0 chunk_objectid 256 flags DATA item 200 key (33147584512 BLOCK_GROUP_ITEM 1073741824) itemoff 5833 itemsize 24 block group used 0 chunk_objectid 256 flags DATA item 201 key (34221326336 BLOCK_GROUP_ITEM 1073741824) itemoff 5809 itemsize 24 block group used 0 chunk_objectid 256 flags DATA item 202 key (35295068160 BLOCK_GROUP_ITEM 1073741824) itemoff 5785 itemsize 24 block group used 0 chunk_objectid 256 flags DATA item 203 key (36368809984 BLOCK_GROUP_ITEM 1073741824) itemoff 5761 itemsize 24 block group used 0 chunk_objectid 256 flags DATA item 204 key (37442551808 BLOCK_GROUP_ITEM 1073741824) itemoff 5737 itemsize 24 block group used 0 chunk_objectid 256 flags DATA item 205 key (38516293632 BLOCK_GROUP_ITEM 1073741824) itemoff 5713 itemsize 24 block group used 0 chunk_objectid 256 flags DATA item 206 key (39590035456 BLOCK_GROUP_ITEM 1073741824) itemoff 5689 itemsize 24 block group used 0 chunk_objectid 256 flags DATA item 207 key (40663777280 BLOCK_GROUP_ITEM 1073741824) itemoff 5665 itemsize 24 block group used 0 chunk_objectid 256 flags DATA item 208 key (41737519104 BLOCK_GROUP_ITEM 1073741824) itemoff 5641 itemsize 24 block group used 0 chunk_objectid 256 flags DATA item 209 key (42811260928 BLOCK_GROUP_ITEM 1073741824) itemoff 5617 itemsize 24 block group used 0 chunk_objectid 256 flags DATA item 210 key (43885002752 BLOCK_GROUP_ITEM 1073741824) itemoff 5593 itemsize 24 block group used 0 chunk_objectid 256 flags DATA item 211 key (44958744576 BLOCK_GROUP_ITEM 1073741824) itemoff 5569 itemsize 24 block group used 0 chunk_objectid 256 flags DATA item 212 key (46032486400 BLOCK_GROUP_ITEM 687865856) itemoff 5545 itemsize 24 block group used 0 chunk_objectid 256 flags DATA item 213 key (46720352256 BLOCK_GROUP_ITEM 635437056) itemoff 5521 itemsize 24 block group used 0 chunk_objectid 256 flags DATA item 214 key (47355789312 BLOCK_GROUP_ITEM 134217728) itemoff 5497 itemsize 24 block group used 0 chunk_objectid 256 flags DATA item 215 key (47490007040 BLOCK_GROUP_ITEM 67108864) itemoff 5473 itemsize 24 block group used 0 chunk_objectid 256 flags DATA item 216 key (47557115904 BLOCK_GROUP_ITEM 11534336) itemoff 5449 itemsize 24 block group used 0 chunk_objectid 256 flags DATA (...) As of the upstream kernel 3.19, we have a background kernel thread that automatically deletes unused block groups. When I mount the image, these unused block groups get deallocated shortly after (about 30s after the fs is mounted, which is the default transaction commit period), releasing over 20G of space. Yes, for a fs for which there's no more unallocated space it means we have a time window of 30s where applications can get ENOSPC errors after mount - far from ideal, but the problem you are having on that machine seems to be that the deallocation isn't happening at all. We had bugs with this automatic cleanup of unused block groups since 3.19, but none of them were related to not getting the unused block groups deleted (we had BUG_ON's, crashes, etc) as far as I remember and can see through git history. The only thing I can think of that prevents them from getting deleted is if at some time the fs turned into readonly mode, due to some serious issue - if that's the case you would see a message and a trace about this in dmesg/syslog. Also, did you end up running a manual balance operation as suggested in a previous comment? btrfs balance -dusage=0 <mountpoint> I know it's not ideal and that people should not worry about this, but it's just to confirm that even if doing that the unused block group deallocations aren't happening.
Hi, I don't need the machine, the image was enough. The fs got into a state where it can't even reserve/allocate metadata to delete unused block groups (deleting them requires COWing the extent, device and chunk trees to delete some items and update another one). Not sure at this point if it's really possible to guarantee 100% we don't reach this scenario (however it can be minimized a bit) - haven't seen this happening before. thanks
Created attachment 655735 [details] btrfs kernel patch Are you able to test patches on that box btw? (kernel patch) If so please try the attached patch and see if it helps (at least after a sync).
So I was able to reproduce by hacking btrfs-image' restore behaviour so that it sets the target fs total size to match the used size: diff --git a/btrfs-image.c b/btrfs-image.c index 82eed05..1d44ea1 100644 --- a/btrfs-image.c +++ b/btrfs-image.c @@ -2386,6 +2386,7 @@ static int fixup_devices(struct btrfs_fs_info *fs_info, devid = btrfs_stack_device_id(dev_item); + dev_size = mdres->alloced_chunks; btrfs_set_stack_device_total_bytes(dev_item, dev_size); btrfs_set_stack_device_bytes_used(dev_item, mdres->alloced_chunks); With that applied to btrfs-progs, I was able to reproduce your enospc issue. After mounting the restored fs, the unused block groups were never deleted, because the cleaner kthread always got a -ENOSPC error when trying to start a transaction to delete unused block groups - the cleaner does not log such errors nor takes any special action, whence why it was silent (nothing in syslog/dmesg). With the patch I attached earlier, soon after the mounting the fs the unused block groups were deleted, and about 33Gb of space released.
(In reply to Ludwig Nussel from comment #16) > I just tried zypper dup on an older tumbleweed installation with kernel > 4.1.6-3-desktop and it ran into the same situation. You can log into > jkeil.suse.de to check. > > jkeil:~ # btrfs balance start -dusage=0 / > ERROR: error during balancing '/' - No space left on device > There may be more info in syslog - try dmesg | tail There are a few issues here but they have the same root cause in different places in the code. The removal of a block group will use the global reservation but starting the transaction won't without fixing. So we fail to start the transaction and don't even touch the global reservation. Filipe's fix (and mine, which I'm testing overnight and in the morning) fix these cases. But, even with those fixes, you'd still get the ENOSPC for balance start. The balance will report ENOSPC but will kick the cleaner thread while it's doing it and the locking means that the cleaner thread will usually run before btrfs balance start returns. So that's now a UI issue and one I'm working on a fix for too.
I should be able to test on the test machine that triggered the bug report. I left the file system as is and reinstalled on another partition. But I don't understand the kernel patching system well enough to build such a kernel. So if you have some rpm in OBS, let me know.
We can give you an RPM with the fixes we have already. Those will just go into git pretty much immediately. The false ENOSPC reporting issue w/ balance is one I'm still tracking down.
For the non-balance related issues, automatic removal of unused block group not working and a BUG_ON/crash when deleting a block group, I've sent the changes upstream today: http://thread.gmane.org/gmane.comp.file-systems.btrfs/50150 We'll get them to sle12/leap/opensuse soon.
Will Santa bring the fixes to our users finally?
(In reply to Ludwig Nussel from comment #25) > Will Santa bring the fixes to our users finally? Only to SLE12 and SLE12-SP1 users... And upstream 4.4 kernel (now at rc6) users.
What about Leap?
https://build.opensuse.org/request/show/355284 references this bug for leap, does it mean it contains a fix or is that just partial?
(In reply to Ludwig Nussel from comment #29) > https://build.opensuse.org/request/show/355284 references this bug for leap, > does it mean it contains a fix or is that just partial? It means it contains the 2 patches I mentioned before (comment 24). 2 fixes for 2 different scenarios leading to ENOSPC. Jeff was/is still investigating the other (probably less common) case.
openSUSE-SU-2016:0280-1: An update that solves 10 vulnerabilities and has 18 fixes is now available. Category: security (important) Bug References: 865096,865259,913996,950178,950998,952621,954324,954532,954647,955422,956708,957152,957988,957990,958439,958463,958504,958510,958886,958951,959190,959399,960021,960710,961263,961509,962075,962597 CVE References: CVE-2015-7550,CVE-2015-8539,CVE-2015-8543,CVE-2015-8550,CVE-2015-8551,CVE-2015-8552,CVE-2015-8569,CVE-2015-8575,CVE-2015-8767,CVE-2016-0728 Sources used: openSUSE Leap 42.1 (src): kernel-debug-4.1.15-8.1, kernel-default-4.1.15-8.1, kernel-docs-4.1.15-8.3, kernel-ec2-4.1.15-8.1, kernel-obs-build-4.1.15-8.2, kernel-obs-qa-4.1.15-8.1, kernel-obs-qa-xen-4.1.15-8.1, kernel-pae-4.1.15-8.1, kernel-pv-4.1.15-8.1, kernel-source-4.1.15-8.1, kernel-syms-4.1.15-8.1, kernel-vanilla-4.1.15-8.1, kernel-xen-4.1.15-8.1
Thanks, reducing severity for less common case mentioned in comment #30
This is automated batch bugzilla cleanup. The openSUSE 42.1 changed to end-of-life (EOL [1]) status. As such it is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of openSUSE, or you can still observe it under openSUSE Leap 15.0, please feel free to reopen this bug against that version (see the "Version" component in the bug fields), or alternatively open a new ticket. Thank you for reporting this bug and we are sorry it could not be fixed during the lifetime of the release. [1] https://en.opensuse.org/Lifetime