Bugzilla – Bug 1080761
avahi-daemon consuming 100% CPU for hours
Last modified: 2021-06-17 12:47:15 UTC
The avahi-daemon process consumes 100% CPU on one CPU core for hours and hours on my machine. When I kill it as root with "kill -9", it doesn't take long until that happens again (not immediately, though).
When I attach gdb to that process, I see this backtrace:
0x00007fc6a6dad2a7 in munmap () from /lib64/libc.so.6
#0 0x00007fc6a6dad2a7 in munmap () from /lib64/libc.so.6
#1 0x00007fc6a6d3991b in new_heap () from /lib64/libc.so.6
#2 0x00007fc6a6d3ac5f in arena_get2.part () from /lib64/libc.so.6
#3 0x00007fc6a6d3f6f3 in calloc () from /lib64/libc.so.6
#4 0x00007fc6a72b9bb4 in ?? () from /usr/lib64/libdbus-1.so.3
#5 0x00007fc6a72b5eb3 in dbus_message_get_path_decomposed () from /usr/lib64/libdbus-1.so.3
#6 0x00007fc6a72b95b3 in ?? () from /usr/lib64/libdbus-1.so.3
#7 0x00007fc6a72aa4aa in dbus_connection_dispatch () from /usr/lib64/libdbus-1.so.3
#8 0x000055f7ed1aa076 in ?? ()
#9 0x00007fc6a7d5ef18 in avahi_simple_poll_dispatch () from /usr/lib64/libavahi-common.so.3
#10 0x000055f7ed19c4ef in ?? ()
#11 0x00007fc6a6cdaf4a in __libc_start_main () from /lib64/libc.so.6
#12 0x000055f7ed19cc1a in ?? ()
This has been a constant problem for Tumbleweed over many months. Right now I observe it with avahi-0.6.32-5.1.x86_64, but it has been no different for other versions for a long time.
It might be important to know that this happens in the SUSE R&D network where a large number of printers (45 as my browser's print dialog tells me) is available in the network.
Created attachment 759963 [details]
Output of "sudo journalctl -u avahi-daemon -S today" (gzipped)
The system journal contains some "out of memory" messages from avahi-daemon with subsequent SIGABRT and restart.
But restarting does not seem to be the problem here, to the contrary; maybe it SHOULD restart itself more often to avoid hanging in that calloc() loop.
The problem might be one level deeper in libdbus, but so far I observed this kind of problem only with avahi-daemon.
I see something similar today:
Feb 14 11:04:29 piscolita.suse.de avahi-daemon: Registering new address record for fe80::42:68ff:fef0:1956 on docker_gwbridge.*.
Feb 14 11:04:29 piscolita.suse.de avahi-daemon: iface.c: avahi_server_add_address() failed: Not permitted
... ton of similar lines
Feb 14 11:04:29 piscolita.suse.de avahi-daemon: Server startup complete. Host name is piscolita.local. Local service cookie is 3350574112.
Feb 14 11:04:29 piscolita.suse.de avahi-daemon: Failed to add service 'piscolita' of type '_ssh._tcp', ignoring service group (/etc/avahi/services/ssh.service): Not permitted
Feb 14 11:04:29 piscolita.suse.de avahi-daemon: Failed to add service 'piscolita' of type '_sftp-ssh._tcp', ignoring service group (/etc/avahi/services/sftp-ssh.service): Not permitted
Feb 14 11:04:29 piscolita.suse.de avahi-daemon: Failed to parse address 'fe80::20d:b9ff:fe01:ea8%enp0s25', ignoring.
Feb 14 11:04:31 piscolita.suse.de avahi-daemon: Out of memory, aborting ...
Feb 14 11:04:31 piscolita.suse.de systemd: avahi-daemon.service: Main process exited, code=killed, status=6/ABRT
Feb 14 11:04:31 piscolita.suse.de systemd: avahi-daemon.service: Unit entered failed state.
Feb 14 11:04:31 piscolita.suse.de systemd: avahi-daemon.service: Failed with result 'signal'.
Feb 14 11:04:31 piscolita.suse.de systemd: Starting Avahi mDNS/DNS-SD Stack...
Feb 14 11:04:31 piscolita.suse.de avahi-daemon: Process 17057 died: No such process; trying to remove PID file. (/run/avahi-daemon//pid)
Feb 14 11:04:31 piscolita.suse.de avahi-daemon: Found user 'avahi' (UID 486) and group 'avahi' (GID 485).
Feb 14 11:04:31 piscolita.suse.de avahi-daemon: Successfully dropped root privileges.
Feb 14 11:04:31 piscolita.suse.de avahi-daemon: avahi-daemon 0.6.32 starting up.
I noticed this when moving my laptop to my dock at home, but I am not sure for how long avahi is eating my cpu
There are similar reports out there for 0.6.32
The Debian report claims version 0.7 fixes this problem. Coild it be this change?
* New upstream version 0.7
- All default rlimits have been removed from avahi-daemon.conf.
Those were causing crashes due to OOM and failure to start in LXC
containers. (Closes: #841926, #856311)
The Ubuntu reports claims:
"Workaround is to edit /etc/avahi/avahi-daemon.conf and comment out the entire [limits] section or at least the rlimit-data and rlimit-stack sections
The rlimits I have:
Updating to 0.7 from GNOME:Next gives me
So, my question is, why was it updated in GNOME:Next but not submitted to Factory/Tumbleweed.
Also on my laptop I see that the avahi daemon uses way too much CPU time:
# rpm -q avahi
11:10:54 up 3:17, 7 users, load average: 0.39, 0.26, 0.22
# ps auxwf | grep avahi-daemon
avahi 833 2.4 0.0 68828 7912 ? Ss 07:53 4:53 avahi-daemon: running [thinkpad-bart.local]
What I see is that the avahi-daemon uses about 2.6% of CPU time, all the time.
It's been 3 years now, and I don't see ANY reaction from the maintainers of this package.
No feedback, no progress, no questions, no acknowledgement that the problem even exists; nothing.
I assume you're now using the 0.8 package from TW. I'm guessing you're no longer seeing the out of memory messages (the rlimit values should be gone now), but you're still seeing the high CPU usage?
It might be useful to run dbus-monitor --system when you see it happening; I'm wondering if it is processing a flood of dbus messages. Also, I'm curious if the backtrace is similar now that the rlimits are gone.
[shundhammer @ g12] ~ % rpm -qf `which avahi-daemon`
I am not updating that machine very much now that I am in Corona home office and that machine is accessible to me only via VPN to the office; any "zypper dup" that requires a reboot (i.e. at least once a week) can easily cause a problem that prevents the machine from coming back up again, then I can't log in anymore until I travel there to fix it manually.
Could you check that, in your avahi-daemon.conf, the items under [rlimits] are commented out? This should be the case with a new installation, but, since you upgraded from 0.6.32, I think that it is worth checking.