Bug 1080761 - avahi-daemon consuming 100% CPU for hours
avahi-daemon consuming 100% CPU for hours
Status: NEW
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Network
Current
Other Other
: P5 - None : Normal (vote)
: ---
Assigned To: E-mail List
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2018-02-13 10:17 UTC by Stefan Hundhammer
Modified: 2021-06-17 12:47 UTC (History)
4 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Output of "sudo journalctl -u avahi-daemon -S today" (gzipped) (3.84 KB, application/gzip)
2018-02-13 10:59 UTC, Stefan Hundhammer
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Hundhammer 2018-02-13 10:17:09 UTC
The avahi-daemon process consumes 100% CPU on one CPU core for hours and hours on my machine. When I kill it as root with "kill -9", it doesn't take long until that happens again (not immediately, though).

When I attach gdb to that process, I see this backtrace:

0x00007fc6a6dad2a7 in munmap () from /lib64/libc.so.6
(gdb) bt
#0  0x00007fc6a6dad2a7 in munmap () from /lib64/libc.so.6
#1  0x00007fc6a6d3991b in new_heap () from /lib64/libc.so.6
#2  0x00007fc6a6d3ac5f in arena_get2.part () from /lib64/libc.so.6
#3  0x00007fc6a6d3f6f3 in calloc () from /lib64/libc.so.6
#4  0x00007fc6a72b9bb4 in ?? () from /usr/lib64/libdbus-1.so.3
#5  0x00007fc6a72b5eb3 in dbus_message_get_path_decomposed () from /usr/lib64/libdbus-1.so.3
#6  0x00007fc6a72b95b3 in ?? () from /usr/lib64/libdbus-1.so.3
#7  0x00007fc6a72aa4aa in dbus_connection_dispatch () from /usr/lib64/libdbus-1.so.3
#8  0x000055f7ed1aa076 in ?? ()
#9  0x00007fc6a7d5ef18 in avahi_simple_poll_dispatch () from /usr/lib64/libavahi-common.so.3
#10 0x000055f7ed19c4ef in ?? ()
#11 0x00007fc6a6cdaf4a in __libc_start_main () from /lib64/libc.so.6
#12 0x000055f7ed19cc1a in ?? ()
(gdb) 


This has been a constant problem for Tumbleweed over many months. Right now I observe it with avahi-0.6.32-5.1.x86_64, but it has been no different for other versions for a long time.
Comment 1 Stefan Hundhammer 2018-02-13 10:41:01 UTC
It might be important to know that this happens in the SUSE R&D network where a large number of printers (45 as my browser's print dialog tells me) is available in the network.
Comment 2 Stefan Hundhammer 2018-02-13 10:59:29 UTC
Created attachment 759963 [details]
Output of "sudo journalctl -u avahi-daemon -S today" (gzipped)

The system journal contains some "out of memory" messages from avahi-daemon with subsequent SIGABRT and restart. 

But restarting does not seem to be the problem here, to the contrary; maybe it SHOULD restart itself more often to avoid hanging in that calloc() loop.

The problem might be one level deeper in libdbus, but so far I observed this kind of problem only with avahi-daemon.
Comment 3 Duncan Mac-Vicar 2018-02-14 20:58:48 UTC
I see something similar today:

Feb 14 11:04:29 piscolita.suse.de avahi-daemon[17057]: Registering new address record for fe80::42:68ff:fef0:1956 on docker_gwbridge.*.
Feb 14 11:04:29 piscolita.suse.de avahi-daemon[17057]: iface.c: avahi_server_add_address() failed: Not permitted

... ton of similar lines

Feb 14 11:04:29 piscolita.suse.de avahi-daemon[17057]: Server startup complete. Host name is piscolita.local. Local service cookie is 3350574112.
Feb 14 11:04:29 piscolita.suse.de avahi-daemon[17057]: Failed to add service 'piscolita' of type '_ssh._tcp', ignoring service group (/etc/avahi/services/ssh.service): Not permitted
Feb 14 11:04:29 piscolita.suse.de avahi-daemon[17057]: Failed to add service 'piscolita' of type '_sftp-ssh._tcp', ignoring service group (/etc/avahi/services/sftp-ssh.service): Not permitted
Feb 14 11:04:29 piscolita.suse.de avahi-daemon[17057]: Failed to parse address 'fe80::20d:b9ff:fe01:ea8%enp0s25', ignoring.
Feb 14 11:04:31 piscolita.suse.de avahi-daemon[17057]: Out of memory, aborting ...
Feb 14 11:04:31 piscolita.suse.de systemd[1]: avahi-daemon.service: Main process exited, code=killed, status=6/ABRT
Feb 14 11:04:31 piscolita.suse.de systemd[1]: avahi-daemon.service: Unit entered failed state.
Feb 14 11:04:31 piscolita.suse.de systemd[1]: avahi-daemon.service: Failed with result 'signal'.
Feb 14 11:04:31 piscolita.suse.de systemd[1]: Starting Avahi mDNS/DNS-SD Stack...
Feb 14 11:04:31 piscolita.suse.de avahi-daemon[17060]: Process 17057 died: No such process; trying to remove PID file. (/run/avahi-daemon//pid)
Feb 14 11:04:31 piscolita.suse.de avahi-daemon[17060]: Found user 'avahi' (UID 486) and group 'avahi' (GID 485).
Feb 14 11:04:31 piscolita.suse.de avahi-daemon[17060]: Successfully dropped root privileges.
Feb 14 11:04:31 piscolita.suse.de avahi-daemon[17060]: avahi-daemon 0.6.32 starting up.

I noticed this when moving my laptop to my dock at home, but I am not sure for how long avahi is eating my cpu

There are similar reports out there for 0.6.32
https://bugzilla.redhat.com/show_bug.cgi?id=1388249
https://lists.freedesktop.org/archives/avahi/2016-October/002439.html
https://bugs.archlinux.org/task/50897
https://askubuntu.com/questions/848257/avahi-out-of-memory-error
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=841926
https://bugs.launchpad.net/ubuntu/+source/avahi/+bug/1638345

The Debian report claims version 0.7 fixes this problem. Coild it be this change?

 * New upstream version 0.7
     - All default rlimits have been removed from avahi-daemon.conf.
       Those were causing crashes due to OOM and failure to start in LXC
       containers. (Closes: #841926, #856311)

The Ubuntu reports claims:

"Workaround is to edit /etc/avahi/avahi-daemon.conf and comment out the entire [limits] section or at least the rlimit-data and rlimit-stack sections

The rlimits I have:

[rlimits]
#rlimit-as=
rlimit-core=0
rlimit-data=4194304
rlimit-fsize=0
rlimit-nofile=768
rlimit-stack=4194304
rlimit-nproc=3

Updating to 0.7 from GNOME:Next gives me

[rlimits]
#rlimit-as=
#rlimit-core=0
#rlimit-data=8388608
#rlimit-fsize=0
#rlimit-nofile=768
#rlimit-stack=8388608
#rlimit-nproc=3

So, my question is, why was it updated in GNOME:Next but not submitted to Factory/Tumbleweed.



https://download.opensuse.org/repositories/GNOME:/Next/openSUSE_Factory/x86_64/avahi-0.7-196.10.x86_64.rpm
Comment 4 Bart Van Assche 2018-08-13 18:14:00 UTC
Also on my laptop I see that the avahi daemon uses way too much CPU time:
# rpm -q avahi
avahi-0.7-4.3.x86_64
# uptime
 11:10:54  up   3:17,  7 users,  load average: 0.39, 0.26, 0.22
# ps auxwf | grep avahi-daemon
avahi      833  2.4  0.0  68828  7912 ?        Ss   07:53   4:53 avahi-daemon: running [thinkpad-bart.local]

What I see is that the avahi-daemon uses about 2.6% of CPU time, all the time.
Comment 5 Stefan Hundhammer 2021-06-16 12:02:09 UTC
It's been 3 years now, and I don't see ANY reaction from the maintainers of this package.

No feedback, no progress, no questions, no acknowledgement that the problem even exists; nothing.

SERIOUSLY?
Comment 6 Michael Gorse 2021-06-16 14:37:37 UTC
I assume you're now using the 0.8 package from TW. I'm guessing you're no longer seeing the out of memory messages (the rlimit values should be gone now), but you're still seeing the high CPU usage?

It might be useful to run dbus-monitor --system when you see it happening; I'm wondering if it is processing a flood of dbus messages. Also, I'm curious if the backtrace is similar now that the rlimits are gone.
Comment 7 Stefan Hundhammer 2021-06-17 08:34:01 UTC
[shundhammer @ g12] ~ % rpm -qf `which avahi-daemon`

avahi-0.7-15.2.x86_64


I am not updating that machine very much now that I am in Corona home office and that machine is accessible to me only via VPN to the office; any "zypper dup" that requires a reboot (i.e. at least once a week) can easily cause a problem that prevents the machine from coming back up again, then I can't log in anymore until I travel there to fix it manually.
Comment 8 Michael Gorse 2021-06-17 12:47:15 UTC
Could you check that, in your avahi-daemon.conf, the items under [rlimits] are commented out? This should be the case with a new installation, but, since you upgraded from 0.6.32, I think that it is worth checking.