Bugzilla – Bug 1152762
User stop jobs delayed/hanging due NFS shared HOME
Last modified: 2019-11-05 11:20:34 UTC
If I do a shudown -r as user I seevery often a stop job running into timeout. A few lines before the network is down before the NFS shared HOME becomes unmounted. Can we *please* do this i correct order: first stop all jobs holding the NFS shared HOME busy, then umount the NFS shafred HOME, adn at very last action stop the network as well.
Reassign to systemd-maintainers and add maintainer of autofs as well to Carbon Copy list
Hi Werner, Can you share /etc/fstab assuming that it's used to setup your NFS mountpoints ?
(In reply to Franck Bui from comment #2) > Hi Werner, > > Can you share /etc/fstab assuming that it's used to setup your NFS > mountpoints ? noether /suse/werner> grep nfs /etc/fstab boole.suse.de:/usr/src/werner /usr/src/werner nfs noauto,user,exec,rsize=32768,wsize=32768,namlen=255,retrans=6,sec=sys,udp 0 0 noether /suse/werner> ypcat auto.suse | grep /werner -rw,nosuid loki:/real-home/werner the touble some seems to be the last one, that is from autofs
(In reply to Dr. Werner Fink from comment #3) > the touble some seems to be the last one, that is from autofs can you provide more details about that one and its config please ? Also can you attach the result of "systemctl show xxx" after replacing xxx with the relevant mount unit ?
Created attachment 820786 [details] suse-werner.mount.txt (In reply to Franck Bui from comment #4) > (In reply to Dr. Werner Fink from comment #3) > > the touble some seems to be the last one, that is from autofs > > can you provide more details about that one and its config please ? > > Also can you attach the result of "systemctl show xxx" after replacing xxx > with the relevant mount unit ? Here we are
Config is that what automount does nothing more and nothing less
Can you provide the journal logs with the issue reproduced and with debug logs enabled ? Thanks.
Created attachment 821627 [details] /var/log/boot.old (In reply to Franck Bui from comment #7) > Can you provide the journal logs with the issue reproduced and with debug > logs enabled ? > > Thanks. Hmmm ... AFAICS the default is MaxLevelStore=debug but at this point there is no parition anymore to store the logs(?) Nevertheless the messages seems to be from manager_print_jobs_in_progress() in src/core/manager.c ... manager_status_printf(m, STATUS_TYPE_EPHEMERAL, cylon, "%sA %s job is running for %s (%s / %s)", strempty(job_of_n), job_type_to_string(j->type), unit_description(j->unit), time, limit); as this looks like the message I had seen (filled with values) ... I've running a blogd and this one shows the console log messages up to last final umount of /
Stopping Session 2 of user werner. ... [** ] A stop job is running for Session 2 of user werner (1min 29s / 1min 30s) [ OK ] Stopped Session 2 of user werner. ... [ OK ] Stopped target Remote File Systems. Unmounting /suse/werner... Unmounting /usr/src/werner... ... [*** ] (1 of 2) A stop job is running for /usr/src/werner (1min 36s / 3min 1s) [ *** ] (2 of 2) A stop job is running for /suse/werner (1min 37s / 3min 1s) [ OK ] Unmounted /suse/werner. [ OK ] Unmounted /usr/src/werner. ... [ OK ] Stopped target Network is Online. [ OK ] Stopped target Network. That is your user session is stopped before /suse/werner is unmounted and /suse/werner is unmounted before the network is shutdown. However I'm not sure for the NFS client services as one can see that rpc-statd.service is stopped before the nfs mounts are but maybe that is expected. Also in the status messages I can't see any other NFS related services. Neil, can you please check that the NFS client stuff is still up until the last remote fs is stopped ? Thanks.
I doubt that rpc-statd.service is causing the proble, but I cannot completely rule it out. It is not straight forward to get statd shutdown working properly because it only needs to remain until the last nfsv3 filesystem is unmounted, and there is no way to tell systemd that. It gets started by mount.nfs requesting it when it mounts an nfsv3 filesystem. You can use systemctl to add a 'Wants' or 'Requires' dependency dynamically but I would need to add a 'Before' dependency (rpc-statd.sevice is Before suse/werner.mount), and I don't think I can. Anyway, I'm not certain that is the problem. Let's experiment first. Werner: could you please add a Before = remote-fs.target dependency to each of autofs.service and rpc-statd.service and see if that fixes the problem. If it does, then remove one or the other and see which one makes the difference.
(In reply to Neil Brown from comment #10) > I doubt that rpc-statd.service is causing the proble, but I cannot > completely rule it out. > It is not straight forward to get statd shutdown working properly because it > only needs to remain until the last nfsv3 filesystem is unmounted, and there > is no way to tell systemd that. > It gets started by mount.nfs requesting it when it mounts an nfsv3 > filesystem. > You can use systemctl to add a 'Wants' or 'Requires' dependency dynamically > but I would need to add a 'Before' dependency (rpc-statd.sevice is Before > suse/werner.mount), and I don't think I can. > > Anyway, I'm not certain that is the problem. Let's experiment first. > > Werner: could you please add a > Before = remote-fs.target > dependency to each of autofs.service and rpc-statd.service and see if that > fixes the problem. > If it does, then remove one or the other and see which one makes the > difference. Done ... copied both services to /etc/systemd/system/, then removed the remote-fs.target from the After= line of autofs.service, and added in both services the Before = remote-fs.target now running daemon-reload
(In reply to Neil Brown from comment #10) > You can use systemctl to add a 'Wants' or 'Requires' dependency dynamically > but I would need to add a 'Before' dependency (rpc-statd.sevice is Before > suse/werner.mount), and I don't think I can. > At least creating a drop-in and reloading the daemon should work. But ordering rpc-statd.service against remote-fs.target seems a better idea except... > Anyway, I'm not certain that is the problem. Let's experiment first. > > Werner: could you please add a > Before = remote-fs.target ... that I would order it against "remote-fs-pre.target" instead since you want rpc-statd.service to be scheduled before any remote fs during boot and stopped after all remote fs are.
(In reply to Franck Bui from comment #12) > (In reply to Neil Brown from comment #10) > > You can use systemctl to add a 'Wants' or 'Requires' dependency dynamically > > but I would need to add a 'Before' dependency (rpc-statd.sevice is Before > > suse/werner.mount), and I don't think I can. > > > > At least creating a drop-in and reloading the daemon should work. > > But ordering rpc-statd.service against remote-fs.target seems a better idea > except... > > > Anyway, I'm not certain that is the problem. Let's experiment first. > > > > Werner: could you please add a > > Before = remote-fs.target > > ... that I would order it against "remote-fs-pre.target" instead since you > want rpc-statd.service to be scheduled before any remote fs during boot and > stopped after all remote fs are. This seems to work, indeed
Werner, could you find out which change of autofs.service or rpc-statd.service made a different (maybe both are needed) ?
(In reply to Franck Bui from comment #14) > Werner, could you find out which change of autofs.service or > rpc-statd.service made a different (maybe both are needed) ? IMHO it is both as autofs.service will be started after rpc-statd.service: /suse/werner> foreach s ( /etc/systemd/system/autofs.service /etc/systemd/system/rpc-statd.service ) foreach? diff -up /usr/lib/systemd/system/${s:t} $s foreach? end --- /usr/lib/systemd/system/autofs.service 2019-07-10 09:05:42.000000000 +0200 +++ /etc/systemd/system/autofs.service 2019-10-18 11:47:57.146252598 +0200 @@ -1,7 +1,8 @@ [Unit] Description=Automounts filesystems on demand -After=network.target ypbind.service sssd.service network-online.target remote-fs.target rpc-statd.service rpcbind.service +After=network.target ypbind.service sssd.service network-online.target rpc-statd.service rpcbind.service Wants=network-online.target rpc-statd.service rpcbind.service +Before=remote-fs-pre.target [Service] Type=notify --- /usr/lib/systemd/system/rpc-statd.service 2019-09-25 01:40:53.000000000 +0200 +++ /etc/systemd/system/rpc-statd.service 2019-10-18 11:46:25.065107398 +0200 @@ -6,6 +6,7 @@ Requires=nss-lookup.target rpcbind.socke Wants=network-online.target Wants=rpc-statd-notify.service After=network-online.target nss-lookup.target rpcbind.socket +Before=remote-fs-pre.target PartOf=nfs-utils.service
Neil, can you handle this bug since Werner's testing seem to confirm that some NFS services are incorrectly ordered ? Thanks.