Bug 1152762 - User stop jobs delayed/hanging due NFS shared HOME
User stop jobs delayed/hanging due NFS shared HOME
Status: IN_PROGRESS
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Basesystem
Current
x86-64 openSUSE Factory
: P5 - None : Normal (vote)
: ---
Assigned To: Neil Brown
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2019-10-02 09:54 UTC by Dr. Werner Fink
Modified: 2019-11-05 11:20 UTC (History)
3 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
suse-werner.mount.txt (4.88 KB, text/plain)
2019-10-08 12:02 UTC, Dr. Werner Fink
Details
/var/log/boot.old (20.05 KB, text/plain)
2019-10-16 06:43 UTC, Dr. Werner Fink
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dr. Werner Fink 2019-10-02 09:54:48 UTC
If I do a

  shudown -r

as user I seevery often a stop job running into timeout.  A few lines before the network is down before the NFS shared HOME becomes unmounted.

Can we *please* do this i correct order: first stop all jobs holding the NFS shared HOME busy, then umount the NFS shafred HOME, adn at very last action stop the network as well.
Comment 1 Dr. Werner Fink 2019-10-02 09:59:24 UTC
Reassign to systemd-maintainers and add maintainer of autofs as well to Carbon Copy list
Comment 2 Franck Bui 2019-10-04 14:14:51 UTC
Hi Werner,

Can you share /etc/fstab assuming that it's used to setup your NFS mountpoints ?
Comment 3 Dr. Werner Fink 2019-10-05 11:59:10 UTC
(In reply to Franck Bui from comment #2)
> Hi Werner,
> 
> Can you share /etc/fstab assuming that it's used to setup your NFS
> mountpoints ?

 noether /suse/werner> grep nfs /etc/fstab
 boole.suse.de:/usr/src/werner    /usr/src/werner nfs noauto,user,exec,rsize=32768,wsize=32768,namlen=255,retrans=6,sec=sys,udp 0 0
 noether /suse/werner> ypcat auto.suse | grep /werner
 -rw,nosuid     loki:/real-home/werner

the touble some seems to be the last one, that is from autofs
Comment 4 Franck Bui 2019-10-08 09:40:22 UTC
(In reply to Dr. Werner Fink from comment #3)
> the touble some seems to be the last one, that is from autofs

can you provide more details about that one and its config please ?

Also can you attach the result of "systemctl show xxx" after replacing xxx with the relevant mount unit ?
Comment 5 Dr. Werner Fink 2019-10-08 12:02:51 UTC
Created attachment 820786 [details]
suse-werner.mount.txt

(In reply to Franck Bui from comment #4)
> (In reply to Dr. Werner Fink from comment #3)
> > the touble some seems to be the last one, that is from autofs
> 
> can you provide more details about that one and its config please ?
> 
> Also can you attach the result of "systemctl show xxx" after replacing xxx
> with the relevant mount unit ?

Here we are
Comment 6 Dr. Werner Fink 2019-10-08 12:04:24 UTC
Config is that what automount does nothing more and nothing less
Comment 7 Franck Bui 2019-10-08 13:31:46 UTC
Can you provide the journal logs with the issue reproduced and with debug logs enabled ?

Thanks.
Comment 8 Dr. Werner Fink 2019-10-16 06:43:26 UTC
Created attachment 821627 [details]
/var/log/boot.old

(In reply to Franck Bui from comment #7)
> Can you provide the journal logs with the issue reproduced and with debug
> logs enabled ?
> 
> Thanks.

Hmmm ... AFAICS the default is MaxLevelStore=debug but at this point there is no parition anymore to store the logs(?)

Nevertheless the messages seems to be from manager_print_jobs_in_progress() in src/core/manager.c ...

        manager_status_printf(m, STATUS_TYPE_EPHEMERAL, cylon,
                              "%sA %s job is running for %s (%s / %s)",
                              strempty(job_of_n),
                              job_type_to_string(j->type),
                              unit_description(j->unit),
                              time, limit);

as this looks like the message I had seen (filled with values) ... I've running a blogd and this one shows the console log messages up to last final umount of /
Comment 9 Franck Bui 2019-10-17 05:53:37 UTC
           Stopping Session 2 of user werner.
  ...
  [**    ] A stop job is running for Session 2 of user werner (1min 29s / 1min 30s)
  [  OK  ] Stopped Session 2 of user werner.
  ...
  [  OK  ] Stopped target Remote File Systems.
           Unmounting /suse/werner...
           Unmounting /usr/src/werner...
  ...
  [***   ] (1 of 2) A stop job is running for /usr/src/werner (1min 36s / 3min 1s)
  [ ***  ] (2 of 2) A stop job is running for /suse/werner (1min 37s / 3min 1s)
  [  OK  ] Unmounted /suse/werner.
  [  OK  ] Unmounted /usr/src/werner.
  ...
  [  OK  ] Stopped target Network is Online.
  [  OK  ] Stopped target Network.

That is your user session is stopped before /suse/werner is unmounted and /suse/werner is unmounted before the network is shutdown.

However I'm not sure for the NFS client services as one can see that rpc-statd.service is stopped before the nfs mounts are but maybe that is expected. Also in the status messages I can't see any other NFS related services.

Neil, can you please check that the NFS client stuff is still up until the last remote fs is stopped ?

Thanks.
Comment 10 Neil Brown 2019-10-17 23:31:54 UTC
I doubt that rpc-statd.service is causing the proble, but I cannot completely rule it out.
It is not straight forward to get statd shutdown working properly because it only needs to remain until the last nfsv3 filesystem is unmounted, and there is no way to tell systemd that.
It gets started by mount.nfs requesting it when it mounts an nfsv3 filesystem. 
You can use systemctl to add a 'Wants' or 'Requires' dependency dynamically but I would need to add a 'Before' dependency (rpc-statd.sevice is Before suse/werner.mount), and I don't think I can.

Anyway, I'm not certain that is the problem.  Let's experiment first.

Werner: could you please add a 
  Before = remote-fs.target
dependency to each of autofs.service and rpc-statd.service and see if that fixes the problem.
If it does, then remove one or the other and see which one makes the difference.
Comment 11 Dr. Werner Fink 2019-10-18 05:54:42 UTC
(In reply to Neil Brown from comment #10)
> I doubt that rpc-statd.service is causing the proble, but I cannot
> completely rule it out.
> It is not straight forward to get statd shutdown working properly because it
> only needs to remain until the last nfsv3 filesystem is unmounted, and there
> is no way to tell systemd that.
> It gets started by mount.nfs requesting it when it mounts an nfsv3
> filesystem. 
> You can use systemctl to add a 'Wants' or 'Requires' dependency dynamically
> but I would need to add a 'Before' dependency (rpc-statd.sevice is Before
> suse/werner.mount), and I don't think I can.
> 
> Anyway, I'm not certain that is the problem.  Let's experiment first.
> 
> Werner: could you please add a 
>   Before = remote-fs.target
> dependency to each of autofs.service and rpc-statd.service and see if that
> fixes the problem.
> If it does, then remove one or the other and see which one makes the
> difference.

Done ... copied both services to /etc/systemd/system/, then removed the remote-fs.target from the After= line of autofs.service, and added in both services the
 Before = remote-fs.target
now running daemon-reload
Comment 12 Franck Bui 2019-10-18 09:25:19 UTC
(In reply to Neil Brown from comment #10)
> You can use systemctl to add a 'Wants' or 'Requires' dependency dynamically
> but I would need to add a 'Before' dependency (rpc-statd.sevice is Before
> suse/werner.mount), and I don't think I can.
> 

At least creating a drop-in and reloading the daemon should work.

But ordering rpc-statd.service against remote-fs.target seems a better idea except...

> Anyway, I'm not certain that is the problem.  Let's experiment first.
> 
> Werner: could you please add a 
>   Before = remote-fs.target

... that I would order it against "remote-fs-pre.target" instead since you want rpc-statd.service to be scheduled before any remote fs during boot and stopped after all remote fs are.
Comment 13 Dr. Werner Fink 2019-10-23 10:39:07 UTC
(In reply to Franck Bui from comment #12)
> (In reply to Neil Brown from comment #10)
> > You can use systemctl to add a 'Wants' or 'Requires' dependency dynamically
> > but I would need to add a 'Before' dependency (rpc-statd.sevice is Before
> > suse/werner.mount), and I don't think I can.
> > 
> 
> At least creating a drop-in and reloading the daemon should work.
> 
> But ordering rpc-statd.service against remote-fs.target seems a better idea
> except...
> 
> > Anyway, I'm not certain that is the problem.  Let's experiment first.
> > 
> > Werner: could you please add a 
> >   Before = remote-fs.target
> 
> ... that I would order it against "remote-fs-pre.target" instead since you
> want rpc-statd.service to be scheduled before any remote fs during boot and
> stopped after all remote fs are.

This seems to work, indeed
Comment 14 Franck Bui 2019-10-23 12:32:51 UTC
Werner, could you find out which change of autofs.service or rpc-statd.service made a different (maybe both are needed) ?
Comment 15 Dr. Werner Fink 2019-10-23 12:38:38 UTC
(In reply to Franck Bui from comment #14)
> Werner, could you find out which change of autofs.service or
> rpc-statd.service made a different (maybe both are needed) ?

IMHO it is both as autofs.service will be started after rpc-statd.service:

/suse/werner> foreach s ( /etc/systemd/system/autofs.service /etc/systemd/system/rpc-statd.service )
foreach? diff -up /usr/lib/systemd/system/${s:t} $s
foreach? end
--- /usr/lib/systemd/system/autofs.service      2019-07-10 09:05:42.000000000 +0200
+++ /etc/systemd/system/autofs.service  2019-10-18 11:47:57.146252598 +0200
@@ -1,7 +1,8 @@
 [Unit]
 Description=Automounts filesystems on demand
-After=network.target ypbind.service sssd.service network-online.target remote-fs.target rpc-statd.service rpcbind.service
+After=network.target ypbind.service sssd.service network-online.target rpc-statd.service rpcbind.service
 Wants=network-online.target rpc-statd.service rpcbind.service
+Before=remote-fs-pre.target
 
 [Service]
 Type=notify
--- /usr/lib/systemd/system/rpc-statd.service   2019-09-25 01:40:53.000000000 +0200
+++ /etc/systemd/system/rpc-statd.service       2019-10-18 11:46:25.065107398 +0200
@@ -6,6 +6,7 @@ Requires=nss-lookup.target rpcbind.socke
 Wants=network-online.target
 Wants=rpc-statd-notify.service
 After=network-online.target nss-lookup.target rpcbind.socket
+Before=remote-fs-pre.target
 
 PartOf=nfs-utils.service
Comment 16 Franck Bui 2019-11-05 11:20:34 UTC
Neil, can you handle this bug since Werner's testing seem to confirm that some NFS services are incorrectly ordered ?

Thanks.