Bug 1068916 - [opena][20171118] nfs_client autofs test fails
[opena][20171118] nfs_client autofs test fails
Status: RESOLVED INVALID
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel
Current
Other Other
: P5 - None : Normal (vote)
: ---
Assigned To: E-mail List
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2017-11-20 11:24 UTC by Dominique Leuenberger
Modified: 2017-11-29 00:36 UTC (History)
3 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dominique Leuenberger 2017-11-20 11:24:13 UTC
Identified by openQA:

https://openqa.opensuse.org/tests/536307#step/nfs_client/7

the nfs_client tests fail, when using autofs (manual nfs mount seems to be fine)
Doing some manual debug, autofs is starting up correctly and does supposedly not give any error.

There is no difference in the test run attempting 'nfs' or 'nfs4' mounts
Comment 1 Dominique Leuenberger 2017-11-20 11:25:10 UTC
The big change in snapshot 1118 was the upgrade of linux kernel to version 4.14; this might be related
Comment 2 Dominique Leuenberger 2017-11-20 12:34:40 UTC
Moving from Kubic to Kernel: kubic is just the product that showed it in a test, but it is definitively not the component at fault
Comment 3 Jeff Mahoney 2017-11-20 13:54:08 UTC
Can you give me access to an instance showing this issue?  This looks more like a connectivity (or firewall) issue than an NFS/autofs issue.
Comment 4 Dominique Leuenberger 2017-11-20 13:58:38 UTC
(In reply to Jeff Mahoney from comment #3)
> Can you give me access to an instance showing this issue?  This looks more
> like a connectivity (or firewall) issue than an NFS/autofs issue.

The problem with the openQA worker is that they are on an isolated network. an ssh jump host is the only chance to access them. But we can arrange for something (let's discuss explicits on IRC, quicker turnarounds)

mounting the NFS share manually, or by means of fstab, works.. which implies it not to be a network/firewall issue (unless autofs does something more). Also, the same test ran on the same network config until snapshot 1117; 1118 was the one receiving kernel 4.14
Comment 5 Jeff Mahoney 2017-11-20 15:03:05 UTC
Thanks to Dominique, I was able to get remote access to the worker.  It looks like this is fallout from:

commit 42f46148217865a545e129612075f3d828a2c4e4
Author: Ian Kent <raven@themaw.net>
Date:   Fri Sep 8 16:16:24 2017 -0700

    autofs: fix AT_NO_AUTOMOUNT not being honored

    The fstatat(2) and statx() calls can pass the flag AT_NO_AUTOMOUNT which
    is meant to clear the LOOKUP_AUTOMOUNT flag and prevent triggering of an
    automount by the call.  But this flag is unconditionally cleared for all
    stat family system calls except statx().

    stat family system calls have always triggered mount requests for the
    negative dentry case in follow_automount() which is intended but prevents
    the fstatat(2) and statx() AT_NO_AUTOMOUNT case from being handled.

    In order to handle the AT_NO_AUTOMOUNT for both system calls the negative
    dentry case in follow_automount() needs to be changed to return ENOENT
    when the LOOKUP_AUTOMOUNT flag is clear (and the other required flags are
    clear).

    AFAICT this change doesn't have any noticable side effects and may, in
    some use cases (although I didn't see it in testing) prevent unnecessary
    callbacks to the automount daemon.

    It's also possible that a stat family call has been made with a path that
    is in the process of being mounted by some other process.  But stat family
    calls should return the automount state of the path as it is "now" so it
    shouldn't wait for mount completion.

    This is the same semantic as the positive dentry case already handled.

    Link: http://lkml.kernel.org/r/150216641255.11652.4204561328197919771.stgit@pluto.themaw.net
    Fixes: deccf497d804a4c5fca ("Make stat/lstat/fstatat pass AT_NO_AUTOMOUNT to vfs_statx()")
    Signed-off-by: Ian Kent <raven@themaw.net>
    Cc: David Howells <dhowells@redhat.com>
    Cc: Colin Walters <walters@redhat.com>
    Cc: Ondrej Holy <oholy@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


... which is intentional.  The test case will either need a trailing slash on the directory or to "cd" into it before listing the files.
Comment 6 Jeff Mahoney 2017-11-20 15:08:58 UTC
https://marc.info/?l=linux-kernel&m=150223957421270&w=2 also covers some of the history as to why stat() on an automount dir doesn't result in the mount action occurring.  There's no way to tell whether it's a single stat call or one as part of an "ls -l" on the automount directory.  The result if we *did* perform a stat on every entry would be that each entry would have to be mounted.  One simple maps like the one in the test case, this doesn't seem like a big deal.  We have customer reports of sites using 30k mounts in their maps.  Attempting to mount each one of those would involve substantial (wasted) effort.

I've discussed the test case with Dominique on IRC and we've agreed that's where the problem lies.  I'll close this report as INVALID.
Comment 7 Neil Brown 2017-11-21 01:34:33 UTC
This is a behavioural change that I was in favour of when it went it, but since I have started to wonder.

Various programs that want to act on a directory will 'stat' it first.
If they get ENOENT, they don't bother to open().  In the case of autofs (after that change), the open would succeed, but is never tried.

Now that we have "O_PATH" for open(), it would be safe to change the programs to use open(..., O_PATH) && fstat()  rather than stat(), but that hasn't been done.


It would be really nice if the 'stat' would check with autofs and create an empty directory.  Only if that directory was then opened (or chdir or whatever) would the mount need to happen.  That would allow current applications to work the same way, but without triggering mounts when they aren't wanted.

However it would mean two round-trips to automount daemon for each indirect mount.  One to create the directory and another to perform the mount.

Doing this would mean reverting the kernel patch and patching the automount daemon instead.  I wonder if that is possible...
Comment 8 Jeff Mahoney 2017-11-21 02:20:49 UTC
(In reply to Neil Brown from comment #7)
> It would be really nice if the 'stat' would check with autofs and create an
> empty directory.  Only if that directory was then opened (or chdir or
> whatever) would the mount need to happen.  That would allow current
> applications to work the same way, but without triggering mounts when they
> aren't wanted.

This doesn't really work either.  The stat will succeed - but the results will be meaningless.
Comment 9 Neil Brown 2017-11-21 02:54:17 UTC
The results will be "This is a directory" which is all that these applications care about.
This is exactly how a direct automount has always worked, and how indirect automounts work if "browse_mode" is enabled in autofs.conf and if the map type supports browsing.
If you 'stat()' the auto mountpoint, you see the autofs directory.  It has a small size and root (probably) ownership etc, but it is a directory.
open() etc will trigger the mount and make the "correct" attributes available.
Comment 10 Neil Brown 2017-11-29 00:36:56 UTC
> It looks like this is fallout from:
> 
> commit 42f46148217865a545e129612075f3d828a2c4e4

Just FYI, akpm has just queued a patch which reverts this patch together with the appropriate Fixes tag.  Once that filter through, the test should stop failing.