Bugzilla – Bug 1068916
[opena] nfs_client autofs test fails
Last modified: 2017-11-29 00:36:56 UTC
Identified by openQA:
the nfs_client tests fail, when using autofs (manual nfs mount seems to be fine)
Doing some manual debug, autofs is starting up correctly and does supposedly not give any error.
There is no difference in the test run attempting 'nfs' or 'nfs4' mounts
The big change in snapshot 1118 was the upgrade of linux kernel to version 4.14; this might be related
Moving from Kubic to Kernel: kubic is just the product that showed it in a test, but it is definitively not the component at fault
Can you give me access to an instance showing this issue? This looks more like a connectivity (or firewall) issue than an NFS/autofs issue.
(In reply to Jeff Mahoney from comment #3)
> Can you give me access to an instance showing this issue? This looks more
> like a connectivity (or firewall) issue than an NFS/autofs issue.
The problem with the openQA worker is that they are on an isolated network. an ssh jump host is the only chance to access them. But we can arrange for something (let's discuss explicits on IRC, quicker turnarounds)
mounting the NFS share manually, or by means of fstab, works.. which implies it not to be a network/firewall issue (unless autofs does something more). Also, the same test ran on the same network config until snapshot 1117; 1118 was the one receiving kernel 4.14
Thanks to Dominique, I was able to get remote access to the worker. It looks like this is fallout from:
Author: Ian Kent <email@example.com>
Date: Fri Sep 8 16:16:24 2017 -0700
autofs: fix AT_NO_AUTOMOUNT not being honored
The fstatat(2) and statx() calls can pass the flag AT_NO_AUTOMOUNT which
is meant to clear the LOOKUP_AUTOMOUNT flag and prevent triggering of an
automount by the call. But this flag is unconditionally cleared for all
stat family system calls except statx().
stat family system calls have always triggered mount requests for the
negative dentry case in follow_automount() which is intended but prevents
the fstatat(2) and statx() AT_NO_AUTOMOUNT case from being handled.
In order to handle the AT_NO_AUTOMOUNT for both system calls the negative
dentry case in follow_automount() needs to be changed to return ENOENT
when the LOOKUP_AUTOMOUNT flag is clear (and the other required flags are
AFAICT this change doesn't have any noticable side effects and may, in
some use cases (although I didn't see it in testing) prevent unnecessary
callbacks to the automount daemon.
It's also possible that a stat family call has been made with a path that
is in the process of being mounted by some other process. But stat family
calls should return the automount state of the path as it is "now" so it
shouldn't wait for mount completion.
This is the same semantic as the positive dentry case already handled.
Fixes: deccf497d804a4c5fca ("Make stat/lstat/fstatat pass AT_NO_AUTOMOUNT to vfs_statx()")
Signed-off-by: Ian Kent <firstname.lastname@example.org>
Cc: David Howells <email@example.com>
Cc: Colin Walters <firstname.lastname@example.org>
Cc: Ondrej Holy <email@example.com>
Signed-off-by: Andrew Morton <firstname.lastname@example.org>
Signed-off-by: Linus Torvalds <email@example.com>
... which is intentional. The test case will either need a trailing slash on the directory or to "cd" into it before listing the files.
https://marc.info/?l=linux-kernel&m=150223957421270&w=2 also covers some of the history as to why stat() on an automount dir doesn't result in the mount action occurring. There's no way to tell whether it's a single stat call or one as part of an "ls -l" on the automount directory. The result if we *did* perform a stat on every entry would be that each entry would have to be mounted. One simple maps like the one in the test case, this doesn't seem like a big deal. We have customer reports of sites using 30k mounts in their maps. Attempting to mount each one of those would involve substantial (wasted) effort.
I've discussed the test case with Dominique on IRC and we've agreed that's where the problem lies. I'll close this report as INVALID.
This is a behavioural change that I was in favour of when it went it, but since I have started to wonder.
Various programs that want to act on a directory will 'stat' it first.
If they get ENOENT, they don't bother to open(). In the case of autofs (after that change), the open would succeed, but is never tried.
Now that we have "O_PATH" for open(), it would be safe to change the programs to use open(..., O_PATH) && fstat() rather than stat(), but that hasn't been done.
It would be really nice if the 'stat' would check with autofs and create an empty directory. Only if that directory was then opened (or chdir or whatever) would the mount need to happen. That would allow current applications to work the same way, but without triggering mounts when they aren't wanted.
However it would mean two round-trips to automount daemon for each indirect mount. One to create the directory and another to perform the mount.
Doing this would mean reverting the kernel patch and patching the automount daemon instead. I wonder if that is possible...
(In reply to Neil Brown from comment #7)
> It would be really nice if the 'stat' would check with autofs and create an
> empty directory. Only if that directory was then opened (or chdir or
> whatever) would the mount need to happen. That would allow current
> applications to work the same way, but without triggering mounts when they
> aren't wanted.
This doesn't really work either. The stat will succeed - but the results will be meaningless.
The results will be "This is a directory" which is all that these applications care about.
This is exactly how a direct automount has always worked, and how indirect automounts work if "browse_mode" is enabled in autofs.conf and if the map type supports browsing.
If you 'stat()' the auto mountpoint, you see the autofs directory. It has a small size and root (probably) ownership etc, but it is a directory.
open() etc will trigger the mount and make the "correct" attributes available.
> It looks like this is fallout from:
> commit 42f46148217865a545e129612075f3d828a2c4e4
Just FYI, akpm has just queued a patch which reverts this patch together with the appropriate Fixes tag. Once that filter through, the test should stop failing.