Problem in NFS client code in RHEL 3 (kernel 2.4.21-20.ELsmp)

Problem in NFS client code in RHEL 3 (kernel 2.4.21-20.ELsmp)

Post by Tor Lillqv » Wed, 20 Oct 2004 22:10:30

e have had a support case for this open for close to a month now, but no
real reply from Red Hat... Are there any kernel NFS gurus here? (If not,
what would be a better forum?) Could you please comment on this issue, and
especially the proposed (quite small) patch below. Thanks in advance...

My initial service request text:

====cut here====
Summary: unlink doesn't work when multiple clients access same files on nfs

This problem requires relatively good understanding of kernel internals
dcache and NFS), so please assign to somebody who actually knows that.

We have a big problem with how the NFS client code in RHEL3 works. (No, this
isn't the same problem as in our two earlier requests, using UDP has cured
that. So read on.) The problem led to a huge unexpected increase in disk
for us after we moved an application from Solaris to RHEL3. We have now been
forced to move it partially back to Solaris servers.

The problem occurs when using a distributed application (ClearCase) where
applications on several client machines (RHEL or Solaris) talk to each other
using RPC and access the same files that are located on an NFS server. It
easily be reproduces without having ClearCase, though.

The scenario is as follows: Let's say we have two NFS client machines, A and
B. A runs RHEL3, B runs any Unix. They both mount a file system from an NFS
server (doesn't matter what kind, we have reproduced the problem with EMC
Celerra, NetApp and Solaris NFS servers).

What happens is this:

0) Let F be a filename on the NFS file system. Initially this file does not

1) The application on the RHEL3 machine A does a stat() on F. The NFS client
in the kernel sends a LOOKUP request to the NFS server, which obviously
returns failure. The stat() fails with ENOENT. OK so far.

2) Immediately afterwards (a few seconds max), the application on machine B
creates the file F. No problems so far.

3) When B is done with F, a few seconds later the application on machine A
does an unlink() on F. Because of the negative dentry caching in the Linux
kernel, it doesn't even bother to send an NFS REMOVE request to the NFS
server, as (it thinks) it knows for sure the file doesn't exist. It lets the
unlink() fail with ENOENT. But the file definitely exists.

The application now thinks that the file F doesn't exist any longer, and
track of it. This means ever increasing disk usage as the above scenario
happens all the time when we run ClearCase builds for our (large) software.

After we moved our view servers from Solaris to RHEL3, the disk usage of our
ClearCase view storage doubled in a few months from 150 GB to close to 300
This was a mystery to us until we found that the view storage file system
full of stranded files that weren't supposed to be left there, and that the
application didn't know of and thus couldn't clean out itself.

(In case you wonder why the applications work like that, well, that's how
ClearCase works. A is a view server, B is a build erver where clearmake jobs
(compilations) are run, and the file F is a view-private file created and
removed during the clearmake run. The view server first checks if F exists,
then B actually creates it, writes to it, then when it isn't needed any
the view server is supposed to remove it.)

It is very easy to demonstrate the problem without ClearCase: Just mount a
file system from a NFS server on two R

Problem in NFS client code in RHEL 3 (kernel 2.4.21-20.ELsmp)

Post by ptb » Wed, 20 Oct 2004 22:31:02

Well, I think that's right, isn't it? NFS has only weak unix semantics.
It's not required to get things "right" all the time. If you want to be
sure that nobody fails to delete a file they think isn't there, you
will have to get them to check that it is not there once again before
deleting/not deleting it, or make the server broadcast an invalidate
for the negative dentry cache entry when it makes a file, or make
the clients use a real locking mechanism instead of trying to
communicate via files, as you are doing now.

You can fix your application by telling machine A not to delete files
it doesn't think exist, but instead to tell B to do it.

I agree that this work probably should be handled by NFS, and it loks
like an unfortunate NFS bug. But you really want to talk to the
maintainer about it ... not me!