Commands hang on an NFS mounted filesystem – OSNEXUS Customer Support

Subject:

On a hard-mounted file system, NFS operations are retried until they are acknowledged by the server. A side effect of hard-mounting NFS file systems is that processes block (or "hang") in a high-priority disk wait state until their NFS RPC calls complete.

If an NFS server goes down, the clients using its file systems hang if they reference these file systems before the server recovers. Using -intr in conjunction with the -hard mount option allows users to interrupt system calls that are blocked waiting on a crashed server. The system call is interrupted when the process making the call receives a signal, usually sent by the user typing Ctrl-C or using the kill command.

Detail:

** Difference Between NFS Soft And Hard Mount With Example

Using NFS protocol, the NFS client can mount the filesystem existing on a NFS server, just like a local filesystem. For example, you will be able to mount “/home” directory of host.nfs_server.com to your client machine as follows:

# mount host.nfs_server.com:/home /techhome

The directory “/techhome” should be created in your machine to hold the NFS partition. This NFS mount can be done in either as a “soft mount” or as a “hard mount”. These mount options define how the NFS client should handle NFS server crash/failure. In this article, we will see the difference between soft and hard mounts.

1. Soft Mount
Suppose you have mounted a NFS filesystem using “soft mount”. When a program or application requests a file from the NFS filesystem, NFS client daemons will try to retrieve the data from the NFS server. But, if it doesn’t get any response from the NFS server (due to any crash or failure of NFS server), the NFS client will report an error to the process on the client machine requesting the file access. The advantage of this mechanism is “fast responsiveness” as it doesn’t wait for the NFS server to respond. But the main disadvantage of this method is data corruption or loss of data. So this is not a recommended option to use.

mount -o rw,soft host.nf_server.com/home /techhome

2. Hard Mount
If you have mounted the NFS filesystem using hard mount, it will repeatedly retry to contact the server. Once the server is back online, the program will continue to execute undisturbed from the state where it was during server crash. We can use the mount option “intr” which allows NFS requests to be interrupted if the server goes down or cannot be reached. Hence the recommended settings are hard and intr options.

mount -o rw,hard,intr host.nf_server.com/home /techhome

** NFS shares hang with the following error(s):

kernel: nfs: server <servername> not responding, still trying

kernel: nfs: server <servername> not responding, timed out

Explanation of the Message
If the NFS client does not receive a response from the NFS server, the "server ... not responding, still trying" message may appear in syslog.
Each message indicates that one NFS/RPC request (for example, one NFS WRITE) has been sent retrans times and timed out each time. With the default options of retrans and timeo, this message will be printed after 180 seconds. For more information, see the retrans and timeo options in the NFS manual page ('man nfs').

Categories of Root Causes:

There are 3 possible categories of root causes:

Problem between the NFS Client and Server
Problem on the NFS Server
Problem on the NFS Client

Problem between the NFS Client and NFS Server
For example, overloaded, mis-configured, or malfunctioning switches, firewalls, or networks may cause NFS requests to get dropped or mangled between the NFS Client and NFS Server.

A problem on the NFS Server
For example, the NFS server is overloaded or contains a hardware or software bug which causes it to drop NFS requests.

A problem on the NFS Client
For example, the NFS Client networking misconfiguration, NIC driver or firmware bug causing NFS requests to be dropped, NFS Client firewall not allowing NFS traffic in our out.

Comments