What does the "Network Problem Detected" message mean – OSNEXUS Customer Support

Subject:

The QuantaStor system will periodically check the "ifconfig" output for dropped packets and will generate an alert like the one below if any are detected:

"alert_manager:100} Alert 'Network Problem Detected' : 'Dropped packets were detected on network port bond1"

"alert_manager:100} Alert 'Network Problem Detected' :"Overruns were detected on network port eth1"

Details:

The QuantaStor system will check for dropped packets every 10 minutes and send an alert if between discovery cycles we see an increase of 5000 dropped packets or more on read or transmit. An alert will then be raised and will not alert again until another 5000 dropped packets are detected.

The alert behavior can be modified by editing the "/etc/quantastor.conf" file and adding the entries below and changing the value of the "dropped_throttling" variable:

[alerts]
dropped_throttling=5000

Possible cause:

You can run the command 'netstat -ni ' to get an overview of errors on all your network cards.

The fields you will want to focus on are 'RX-DRP' and 'RX-OVR', see below for more detail:

RX-DRP implies the appliance is dropping packets at the network. If the ratio of RX-DRP to RX-OK is greater than 0.1% attention is required.

RX-OVR = the number of times the receiver hardware was unable to hand received data to a hardware buffer - the internal FIFO buffer of the chip is full, but is still tries to handle incoming traffic ; most likely, the input rate of traffic exceeded the ability of the receiver to handle the data.

The "dropped" field counts things like unintended VLAN tags or receiving IPv6 frames when the interface is not configured for IPv6.
Packet Dropped seen from i'fconfig' or 'netstat' could be due to many reasons.
Some possible things to check would be cable/hardware/duplex issues

Below are some general reasons for "dropped" packets:

- NIC ring buffers getting full and unable to cope-up with incoming bursts of traffic
- CPU receiving NIC interrupts is very busy and unable to process
- some cable/hardware/duplex issues
- some bug in NIC driver

Note:

These errors can also be caused by overloading the network with replication jobs and can be resolved by throttling replication rate.

This can be done with the "Replication Rate Limit" throttling mechanism using the steps below:

• The "qs-util rratelimitset NN" command can be used to impose an artificial limit on the replication tasks for the system overall. This is intended to limit disk and network usage to a value below the limit on both the Primary and Secondary(Destination) Replication units.

The default Replication Throttle rate is 30MB/s.

Use the "qs-util rratelimitget" to verify current settings.

• Please note that this value is used for all replication tasks on a system and is taken in real time and divided up by any replication tasks running on the system.

If you have a rate limit of 200 but there are two replication tasks running, each task will be throttled to a maximum of 100MB/s of throughput.

When one of the replication tasks finishes, the remaining replication task will have a maximum throughput limit of 200MB/s. The same is true if more replication tasks are started, if a Replication task is running with a 200MB/s limit, and three other tasks are started, all four tasks will now each have a maximum throughput of 50MB/s.

Please review the link below for additional information:

QuantaStor_Admin_Guide#Remote_Replication_Bandwidth_Throttling

Comments