Back

Tuxgraphics.org ethernet host watchdog

The software of routers and server does unfortunately occasionally fail. In many cases a reboot can remedy the problem for a while.

The host watchdog sends in intervals ping (icmp echo request) to a host and waits for the reply. It can also just monitor the pings that come from a host. This can even be used to add some application level checks using scripts.

After 10 intervals of no received ping or ping reply the watchdog resets the host. The watchdog goes then into a "passive" state to avoid rebooting duing startup of the host (e.g interrupting a filesystem check). In this passive state it will not issue a second reset even if the monitored host appears to be not responding to ping or not sending ping. Once the host answers it goes back to the active state. In this state it would reset the host if it suddenly fails again to respond.

Configuring the watchdog's own IP address

The default settings for the watchdog are:
 passwords: secret
IP address: 10.0.0.27
You can either change this in the code before you compile it or you can change it at run-time.

To change it at run-time you must first connect to the watchdog. You can use a straight ethernet cable and connect the watchdog directly to your PC. Re-configure the network interface of your PC to be a host in the same network (e.g 10.0.0.1).

You can also connect the watchdog somewere in the local lan and add a route on your PC. Under Linux that would be done with the command:
route add -host 10.0.0.27 dev eth0
Once you are connected open http://10.0.0.27/ go to the config menu and enter "secret" as "old pw". Change "new pw" and "own IP" as needed.

The main configuraton page of the watchdog


The status line:
Status: OK or amount of missing pings [reset cnt: How often a reset was initiated. This value goes back to zero on powerdown of the the watchdog, stopped|active|passive ]

The state stopped is shown if the watchdog is stopped via the "actions" menu. Active means the watchdog is ready to reset the host if needed. Passive means the host has not been reachable yet since last reset (or after powerdown of the watchdog).

Monitored IP is the ip-address of the host to watch. Pings from this host are counted as "host alive" and if the box "Send ping" is ticked then pings are also send to this IP address. It must be an address in the local lan and can not be behind a gateway.

The main page updates itself (http refesh) after 60sec. An immediate refresh can be done by clicking on "refresh page". Note that requests for webpages take presindence over pings. Clicking too often "refresh page" can actually cause sending of ping to fail.

Actions page

On the actions you can trigger an immediate reset of the system with the "reboot host now" button or stop the watchdog with the "stop watchdog now" button. To stope the watchdog is useful when performing maintenance on the monitored system. I the stopped state the watchdog will not reboot the monitored host.

To start the watchdog again after it was stopped just go back to the "actions page" and it will say "start watchdog now".

Config page

The config page is used to change the watchdog's own password and IP address settings.

Factory default

The watchdog can be reset to compile time default settings in the event that you forgott to what IP address or password you configured. To do this power down the watchdog and connect the pin PD6 from the microcontroller with GND while powering it up again. The watchdog will then be reachable at http://10.0.0.27/ as described above. Note that you have to actually go to the config page and submit the changes otherwise it will fall back (after power down) to the settings the watchdog had previously stored.


Inserting a temporary wire bridge to reset to factory defaults.

Thoughts on reliability and DOS attacks

The watchdog is only useful if a failure of the monitored host really causes it to not respond to ping. You can add some more sophisticated checks by only pinging the watchdog from the host (the "Send ping" box not checked). This way you can write a script which does some additional checks on the host:
#!/bin/sh
while true; do

# put your additional checks here (example check webserver is responding):
if w3m -dump_head http://localhost | grep Content-Type > /dev/null; then
    ping -c 1 -q -w 2 10.0.0.27
fi
# end of additional checks

sleep 25
done
A big problem for servers on the internet are DOS attacks where usually virus infected windows PCs are used to attack a server by overloading it with requests. In such a case host might not be responsive to the watchdog too. The chances for this to happen are a bit reduced because the watchdog will only hit after 10 reponse failures in a row. So make sure that your host do not totally lock-up when they are attacked. A second target could be the watchdog itself. The best protection is to not allow any external traffic towards the watchdog. This can e.g be done by only using private IP addresses between host and watchdog or by using a firewall.


Back

© tuxgraphics.org, K. Socher