This is an old revision of the document!
Table of Contents
Watchdog on Linux/Ubuntu
Background
Watchdog timers are commonly found in embedded systems and other computer-controlled equipment where humans cannot easily access the equipment or would be unable to react to faults in a timely manner. In such systems, the computer cannot depend on a human to reboot it if it hangs; it must be self-reliant.
ODROID-XU3/XU4 kernel support s3c2410_wdt module to control the Power Management Unit (PMU).
s3c2410_wdt driver is build as a loadable module so that watchdog daemon can configure the driver.
s3c2410_wdt could be loader with per-configurable parameters.
- tmr_margin - Watchdog tmr_margin in seconds.
- tmr_atboot - Watchdog is started at boot time if set to 1
- nowayout - Watchdog cannot be stopped once started
- soft_noboot - Watchdog action, set to 1 to ignore reboots
Note that the watchdog driver is available in the Kernel update 3.10.82-52 or higher.
odroid@odroid:~$ uname -a Linux odroid 3.10.82-52 #1 SMP PREEMPT Thu Aug 27 11:45:33 BRT 2015 armv7l armv7l armv7l GNU/Linux
Test Watchdog module
Watchdog driver s3c2410_wdt is configurable for Odroid XU3/XU4.
# sudo modprobe s3c2410_wdt
You should be able to see /dev/watchdog and /dev/watchdog0 device files being created.
# ls -l /dev/watchdog* crw------- 1 root root 10, 130 Aug 28 09:57 /dev/watchdog crw------- 1 root root 253, 0 Aug 28 09:57 /dev/watchdog0
Watchdog daemon will trigger and reboot if we access the device file manually.
# cat /dev/watchdog [ 7639.726211] watchdog watchdog0: watchdog did not stop!
To manually stop watchdog to reboot.
# echo V > /dev/watchdog
Install Watchdog daemon
To install watchdog daemon
sudo apt-get install watchdog
Create dir for watchdog logs files
sudo mkdir -p /var/log/watchdog
Watchdog demon configuration files
Note: that Watchdog drivers should not be loaded automatically, but only if a watchdog daemon is installed. Unless watchdog will activate during boot and system will restart.
In order to activate the watchdog driver, remove or comment out the line of the driver name. For ODROID-XU3 or ODROID-XU4, the driver name is s3c2410_wdt.
$ cat /etc/modprobe.d/blacklist-watchdog.conf blacklist pc87413_wdt blacklist pcwd blacklist pcwd_pci blacklist pcwd_usb # blacklist s3c2410_wdt blacklist sa1100_wdt blacklist sbc60xxwdt blacklist sbc7240_wdt blacklist sb8360 blacklist sc1200wdt
In order to configure watchdog service you need to enable the module
cat /etc/default/watchdog # Start watchdog at boot time? 0 or 1 run_watchdog=1 # Start wd_keepalive after stopping watchdog? 0 or 1 run_wd_keepalive=1 # Load module before starting watchdog watchdog_module="s3c2410_wdt" # Specify additional watchdog options here (see manpage). watchdog_options="-s -v -c /etc/watchdog.conf"
You need to edit the /etc/watchdog.conf file to un-comment and so actually use the /dev/watchdog device access to the module. Otherwise the watchdog will not use the hardware and rely only on its internal code to soft-reboot a broken machine
$ cat /etc/watchdog.conf #ping = 172.31.14.1 #ping = 172.26.1.255 #interface = eth0 file = /var/log/syslog #change = 1407 # Uncomment to enable test. Setting one of these values to '0' disables it. # These values will hopefully never reboot your machine during normal use # (if your machine is really hung, the loadavg will go much higher than 25) #max-load-1 = 24 #max-load-5 = 18 #max-load-15 = 12 # Note that this is the number of pages! # To get the real size, check how large the pagesize is on your machine. #min-memory = 1 #allocatable-memory = 1 #repair-binary = /usr/sbin/repair #repair-timeout = #test-binary = #test-timeout = watchdog-device = /dev/watchdog # Defaults compiled into the binary #temperature-device = #max-temperature = 120 # Defaults compiled into the binary admin = root interval = 1 logtick = 1 log-dir = /var/log/watchdog # This greatly decreases the chance that watchdog won't be scheduled before # your machine is really loaded realtime = yes priority = 1 # Check if rsyslogd is still running by enabling the following line #pidfile = /var/run/rsyslogd.pid # set watchdog timer watchdog-timeout = 15 # set heartbeat setting heartbeat-file = /var/log/watchdog/heartbeat.log heartbeat-stamps = 300
For more configuration please follow link below. http://www.sat.dundee.ac.uk/psc/watchdog/watchdog-configure.html
Start Watchdog Service and Verify
Watchdog service somehow doesn't start automatically. For now if the service doesn't start, it can be started with small HACK.
root@odroidxu4m:~# cat /etc/rc.local #!/bin/sh -e # # rc.local # # This script is executed at the end of each multiuser runlevel. # Make sure that the script will "exit 0" on success or any other # value on error. # # In order to enable or disable this script just change the execution # bits. # # By default this script does nothing. service watchdog restart exit 0
Verify watchdog service in running correctly
root@odroidxu4m:~# service watchdog status ● watchdog.service - watchdog daemon Loaded: loaded (/lib/systemd/system/watchdog.service; static; vendor preset: enabled) Active: active (running) since Fri 2015-08-28 10:48:41 UTC; 2s ago Process: 4736 ExecStart=/bin/sh -c [ $run_watchdog != 1 ] || exec /usr/sbin/watchdog $watchdog_options (code=exited, status=0/SUCCESS) Main PID: 4738 (watchdog) CGroup: /system.slice/watchdog.service └─4738 /usr/sbin/watchdog -s -v -c /etc/watchdog.conf Aug 28 10:48:41 odroidxu4m watchdog[4738]: hardware watchdog identity: S3C2410 Watchdog Aug 28 10:48:41 odroidxu4m systemd[1]: Started watchdog daemon. Aug 28 10:48:41 odroidxu4m watchdog[4738]: current load is 0 0 0 Aug 28 10:48:41 odroidxu4m watchdog[4738]: was able to ping process 2033 (/var/run/rsyslogd.pid). Aug 28 10:48:42 odroidxu4m watchdog[4738]: still alive after 1 interval(s) Aug 28 10:48:42 odroidxu4m watchdog[4738]: current load is 0 0 0 Aug 28 10:48:42 odroidxu4m watchdog[4738]: was able to ping process 2033 (/var/run/rsyslogd.pid). Aug 28 10:48:43 odroidxu4m watchdog[4738]: still alive after 2 interval(s) Aug 28 10:48:43 odroidxu4m watchdog[4738]: current load is 0 0 0 Aug 28 10:48:43 odroidxu4m watchdog[4738]: was able to ping process 2033 (/var/run/rsyslogd.pid).