We're no longer updating This wiki!!

Watchdog on Linux/Ubuntu

Background

Watchdog timers are commonly found in embedded systems and other computer-controlled equipment where humans cannot easily access the equipment or would be unable to react to faults in a timely manner. In such systems, the computer cannot depend on a human to reboot it if it hangs; it must be self-reliant.

ODROID-XU3/XU4 kernel support s3c2410_wdt module to control the Power Management Unit (PMU).

s3c2410_wdt driver is build as a loadable module so that watchdog daemon can configure the driver.

s3c2410_wdt could be loader with per-configurable parameters.

  • tmr_margin - Watchdog tmr_margin in seconds.
  • tmr_atboot - Watchdog is started at boot time if set to 1
  • nowayout - Watchdog cannot be stopped once started
  • soft_noboot - Watchdog action, set to 1 to ignore reboots

Note that the watchdog driver is available in the Kernel update 3.10.82-52 or higher.

odroid@odroid:~$ uname -a
Linux odroid 3.10.82-52 #1 SMP PREEMPT Thu Aug 27 11:45:33 BRT 2015 armv7l armv7l armv7l GNU/Linux

Test Watchdog module

Watchdog driver s3c2410_wdt is configurable for Odroid XU3/XU4.

# sudo modprobe s3c2410_wdt

You should be able to see /dev/watchdog and /dev/watchdog0 device files being created.

# ls -l /dev/watchdog*
crw------- 1 root root  10, 130 Aug 28 09:57 /dev/watchdog
crw------- 1 root root 253,   0 Aug 28 09:57 /dev/watchdog0

Watchdog daemon will trigger and reboot if we access the device file manually.

# cat /dev/watchdog
[ 7639.726211] watchdog watchdog0: watchdog did not stop!

To manually stop watchdog to reboot.

# echo V > /dev/watchdog

Install Watchdog daemon

To install watchdog daemon

sudo apt-get install watchdog

Create dir for watchdog logs files

sudo mkdir -p /var/log/watchdog

Remove the watchdog module from black list. /etc/modprobe.d/blacklist-watchdog.conf

#blacklist s3c2410_wdt

Append the default watchdog configuration. /etc/default/watchdog

# Start watchdog at boot time? 0 or 1
run_watchdog=1
# Start wd_keepalive after stopping watchdog? 0 or 1
run_wd_keepalive=1
# Load module before starting watchdog
watchdog_module=s3c2410_wdt
# Specify additional watchdog options here (see manpage).
watchdog_options="-s -v -c /etc/watchdog.conf"

Watchdog demon configuration files

Note: Watchdog drivers start automatically as it's buildin, but only if a watchdog daemon to configure the times.

You need to edit the /etc/watchdog.conf file to un-comment and so actually use the /dev/watchdog device access to the module. Otherwise the watchdog will not use the hardware and rely only on its internal code to soft-reboot a broken machine

$ cat /etc/watchdog.conf
#ping                   = 172.31.14.1
#ping                   = 172.26.1.255
#interface              = eth0
file                    = /var/log/syslog
#change                 = 1407

# Uncomment to enable test. Setting one of these values to '0' disables it.
# These values will hopefully never reboot your machine during normal use
# (if your machine is really hung, the loadavg will go much higher than 25)
#max-load-1             = 24
#max-load-5             = 18
#max-load-15            = 12

# Note that this is the number of pages!
# To get the real size, check how large the pagesize is on your machine.
#min-memory             = 1
#allocatable-memory     = 1

#repair-binary          = /usr/sbin/repair
#repair-timeout         =
#test-binary            =
#test-timeout           =

watchdog-device = /dev/watchdog

# Defaults compiled into the binary
#temperature-device     =
#max-temperature        = 120

# Defaults compiled into the binary
admin                   = root
interval                = 1
logtick                = 1
log-dir                = /var/log/watchdog

# This greatly decreases the chance that watchdog won't be scheduled before
# your machine is really loaded
#realtime                = yes
#priority                = 1

# Check if rsyslogd is still running by enabling the following line
#pidfile                = /var/run/rsyslogd.pid

# set watchdog timer
watchdog-timeout        = 15

# set heartbeat setting 
heartbeat-file = /var/log/watchdog/heartbeat.log
heartbeat-stamps = 300

For more configuration please follow link below. http://www.sat.dundee.ac.uk/psc/watchdog/watchdog-configure.html

Start Watchdog Service and Verify

Watchdog service somehow doesn't start automatically. For now if the service doesn't start, it can be started with small HACK.

root@odroidxu4m:~# cat /etc/rc.local
#!/bin/sh -e
#
# rc.local
#
# This script is executed at the end of each multiuser runlevel.
# Make sure that the script will "exit 0" on success or any other
# value on error.
#
# In order to enable or disable this script just change the execution
# bits.
#
# By default this script does nothing.

service watchdog restart

exit 0

Verify watchdog service in running correctly

root@odroidxu4m:~# service watchdog status
● watchdog.service - watchdog daemon
    Loaded: loaded (/lib/systemd/system/watchdog.service; static; vendor preset: enabled)
    Active: active (running) since Fri 2015-08-28 10:48:41 UTC; 2s ago
    Process: 4736 ExecStart=/bin/sh -c [ $run_watchdog != 1 ] || exec /usr/sbin/watchdog $watchdog_options (code=exited, status=0/SUCCESS)
    Main PID: 4738 (watchdog)
    CGroup: /system.slice/watchdog.service
               └─4738 /usr/sbin/watchdog -s -v -c /etc/watchdog.conf

  Aug 28 10:48:41 odroidxu4m watchdog[4738]: hardware watchdog identity: S3C2410 Watchdog
  Aug 28 10:48:41 odroidxu4m systemd[1]: Started watchdog daemon.
  Aug 28 10:48:41 odroidxu4m watchdog[4738]: current load is 0 0 0
  Aug 28 10:48:41 odroidxu4m watchdog[4738]: was able to ping process 2033 (/var/run/rsyslogd.pid).
  Aug 28 10:48:42 odroidxu4m watchdog[4738]: still alive after 1 interval(s)
  Aug 28 10:48:42 odroidxu4m watchdog[4738]: current load is 0 0 0
  Aug 28 10:48:42 odroidxu4m watchdog[4738]: was able to ping process 2033 (/var/run/rsyslogd.pid).
  Aug 28 10:48:43 odroidxu4m watchdog[4738]: still alive after 2 interval(s)
  Aug 28 10:48:43 odroidxu4m watchdog[4738]: current load is 0 0 0
  Aug 28 10:48:43 odroidxu4m watchdog[4738]: was able to ping process 2033 (/var/run/rsyslogd.pid).
en/odroid_linux_watchdog.txt · Last modified: 2022/01/02 22:39 (external edit)
CC Attribution-Share Alike 3.0 Unported
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0