Differences
This shows you the differences between two versions of the page.
en:odroid_linux_watchdog [2016/05/02 16:53] moon.linux [Install Watchdog daemon] |
en:odroid_linux_watchdog [2022/01/02 22:39] |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Watchdog on Linux/Ubuntu ====== | ||
- | ===== Background ===== | ||
- | Watchdog timers are commonly found in embedded systems and other computer-controlled equipment where humans cannot easily access the equipment or would be unable to react to faults in a timely manner. In such systems, the computer cannot depend on a human to reboot it if it hangs; it must be self-reliant. | ||
- | |||
- | ODROID-XU3/XU4 kernel support s3c2410_wdt module to control the Power Management Unit (PMU). | ||
- | |||
- | s3c2410_wdt driver is build as a loadable module so that watchdog daemon can configure the driver. | ||
- | |||
- | s3c2410_wdt could be loader with per-configurable parameters. | ||
- | * tmr_margin - Watchdog tmr_margin in seconds. | ||
- | * tmr_atboot - Watchdog is started at boot time if set to 1 | ||
- | * nowayout - Watchdog cannot be stopped once started | ||
- | * soft_noboot - Watchdog action, set to 1 to ignore reboots | ||
- | |||
- | |||
- | **Note that the watchdog driver is available in the Kernel update 3.10.82-52 or higher.**\\ | ||
- | <code> | ||
- | odroid@odroid:~$ uname -a | ||
- | Linux odroid 3.10.82-52 #1 SMP PREEMPT Thu Aug 27 11:45:33 BRT 2015 armv7l armv7l armv7l GNU/Linux | ||
- | </code> | ||
- | |||
- | ===== Test Watchdog module ===== | ||
- | <WRAP center round important 100%> | ||
- | Watchdog driver s3c2410_wdt is configurable for Odroid XU3/XU4. | ||
- | </WRAP> | ||
- | <code> | ||
- | # sudo modprobe s3c2410_wdt | ||
- | </code> | ||
- | |||
- | You should be able to see /dev/watchdog and /dev/watchdog0 device files being created. | ||
- | |||
- | <code> | ||
- | # ls -l /dev/watchdog* | ||
- | crw------- 1 root root 10, 130 Aug 28 09:57 /dev/watchdog | ||
- | crw------- 1 root root 253, 0 Aug 28 09:57 /dev/watchdog0 | ||
- | </code> | ||
- | Watchdog daemon will trigger and reboot if we access the device file manually. | ||
- | |||
- | <code> | ||
- | # cat /dev/watchdog | ||
- | [ 7639.726211] watchdog watchdog0: watchdog did not stop! | ||
- | </code> | ||
- | |||
- | To manually stop watchdog to reboot. | ||
- | |||
- | <code> | ||
- | # echo V > /dev/watchdog | ||
- | </code> | ||
- | ===== Install Watchdog daemon ===== | ||
- | To install watchdog daemon | ||
- | <code> | ||
- | sudo apt-get install watchdog | ||
- | </code> | ||
- | |||
- | Create dir for watchdog logs files | ||
- | |||
- | <code> | ||
- | sudo mkdir -p /var/log/watchdog | ||
- | </code> | ||
- | |||
- | Remove the watchdog module from black list. | ||
- | **/etc/modprobe.d/blacklist-watchdog.conf** | ||
- | <code> | ||
- | #blacklist s3c2410_wdt | ||
- | </code> | ||
- | |||
- | Append the default watchdog configuration. | ||
- | **/etc/default/watchdog** | ||
- | <code> | ||
- | # Start watchdog at boot time? 0 or 1 | ||
- | run_watchdog=1 | ||
- | # Start wd_keepalive after stopping watchdog? 0 or 1 | ||
- | run_wd_keepalive=1 | ||
- | # Load module before starting watchdog | ||
- | watchdog_module="s3c2410_wdt" | ||
- | # Specify additional watchdog options here (see manpage). | ||
- | watchdog_options="-s -v -c /etc/watchdog.conf" | ||
- | |||
- | </code> | ||
- | |||
- | ===== Watchdog demon configuration files ===== | ||
- | **Note: Watchdog drivers start automatically as it's buildin, but only if a watchdog daemon to configure the times.** | ||
- | |||
- | You need to edit the **/etc/watchdog.conf** file to un-comment and so actually use the **/dev/watchdog** device access to the module. Otherwise the watchdog will not use the hardware and rely only on its internal code to soft-reboot a broken machine | ||
- | |||
- | <code> | ||
- | $ cat /etc/watchdog.conf | ||
- | #ping = 172.31.14.1 | ||
- | #ping = 172.26.1.255 | ||
- | #interface = eth0 | ||
- | file = /var/log/syslog | ||
- | #change = 1407 | ||
- | |||
- | # Uncomment to enable test. Setting one of these values to '0' disables it. | ||
- | # These values will hopefully never reboot your machine during normal use | ||
- | # (if your machine is really hung, the loadavg will go much higher than 25) | ||
- | #max-load-1 = 24 | ||
- | #max-load-5 = 18 | ||
- | #max-load-15 = 12 | ||
- | |||
- | # Note that this is the number of pages! | ||
- | # To get the real size, check how large the pagesize is on your machine. | ||
- | #min-memory = 1 | ||
- | #allocatable-memory = 1 | ||
- | |||
- | #repair-binary = /usr/sbin/repair | ||
- | #repair-timeout = | ||
- | #test-binary = | ||
- | #test-timeout = | ||
- | |||
- | watchdog-device = /dev/watchdog | ||
- | |||
- | # Defaults compiled into the binary | ||
- | #temperature-device = | ||
- | #max-temperature = 120 | ||
- | |||
- | # Defaults compiled into the binary | ||
- | admin = root | ||
- | interval = 1 | ||
- | logtick = 1 | ||
- | log-dir = /var/log/watchdog | ||
- | |||
- | # This greatly decreases the chance that watchdog won't be scheduled before | ||
- | # your machine is really loaded | ||
- | #realtime = yes | ||
- | #priority = 1 | ||
- | |||
- | # Check if rsyslogd is still running by enabling the following line | ||
- | #pidfile = /var/run/rsyslogd.pid | ||
- | |||
- | # set watchdog timer | ||
- | watchdog-timeout = 15 | ||
- | |||
- | # set heartbeat setting | ||
- | heartbeat-file = /var/log/watchdog/heartbeat.log | ||
- | heartbeat-stamps = 300 | ||
- | |||
- | </code> | ||
- | |||
- | For more configuration please follow link below. | ||
- | [[http://www.sat.dundee.ac.uk/psc/watchdog/watchdog-configure.html]] | ||
- | |||
- | ===== Start Watchdog Service and Verify ====== | ||
- | Watchdog service somehow doesn't start automatically. For now if the service doesn't start, it can be started with small HACK. | ||
- | |||
- | <code> | ||
- | root@odroidxu4m:~# cat /etc/rc.local | ||
- | #!/bin/sh -e | ||
- | # | ||
- | # rc.local | ||
- | # | ||
- | # This script is executed at the end of each multiuser runlevel. | ||
- | # Make sure that the script will "exit 0" on success or any other | ||
- | # value on error. | ||
- | # | ||
- | # In order to enable or disable this script just change the execution | ||
- | # bits. | ||
- | # | ||
- | # By default this script does nothing. | ||
- | |||
- | service watchdog restart | ||
- | |||
- | exit 0 | ||
- | </code> | ||
- | |||
- | Verify watchdog service in running correctly | ||
- | |||
- | <code> | ||
- | root@odroidxu4m:~# service watchdog status | ||
- | โ watchdog.service - watchdog daemon | ||
- | Loaded: loaded (/lib/systemd/system/watchdog.service; static; vendor preset: enabled) | ||
- | Active: active (running) since Fri 2015-08-28 10:48:41 UTC; 2s ago | ||
- | Process: 4736 ExecStart=/bin/sh -c [ $run_watchdog != 1 ] || exec /usr/sbin/watchdog $watchdog_options (code=exited, status=0/SUCCESS) | ||
- | Main PID: 4738 (watchdog) | ||
- | CGroup: /system.slice/watchdog.service | ||
- | โโ4738 /usr/sbin/watchdog -s -v -c /etc/watchdog.conf | ||
- | |||
- | Aug 28 10:48:41 odroidxu4m watchdog[4738]: hardware watchdog identity: S3C2410 Watchdog | ||
- | Aug 28 10:48:41 odroidxu4m systemd[1]: Started watchdog daemon. | ||
- | Aug 28 10:48:41 odroidxu4m watchdog[4738]: current load is 0 0 0 | ||
- | Aug 28 10:48:41 odroidxu4m watchdog[4738]: was able to ping process 2033 (/var/run/rsyslogd.pid). | ||
- | Aug 28 10:48:42 odroidxu4m watchdog[4738]: still alive after 1 interval(s) | ||
- | Aug 28 10:48:42 odroidxu4m watchdog[4738]: current load is 0 0 0 | ||
- | Aug 28 10:48:42 odroidxu4m watchdog[4738]: was able to ping process 2033 (/var/run/rsyslogd.pid). | ||
- | Aug 28 10:48:43 odroidxu4m watchdog[4738]: still alive after 2 interval(s) | ||
- | Aug 28 10:48:43 odroidxu4m watchdog[4738]: current load is 0 0 0 | ||
- | Aug 28 10:48:43 odroidxu4m watchdog[4738]: was able to ping process 2033 (/var/run/rsyslogd.pid). | ||
- | </code> |