Calling a sd_notify(0, "WATCHDOG=1") in a service - notify

I have a sys d service. I want to implement a watch dog for that.
It's something like,
[Unit]
Description=Watchdog example service
[Service]
Type=notify
Environment=NOTIFY_SOCKET=/run/%p.sock
ExecStartPre=-/usr/bin/docker kill %p
ExecStartPre=-/usr/bin/docker rm %p
ExecStart=/usr/libexec/sdnotify-proxy /run/%p.sock /usr/bin/docker run \
--env=NOTIFY_SOCKET=/run/%p.sock \
--name %p pranav93/test_watchdogged python hello.py
ExecStop=/usr/bin/docker stop %p
Restart=on-success
WatchdogSec=30s
RestartSec=30s
[Install]
WantedBy=multi-user.target
According to docs I've to call sd_notify("watchdog=1") every half of the interval specified (In this case it's 15s). But I've no idea how to call that function in a service. Help would be highly appreciated.

I had to install systemd lib:
sudo apt-get install libsystemd-dev
And compile the program passing it to linker:
gcc testWatchDogProcess.c -o testWatchDogProcess -lsystemd
I´ve made some changes in #rameshrgtvl code to make it run directly without any warning or errors.
#include <systemd/sd-daemon.h>
#include <fcntl.h>
#include <time.h>
/* This should be sent once you are done with your initialization */
/* Until you call this systemd will keep your service as activating status */
/* Once you called, systemd will change the status of ur service to active */
#define true 1
int main ()
{
sd_notify (0, "READY=1");
/* Way to get the WatchdogSec value from service file */
char * env;
int interval=0;
int isRun = true;
env = getenv("WATCHDOG_USEC");
if (env)
{
interval = atoi(env)/(2*1000000);
}
/* Ping systsemd once you are done with Init */
sd_notify (0, "WATCHDOG=1");
/* Now go for periodic notification */
while(isRun == true)
{
sleep(interval);
/* Way to ping systemd */
sd_notify (0, "WATCHDOG=1");
}
return 0;
}

Sample Service File
[Unit]
Description=Test watchdog Demo process
DefaultDependencies=false
Requires=basic.target
[Service]
Type=notify
WatchdogSec=10s
ExecStart=/usr/bin/TestWatchDogProcess
StartLimitInterval=5min
StartLimitBurst=5
StartLimitAction=reboot
Restart=always
Sample Code For testWatchDogProcess.c:
#include "systemd/sd-daemon.h"
#include <fcntl.h>
#include <time.h>
/* This should be sent once you are done with your initialization */
/* Until you call this systemd will keep your service as activating status */
/* Once you called, systemd will change the status of ur service to active */
sd_notify (0, "READY=1");
/* Way to get the WatchdogSec value from service file */
env = getenv("WATCHDOG_USEC");
if(env != NULL)
int interval = atoi(env)/(2*1000000);
/* Ping systsemd once you are done with Init */
sd_notify (0, "WATCHDOG=1");
/* Now go for periodic notification */
while(isRun == true)
{
sleep(interval);
/* Way to ping systemd */
sd_notify (0, "WATCHDOG=1");
}
return 0;
}
Note: Based on your systemd version please take care to include proper header and library during compilation.

sd_notify(0,"WATCHDOG=1") is a API for notifying systemd that your process is working fine.
As Type=notify has been used, sd_notify(0,"WATCHDOG=1") should be called in your application not in service and this must be called at regular interval(before 30 sec as WatchdogSec=30s is mentioned in your service file) so that systemd get notified else,
systemd will think of this as failed service and hence systemd will kill your service and restart it.

Related

Pktgen failed to load Lua script - User State for CLI not set for Lua

Running Pktgen with Lua scripts generates a failure User State for CLI not set for Lua.
The command I'm running is:
sudo -E ./usr/local/bin/pktgen --no-telemetry -l 4,6,8,10 -n 4 -a 0000:03:02.0 -m 1024 -- -T -P -m [6:8].0 -f test/hello-world.lua
Which gives the following result:
I have set the environment variables RTE_SDK=/root/Program/dpdk and
RTE_TARGET=x86_64-native-linux-gcc.
The failure comes from cli.c (link to code):
/**
* Load and execute a command file or Lua script file.
*
*/
int
cli_execute_cmdfile(const char *filename)
{
if (filename == NULL)
return 0;
gb_reset_buf(this_cli->gb);
if (strstr(filename, ".lua") || strstr(filename, ".LUA") ) {
if (!this_cli->user_state) {
cli_printf(">>> User State for CLI not set for Lua\n");
return -1;
}
if (lua_dofile) {
/* Execute the Lua script file. */
if (lua_dofile(this_cli->user_state, filename) != 0)
return -1;
} else
cli_printf(">>> Lua is not enabled in configuration!\n");
} else {
FILE *fd;
char buff[256];
fd = fopen(filename, "r");
if (fd == NULL)
return -1;
/* Read and feed the lines to the cmdline parser. */
while (fgets(buff, sizeof(buff), fd))
cli_input(buff, strlen(buff));
fclose(fd);
}
return 0;
}
So it would seem this_cli->user_state is not set, but how do you set it?
I have looked through the documentation for CLI, but it doesn't mention setting any user state from what I can see.
Update:
After having installed Lua and cmake, I ran meson -Denable_lua=true build which worked fine. But when I run ninja -C build I receive this error:
user_state is set in pktgen-main.c, but within a #ifdef LUA_ENABLED section.
Looking at meson_options.txt, which is at the base directory of a pktgen extraction, it contains:
option('enable_lua', type: 'boolean', value: false, description: 'Enable Lua support')
That means that you must enable Lua support when building (after installing dependencies, as shown below):
meson -Denable_lua=true build
This is going to require a few more dependencies, so you may have to install them, e.g. in Ubuntu:
sudo apt-get install cmake
And then installing Lua - I had to install it from source as per the instructions, because installing it with Ubuntu's package manager apt didn't give me the library that was needed:
curl -R -O http://www.lua.org/ftp/lua-5.4.4.tar.gz
tar zxf lua-5.4.4.tar.gz
cd lua-5.4.4
make all test
sudo make install

What does Jenkins use to capture stdout and stderr of a shell command?

In Jenkins, you can use the sh step to run Unix shell scripts.
I was experimenting, and I found that the stdout is not a tty, at least on a Docker image.
What does Jenkins use for capturing stdout and stderr of programs running via the sh step? Is the same thing used for running the sh step on a Jenkins node versus on a Docker container?
I ask for my own edification and for some possible practical applications of this knowledge.
To reproduce my experimentation
If you already know an answer, you don't need to read these details for reproducing. I am just adding this here for reproducibility.
I have the following Jenkins/Groovy code:
docker.image('gcc').inside {
sh '''
gcc -O2 -Wall -Wextra -Wunused -Wpedantic \
-Werror write_to_tty.c -o write_to_tty
./write_to_tty
'''
}
The Jenkins log snippet for the sh step code above is
+ gcc -O2 -Wall -Wextra -Wunused -Wpedantic -Werror write_to_tty.c -o write_to_tty
+ ./write_to_tty
stdout is not a tty.
This compiles and runs the following C code:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main() {
int stdout_fd = fileno(stdout);
if (!isatty(stdout_fd)) {
fprintf(stderr, "stdout is not a tty.\n");
exit(1);
}
char* stdout_tty_name = ttyname(stdout_fd);
if (stdout_tty_name == NULL) {
fprintf(stderr, "Failed to get tty name of stdout.\n");
exit(1);
}
FILE* tty = fopen(stdout_tty_name, "w");
if (tty == NULL) {
fprintf(stderr, "Failed to open tty %s.\n", stdout_tty_name);
exit(1);
}
fprintf(tty, "Written directly to tty.\n");
fclose(tty);
printf("Written to stdout.\n");
fprintf(stderr, "Written to stderr.\n");
exit(0);
}
I briefly looked at the source here and it seems stdout is written to a file and then read from that, Hence it's not a tty. Also, stderror, if any, will be written to the log.
Here is the Javadoc.
/** * Runs a durable task, such as a shell script, typically on an
agent. * “Durable” in this context means that Jenkins makes an
attempt to keep the external process running * even if either the
Jenkins controller or an agent JVM is restarted. * Process standard
output is directed to a file near the workspace, rather than holding a
file handle open. * Whenever a Remoting connection between the two
can be reëstablished, * Jenkins again looks for any output sent since
the last time it checked. * When the process exits, the status code
is also written to a file and ultimately results in the step passing
or failing. * Tasks can also be run on the built-in node, which
differs only in that there is no possibility of a network failure. */
I found the source for the sh step, which appears to be implemented using the BourneShellScript class.
When not capturing stdout, the command is generated like this:
cmdString = String.format("{ while [ -d '%s' -a \\! -f '%s' ]; do touch '%s'; sleep 3; done } & jsc=%s; %s=$jsc %s '%s' > '%s' 2>&1; echo $? > '%s.tmp'; mv '%s.tmp' '%s'; wait",
controlDir,
resultFile,
logFile,
cookieValue,
cookieVariable,
interpreter,
scriptPath,
logFile,
resultFile, resultFile, resultFile);
If I correctly matched the format specifiers to the variables, then the %s '%s' > '%s' 2>&1 part roughly corresponds to
${interpreter} ${scriptPath} > ${logFile} 2>&1
So it seems that the stdout and stderr of the script is written to a file.
When capturing output, it is slightly different, and is instead
${interpreter} ${scriptPath} > ${c.getOutputFile(ws)} 2> ${logFile}
In this case, stdout and stderr are still written to files, just not to the same file.
Aside
For anyone interested in how I found the source code:
First I used my bookmark for the sh step docs, which can be found with a search engine
I scrolled to the top and clicked on the "View this plugin on the Plugins site" link
On that page, I clicked the GitHub link
On the GitHub repo, I navigated to the ShellStep.java by drilling down from the src directory
I saw that the class was implemented using the BourneShellScript class, and based on the import for this class, I knew it was part of the durable task plugin
I searched the durable task plugin, and followed the same "View this plugin on the Plugins site" link and then GitHub link to view the GitHub repo for the durable task plugin
Next, I used the "Go to file" button and searched for the class name to jump to its .java file
Finally, I inspected this file to find what I was interested in

Why two threads in NPTL have different pid in Ubuntu12.04

I tested some code in Ubuntu 12.04 LTS server x64(3.2 kernel), which I think is using NPTL.
when I run
$ getconf GNU_LIBPTHREAD_VERSION
I get
NPTL 2.15
The following is the test code. I compiled it with gcc -g -Wall -pthread
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include <pthread.h>
void *func1(void *arg)
{
printf("func1:[%d]%lu\n", getpid(), pthread_self());
for(;;);
return (void *)0;
}
int main(void)
{
printf("main:[%d]%lu\n", getpid(), pthread_self());
pthread_t tid1;
pthread_create(&tid1, NULL, func1, NULL);
void *tret = NULL;
pthread_join(tid1, &tret);
return 0;
}
when I run the program, it seems all to be expected: two threads have the same pid
$ ./a.out
main:[2107]139745753233152
func1:[2107]139745744897792
but in htop (a top like tool, you can get it by apt-get) I see this: two threads have different pid
PID Command
2108 ./a.out
2107 ./a.out
If I kill the pid 2108, the process will be killed
$ kill -9 2108
$./a.out
main:[2107]139745753233152
func1:[2107]139745744897792
Killed
And if I run the program through gdb, I can see LWP
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
main:[2183]140737354069760
[New Thread 0x7ffff77fd700 (LWP 2186)]
func1:[2183]140737345738496
I think NPTL's threads share one PID and LWP is for LinuxThreads before kernel 2.6.
The Above seems that NPTL is still using LWP under.
Am I right? I want to know the truth about NTPL and LWP.
Thanks.
The two threads do share one PID, as shown by your first example.
htop is showing you TIDs (Thread IDs) in the field marked as PID.

open: Invalid argument in direct I/O

I was trying to make a AIO practice via kernel AIO API. Here is some code:
#define _GNU_SOURCE /* syscall() not POSIX */
#define ALIGN_SIZE 4096
#define RD_WR_SIZE 1024
/* ... */
/* Make the alignment according to page size to validate open() */
posix_memalign(&buf, ALIGN_SIZE, RD_WR_SIZE);
fd = open("aio_test_file", O_RDWR | O_CREAT | O_DIRECT, 0644);
if (fd == -1) {
perror("open");
return -1;
}
/* ... */
But my open() calling still fail:
open: Invalid argument
I have searched that error. Some of them says that you must make the alignment in the case of direct I/O. By using the command:
$ sudo dumpe2fs /dev/sda1 | grep -i "block size"
I got the block size is 4096. But why the open() calling still fail?
From the man page:
O_DIRECT support was added under Linux in kernel version 2.4.10. Older
Linux kernels simply ignore this flag. Some file systems may not
implement the flag and open() will fail with EINVAL if it is used.
That would be my first guess.

RabbitMQ install issue on Centos 5.5

I've been trying to get rabbitmq-server-2.4.0 up and running on Centos
5.5 on an Amazon AWS instance.
My instance uses the following kernel: 2.6.18-xenU-ec2-v1.2
I've tried installation of erlang and rabbitmq-server using:
1) yum repos
2) direct rpm installation
3) compiling from source.
In every case, I get the following message when attempting to start the
RabbitMQ-Server process:
pthread/ethr_event.c:98: Fatal error in wait__(): Function not
implemented (38)
Any help would be appreciated.
The problem
When starting erlang, the message pthread/ethr_event.c:98: Fatal error in wait__(): Function not implemented (38) is, on modern distros, most likely the result of a precompiled Erlang binary interacting with a kernel that doesn't implement FUTEX_WAIT_PRIVATE and FUTEX_WAKE_PRIVATE. The kernels Amazon provides for EC2 do not implement these FUTEX_PRIVATE_ macros.
Attempting to build Erlang from source on an ec2 box may fail in the same way if the distro installs kernel headers into /usr/include/linux as a requirement of other packages. (E.g., Centos requires the kernel-headers package as a prerequisite for gcc, gcc-c++, glibc-devel and glibc-headers, among others). Since the headers installed by the package do not match the kernel installed by the EC2 image creation scripts, Erlang incorrectly assumes FUTEX_WAIT_PRIVATE and FUTEX_WAKE_PRIVATE are available.
The fix
To fix it, the fastest is to manually patch erts/include/internal/pthread/ethr_event.h to use the non-_PRIVATE futex implementation:
#if defined(FUTEX_WAIT_PRIVATE) && defined(FUTEX_WAKE_PRIVATE)
# define ETHR_FUTEX_WAIT__ FUTEX_WAIT_PRIVATE
# define ETHR_FUTEX_WAKE__ FUTEX_WAKE_PRIVATE
#else
# define ETHR_FUTEX_WAIT__ FUTEX_WAIT
# define ETHR_FUTEX_WAKE__ FUTEX_WAKE
#endif
should become
//#if defined(FUTEX_WAIT_PRIVATE) && defined(FUTEX_WAKE_PRIVATE)
//# define ETHR_FUTEX_WAIT__ FUTEX_WAIT_PRIVATE
//# define ETHR_FUTEX_WAKE__ FUTEX_WAKE_PRIVATE
//#else
# define ETHR_FUTEX_WAIT__ FUTEX_WAIT
# define ETHR_FUTEX_WAKE__ FUTEX_WAKE
//#endif
Quick test
If you suspect the private futex issue is your problem, but want to verify it before you recompile all of Erlang, the following program can pin it down:
#include <sys/syscall.h>
#include <unistd.h>
#include <sys/time.h>
#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <stdint.h>
typedef uint32_t u32; /* required on older kernel headers to fix a bug in futex.h Delete this line if it causes problems. */
#include <linux/futex.h>
int main(int argc, char *argv[])
{
#if defined(FUTEX_WAIT) && defined(FUTEX_WAKE)
uint32_t i = 1;
int res = 0;
res = syscall(__NR_futex, (void *) &i, FUTEX_WAKE, 1,
(void*)0,(void*)0, 0);
if (res != 0)
{
printf("FUTEX_WAKE HAD ERR %i: %s\n", errno, strerror(errno));
} else {
printf("FUTEX_WAKE SUCCESS\n");
}
res = syscall(__NR_futex, (void *) &i, FUTEX_WAIT, 0,
(void*)0,(void*)0, 0);
if (res != 0)
{
printf("FUTEX_WAIT HAD ERR %i: %s\n", errno, strerror(errno));
} else {
printf("FUTEX_WAIT SUCCESS\n");
}
#else
printf("FUTEX_WAKE and FUTEX_WAIT are not defined.\n");
#endif
#if defined(FUTEX_WAIT_PRIVATE) && defined(FUTEX_WAKE_PRIVATE)
uint32_t j = 1;
int res_priv = 0;
res_priv = syscall(__NR_futex, (void *) &j, FUTEX_WAKE_PRIVATE, 1,
(void*)0,(void*)0, 0);
if (res_priv != 0)
{
printf("FUTEX_WAKE_PRIVATE HAD ERR %i: %s\n", errno, strerror(errno));
} else {
printf("FUTEX_WAKE_PRIVATE SUCCESS\n");
}
res_priv = syscall(__NR_futex, (void *) &j, FUTEX_WAIT_PRIVATE, 0,
(void*)0,(void*)0, 0);
if (res_priv != 0)
{
printf("FUTEX_WAIT_PRIVATE HAD ERR %i: %s\n", errno, strerror(errno));
} else {
printf("FUTEX_WAIT_PRIVATE SUCCESS\n");
}
#else
printf("FUTEX_WAKE_PRIVATE and FUTEX_WAIT_PRIVATE are not defined.\n");
#endif
return 0;
}
Paste it into futextest.c, then gcc futextest.c and ./a.out.
If your kernel implements private futexes, you'll see
FUTEX_WAKE SUCCESS
FUTEX_WAIT SUCCESS
FUTEX_WAKE_PRIVATE SUCCESS
FUTEX_WAIT_PRIVATE SUCCESS
If you have a kernel without the _PRIVATE futex functions, you'll see
FUTEX_WAKE SUCCESS
FUTEX WAIT SUCCESS
FUTEX_WAKE_PRIVATE HAD ERR 38: Function not implemented
FUTEX_WAIT_PRIVATE HAD ERR 38: Function not implemented
This fix should allow Erlang to compile, and will yield an environment you can install rabbitmq against using the --nodeps method discussed here.
I installed it by first installing erlang by source:
sudo yum -y install make gcc gcc-c++ kernel-devel m4 ncurses-devel openssl-devel
wget http://www.erlang.org/download/otp_src_R13B04.tar.gz
tar xfvz otp_src_R13B04.tar.gz
cd otp_src_R13B04/
./configure
sudo make install
After that create a symbolic link to also make erl available for root user:
sudo ln -s /usr/local/bin/erl /bin/erl
Install rabbitmq rpm (Maybe outdated check latest release yourself):
wget http://www.rabbitmq.com/releases/rabbitmq-server/v2.4.1/rabbitmq-server-2.4.1-1.noarch.rpm
rpm -Uvh rabbitmq-server-2.4.1-1.noarch.rpm
If erlang is installed from source, rpm install of rabbitmq fails to recognize erlang stating that erlang R12B-3 is required.
Use:
rpm --nodeps -Uvh rabbitmq-server-2.6.1-1.noarch.rpm
I was able to install and use RabbitMQ 2.6.1 successfully on CentOS 5.6 with Erlang R14B04
Seems that this kernel is not compatible with Erlang 14B, 14B01, or 14B02
Compiling Erlang 13B04 led to a successful install of rabbitmq-server
For people in the future finding this answer, the RabbitMQ site itself has a potential answer for you:
Installing on RPM-based Linux (CentOS, Fedora, OpenSuse, RedHat)
Erlang on RHEL 5 (and CentOS 5)
Due to the EPEL package update policy, EPEL 5 contains Erlang version
R12B-5, which is relatively old. rabbitmq-server supports R12B-5, but
performance may be lower than for more recent Erlang versions, and
certain non-core features are not supported (SSL support, HTTP-based
plugins including the management plugin). Therefore, we recommend that
you install the most recent stable version of Erlang. The easiest way
to do this is to use a package repository provided for this purpose by
the owner of the EPEL Erlang package. Enable it by invoking (as root):
wget -O /etc/yum.repos.d/epel-erlang.repo
http://repos.fedorapeople.org/repos/peter/erlang/epel-erlang.repo
and then install or update erlang with yum install erlang.
If you go down the route of building Erlang manually on a minimal OS install, you may also find that you need to install wxGTK & wxGTK-devel in order for all of the tests to build and run correctly.

Resources