Why two threads in NPTL have different pid in Ubuntu12.04 - pthreads

I tested some code in Ubuntu 12.04 LTS server x64(3.2 kernel), which I think is using NPTL.
when I run
$ getconf GNU_LIBPTHREAD_VERSION
I get
NPTL 2.15
The following is the test code. I compiled it with gcc -g -Wall -pthread
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include <pthread.h>
void *func1(void *arg)
{
printf("func1:[%d]%lu\n", getpid(), pthread_self());
for(;;);
return (void *)0;
}
int main(void)
{
printf("main:[%d]%lu\n", getpid(), pthread_self());
pthread_t tid1;
pthread_create(&tid1, NULL, func1, NULL);
void *tret = NULL;
pthread_join(tid1, &tret);
return 0;
}
when I run the program, it seems all to be expected: two threads have the same pid
$ ./a.out
main:[2107]139745753233152
func1:[2107]139745744897792
but in htop (a top like tool, you can get it by apt-get) I see this: two threads have different pid
PID Command
2108 ./a.out
2107 ./a.out
If I kill the pid 2108, the process will be killed
$ kill -9 2108
$./a.out
main:[2107]139745753233152
func1:[2107]139745744897792
Killed
And if I run the program through gdb, I can see LWP
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
main:[2183]140737354069760
[New Thread 0x7ffff77fd700 (LWP 2186)]
func1:[2183]140737345738496
I think NPTL's threads share one PID and LWP is for LinuxThreads before kernel 2.6.
The Above seems that NPTL is still using LWP under.
Am I right? I want to know the truth about NTPL and LWP.
Thanks.

The two threads do share one PID, as shown by your first example.
htop is showing you TIDs (Thread IDs) in the field marked as PID.

Related

What does Jenkins use to capture stdout and stderr of a shell command?

In Jenkins, you can use the sh step to run Unix shell scripts.
I was experimenting, and I found that the stdout is not a tty, at least on a Docker image.
What does Jenkins use for capturing stdout and stderr of programs running via the sh step? Is the same thing used for running the sh step on a Jenkins node versus on a Docker container?
I ask for my own edification and for some possible practical applications of this knowledge.
To reproduce my experimentation
If you already know an answer, you don't need to read these details for reproducing. I am just adding this here for reproducibility.
I have the following Jenkins/Groovy code:
docker.image('gcc').inside {
sh '''
gcc -O2 -Wall -Wextra -Wunused -Wpedantic \
-Werror write_to_tty.c -o write_to_tty
./write_to_tty
'''
}
The Jenkins log snippet for the sh step code above is
+ gcc -O2 -Wall -Wextra -Wunused -Wpedantic -Werror write_to_tty.c -o write_to_tty
+ ./write_to_tty
stdout is not a tty.
This compiles and runs the following C code:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main() {
int stdout_fd = fileno(stdout);
if (!isatty(stdout_fd)) {
fprintf(stderr, "stdout is not a tty.\n");
exit(1);
}
char* stdout_tty_name = ttyname(stdout_fd);
if (stdout_tty_name == NULL) {
fprintf(stderr, "Failed to get tty name of stdout.\n");
exit(1);
}
FILE* tty = fopen(stdout_tty_name, "w");
if (tty == NULL) {
fprintf(stderr, "Failed to open tty %s.\n", stdout_tty_name);
exit(1);
}
fprintf(tty, "Written directly to tty.\n");
fclose(tty);
printf("Written to stdout.\n");
fprintf(stderr, "Written to stderr.\n");
exit(0);
}
I briefly looked at the source here and it seems stdout is written to a file and then read from that, Hence it's not a tty. Also, stderror, if any, will be written to the log.
Here is the Javadoc.
/** * Runs a durable task, such as a shell script, typically on an
agent. * “Durable” in this context means that Jenkins makes an
attempt to keep the external process running * even if either the
Jenkins controller or an agent JVM is restarted. * Process standard
output is directed to a file near the workspace, rather than holding a
file handle open. * Whenever a Remoting connection between the two
can be reëstablished, * Jenkins again looks for any output sent since
the last time it checked. * When the process exits, the status code
is also written to a file and ultimately results in the step passing
or failing. * Tasks can also be run on the built-in node, which
differs only in that there is no possibility of a network failure. */
I found the source for the sh step, which appears to be implemented using the BourneShellScript class.
When not capturing stdout, the command is generated like this:
cmdString = String.format("{ while [ -d '%s' -a \\! -f '%s' ]; do touch '%s'; sleep 3; done } & jsc=%s; %s=$jsc %s '%s' > '%s' 2>&1; echo $? > '%s.tmp'; mv '%s.tmp' '%s'; wait",
controlDir,
resultFile,
logFile,
cookieValue,
cookieVariable,
interpreter,
scriptPath,
logFile,
resultFile, resultFile, resultFile);
If I correctly matched the format specifiers to the variables, then the %s '%s' > '%s' 2>&1 part roughly corresponds to
${interpreter} ${scriptPath} > ${logFile} 2>&1
So it seems that the stdout and stderr of the script is written to a file.
When capturing output, it is slightly different, and is instead
${interpreter} ${scriptPath} > ${c.getOutputFile(ws)} 2> ${logFile}
In this case, stdout and stderr are still written to files, just not to the same file.
Aside
For anyone interested in how I found the source code:
First I used my bookmark for the sh step docs, which can be found with a search engine
I scrolled to the top and clicked on the "View this plugin on the Plugins site" link
On that page, I clicked the GitHub link
On the GitHub repo, I navigated to the ShellStep.java by drilling down from the src directory
I saw that the class was implemented using the BourneShellScript class, and based on the import for this class, I knew it was part of the durable task plugin
I searched the durable task plugin, and followed the same "View this plugin on the Plugins site" link and then GitHub link to view the GitHub repo for the durable task plugin
Next, I used the "Go to file" button and searched for the class name to jump to its .java file
Finally, I inspected this file to find what I was interested in

Debug printf performance when compiled with clang cfi

Setup
I have a simple helloworld program:
// content of main.c
#include <stdio.h>
#include <limits.h>
int main() {
for (int i = 0; i < INT_MAX; ++i) {
printf("simply helloworld!\n");
}
return 0;
}
I compile a baseline version with clang 13.0.0 using clang -flto=thin -fvisibility=hidden -fuse-ld=lld main.c
To experiment with CFI, I compile another version using clang -flto=thin -fsanitize=cfi -fsanitize-cfi-cross-dso -fno-sanitize-cfi-canonical-jump-tables -fsanitize-trap=cfi -fvisibility=hidden -fuse-ld=lld main.c
Expectation
I am expecting negligible performance overhead as I am only calling into a shared library that I expect will run the same code for both. The disassembly for main function for both binaries look the same.
Reality
The baseline version completes execution in ~27s while the cfi version completes execution in ~32s. Using perf stat -e instructions <binary> I can see that the cfi version runs ~100,000,000,000 more instructions. With perf record then perf diff, I can see that the difference is primarily in two functions _pthread_cleanup_push_defer and _pthread_cleanup_pop_restore that the cfi version runs. Using gdb, these functions are called as the call stack of printf gets deeper.
Question
How do I begin to explain the performance difference between these two binaries? What makes a simple call to printf call two different versions of itself for two different binaries?

Why doesn't frida-trace find functions in Ubuntu/GCC binaries that it finds on MacOS/Clang compiles?

Here is a simple program:
int
fx(int a)
{
a += 20;
return a;
}
int
main(int argc, char *argv[])
{
return fx(fx(10));
}
I compile this on macOS (bigSur) with Clang, and trace it:
0 ✓ [11:21:19 Fri Aug 27] ~/nobackup/frida/02
% gcc -g -O0 test.c
0 ✓ [11:21:24 Fri Aug 27] ~/nobackup/frida/02
% frida-trace ./a.out -i 'a.out!*'
Instrumenting...
fx: Auto-generated handler at "/Users/pt/nobackup/frida/02/__handlers__/a.out/fx.js"
main: Auto-generated handler at "/Users/pt/nobackup/frida/02/__handlers__/a.out/main.js"
Started tracing 2 functions. Press Ctrl+C to stop.
/* TID 0x103 */
100 ms main()
100 ms | fx()
100 ms | fx()
Process terminated
1 ✗ [11:21:31 Fri Aug 27] ~/nobackup/frida/02
Perfect. It created the __handler__ JavaScript and everything.
However, I do the same thing on Ubuntu with gcc, and Frida doesn't find the functions:
pt#serval:~/frida$ gcc -g -O0 test.c
pt#serval:~/frida$ frida-trace ./a.out -i 'a.out!*'
Started tracing 0 functions. Press Ctrl+C to stop.
Process terminated
...but they are in the symbol table with objdump -t, and I can find them by walking the modules in the Frida JavaScript API.
What is the magic compiler switch I am missing? I tried visibility and export symbols with no luck.

Calling a sd_notify(0, "WATCHDOG=1") in a service

I have a sys d service. I want to implement a watch dog for that.
It's something like,
[Unit]
Description=Watchdog example service
[Service]
Type=notify
Environment=NOTIFY_SOCKET=/run/%p.sock
ExecStartPre=-/usr/bin/docker kill %p
ExecStartPre=-/usr/bin/docker rm %p
ExecStart=/usr/libexec/sdnotify-proxy /run/%p.sock /usr/bin/docker run \
--env=NOTIFY_SOCKET=/run/%p.sock \
--name %p pranav93/test_watchdogged python hello.py
ExecStop=/usr/bin/docker stop %p
Restart=on-success
WatchdogSec=30s
RestartSec=30s
[Install]
WantedBy=multi-user.target
According to docs I've to call sd_notify("watchdog=1") every half of the interval specified (In this case it's 15s). But I've no idea how to call that function in a service. Help would be highly appreciated.
I had to install systemd lib:
sudo apt-get install libsystemd-dev
And compile the program passing it to linker:
gcc testWatchDogProcess.c -o testWatchDogProcess -lsystemd
I´ve made some changes in #rameshrgtvl code to make it run directly without any warning or errors.
#include <systemd/sd-daemon.h>
#include <fcntl.h>
#include <time.h>
/* This should be sent once you are done with your initialization */
/* Until you call this systemd will keep your service as activating status */
/* Once you called, systemd will change the status of ur service to active */
#define true 1
int main ()
{
sd_notify (0, "READY=1");
/* Way to get the WatchdogSec value from service file */
char * env;
int interval=0;
int isRun = true;
env = getenv("WATCHDOG_USEC");
if (env)
{
interval = atoi(env)/(2*1000000);
}
/* Ping systsemd once you are done with Init */
sd_notify (0, "WATCHDOG=1");
/* Now go for periodic notification */
while(isRun == true)
{
sleep(interval);
/* Way to ping systemd */
sd_notify (0, "WATCHDOG=1");
}
return 0;
}
Sample Service File
[Unit]
Description=Test watchdog Demo process
DefaultDependencies=false
Requires=basic.target
[Service]
Type=notify
WatchdogSec=10s
ExecStart=/usr/bin/TestWatchDogProcess
StartLimitInterval=5min
StartLimitBurst=5
StartLimitAction=reboot
Restart=always
Sample Code For testWatchDogProcess.c:
#include "systemd/sd-daemon.h"
#include <fcntl.h>
#include <time.h>
/* This should be sent once you are done with your initialization */
/* Until you call this systemd will keep your service as activating status */
/* Once you called, systemd will change the status of ur service to active */
sd_notify (0, "READY=1");
/* Way to get the WatchdogSec value from service file */
env = getenv("WATCHDOG_USEC");
if(env != NULL)
int interval = atoi(env)/(2*1000000);
/* Ping systsemd once you are done with Init */
sd_notify (0, "WATCHDOG=1");
/* Now go for periodic notification */
while(isRun == true)
{
sleep(interval);
/* Way to ping systemd */
sd_notify (0, "WATCHDOG=1");
}
return 0;
}
Note: Based on your systemd version please take care to include proper header and library during compilation.
sd_notify(0,"WATCHDOG=1") is a API for notifying systemd that your process is working fine.
As Type=notify has been used, sd_notify(0,"WATCHDOG=1") should be called in your application not in service and this must be called at regular interval(before 30 sec as WatchdogSec=30s is mentioned in your service file) so that systemd get notified else,
systemd will think of this as failed service and hence systemd will kill your service and restart it.

RabbitMQ install issue on Centos 5.5

I've been trying to get rabbitmq-server-2.4.0 up and running on Centos
5.5 on an Amazon AWS instance.
My instance uses the following kernel: 2.6.18-xenU-ec2-v1.2
I've tried installation of erlang and rabbitmq-server using:
1) yum repos
2) direct rpm installation
3) compiling from source.
In every case, I get the following message when attempting to start the
RabbitMQ-Server process:
pthread/ethr_event.c:98: Fatal error in wait__(): Function not
implemented (38)
Any help would be appreciated.
The problem
When starting erlang, the message pthread/ethr_event.c:98: Fatal error in wait__(): Function not implemented (38) is, on modern distros, most likely the result of a precompiled Erlang binary interacting with a kernel that doesn't implement FUTEX_WAIT_PRIVATE and FUTEX_WAKE_PRIVATE. The kernels Amazon provides for EC2 do not implement these FUTEX_PRIVATE_ macros.
Attempting to build Erlang from source on an ec2 box may fail in the same way if the distro installs kernel headers into /usr/include/linux as a requirement of other packages. (E.g., Centos requires the kernel-headers package as a prerequisite for gcc, gcc-c++, glibc-devel and glibc-headers, among others). Since the headers installed by the package do not match the kernel installed by the EC2 image creation scripts, Erlang incorrectly assumes FUTEX_WAIT_PRIVATE and FUTEX_WAKE_PRIVATE are available.
The fix
To fix it, the fastest is to manually patch erts/include/internal/pthread/ethr_event.h to use the non-_PRIVATE futex implementation:
#if defined(FUTEX_WAIT_PRIVATE) && defined(FUTEX_WAKE_PRIVATE)
# define ETHR_FUTEX_WAIT__ FUTEX_WAIT_PRIVATE
# define ETHR_FUTEX_WAKE__ FUTEX_WAKE_PRIVATE
#else
# define ETHR_FUTEX_WAIT__ FUTEX_WAIT
# define ETHR_FUTEX_WAKE__ FUTEX_WAKE
#endif
should become
//#if defined(FUTEX_WAIT_PRIVATE) && defined(FUTEX_WAKE_PRIVATE)
//# define ETHR_FUTEX_WAIT__ FUTEX_WAIT_PRIVATE
//# define ETHR_FUTEX_WAKE__ FUTEX_WAKE_PRIVATE
//#else
# define ETHR_FUTEX_WAIT__ FUTEX_WAIT
# define ETHR_FUTEX_WAKE__ FUTEX_WAKE
//#endif
Quick test
If you suspect the private futex issue is your problem, but want to verify it before you recompile all of Erlang, the following program can pin it down:
#include <sys/syscall.h>
#include <unistd.h>
#include <sys/time.h>
#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <stdint.h>
typedef uint32_t u32; /* required on older kernel headers to fix a bug in futex.h Delete this line if it causes problems. */
#include <linux/futex.h>
int main(int argc, char *argv[])
{
#if defined(FUTEX_WAIT) && defined(FUTEX_WAKE)
uint32_t i = 1;
int res = 0;
res = syscall(__NR_futex, (void *) &i, FUTEX_WAKE, 1,
(void*)0,(void*)0, 0);
if (res != 0)
{
printf("FUTEX_WAKE HAD ERR %i: %s\n", errno, strerror(errno));
} else {
printf("FUTEX_WAKE SUCCESS\n");
}
res = syscall(__NR_futex, (void *) &i, FUTEX_WAIT, 0,
(void*)0,(void*)0, 0);
if (res != 0)
{
printf("FUTEX_WAIT HAD ERR %i: %s\n", errno, strerror(errno));
} else {
printf("FUTEX_WAIT SUCCESS\n");
}
#else
printf("FUTEX_WAKE and FUTEX_WAIT are not defined.\n");
#endif
#if defined(FUTEX_WAIT_PRIVATE) && defined(FUTEX_WAKE_PRIVATE)
uint32_t j = 1;
int res_priv = 0;
res_priv = syscall(__NR_futex, (void *) &j, FUTEX_WAKE_PRIVATE, 1,
(void*)0,(void*)0, 0);
if (res_priv != 0)
{
printf("FUTEX_WAKE_PRIVATE HAD ERR %i: %s\n", errno, strerror(errno));
} else {
printf("FUTEX_WAKE_PRIVATE SUCCESS\n");
}
res_priv = syscall(__NR_futex, (void *) &j, FUTEX_WAIT_PRIVATE, 0,
(void*)0,(void*)0, 0);
if (res_priv != 0)
{
printf("FUTEX_WAIT_PRIVATE HAD ERR %i: %s\n", errno, strerror(errno));
} else {
printf("FUTEX_WAIT_PRIVATE SUCCESS\n");
}
#else
printf("FUTEX_WAKE_PRIVATE and FUTEX_WAIT_PRIVATE are not defined.\n");
#endif
return 0;
}
Paste it into futextest.c, then gcc futextest.c and ./a.out.
If your kernel implements private futexes, you'll see
FUTEX_WAKE SUCCESS
FUTEX_WAIT SUCCESS
FUTEX_WAKE_PRIVATE SUCCESS
FUTEX_WAIT_PRIVATE SUCCESS
If you have a kernel without the _PRIVATE futex functions, you'll see
FUTEX_WAKE SUCCESS
FUTEX WAIT SUCCESS
FUTEX_WAKE_PRIVATE HAD ERR 38: Function not implemented
FUTEX_WAIT_PRIVATE HAD ERR 38: Function not implemented
This fix should allow Erlang to compile, and will yield an environment you can install rabbitmq against using the --nodeps method discussed here.
I installed it by first installing erlang by source:
sudo yum -y install make gcc gcc-c++ kernel-devel m4 ncurses-devel openssl-devel
wget http://www.erlang.org/download/otp_src_R13B04.tar.gz
tar xfvz otp_src_R13B04.tar.gz
cd otp_src_R13B04/
./configure
sudo make install
After that create a symbolic link to also make erl available for root user:
sudo ln -s /usr/local/bin/erl /bin/erl
Install rabbitmq rpm (Maybe outdated check latest release yourself):
wget http://www.rabbitmq.com/releases/rabbitmq-server/v2.4.1/rabbitmq-server-2.4.1-1.noarch.rpm
rpm -Uvh rabbitmq-server-2.4.1-1.noarch.rpm
If erlang is installed from source, rpm install of rabbitmq fails to recognize erlang stating that erlang R12B-3 is required.
Use:
rpm --nodeps -Uvh rabbitmq-server-2.6.1-1.noarch.rpm
I was able to install and use RabbitMQ 2.6.1 successfully on CentOS 5.6 with Erlang R14B04
Seems that this kernel is not compatible with Erlang 14B, 14B01, or 14B02
Compiling Erlang 13B04 led to a successful install of rabbitmq-server
For people in the future finding this answer, the RabbitMQ site itself has a potential answer for you:
Installing on RPM-based Linux (CentOS, Fedora, OpenSuse, RedHat)
Erlang on RHEL 5 (and CentOS 5)
Due to the EPEL package update policy, EPEL 5 contains Erlang version
R12B-5, which is relatively old. rabbitmq-server supports R12B-5, but
performance may be lower than for more recent Erlang versions, and
certain non-core features are not supported (SSL support, HTTP-based
plugins including the management plugin). Therefore, we recommend that
you install the most recent stable version of Erlang. The easiest way
to do this is to use a package repository provided for this purpose by
the owner of the EPEL Erlang package. Enable it by invoking (as root):
wget -O /etc/yum.repos.d/epel-erlang.repo
http://repos.fedorapeople.org/repos/peter/erlang/epel-erlang.repo
and then install or update erlang with yum install erlang.
If you go down the route of building Erlang manually on a minimal OS install, you may also find that you need to install wxGTK & wxGTK-devel in order for all of the tests to build and run correctly.

Resources