Bad file descriptor on pthread_detach - pthreads

My pthread_detach calls fail with a "Bad file descriptor" error. The calls are in the destructor for my class and look like this -
if(pthread_detach(get_sensors) != 0)
printf("\ndetach on get_sensors failed with error %m", errno);
if(pthread_detach(get_real_velocity) != 0)
printf("\ndetach on get_real_velocity failed with error %m", errno);
I have only ever dealt with this error when using sockets. What could be causing this to happen in a pthread_detach call that I should look for? Or is it likely something in the thread callback that could be causing it? Just in case, the callbacks look like this -
void* Robot::get_real_velocity_thread(void* threadid) {
Robot* r = (Robot*)threadid;
r->get_real_velocity_thread_i();
}
inline void Robot::get_real_velocity_thread_i() {
while(1) {
usleep(14500);
sensor_packet temp = get_sensor_value(REQUESTED_VELOCITY);
real_velocity = temp.values[0];
if(temp.values[1] != -1)
real_velocity += temp.values[1];
} //end while
}
/*Callback for get sensors thread*/
void* Robot::get_sensors_thread(void* threadid) {
Robot* r = (Robot*)threadid;
r->get_sensors_thread_i();
} //END GETSENSORS_THREAD
inline void Robot::get_sensors_thread_i() {
while(1) {
usleep(14500);
if(sensorsstreaming) {
unsigned char receive;
int read = 0;
read = connection.PollComport(port, &receive, sizeof(unsigned char));
if((int)receive == 19) {
read = connection.PollComport(port, &receive, sizeof(unsigned char));
unsigned char rest[54];
read = connection.PollComport(port, rest, 54);
/* ***SET SENSOR VALUES*** */
//bump + wheel drop
sensor_values[0] = (int)rest[1];
sensor_values[1] = -1;
//wall
sensor_values[2] = (int)rest[2];
sensor_values[3] = -1;
...
...
lots more setting just like the two above
} //end if header == 19
} //end if sensors streaming
} //end while
} //END GET_SENSORS_THREAD_I
Thank you for any help.

The pthread_* functions return an error code; they do not set errno. (Well, they may of course, but not in any way that is documented.)
Your code should print the value returned by pthread_detach and print that.
Single Unix Spec documents two return values for this function: ESRCH (no thread by that ID was found) and EINVAL (the thread is not joinable).
Detaching threads in the destructor of an object seems silly. Firstly, if they are going to be detached eventually, why not just create them that way?
If there is any risk that the threads can use the object that is being destroyed, they need to be stopped, not detached. I.e. you somehow indicate to the threads that they should shut down, and then wait for them to reach some safe place after which they will not touch the object any more. pthread_join is useful for this.
Also, it is a little late to be doing that from the destructor. A destructor should only be run when the thread executing it is the only thread with a reference to that object. If threads are still using the object, then you're destroying it from under them.

Related

Debugging Fatal Error - alloc: invalid block: 0000000001F00AEF0: 0 0

I have a GUI written in R that utilizes Tcl/TK package as well a C .dll that also uses Tcl library. I have done some research on this issue, and it seems to be memory related. I am an inexperienced programmer, so I am not sure where I should be looking for this memory issue. Each call of malloc() has a matching free(), and same with the analogous Tcl_Alloc() and Tcl_Free(). This error is very hard to reproduce as well, thus I am afraid I cannot provide a reproducible example as it is seemingly random in nature. One pattern is however that it seems to only happen upon closure of the program, though this is very inconsistent.
By making this post, I am hoping to gain a logical process that one should take in an attempt to debug this problem in a general context under Tcl/Tk - C - R applications. I am not looking for a solution specific to my code, but rather what an individual should think about when encountering this problem.
The message comes from the function Ptr2Block() in tclThreadAlloc.c (or there's something else about which produces the same error message; possible but unlikely) which is Tcl's thread-specific memory allocator (which is used widely inside Tcl to reduce the number of times global locks are hit). Specifically, it's this bit:
if (blockPtr->magicNum1 != MAGIC || blockPtr->magicNum2 != MAGIC) {
Tcl_Panic("alloc: invalid block: %p: %x %x",
blockPtr, blockPtr->magicNum1, blockPtr->magicNum2);
}
The problem? Those zeroes should be MAGIC (which is equal to 0xEF). This indicates that something has overwritten the memory block's metadata — which also should include the size of the block, but that is now likely hot garbage — and program memory integrity can no longer be trusted. Alas, at this point we're now dealing with a program in a broken state where the breakage happened some time previously; the place where the panic happened is merely where detection of the bug happened, not the actual location of the bug.
Debugging further is usually done by building a version of everything with fancy memory allocators turned off (in Tcl's code, this is done by defining the PURIFY symbol when building) and then running the resulting code — which hopefully still has the bug — with a tool like electricfence or purify (hence the special symbol name) to see what sort of out-of-bounds errors are found; they're very good at hunting down this sort of issue.
I would advise you to start by having a closer look to the sizeof() values provided to your Tcl_Alloc() calls in this C .dll.
I'm writing myself a Tcl binding for a C library and I faced recently exactly the same problem and therefore I'm assuming you may have the same error than me in your code.
Here below a minimal example that reproduces the problem:
#include <tcl.h>
#include <stdlib.h> // malloc
static unsigned int dataCtr;
struct tDataWrapper {
const char *str; // Tcl_GetCommandName(interp, cmd)
unsigned int n; // dataCtr value
void *data; // pointer to wrapped object
};
static void wrapDelCmd(ClientData clientData)
{
struct tDataWrapper *wrap = (struct tDataWrapper *) clientData;
if (wrap != NULL) {
/* with false sizeof value provided while creating the wrapper
* (see above), this data pointer would overwrite the
* overhead section of the allocated tcl memory block
* from what I understood and this is what can be causing
* the panic with message like following one when the
* memory is freed with ckfree (here after calling unload)
* alloc: invalid block: 0000018F2624E760: 0 0 */
printf("DEBUG: #%s(%s) &wrap->data #%p\n",
__func__, wrap->str, &wrap->data);
if (wrap->data != NULL) {
// call your wrapped API to deinstantiate the object
}
ckfree(wrap);
}
}
static int wrapCmd(ClientData clientData, Tcl_Interp *interp,
int objc, Tcl_Obj *const objv[])
{
struct tDataWrapper *wrap = (struct tDataWrapper *) clientData;
if (wrap == NULL)
return TCL_ERROR;
else if (wrap->data != NULL) {
// call your wrapped API to do something with instantiated object
return TCL_OK;
} else {
Tcl_Obj *obj = Tcl_ObjPrintf("wrap: {str=\"%s\", n=%u, data=%llx}",
wrap->str, wrap->n, (unsigned long long) wrap->data);
if (obj != NULL) {
Tcl_SetObjResult(interp, obj);
return TCL_OK;
} else
return TCL_ERROR;
}
}
static int newCmd(ClientData clientData, Tcl_Interp *interp,
int objc, Tcl_Obj *const objv[])
{
struct tDataWrapper *wrap;
Tcl_Obj *obj;
Tcl_Command cmd;
// 3) this is correct
// if ((wrap = attemptckalloc(sizeof(struct tDataWrapper))) == NULL)
// 2) still incorrect but GCC gives more warning regarding the inconsistent pointer handling
// if ((wrap = malloc(sizeof(struct tDataWrapper *))) == NULL)
// 1) this is incorrect
if ((wrap = attemptckalloc(sizeof(struct tDataWrapper *))) == NULL)
Tcl_Panic("%s:%u: attemptckalloc failed\n", __func__, __LINE__);
else if ((obj = Tcl_ObjPrintf("data%u", dataCtr+1)) == NULL)
Tcl_Panic("%s:%u: Tcl_ObjPrintf failed\n", __func__, __LINE__);
else if ((cmd = Tcl_CreateObjCommand(interp, Tcl_GetString(obj),
wrapCmd, (ClientData) wrap, wrapDelCmd)) == NULL)
Tcl_Panic("%s:%u: Tcl_CreateObjCommand failed\n", __func__, __LINE__);
else {
wrap->str = Tcl_GetCommandName(interp, cmd);
wrap->n = dataCtr;
wrap->data = NULL; // call your wrapped API to instantiate an object
dataCtr++;
Tcl_SetObjResult(interp, obj);
}
return TCL_OK;
}
int Allocinvalidblock_Init(Tcl_Interp *interp)
{
dataCtr = 0;
return (Tcl_CreateObjCommand(interp, "new",
newCmd, (ClientData) NULL, NULL)
== NULL) ? TCL_ERROR : TCL_OK;
}
int Allocinvalidblock_Unload(Tcl_Interp *interp, int flags)
{
Tcl_Namespace *ns = Tcl_GetGlobalNamespace(interp);
Tcl_Obj *obj;
Tcl_Command cmd;
unsigned int i;
for(i=0; i<dataCtr; i++) {
if ((obj = Tcl_ObjPrintf("data%u", i+1)) != NULL) {
if ((cmd = Tcl_FindCommand(interp,
Tcl_GetString(obj), ns, TCL_GLOBAL_ONLY)) != NULL)
Tcl_DeleteCommandFromToken(interp, cmd);
Tcl_DecrRefCount(obj);
}
}
return TCL_OK;
}
Once built (for example with Code::Blocks as shared library project linking against C:/msys64/mingw64/lib/libtcl.dll.a), the error can be triggered when more than a data object is created and the library immediately unloaded:
load bin/Release/libAllocInvalidBlock.dll
new
new
unload bin/Release/libAllocInvalidBlock.dll
If used otherwise the crash may even be not triggered... Anyway, such an error in the C code is not particularly obvious to identify (although easy to fix) because the compilation is running without any warning (although -Wall compiler flag is set).

Thread handling in TCP server in C

This is my first post here, so I'd like to say hello to everyone.
I am facing some problems with writing a TCP, I want to have a separate thread that allows user to type quit instruction to terminate process. The problem is that it doesn't seem to be running. The programme pauses on accept and I am unable to pass anything to the thread function.
The thread function:
void *loop_stop()
{
while(progr_control != 'q')
progr_control = getchar();
return NULL;
}
And the main (I ommited some code I think is not causing problems):
pthread_t thread_id;
pthread_create(&thread_id, NULL, loop_stop, NULL);
do
{
listen(sock_fd, 5);
int addr_len = sizeof(client_addr);
if( (new_sock_fd = accept(sock_fd, (struct sockaddr *) &client_addr, &addr_len)) < 0)
{
perror("Problems with incoming connection");
return -1;
}
else
//here is the ommited part-as I've said this loop stops at accept
}while(progr_control != 'q');
If anyone could please find the bug, or suggest other way of handling with the task, I'd be grateful
As Brian mentioned:
'accept' will make the thread wait for incomming connection, so no "quit check" is possible. The alternative is to use ctrl+c to quit instead.

MemFault in GC when calling back a closure from C

I working with Keil, MDK-ARM Pro 4.71 for a Cortex-M3 target(STM32F107).
I have compiled the Lua interpreter and a Lua "timer" module that interfaces the chip's timers. I'd like to call a lua function when the timer is elapsed.
Here is a sample use :
t = timer.open()
t.event = function() print("Bing !") end
t:start()
Until here, everything works fine :-) ! I see the "Bing !" message being printed each time the timer elapses.
Now if I use a closure :
t = timer.open()
i = 0
t.event = function() i = i + 1; print(i); end
t:start()
I'm having a bad memory access in the GC after some amount of timer's updates. Since it's an embedded context with very few memory, I may be running out of memory quite fast if there is a leak.
Here is the "t.event" setter (ELIB_TIMER is a C structure representing my timer) :
static int ElibTimerSetEvent(lua_State* L)
{
ELIB_TIMER* pTimer_X = ElibCheckTimer(L, 1, TRUE);
if (pTimer_X->LuaFuncKey_i != LUA_REFNIL)
{
luaL_unref(L, LUA_REGISTRYINDEX, pTimer_X->LuaFuncKey_i);
pTimer_X->LuaFuncKey_i = LUA_REFNIL;
}
if (!lua_isnil(L, 2))
{
pTimer_X->LuaFuncKey_i = luaL_ref(L, LUA_REGISTRYINDEX);
}
return 0;
}
And here is the native callback implementation :
static void ElibTimerEventHandler(SYSEVT_HANDLE Event_H)
{
ELIB_TIMER* pTimer_X = (ELIB_TIMER*)SWLIB_SYSEVT_GetSideData(Event_H);
lua_State* L = pTimer_X->L;
int i = lua_gettop(L);
if (pTimer_X->LuaFuncKey_i != LUA_REFNIL)
{
lua_rawgeti(L, LUA_REGISTRYINDEX, pTimer_X->LuaFuncKey_i);
lua_call(L, 0, 0);
lua_settop(L, i);
}
}
This is synchronized externally, so this isn't a synchronization issue.
Am I doing something wrong ?
EDIT
Here is the callstack (with a lua_pcall instead of lua_call, but it is the same). The first line is my hard fault handler.
I have found the problem ! I ran out of stack (native stack, not Lua) space :p.
I guess this specific script was causing a particularly long call stack. After increasing the allocated memory for my native stack, the problem is gone. On the opposite, if I reduce it, I can't even initialize the interpreter.
Many thanks to those who tried to help here.
Found a bug in your C code. You broke the lua stack in static int ElibTimerSetEvent(lua_State* L)
luaL_ref will pop the value on the top of Lua stack: http://www.lua.org/manual/5.2/manual.html#luaL_ref
So you need to copy the value to be refed, before the call to luaL_ref:
lua_pushvalue(L, 2); // push the callback to stack top, and then it will be consumed by luaL_ref()
Please fix this and try again.

a call to GetMessage causes a thread to stop

I am writing a multi-threaded application using Borland C++ (Delphi Forms). I have recently learned that I can use Windows' Messaging Service within these classes when I call the PostThreadMessage() function:
System = new STSystem(SystemName,1000,1,NULL);
while (PostThreadMessage(System->ThreadID,ST_MSG_SYSTEM_INIT,0,0) == 0)
{
Sleep(0);
};
The above seems to work just fine. The issue lies on the retrieval end of this process inside of the Thread Execution function:
void __fastcall STSystem::Execute()
{
ST_Message STMSG;
while(FStatus != Destroyed)
{
FHeartBeat++;
if(GetMessage(MSG,NULL,ST_MSG_SYSTEM_START,ST_MSG_SYSTEM_END))
{
STMSG.Value = MSG->wParam;
if((STMSG.dSYS + (8*STMSG.dSEC) + (64*STMSG.dDEP)) == FSystemID)
{
RXMessages[RxQueueIn++] = STMSG.MSG; // Message
RXMessages[RxQueueIn++] = MSG->lParam; // Data
}
}
if(TaskList->Count>0)
ProcessTask();
if(RxQueueIn!=RxQueueOut)
ProcessRxMessage();
if(TxQueueIn!=TxQueueOut)
ProcessTxMessage();
Sleep(0);
};
}
The above works for about two thread cycles and then stops; the thread stops, not the program. I have tried using the PeekMessage() function instead of the GetMessage() function in the IF clause following the FHeartbeat++ counter. This prevents the thread from stopping however, the INIT message sent in the first block of code is still not found.
I hope this example is not too specific. I have tried to leave in anything that was pertinent. Basically, this is a message pump for a class that has no window.
GetMessage() blocks the calling thread when there are no messages to retrieve. Like Luis said, you need to make sure the thread has a message queue before you start posting messages to it, and you need to check the return value of PostThreadMessage() for failures. A message queue is not created in a thread until any user32.dll function is called within the thread for the first time. For example:
System = new STSystem(SystemName,1000,1,NULL);
while (!System->Ready)
Sleep(100);
if (!PostThreadMessage(System->ThreadID,ST_MSG_SYSTEM_INIT,0,0))
{
DWORD err = GetLastError();
//...
}
void __fastcall STSystem::Execute()
{
// create a message queue
PeekMessage(MSG, NULL, 0, 0, PM_NOREMOVE);
Ready = true;
ST_Message STMSG;
while(FStatus != Destroyed)
{
FHeartBeat++;
if(GetMessage(MSG,NULL,ST_MSG_SYSTEM_START,ST_MSG_SYSTEM_END)) // or PeekMessage()
{
STMSG.Value = MSG->wParam;
if((STMSG.dSYS + (8*STMSG.dSEC) + (64*STMSG.dDEP)) == FSystemID)
{
RXMessages[RxQueueIn++] = STMSG.MSG; // Message
RXMessages[RxQueueIn++] = MSG->lParam; // Data
}
}
if(TaskList->Count>0)
ProcessTask();
if(RxQueueIn!=RxQueueOut)
ProcessRxMessage();
if(TxQueueIn!=TxQueueOut)
ProcessTxMessage();
Sleep(0);
};
}
To send messages to a tread it must have a message queue, in the remarks section of this link:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms644946(v=vs.85).aspx
you can fine the steps reqired to create a message queue for a thread.
By the way, if PostThreadMessage returns 0 (FALSE), there is an error and you must check the value returned by GetLastError.

pthread_Join can't return after call pthread_cancel?

I used Eclispse Indigo + CDT 8.0.2 + cygwin to develope a multi-thread systerm, the code is below:
pthread_mutex_t mutexCmd = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t signalCmd = PTHREAD_COND_INITIALIZER;
void * Func(void * arg)
{
int iStatus;
while (1)
{
int a = 1;
pthread_cleanup_push(pthread_mutex_unlock, &mutexCmd);
pthread_mutex_lock(&mutexCmd);
iStatus = pthread_cond_wait(&signalCmd, &mutexCmd);
if (iStatus) {
err_abort(iStatus, "signalCmd status error");
}
if(arg->Cmd != navi_Go) //Just a command tag;
{
pthread_mutex_unlock(&(pNaviCtrl->mutexCmd));
continue;
}
//do some work
//.....
pthread_mutex_unlock(&mutexCmd);
pthread_cleanup_pop(1);
}
//pthread_detach(pthread_self());
return NULL;
}
int main()
{
int iStatus = 0;
pthread = tid;
iStatus = pthread_create(&tid;NULL, Func, NULL);
if(iStatus)
{
err_abort(iStatus, "Start pthread error");
}
// do some work
...
//Cancel thread
void * retval;
iStatus = pthread_cancel(tid)
iStatus = pthread_join(tid; &retval);
if(iStatus){
err_abort(iStatus,"Stop thread error");
}
return iStatus;
}
where program run, it stop at "iStatus = pthread_join(tid1; &retval);" couldn't go forward anymore, I think the thread could be happed to deadlock, but can't find the reason. I supposed after call pthread_cancel(), the thread will exit and return to the pthread_join(),
who can tell me what's wrong with my code?
Don't put cleanup_push and _pop inside the while loop. Don't call them more than once. If you look at them, they are macros that wrap the code between them in { }. They setup a longjump that is used when you call pthread_cancel.
pthread_cleanup_pop(1) tells the pthread library to not only pop the cleanup entry off the stack, but to also execute it. So that call will also implicitly call:
pthread_mutex_unlock(&mutexCmd);
Since you've already unlocked the mutex, that call has undefined behavior (assuming the mutex type is PTHREAD_MUTEX_NORMAL). I imagine that call is just never returning or something.
Note that your code has other problems handing the cleanup - if you execute the continue for the loop, you'll call pthread_cleanup_push() a second time (or more), which will add another cleanup context.
There may be other problems (I'm not very familiar with pthread_cancel()).

Resources