Multithreading in Ubuntu 11.10 Vmware - pthreads

I'm writing a multithread program in C++ using "pthread" libraries, but when I come to execute it on Ubuntu Virtual machine, my threads don't seem to run in parallel although I have a multicore processor (i7-2630QM)... the code is too long so I'm going to explain my problem with this simple code:
#include <iostream>
#include <pthread.h>
using namespace std;
void* myfunction(void* arg); //function that the thread will execute
int main()
{
pthread_t thread;
pthread_create(&thread, NULL, myfunction, NULL); //thread created
for (int i=0; i<10; i++) //show "1" 10 times
cout << "1";
pthread_join(thread, NULL); //wait for the thread to finish executing
return 0;
}
void* myfunction(void* arg)
{
for (int j=0; j<10; j++) //show "2" 10 times
cout << "2";
return NULL;
}
When I run this code on my host OS (Windows 7 with VC++2010) I get a result like 12212121211121212..., which is what a multithread app supposed to do, but when I run the same code on the guest OS (Ubuntu on Vmware with Code::Blocks) I always get 11111111112222222222 !!!
AFAIK the thread should run in parallel with the main() function, not sequentially. My VM's core number is set to 4, but it seems that the program use only one core, I don't know what is wrong? is it the code or ...? am I missing something here?
I appreciate any help, thanks in advance and please forgive my english :/

Use a semaphore (global or passed as a parameter to my function) to synchronise the threads properly. You're probably running into timing issues due to the different scheduling characteristics between the OSes (it's not really an issue per se). Have the first thread wait on the semaphore after the call to pthread_create, and have the new thread signal it as soon as myfunction is entered.
Also, your loops are quite short, make them take longer.
example here

Windows and Linux CRT/STDC++ libs have different synchronization behaviors. You can't learn much of anything about parallel execution with calls to cout. Write some actual parallel computation and measure elapsed time to tell what's going on.

Related

Queues in FreeRTOS

I'm using Freescale FRDM-KL25Z board with Codewarrior 10.6 software. My goal is to make small program in FreeRTOS, which reads voltage from thermistor by analog/digital converter (0-3,3v) and depends on this voltage I'd like turn on/off led diodes. It worked for me till the moment, when I added second task and queues. I'm thinking that problem might be in stack size, but I have no idea how to configure it.
Code is below:
xQueueHandle queue_led;
void TaskLed (void *p)
{
uint16_t temp_val;
xQueueReceive(queue_led, &temp_val, 1);
if (temp_val<60000)
{
LED_1_Neg();
}
}
void TaskTemp (void *p)
{
uint16_t temp_val;
(void)AD1_Measure(TRUE);
(void)AD1_GetValue16(&temp_val);
xQueueSendToBack(queue_led, &temp_val, 1000);
FRTOS1_vTaskDelay(1000);
}
Code in main():
xTaskCreate(TaskLed, (signed char *)"tl", 200, NULL, 1, NULL);
xTaskCreate(TaskTemp, (signed char *)"tt", 200, NULL, 1, NULL);
vTaskStartScheduler();
return(0);
A task is normally a continuous thread of execution - that is - it is implemented as an infinite loop that runs forever. It is very rare for a task to exit its loop - and in FreeRTOS you cannot run off the bottom of a function that implements a task without deleting the task (in more recent versions of FreeRTOS you will trigger an assert if you try). Therefore the functions that implement your tasks are not valid.
FreeRTOS has excellent documentation (and an excellent support forum, for that matter, which would be a more appropriate place to post this question). You can see how a task should be written here: http://www.freertos.org/implementing-a-FreeRTOS-task.html
In the code you post I can't see that you are creating the queue that you are trying to use. That is also documented on the FreeRTOS.org website, and the download has hundreds of examples of how to do it.
If it were a stack issue then Google would show you to go here:
http://www.freertos.org/Stacks-and-stack-overflow-checking.html
You should create the queue and then check that the returned value is not zero (the queue is successfully created)

Why does the nif function block the Erlang VM from scheduling other processes?

When the Erlang VM beam runs some code written in C,the other processes written in Erlang was not scheduled.
For example:
static ERL_NIF_TERM
nifsleep(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[])
{
sleep(10);
return enif_make_atom(env, "ok");
}
when you call this C function in Erlang,the other processes was not schedulling normally.
I want to know why?
Is this a feature or is limited by the implementation(that is,this is a bug)?
The address of the code above is in:https://github.com/davisp/sleepy
beam processes are not mapped to OS threads directly. There is normally 1 scheduler per core. Your call to
sleep(10);
is blocking the scheduler that executed it (as expected, otherwise it would have to intercept that call somehow to make it non-blocking), and so the scheduler can't execute any other erlang process until the call returns.
Long running nif are strongly discouraged. A quick google is enough to find many references, see for example
http://www.erlang.org/doc/man/erl_nif.html#lengthy_work
http://osdir.com/ml/erlang-questions-programming/2013-02/msg00275.html
http://ninenines.eu/articles/erlang-scalability
for comprehensive info about how the scheduler work, see
http://jlouisramblings.blogspot.com.ar/2013/01/how-erlang-does-scheduling.html

Why the child thread block in my code?

I was trying to test the pthread by using mutex:
#include <stdio.h>
#include <pthread.h>
#include <sys/types.h>
#include <stdlib.h>
#include <unistd.h>
pthread_mutex_t mutex1 = PTHREAD_MUTEX_INITIALIZER;
int global = 0;
void thread_set();
void thread_read();
int main(void){
pthread_t thread1, thread2;
int re_value1, re_value2;
int i;
for(i = 0; i < 5; i++){
re_value1 = pthread_create(&thread1,NULL, (void*)&thread_set,NULL);
re_value2 = pthread_create(&thread2,NULL,(void*)&thread_read,NULL);
}
pthread_join(thread1,NULL);
pthread_join(thread2,NULL);
/* sleep(2); */ // without it the 5 iteration couldn't finish
printf("exsiting\n");
exit(0);
}
void thread_set(){
pthread_mutex_lock(&mutex1);
printf("Setting data\t");
global = rand();
pthread_mutex_unlock(&mutex1);
}
void thread_read(){
int data;
pthread_mutex_lock(&mutex1);
data = global;
printf("the value is: %d\n",data);
pthread_mutex_unlock(&mutex1);
}
without the sleep(), the code won't finish the 5 iteration:
Setting data the value is: 1804289383
the value is: 1804289383
Setting data the value is: 846930886
exsiting
Setting data the value is: 1804289383
Setting data the value is: 846930886
the value is: 846930886
exsiting
It works only add the sleep() to the main thread, I think it should work without the sleep(), because the join() function wait for each child thread terminate
Any one can tell me why is it?
Your use of mutex objects looks fine, but this loop
for(i = 0; i < 5; i++) {
re_value1 = pthread_create(&thread1,NULL, (void*)&thread_set,NULL);
re_value2 = pthread_create(&thread2,NULL,(void*)&thread_read,NULL);
}
is asking for trouble as you are reusing the same thread instances thread1 and thread2 for each iteration of your loop. Internally this must be causing problems although I do not know exactly how it would manifest itself. You should really use a separate instance of thread object per thread you want to ensure reliable running. I do not know what would happen if you called pthread_create using an instance of a thread object that is already running but I suspect it is not a wise thing to do. I would suspect that at best it will block until the thread function has exited.
Also you are not cheking the return values from pthread_create() either which might be a good idea. In summary I would use a separate instance of thread objects, or add the pthread_join calls to the inside of your loop so that you are certain that the threads are finished running before the next call to pthread_create().
Finally the function signature for functions passed to pthread_create() are of the type
void* thread_function(void*);
and not
void thead_function()
like you have in your code.
You are creating 10 threads (5 iterations of two threads each), but only joining the last two you create (as mathematician1975 notes, you're re-using the thread handle variables, so the values from the last iteration are the only ones you have available to join on). Without the sleep(), it's quite possible that the scheduler has not started executing the first 8 threads before you hit exit(), which automatically terminates all threads, whether they have had a chance to run yet or not.

Odd behavior when creating and cancelling a thread in close succession

I'm using g++ version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) and libpthread v. 2-11-1. The following code simply creates a thread running Foo(), and immediately cancels it:
void* Foo(void*){
printf("Foo\n");
/* wait 1 second, e.g. using nanosleep() */
return NULL;
}
int main(){
pthread_t thread;
int res_create, res_cancel;
printf("creating thread\n);
res_create = pthread_create(&thread, NULL, &Foo, NULL);
res_cancel = pthread_cancel(thread);
printf("cancelled thread\n);
printf("create: %d, cancel: %d\n", res_create, res_cancel);
return 0;
}
The output I get is:
creating thread
Foo
Foo
cancelled thread
create: 0, cancel: 0
Why the second Foo output? Am I abusing the pthread API by calling pthread_cancel right after pthread_create? If so, how can I know when it's safe to touch the thread? If I so much as stick a printf() between the two, I don't have this problem.
I cannot reproduce this on a slightly newer Ubuntu. Sometimes I get one Foo and sometimes none. I had to fix a few things to get your code to compile (missing headers, missing call to some sleep function implied by a comment and string literals not closed), which indicate you did not paste the actual code which reproduced the problem.
If the problem is indeed real, it might indicate some thread cancellation problem in glibc's IO library. It looks a lot like two threads doing a flush(stdout) on the same buffer contents. Now that should never happen normally because the IO library is thread safe. But what if there is some cancellation scenario like: the thread has the mutex on stdout, and has just done a flush, but has not updated the buffer yet to clear the output. Then it is canceled before it can do that, and the main thread flushes the same data again.

trouble reading from __global memory after atom_inc in OpenCL

OpenCL doesn't have a global barrier that will stop all threads, so I'm trying to create a work around with the following code:
void barrier(__global uint* scratch) {
uint nThreads = get_global_size(0);
atom_inc(scratch);
/* this loop never terminates */
while(scratch[0] < nThreads) {
continue;
}
}
The idea is that each thread loops until all of them increment that one piece of memory.
However, the value read from scratch[0] never changes for the threads once it's been read, and it loops forever. I know it's being incremented because it's the correct value when I read it back to the host.
Is the global memory being locally cached? What's going on here?
Found the problem: the order in which work groups are executed is implementation defined. This means that some threads might start only after others have finished.
In the code I gave, the work groups that are started first will loop forever waiting on the the others to hit the 'barrier'. And the work groups that would be started later won't ever start because they're waiting for the first ones to finish.
If the implementation (I'm on a Radeon 5750, using Stream SDK 2.2) executes all work groups concurrently, then it probably wouldn't be an issue. But that's not the case for my setup.

Resources