I'm trying to implement my own version of the libgomb library. Now I'm implementing the GOMP_task() function but there are somo parameters that I donĀ“t understand.
void
GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
long arg_size, long arg_align, bool if_clause, unsigned flags,
void **depend)
I have problems with this parameters
unsigned flags
void **depend
When I compile this code with my own libgomp library
#pragma omp parallel
{
#pragma omp for schedule(dynamic,3)
for (long i = 0; i < 10; i++){
#pragma omp task shared(result) depend(in:result) depend(out:result)
{
result++;}
#pragma omp task depend(in:result) depend(out:result)
result++;
}
#pragma omp taskwait
printf("result = %ld\n", result);
}
I print those two parameters and flags is always 8 and depend is always the same value even if the differents tasks depens of differents variables.
There is any document with information about that? Because I haven't found anything.
Or does anybody knows about that function?
Thanks
Related
We know that we call pthread like this:
int pthread_create(pthread_t *thread, const pthread_attr_t *attr,
void *(*start_routine) (void *), void* arg);
Hi guys, i want to know why the return type of third parameter is void*? why not void?
Because there is no way for a start function to know what kind of data a developer wants to return from the function they use a void* that can point to any type. It is up to the developer of the start function to then cast the void* to appropriate type he actually returned before using whatever the void* points to. So now the start function can return a pointer that may in actually point to anything. If the start function is declared to return void, it means this function returns nothing, then what if the developer wants the start function to return a int, a struct? For example:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <pthread.h>
struct test {
char str[32];
int x;
};
void *func(void*) {
struct test *eg = (struct test *)malloc(sizeof(struct test));
strcpy(eg->str,"hello world");
eg->x = 42;
pthread_exit(eg);
}
int main (void) {
pthread_t id;
struct test *resp;
pthread_create(&id, NULL, func, NULL);
pthread_join(id,(void**)&resp);
printf("%s %d\n",resp->str,resp->x);
free(resp);
return 0;
}
More details on this post: What does void* mean and how to use it?
I want to write a sample program in which 16 threads have access to a shared object with huge size like 10gb. I know that I can use pthread_mutex_t to get the lock on the object, but how can I make it efficient so that two or more thread can modify disjoint part of the shared object simultaneously?
Maybe you can create an array of 10 pthread_mutex_t's, one for each 1gb range, and lock the appropriate mutex for the range you'll be modifying?
What about using a sempahore. You can initialize semaphore with number of threads that shares the resources.
/* Includes */
#include <unistd.h> /* Symbolic Constants */
#include <sys/types.h> /* Primitive System Data Types */
#include <errno.h> /* Errors */
#include <stdio.h> /* Input/Output */
#include <stdlib.h> /* General Utilities */
#include <pthread.h> /* POSIX Threads */
#include <string.h> /* String handling */
#include <semaphore.h> /* Semaphore */
void semhandler ( void *ptr );
sem_t mutex;
int cntr=0; /* shared variable */
int main()
{
int arg[2];
pthread_t thread1;
pthread_t thread2;
arg[0] = 0;
arg[1] = 1;
/* initialize mutex to 2 to share resource with two threads*/
/* Seconds Argumnet "0" makes the semaphore local to the process */
sem_init(&mutex, 0, 2);
pthread_create (&thread1, NULL, (void *) &semhandler, (void *) &arg[0]);
pthread_create (&thread2, NULL, (void *) &semhandler, (void *) &arg[1]);
pthread_join(thread1, NULL);
pthread_join(thread2, NULL);
sem_destroy(&mutex);
exit(0);
} /* main() */
void semhandler ( void *ptr )
{
int x;
x = *((int *) ptr);
printf("Thrd %d: Waiting to enter critical region...\n", x);
sem_wait(&mutex); /* down semaphore */
if( x == 1 )
cntr++;
/* START CRITICAL REGION */
printf("Thrd %d: Now in critical region...\n", x);
printf("Thrd %d: New Counter Value: %d\n", x, cntr);
printf("Thrd %d: Exiting critical region...\n", x);
/* END CRITICAL REGION */
sem_post(&mutex); /* up semaphore */
pthread_exit(0); /* exit thread */
}
I like to build an app to view specific apps without URL scheme I have heard about a Framework called SpringBoardServices but there is always a Linker Error
As far I use this code with the SpringBoardServices.h file
SpringBoardServices.h:
#define SPRINGBOARDSERVICES_H
#if __OBJC__
#if __cplusplus
extern "C" {
#endif
#include <CoreFoundation/CoreFoundation.h>
#include <Availability.h>
mach_port_t SBSSpringBoardServerPort();
#pragma mark -
#pragma mark Application launching
/// Launch an application given the display ID.
/// Equivalent to -[UIApplication launchApplicationWithIdentifier:suspended:].
/// #return 0 on success, nonzero on failure. Feed the result to SBSApplicationLaunchingErrorString() to get the error string.
int SBSLaunchApplicationWithIdentifier(CFStringRef displayIdentifier, Boolean suspended) __OSX_AVAILABLE_STARTING(__MAC_NA, __IPHONE_3_0);
/// Launch an application for debugging.
/// The parameters are not known yet...
int SBSLaunchApplicationForDebugging(void* unknown, ...) __OSX_AVAILABLE_STARTING(__MAC_NA, __IPHONE_3_0);
/// Get the error string from error code returned by SBSLaunchApplicationWithIdentifier().
CFStringRef SBSApplicationLaunchingErrorString(int error);
#pragma mark -
#pragma mark Watchdog assertion
typedef struct __SBSWatchdogAssertion* SBSWatchdogAssertionRef;
CFTypeID SBSWatchdogAssertionGetTypeID();
void SBSWatchdogAssertionCancel(SBSWatchdogAssertionRef assertion);
SBSWatchdogAssertionRef SBSWatchdogAssertionCreateForPID(CFAllocatorRef allocator, pid_t pid);
int SBSWatchdogAssertionRenew(SBSWatchdogAssertionRef assertion);
CFTimeInterval SBSWatchdogAssertionGetRenewalInterval(SBSWatchdogAssertionRef assertion);
#pragma mark -
CFArrayRef SBSCopyApplicationDisplayIdentifiers(Boolean onlyActive, Boolean unknown);
CFStringRef SBSCopyIconImagePathForDisplayIdentifier(CFStringRef dispIden);
CFStringRef SBSCopyLocalizedApplicationNameForDisplayIdentifier(CFStringRef dispIden);
/*
SB functions should be generated by MIG!
#pragma mark -
#pragma mark SB functions - Media
void SBSetMediaVolume(mach_port_t port, int volume) __OSX_AVAILABLE_STARTING(__MAC_NA, __IPHONE_3_0);
void SBSetDisableNowPlayingHUD(mach_port_t port, Boolean disable) __OSX_AVAILABLE_STARTING(__MAC_NA, __IPHONE_3_0);
void SBSetNowPlayingInformation(mach_port_t port, void* info);
#pragma mark -
#pragma mark SB functions - Accessibility
void SBSetZoomTouchEnabled(mach_port_t port, Boolean enable) __OSX_AVAILABLE_STARTING(__MAC_NA, __IPHONE_3_0);
void SBSetDisplayColorsInverted(mach_port_t port, Boolean inverted) __OSX_AVAILABLE_STARTING(__MAC_NA, __IPHONE_3_0);
#pragma mark -
#pragma mark SB functions - Remote
void SBApplicationSetSimpleRemoteRoutingPriority(mach_port_t port, int priority) __OSX_AVAILABLE_STARTING(__MAC_NA, __IPHONE_3_0);
#pragma mark -
#pragma mark SB functions - Watchdog
void SBCancelWatchdogAssertionForProcess(mach_port_t port, pid_t pid, void* unknown);
void SBReloadApplication(mach_port_t port) __OSX_AVAILABLE_STARTING(__MAC_NA, __IPHONE_3_0);
*/
#if __cplusplus
}
#endif
#endif
and here is my code I am using in my ViewController.m file:
#import "SpringBoardServices.h"
-(IBAction)AdSheed {
SBSLaunchApplicationWithIdentifier(CFSTR("com.apple.preferences"), false);
}
Does anyone have an idea how I can solve this problem?
It's a private framework. you're not supposed to use it. You cannot do this without using schemes.
Node *head = &node1;
while (head)
{
#pragma omp task
cout<<head->value<<endl;
head = head->next;
}
#pragma omp parallel
{
#pragma omp single
{
Node *head = &node1;
while (head)
{
#pragma omp task
cout<<head->value<<endl;
head = head->next;
}
}
}
In the first block, I just created tasks without parallel directive, while in the second block, I used parallel directive and single directive which is a common way I saw in the papers.
I wonder what's the difference between them? BTW, I know the basic meaning of these directives.
The code in my comment:
void traverse(node *root)
{
if (root->left)
{
#pragma omp task
traverse(root->left);
}
if (root->right)
{
#pragma omp task
traverse(root->right);
}
process(root);
}
The difference is that in the first block you are not really creating any tasks since the block itself is not nested (neither syntactically nor lexically) inside an active parallel region. In the second block the task construct is syntactically nested inside the parallel region and would queue explicit tasks if the region happens to be active at run-time (an active parallel region is one that executes with a team of more than one thread). Lexical nesting is less obvious. Observe the following example:
void foo(void)
{
int i;
for (i = 0; i < 10; i++)
#pragma omp task
bar();
}
int main(void)
{
foo();
#pragma omp parallel num_threads(4)
{
#pragma omp single
foo();
}
return 0;
}
The first call to foo() happens outside of any parallel regions. Hence the task directive does (almost) nothing and all calls to bar() happen serially. The second call to foo() comes from inside the parallel region and hence new tasks would be generated inside foo(). The parallel region is active since the number of threads was fixed to 4 by the num_threads(4) clause.
This different behaviour of the OpenMP directives is a design feature. The main idea is to be able to write code that could execute both as serial and as parallel.
Still the presence of the task construct in foo() does some code transformation, e.g. foo() is transformed to something like:
void foo_omp_fn_1(void *omp_data)
{
bar();
}
void foo(void)
{
int i;
for (i = 0; i < 10; i++)
OMP_make_task(foo_omp_fn_1, NULL);
}
Here OMP_make_task() is a hypothetical (not publicly available) function from the OpenMP support library that queues a call to the function, supplied as its first argument. If OMP_make_task() detects, that it works outside an active parallel region, it would simply call foo_omp_fn_1() instead. This adds some overhead to the call to bar() in the serial case. Instead of main -> foo -> bar, the call goes like main -> foo -> OMP_make_task -> foo_omp_fn_1 -> bar. The implication of this is slower serial code execution.
This is even more obviously illustrated with the worksharing directive:
void foo(void)
{
int i;
#pragma omp for
for (i = 0; i < 12; i++)
bar();
}
int main(void)
{
foo();
#pragma omp parallel num_threads(4)
{
foo();
}
return 0;
}
The first call to foo() would run the loop in serial. The second call would distribute the 12 iterations among the 4 threads, i.e. each thread would only execute 3 iteratons. Once again, some code transformation magic is used to achieve this and the serial loop would run slower than if no #pragma omp for was present in foo().
The lesson here is to never add OpenMP constructs where they are not really necessary.
Considering :
void saxpy_worksharing(float* x, float* y, float a, int N) {
#pragma omp parallel for
for (int i = 0; i < N; i++) {
y[i] = y[i]+a*x[i];
}
}
And
void saxpy_tasks(float* x, float* y, float a, int N) {
#pragma omp parallel
{
for (int i = 0; i < N; i++) {
#pragma omp task
{
y[i] = y[i]+a*x[i];
}
}
}
What is the difference using tasks and the omp parallel directive ? Why can we write recursive algorithms such as merge sort with tasks, but not with worksharing ?
I would suggest that you have a look at the OpenMP tutorial from Lawrence Livermore National Laboratory, available here.
Your particular example is one that should not be implemented using OpenMP tasks. The second code creates N times the number of threads tasks (because there is an error in the code beside the missing }; I would come back to it later), and each task is only performing a very simple computation. The overhead of tasks would be gigantic, as you can see in my answer to this question. Besides the second code is conceptually wrong. Since there is no worksharing directive, all threads would execute all iterations of the loop and instead of N tasks, N times the number of threads tasks would get created. It should be rewritten in one of the following ways:
Single task producer - common pattern, NUMA unfriendly:
void saxpy_tasks(float* x, float* y, float a, int N) {
#pragma omp parallel
{
#pragma omp single
{
for (int i = 0; i < N; i++)
#pragma omp task
{
y[i] = y[i]+a*x[i];
}
}
}
}
The single directive would make the loop run inside a single thread only. All other threads would skip it and hit the implicit barrier at the end of the single construct. As barriers contain implicit task scheduling points, the waiting threads will start processing tasks immediately as they become available.
Parallel task producer - more NUMA friendly:
void saxpy_tasks(float* x, float* y, float a, int N) {
#pragma omp parallel
{
#pragma omp for
for (int i = 0; i < N; i++)
#pragma omp task
{
y[i] = y[i]+a*x[i];
}
}
}
In this case the task creation loop would be shared among the threads.
If you do not know what NUMA is, ignore the comments about NUMA friendliness.