I have written a sequential code for a simulation process. Basically the flow goes like the following:
int count=0;
int i, j;
for(i=0;i<n;i++)
{
for(j=0;j<n;j++)
{
label:
if(j==i) continue;
// Some vector calculations and finding out X value example: (X=i+j)
}
// Some calculations with respect to i value example: (K=i*0.8+0.2)
// Example:Z=X+K;
if(Z==0.5)
{
count++;
if(count<3)
{
goto label;
}
}
}
I am new to parallel programming. How can I implement the above code using OpenMP and Pthreads.
I tried using #pragma omp parallel for collapse(2) before the first for loop and I am stuck in figuring out the next steps to be followed.
Related
// Online C++ compiler to run C++ program online
#include <bits/stdc++.h>
using namespace std;
struct Node{
int data;
Node *next;
Node(int val)
{
data=val;
next=NULL;
}
};
Node* findIntersection(Node* head1, Node* head2)
{
// Your Code Here
Node* h=nullptr;
Node* temp=h;
while(head2!=nullptr){
while(head1!=nullptr && head1->data<head2->data){
head1=head1->next;
}
cout<<head1->data<<endl;
if(head1->data==head2->data){
if(h==nullptr){
h=new Node(head1->data);
}
else{
temp->next=new Node(head1->data);
temp=temp->next;
}
}
head2=head2->next;
}
return h;
}
int main() {
// Write C++ code here
Node* l1= new Node(1);
l1->next=new Node(2);
l1->next->next=new Node(3);
l1->next->next->next=new Node(4);
l1->next->next->next->next=new Node(6);
Node* l2=new Node(2);
l1->next=new Node(4);
l1->next->next=new Node(6);
l1->next->next->next=new Node(8);
findIntersection(l1,l2);
return 0;
}
I tried to find intersection of two linked lists with elements in sorted order.
This is a question on GFG which can be found on link:
https://practice.geeksforgeeks.org/problems/intersection-of-two-sorted-linked-lists/1?page=1&category[]=Linked%20List&sortBy=difficulty
Why this code is giving segmentation fault in linked list intersection?
Because there are bugs in it. You'll benefit from learning how to debug small programs.
#include <bits/stdc++.h>
Unrelated to your crash, but you should never #include anything from the bits directory.
while(head2!=nullptr){
while(head1!=nullptr && head1->data<head2->data){
head1=head1->next;
}
// We could get here if head1==nullptr. The next line will crash.
cout<<head1->data<<endl;
...
else {
// temp was assigned NULL in the start, and now is being dereferenced.
temp->next=new Node(head1->data);
temp=temp->next;
}
Consider the following code which prints out the even numbers up to 20:
import std.stdio;
class count_to_ten{
static int opApply()(int delegate(ref int) dg) {
int i = 1;
int ret;
while(i <= 10){
ret = dg(i);
if(ret != 0) {
break;
}
i++;
}
return ret;
}
}
void main() {
int y = 2;
foreach(int x; count_to_ten) {
writeln(x * y);
}
}
The syntax of opApply requires that it take a delegate or function as a normal argument. However, even if we relaxed that and allowed opApply to take a function as a template argument, we still would have no recourse for delegates because D doesn't provide any way to separate the stack-frame pointer from the function pointer. However, this seems like it should be possible since the function-pointer part of the delegate is commonly a compile-time constant. And if we could do that and the body of the loop was short, then it could actually be inlined which might speed this code up quite a bit.
Is there any way to do this? Does the D compiler have some trick by which it happens automagically?
Node *head = &node1;
while (head)
{
#pragma omp task
cout<<head->value<<endl;
head = head->next;
}
#pragma omp parallel
{
#pragma omp single
{
Node *head = &node1;
while (head)
{
#pragma omp task
cout<<head->value<<endl;
head = head->next;
}
}
}
In the first block, I just created tasks without parallel directive, while in the second block, I used parallel directive and single directive which is a common way I saw in the papers.
I wonder what's the difference between them? BTW, I know the basic meaning of these directives.
The code in my comment:
void traverse(node *root)
{
if (root->left)
{
#pragma omp task
traverse(root->left);
}
if (root->right)
{
#pragma omp task
traverse(root->right);
}
process(root);
}
The difference is that in the first block you are not really creating any tasks since the block itself is not nested (neither syntactically nor lexically) inside an active parallel region. In the second block the task construct is syntactically nested inside the parallel region and would queue explicit tasks if the region happens to be active at run-time (an active parallel region is one that executes with a team of more than one thread). Lexical nesting is less obvious. Observe the following example:
void foo(void)
{
int i;
for (i = 0; i < 10; i++)
#pragma omp task
bar();
}
int main(void)
{
foo();
#pragma omp parallel num_threads(4)
{
#pragma omp single
foo();
}
return 0;
}
The first call to foo() happens outside of any parallel regions. Hence the task directive does (almost) nothing and all calls to bar() happen serially. The second call to foo() comes from inside the parallel region and hence new tasks would be generated inside foo(). The parallel region is active since the number of threads was fixed to 4 by the num_threads(4) clause.
This different behaviour of the OpenMP directives is a design feature. The main idea is to be able to write code that could execute both as serial and as parallel.
Still the presence of the task construct in foo() does some code transformation, e.g. foo() is transformed to something like:
void foo_omp_fn_1(void *omp_data)
{
bar();
}
void foo(void)
{
int i;
for (i = 0; i < 10; i++)
OMP_make_task(foo_omp_fn_1, NULL);
}
Here OMP_make_task() is a hypothetical (not publicly available) function from the OpenMP support library that queues a call to the function, supplied as its first argument. If OMP_make_task() detects, that it works outside an active parallel region, it would simply call foo_omp_fn_1() instead. This adds some overhead to the call to bar() in the serial case. Instead of main -> foo -> bar, the call goes like main -> foo -> OMP_make_task -> foo_omp_fn_1 -> bar. The implication of this is slower serial code execution.
This is even more obviously illustrated with the worksharing directive:
void foo(void)
{
int i;
#pragma omp for
for (i = 0; i < 12; i++)
bar();
}
int main(void)
{
foo();
#pragma omp parallel num_threads(4)
{
foo();
}
return 0;
}
The first call to foo() would run the loop in serial. The second call would distribute the 12 iterations among the 4 threads, i.e. each thread would only execute 3 iteratons. Once again, some code transformation magic is used to achieve this and the serial loop would run slower than if no #pragma omp for was present in foo().
The lesson here is to never add OpenMP constructs where they are not really necessary.
Considering :
void saxpy_worksharing(float* x, float* y, float a, int N) {
#pragma omp parallel for
for (int i = 0; i < N; i++) {
y[i] = y[i]+a*x[i];
}
}
And
void saxpy_tasks(float* x, float* y, float a, int N) {
#pragma omp parallel
{
for (int i = 0; i < N; i++) {
#pragma omp task
{
y[i] = y[i]+a*x[i];
}
}
}
What is the difference using tasks and the omp parallel directive ? Why can we write recursive algorithms such as merge sort with tasks, but not with worksharing ?
I would suggest that you have a look at the OpenMP tutorial from Lawrence Livermore National Laboratory, available here.
Your particular example is one that should not be implemented using OpenMP tasks. The second code creates N times the number of threads tasks (because there is an error in the code beside the missing }; I would come back to it later), and each task is only performing a very simple computation. The overhead of tasks would be gigantic, as you can see in my answer to this question. Besides the second code is conceptually wrong. Since there is no worksharing directive, all threads would execute all iterations of the loop and instead of N tasks, N times the number of threads tasks would get created. It should be rewritten in one of the following ways:
Single task producer - common pattern, NUMA unfriendly:
void saxpy_tasks(float* x, float* y, float a, int N) {
#pragma omp parallel
{
#pragma omp single
{
for (int i = 0; i < N; i++)
#pragma omp task
{
y[i] = y[i]+a*x[i];
}
}
}
}
The single directive would make the loop run inside a single thread only. All other threads would skip it and hit the implicit barrier at the end of the single construct. As barriers contain implicit task scheduling points, the waiting threads will start processing tasks immediately as they become available.
Parallel task producer - more NUMA friendly:
void saxpy_tasks(float* x, float* y, float a, int N) {
#pragma omp parallel
{
#pragma omp for
for (int i = 0; i < N; i++)
#pragma omp task
{
y[i] = y[i]+a*x[i];
}
}
}
In this case the task creation loop would be shared among the threads.
If you do not know what NUMA is, ignore the comments about NUMA friendliness.
I'm newer to C++. I have written some code, but when i run it, there's always this:
raised exception class
EAccessViolation with message 'Access
violation at address'
i don't understand this. Would you like to help me solve it? It's important to me. Really, really thank you!
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <math.h>
#include <conio.h>
#define k 2
#define minoffset 0.5
using namespace std;
struct Point
{
double X;
double Y;
};
vector<Point> dataprocess();
void k_means(vector<Point> points,int N);
double getdistance(Point p1,Point p2)
{ double distance;
distance=sqrt((p1.X-p2.X)*(p1.X-p2.X)+(p1.Y-p2.Y)*(p1.Y-p2.Y));
return distance;
}
int getmindis(Point p,Point means[])
{
int i;
int c;
double dis=getdistance(p,means[0]);
for(i=1;i<k;i++)
{
double term=getdistance(p,means[i]);
if(term<dis)
{
c=i;
dis=term;
}
}
return c;
}
Point getmeans(vector<Point> points)
{
int i;
double sumX,sumY;
Point p;
int M=points.size();
for(i=0;i<M;i++)
{
sumX=points[i].X;
sumY=points[i].Y;
}
p.X=sumX/M;
p.Y=sumY/M;
return p;
}
int main()
{ int N;
vector<Point> stars;
stars=dataprocess();
N=stars.size();
cout<<"the size is:"<<N<<endl;
k_means(stars,N);
getch();
}
vector<Point> dataprocess()
{
int i;
int N;
double x,y;
vector<Point> points;
Point p;
string import_file;
cout<<"input the filename:"<<endl;
cin>>import_file;
ifstream infile(import_file.c_str());
if(!infile)
{
cout<<"read error!"<<endl;
}
else
{
while(infile>>x>>y)
{
p.X=x;
p.Y=y;
points.push_back(p);
}
}
N=points.size();
cout<<"output the file data:"<<endl;
for(i=0;i<N;i++)
{
cout<<"the point"<<i+1<<"is:X="<<points[i].X<<" Y="<<points[i].Y<<endl;
}
return points;
}
void k_means(vector<Point> points,int N)
{
int i;
int j;
int index;
vector<Point> clusters[k];
Point means[k];
Point newmeans[k];
double d,offset=0;
bool flag=1;
cout<<"there will be"<<k<<"clusters,input the original means:"<<endl;
for(i=0;i<k;i++)
{
cout<<"k"<<i+1<<":"<<endl;
cin>>means[i].X>>means[i].Y;
}
while(flag)
{
for(i=0;i<N;i++)
{
index=getmindis(points[i],means);
clusters[index].push_back(points[i]);
}
for(j=0;j<k;j++)
{
newmeans[j]=getmeans(clusters[j]);
offset=getdistance(newmeans[j],means[j]);
}
if(offset>d)
{
d=offset;
}
flag=(minoffset<d)?true:false;
for(i=0;i<k;i++)
{
means[i]=newmeans[i];
clusters[i].clear();
}
}
for(i=0;i<k;i++)
{
cout<<"N"<<i+1<<"="<<clusters[i].size()<<endl;
cout<<"the center of k"<<i+1<<"is:"<<means[i].X<<" "<<means[i].Y<< endl;
}
}
You surely have some algo errors in you code. It is difficult to deal with code without input data, that caused an error, but let's try:
First, lets look at function Point getmeans(vector<Point> points)
it is supposed to evaluate mean coordinates for cluster of points: if you pass an empty cluster to this function it will cause an error:
look here -
int M=points.size()
and here -
for(i=0;i<M;i++)
{
sumX=points[i].X;
sumY=points[i].Y;
}
if your cluster is empty than M will be zero and you loop will iterate 2^31 times (until 32 bit integer overflow) and each time you will try to read values of nonexistent vector items
So, You have to test if you vector is not empty before running main function loop and you have to decide which mean values should be assigned for zero cluster (May be you need an additional flag for empty cluster which will be checked before dealing with cluster's mean values)
Then lets examine function int getmindis(Point p,Point means[]) and, also, a place, where we call it:
index=getmindis(points[i],means); clusters[index].push_back(points[i]);
This function assings points to clusters. cluster number is ruled by c variable. If input point doesn't fit to any cluster, function will return uninitialized variable (holding any possible value) which. then is used as vector index of nonexisting element - possible access violation error
You probably have to initialize c to zero in declaration
Tell us when you will be ready with errors described above and also show us a sample input file (one which causes errors, if all datasets cause errors, show us the smallest one)