Here is my code:
struct S {
int a, b;
float c, d;
};
class A {
private:
S* d;
S h[3];
public:
A() {
cutilSafeCall(cudaMalloc((void**)&d, sizeof(S)*3));
}
void Init();
};
void A::Init() {
for (int i=0;i<3;i++) {
h[i].a = 0;
h[i].b = 1;
h[i].c = 2;
h[i].d = 3;
}
cutilSafeCall(cudaMemcpy(d, h, 3*sizeof(S), cudaMemcpyHostToDevice));
}
A a;
In fact it is a complex program which contain CUDA and OpenGL. When I debug this program, it fails when running at cudaMemcpy with the error information
cudaSafeCall() Runtime API error 11: invalid argument.
Actually, this program is transformed from another one that can run correctly. But in that one, I used two variables S* d and S h[3] in the main function instead of in the class. What is more weird is that I implement this class A in a small program, it works fine.
And I've updated my driver, error still exists.
Could anyone give me a hint on why this happen and how to solve it. Thanks.
Because the memory operations in CUDA are blocking, they make a synchronization point. So other errors, if not checked with cudaThreadSynchonize, will seem like errors on the memory calls.
So if an error is received on a memory operation, try to place a cudaThreadSynchronize before it and check the result.
Be sure that the first malloc statement is being executed. If it is a problem about initialization of CUDA, like #Harrism indicate, then it would fail in this statement?? Try to place printf statements, and see proper initializations are performed. I think generally invalid argument errors are generated because of using uninitalized memory areas.
Write a printf to your constructor showing the address of the cudaMalloc'ed memory area
A()
{
d = NULL;
cutilSafeCall(cudaMalloc((void**)&d, sizeof(S)*3));
printf("D: %p\n", d);
}
Try to make a memory copy for an area that is locally allocated, namely move the cudaMalloc to above of cudaMemcopy (just for testing).
void A::Init()
{
for (int i=0;i<3;i++)
{
h[i].a = 0;
h[i].b = 1;
h[i].c = 2;
h[i].d = 3;
}
cutilSafeCall(cudaMalloc((void**)&d, sizeof(S)*3)); // here!..
cutilSafeCall(cudaMemcpy(d, h, 3*sizeof(S), cudaMemcpyHostToDevice));
}
Good luck.
Related
See update 1 below for my guess as to why the error is happening
I'm trying to develop an application with some C#/WPF and C++. I am having a problem on the C++ side on a part of the code that involves optimizing an object using GNU Scientific Library (GSL) optimization functions. I will avoid including any of the C#/WPF/GSL code in order to keep this question more generic and because the problem is within my C++ code.
For the minimal, complete and verifiable example below, here is what I have. I have a class Foo. And a class Optimizer. An object of class Optimizer is a member of class Foo, so that objects of Foo can optimize themselves when it is required.
The way GSL optimization functions take in external parameters is through a void pointer. I first define a struct Params to hold all the required parameters. Then I define an object of Params and convert it into a void pointer. A copy of this data is made with memcpy_s and a member void pointer optimParamsPtr of Optimizer class points to it so it can access the parameters when the optimizer is called to run later in time. When optimParamsPtr is accessed by CostFn(), I get the following error.
Managed Debugging Assistant 'FatalExecutionEngineError' : 'The runtime
has encountered a fatal error. The address of the error was at
0x6f25e01e, on thread 0x431c. The error code is 0xc0000005. This error
may be a bug in the CLR or in the unsafe or non-verifiable portions of
user code. Common sources of this bug include user marshaling errors
for COM-interop or PInvoke, which may corrupt the stack.'
Just to ensure the validity of the void pointer I made, I call CostFn() at line 81 with the void * pointer passed as an argument to InitOptimizer() and everything works. But in line 85 when the same CostFn() is called with the optimParamsPtr pointing to data copied by memcpy_s, I get the error. So I am guessing something is going wrong with the memcpy_s step. Anyone have any ideas as to what?
#include "pch.h"
#include <iostream>
using namespace System;
using namespace System::Runtime::InteropServices;
using namespace std;
// An optimizer for various kinds of objects
class Optimizer // GSL requires this to be an unmanaged class
{
public:
double InitOptimizer(int ptrID, void *optimParams, size_t optimParamsSize);
void FreeOptimizer();
void * optimParamsPtr;
private:
double cost = 0;
};
ref class Foo // A class whose objects can be optimized
{
private:
int a; // An internal variable that can be changed to optimize the object
Optimizer *fooOptimizer; // Optimizer for a Foo object
public:
Foo(int val) // Constructor
{
a = val;
fooOptimizer = new Optimizer;
}
~Foo()
{
if (fooOptimizer != NULL)
{
delete fooOptimizer;
}
}
void SetA(int val) // Mutator
{
a = val;
}
int GetA() // Accessor
{
return a;
}
double Optimize(int ptrID); // Optimize object
// ptrID is a variable just to change behavior of Optimize() and show what works and what doesn't
};
ref struct Params // Parameters required by the cost function
{
int cost_scaling;
Foo ^ FooObj;
};
double CostFn(void *params) // GSL requires cost function to be of this type and cannot be a member of a class
{
// Cast void * to Params type
GCHandle h = GCHandle::FromIntPtr(IntPtr(params));
Params ^ paramsArg = safe_cast<Params^>(h.Target);
h.Free(); // Deallocate
// Return the cost
int val = paramsArg->FooObj->GetA();
return (double)(paramsArg->cost_scaling * val);
}
double Optimizer::InitOptimizer(int ptrID, void *optimParamsArg, size_t optimParamsSizeArg)
{
optimParamsPtr = ::operator new(optimParamsSizeArg);
memcpy_s(optimParamsPtr, optimParamsSizeArg, optimParamsArg, optimParamsSizeArg);
double ret_val;
// Here is where the GSL stuff would be. But I replace that with a call to CostFn to show the error
if (ptrID == 1)
{
ret_val = CostFn(optimParamsArg); // Works
}
else
{
ret_val = CostFn(optimParamsPtr); // Doesn't work
}
return ret_val;
}
// Release memory used by unmanaged variables in Optimizer
void Optimizer::FreeOptimizer()
{
if (optimParamsPtr != NULL)
{
delete optimParamsPtr;
}
}
double Foo::Optimize(int ptrID)
{
// Create and initialize params object
Params^ paramsArg = gcnew Params;
paramsArg->cost_scaling = 11;
paramsArg->FooObj = this;
// Convert Params type object to void *
void * paramsArgVPtr = GCHandle::ToIntPtr(GCHandle::Alloc(paramsArg)).ToPointer();
size_t paramsArgSize = sizeof(paramsArg); // size of memory block in bytes pointed to by void pointer
double result = 0;
// Initialize optimizer
result = fooOptimizer->InitOptimizer(ptrID, paramsArgVPtr, paramsArgSize);
// Here is where the loop that does the optimization will be. Removed from this example for simplicity.
return result;
}
int main()
{
Foo Foo1(2);
std::cout << Foo1.Optimize(1) << endl; // Use orig void * arg in line 81 and it works
std::cout << Foo1.Optimize(2) << endl; // Use memcpy_s-ed new void * public member of Optimizer in line 85 and it doesn't work
}
Just to reiterate I need to copy the params to a member in the optimizer because the optimizer will run all through the lifetime of the Foo object. So it needs to exist as long as the Optimizer object exist and not just in the scope of Foo::Optimize()
/clr support need to be selected in project properties for the code to compile. Running on an x64 solution platform.
Update 1: While trying to debug this, I got suspicious of the way I get the size of paramsArg at line 109. Looks like I am getting the size of paramsArg as size of int cost_scaling plus size of the memory storing the address to FooObj instead of the size of memory storing FooObj itself. I realized this after stumbling across this answer to another post. I confirmed this by checking the value of paramsArg after adding some new dummy double members to Foo class. As expected the value of paramsArg doesn't change. I suppose this explains why I get the error. A solution would be to write code to correctly calculate the size of a Foo class object and set that to paramsArg instead of using sizeof. But that is turning out to be too complicated and probably another question in itself. For example, how to get size of a ref class object? Anyways hopefully someone will find this helpful.
I'm using EEPROM on Arduino to store some large constant array. I noticed that both EEPROM.read(address) and EEPROM[address] works for my reading. But there are few documentations on the EEPROM[address] method. I also experienced occasional memory crash with that method.
EEPROM.read(address) has not been fully tested for long run. It does take more storage space when compiling. Is it safer for its behavior behind the scene?
EEPROM.read(adress) ->Read the EEPROM (address starting form 0)and send its value as unsigned char.
EEPROM[adress] -> reference eeprom cell with address
To reduce the size of the you can use avr/eeprom library , which has various function and macros for the eeprom usage. This is a reliable library and well tested.
avr/eeprom.h
Sample Code
#include <EEPROM.h>
#include <avr/eeprom.h>
void Eepromclr();
void setup() {
Serial.begin(9600);
eeprom_write_byte((void*)0,12);
int x = eeprom_read_byte((void*)0);\
Serial.println(x);
Eepromclr();
eeprom_update_byte((void*)0,6);
int y = eeprom_read_byte((void*)0);
Serial.println(y);
}
void loop() {
}
void Eepromclr() {
for (int i = 0 ; i < EEPROM.length() ; i++) {
EEPROM.write(i, 0);
}
Serial.println("Eeprom is cleared");
}
EEPROM[adress] will give you a reference to the eeprom cell while EEPROM.read(adress) will give you an unsigned char value from that cell.
In both cases you should ensure that your adress is valid.
make sure adress is >= 0 and < EEPROM.length().
I need to force the Metal compiler to unroll a loop in my kernel compute function. So far I've tried to put #pragma unroll(num_times) before a for loop, but the compiler ignores that statement.
It seems that the compiler doesn't unroll the loops automatically — I compared execution times for 1) a code with for loop 2) the same code but with hand-unrolled loop. The hand-unrolled version was 3 times faster.
E.g.: I want to go from this:
for (int i=0; i<3; i++) {
do_stuff();
}
to this:
do_stuff();
do_stuff();
do_stuff();
Is there even something like loop unrolling in the Metal C++ language? If yes, how can I possibly let the compiler know I want to unroll a loop?
Metal is a subset C++11, and you can try using template metaprogramming to unroll loops. The following compiled in metal, though I don't have time to properly test it:
template <unsigned N> struct unroll {
template<class F>
static void call(F f) {
f();
unroll<N-1>::call(f);
}
};
template <> struct unroll<0u> {
template<class F>
static void call(F f) {}
};
kernel void test() {
unroll<3>::call(do_stuff);
}
Please let me know if it works! You'll probably have to add some arguments to call to pass arguments to do_stuff.
See also: Self-unrolling macro loop in C/C++
Consider the following code which prints out the even numbers up to 20:
import std.stdio;
class count_to_ten{
static int opApply()(int delegate(ref int) dg) {
int i = 1;
int ret;
while(i <= 10){
ret = dg(i);
if(ret != 0) {
break;
}
i++;
}
return ret;
}
}
void main() {
int y = 2;
foreach(int x; count_to_ten) {
writeln(x * y);
}
}
The syntax of opApply requires that it take a delegate or function as a normal argument. However, even if we relaxed that and allowed opApply to take a function as a template argument, we still would have no recourse for delegates because D doesn't provide any way to separate the stack-frame pointer from the function pointer. However, this seems like it should be possible since the function-pointer part of the delegate is commonly a compile-time constant. And if we could do that and the body of the loop was short, then it could actually be inlined which might speed this code up quite a bit.
Is there any way to do this? Does the D compiler have some trick by which it happens automagically?
I am not able to execute pthreads program in c. Please tell me what is wrong with the following program. I am neither getting any error nor expected output.
void *worker(void * arg)
{
int i;
int *id=(int *)arg;
printf("Thread %d starts\n", *id );
}
void main(int argc, char **argv)
{
int thrd_no,i,*thrd_id,rank=0;
void *exit_status;
pthread_t *threads;
thrd_no=atoi(argv[1]-1);
thrd_id= malloc(sizeof(int)*(thrd_no));
threads=malloc(sizeof(pthread_t)*(thrd_no));
for(i=0;i<thrd_no;i++)
{
rank=i+1;
thrd_id[i]=pthread_create(&threads[i], NULL, worker, &rank);
}
for(i=0;i<thrd_no;i++)
{
pthread_join(threads[i], &exit_status);
}
}
thrd_no = atoi(argv[1] - 1); likely doesn't do what you intended; the way argv is normally passed into a new process and parsed into a C array, argv[1] - 1 is probably pointing at \0 (specifically, the \0 at the end of argv[0]). (More generally, indexing backwards off the start of a string is rarely correct.) The result is that atoi() will return 0 and no threads will be created. What did you actually intend to do there?
You are passing the same address &rank to each thread, so id and *id is the same for all your worker-s.
You should better allocate on the heap the address you pass to each worker routine.
You might also include <stdint.h and use intptr_t, e.g.
void worker (void* p)
{
intptr_t rk = (intptr_t) p;
/// etc
}
and call
intptr_t rank = i + 1;
thrd_id[i]=pthread_create(&threads[i], NULL, worker, (void*)rank);
You should learn to use a debugger and compile with all warnings and debug information, i.e. gcc -Wall -g (and improve your code till it gets no warnings, then use gdb)
code segment rank=i+1;
thrd_id[i]=pthread_create(&threads[i], NULL, worker, &rank);
will produce race condition.