Dart - Bitshift for Uint - dart

Try bitshifting for a Uint32 in Dart. How to handle this, if there is no support from dart:ffi?
import 'dart:ffi';
Uint32 len = 0 as Uint32;
len >>= 1; // will not be compiled
Try it with "normal" int. But i am afraid, will the result be everytime the same as using Uint32?

The Uint32 type represents native values. There is no Dart value with that type, so your as Uint32 cast will fail.
If you have a Pointer<Uint32>, then you can use that to refer to the native value.
Pointer<Uint32> p = ...;
p.value >>= 1;


Can Lua send extended function Keys? ex F13-F24

I tried sending F13 with kb.stroke("F13");
Well it doesn't work, works fine with anything F12 and below.
I'm trying to use this in a custom remote in Unified Remote app, so my only workaround for know is using os.start to run an ahk script that does the key sending but it's a very slow approach.
Any help will be appreciated.
local ffi = require"ffi"
typedef struct {
uintptr_t type;
uint16_t wVk;
uint16_t wScan;
uint32_t dwFlags;
uint32_t time;
uintptr_t dwExtraInfo;
uint32_t x[2];
} INP;
int SendInput(int, void*, int);
local inp_t = ffi.typeof"INP[2]"
local function PressAndReleaseKey(vkey)
local inp = inp_t()
for j = 0, 1 do
inp[j].type = 1
inp[j].wVk = vkey
inp[j].dwFlags = j * 2
ffi.C.SendInput(2, inp, ffi.sizeof"INP")
PressAndReleaseKey(0x57) -- W
PressAndReleaseKey(0x7C) -- F13

vala: quotient of two integers is always an integer. Why?

Newbie question:
void main () {
int A = 1;
int B = 2;
double C = A / B;
stdout.printf("C value is: %g\n", C);
This prints: "C value is: 0"
void main () {
int A = 1;
double B = 2;
double C = A / B;
stdout.printf("C value is: %g\n", C);
This prints: "C value is: 0.5"
I don't understand the reason why the result is not 0.5 in both cases.
The division operation is performed on two integers, so the result is an integer. The fact that you assign it to a double afterwards doesn't change that.
What you're doing in your question, with the implicit conversions made explicit, is
int A = 1;
int B = 2;
double C = (double) (A / B);
However, if you want to perform the division operation using doubles you have to explicitly cast at least one of the operands to double:
int A = 1;
int B = 2;
double C = ((double) A) / B;
For the rules concerning arithmetic operations, see the arithmetic expressions section of the Vala Manaual. The relevant bit:
If both operands are of integer types, then the result will be the quotient only of the calculation (equivalent to the precise answer rounded down to an integer value.) If either operand is of a floating point type, then the result will be as precise as possible within the boundaries of the result type (which is worked out from the basic arithmetic type rules.)

C to delphi method conversion of openCV sample code

I have a code snippet from openCV example as follows:
CvScalar sum_line_pixels( IplImage* image, CvPoint pt1, CvPoint pt2 )
CvLineIterator iterator;
int blue_sum = 0, green_sum = 0, red_sum = 0;
int count = cvInitLineIterator( image, pt1, pt2, &iterator, 8, 0 );
for( int i = 0; i < count; i++ ){
blue_sum += iterator.ptr[0];
green_sum += iterator.ptr[1];
red_sum += iterator.ptr[2];
/* print the pixel coordinates: demonstrates how to calculate the
coordinates */
int offset, x, y;
/* assume that ROI is not set, otherwise need to take it
into account. */
offset = iterator.ptr - (uchar*)(image->imageData);
y = offset/image->widthStep;
x = (offset - y*image->widthStep)/(3*sizeof(uchar)
/* size of pixel */);
printf("(%d,%d)\n", x, y );
return cvScalar( blue_sum, green_sum, red_sum );
I got stuck on the line:
offset = iterator.ptr - (uchar*)(image->imageData);
Iterator structure is:
PCvLineIterator = ^TCvLineIterator;
TCvLineIterator = packed record
ptr: ^UCHAR;
err: Integer;
plus_delta: Integer;
minus_delta: Integer;
plus_step: Integer;
minus_step: Integer;
image->imageData is
imageData: PByte;
Could someone help me convert the offset line to delphi?
The line that calculates offset is simply calculating the number of bytes between the pointers iterator.ptr and image->imageData. Assuming you are using the same variable names a Delphi version of that code would be like this:
offset := PByte(iterator.ptr) - image.ImageData;
However, since you are using an older version of Delphi, the above code will not compile. Older Delphi versions (pre Delphi 2009) don't permit pointer arithmetic on types other than PAnsiChar. So you will need to write it like this:
offset := PAnsiChar(iterator.ptr) - PAnsiChar(image.ImageData);
I suspect that what is confusing you in the C code is (uchar*). That is the C syntax for a type cast.
As an aside, it is a mistake to use packed records for OpenCV structs. If you take a look at the C header files you will see that these structs are not packed. This is benign in the case of CvLineIterator since it has no padding, but you will get caught out somewhere down the line if you get into the bad habit of packing structs that should not be packed.

Why is calling my C code from F# very slow (compared to native)?

So I wrote some numerical code in C but wanted to call it from F#. However it runs incredibly slowly.
gcc -O3 : 4 seconds
gcc -O0 : 30 seconds
fsharp code which calls the optimised gcc code: 2 minutes 30 seconds.
For reference, the c code is
int main(int argc, char** argv)
float* dmats = malloc(sizeof(float) * factor*factor);
MakeDmat(1.4,-1.92,dmats); //dmat appears to be correct
float* arr1 = malloc(sizeof(float)*xsize*ysize);
float* arr2 = malloc(sizeof(float)*xsize*ysize);
for (int i = 0;i < 10000;i++)
if (i==9999) {print(arr1,xsize,ysize);};
return 0;
I left out the implementation of the functions. The F# code I am using is
open System.Runtime.InteropServices
open Microsoft.FSharp.NativeInterop
[<DllImport("a.dll")>] extern void main (int argc, char* argv)
[<DllImport("a.dll")>] extern void setvals (int _xsize, int _ysize, int _distlimit,float _tau,float _Iex)
[<DllImport("a.dll")>] extern void MakeDmat(float We,float Wi, float*arr)
[<DllImport("a.dll")>] extern void randinit(float* arr)
[<DllImport("a.dll")>] extern void print(float* arr)
[<DllImport("a.dll")>] extern void evolve (float* input, float* output,float* connections)
let dlimit,xsize,ysize = 15,100,100
let factor = (2*dlimit)+1
let dmat = Array.zeroCreate (factor*factor)
let arr1 = Array.zeroCreate (xsize*ysize)
let arr2 = Array.zeroCreate (xsize*ysize)
let addr1 = &&arr1.[0]
let addr2 = &&arr2.[0]
let dmataddr = &&dmat.[0]
[0..10000] |> List.iter (fun _ ->
The F# code is compiled with optimisations on.
Is the mono interface for calling C code really that slow (almost 8ms of overhead per function call) or am I just doing something stupid?
It looks like part of the problem is that you are using float on both the F# and C side of the PInvoke signature. In F# float is really System.Double and hence is 8 bytes. In C a float is generally 4 bytes.
If this were running under the CLR I would expect you to see a PInvoke stack unbalanced error during debugging. I'm not sure if Mono has similar checks or not. But it's possible this is related to the problem you're seeing.

How to solve CUDA Thrust library - for_each synchronization error?

I'm trying to modify a simple dynamic vector in CUDA using the thrust library of CUDA. But I'm getting "launch_closure_by_value" error on the screen indicatiing that the error is related to some synchronization process.
A simple 1D dynamic array modification is not possible due to this error.
My code segment which is causing the error is as follows.
from a .cpp file I call setIndexedGrid, which is defined in System.cu
float* a= (float*)(malloc(8*sizeof(float)));
a[0]= 0; a[1]= 1; a[2]= 2; a[3]= 3; a[4]= 4; a[5]= 5; a[6]= 6; a[7]= 7;
float* b = (float*)(malloc(8*sizeof(float)));
The code segment at System.cu:
setIndexedGridInfo(float* a, float*b)
thrust::device_ptr<float> d_oldData(a);
thrust::device_ptr<float> d_newData(b);
float c = 0.0;
grid_functor is defined in _kernel.cu
struct grid_functor
float a;
__host__ __device__
grid_functor(float grid_Info) : a(grid_Info) {}
template <typename Tuple>
void operator()(Tuple t)
volatile float data = thrust::get<0>(t);
float pos = data + 0.1;
thrust::get<1>(t) = pos;
I also get these on the Output window (I use Visual Studio):
First-chance exception at 0x000007fefdc7cacd in Particles.exe:
Microsoft C++ exception: cudaError_enum at memory location
0x0029eb60.. First-chance exception at 0x000007fefdc7cacd in
smokeParticles.exe: Microsoft C++ exception:
thrust::system::system_error at memory location 0x0029ecf0.. Unhandled
exception at 0x000007fefdc7cacd in Particles.exe: Microsoft C++
exception: thrust::system::system_error at memory location
What is causing the problem?
You are trying to use host memory pointers in functions expecting pointers in device memory. This code is the problem:
float* a= (float*)(malloc(8*sizeof(float)));
a[0]= 0; a[1]= 1; a[2]= 2; a[3]= 3; a[4]= 4; a[5]= 5; a[6]= 6; a[7]= 7;
float* b = (float*)(malloc(8*sizeof(float)));
thrust::device_ptr<float> d_oldData(a);
thrust::device_ptr<float> d_newData(b);
The thrust::device_ptr is intended for "wrapping" a device memory pointer allocated with the CUDA API so that thrust can use it. You are trying to treat a host pointer directly as a device pointer. That is illegal. You could modify your setIndexedGridInfo function like this:
void setIndexedGridInfo(float* a, float*b, const int n)
thrust::device_vector<float> d_oldData(a,a+n);
thrust::device_vector<float> d_newData(b,b+n);
float c = 0.0;
The device_vector constructor will allocate device memory and then copy the contents of your host memory to the device. That should fix the error you are seeing, although I am not sure what you are trying to do with the for_each iterator and whether the functor you have wrttien is correct.
Here is a complete, compilable, runnable version of your code:
#include <cstdlib>
#include <cstdio>
#include <thrust/device_vector.h>
#include <thrust/for_each.h>
#include <thrust/copy.h>
struct grid_functor
float a;
__host__ __device__
grid_functor(float grid_Info) : a(grid_Info) {}
template <typename Tuple>
void operator()(Tuple t)
volatile float data = thrust::get<0>(t);
float pos = data + 0.1f;
thrust::get<1>(t) = pos;
void setIndexedGridInfo(float* a, float*b, const int n)
thrust::device_vector<float> d_oldData(a,a+n);
thrust::device_vector<float> d_newData(b,b+n);
float c = 0.0;
thrust::copy(d_newData.begin(), d_newData.end(), b);
int main(void)
const int n = 8;
float* a= (float*)(malloc(n*sizeof(float)));
a[0]= 0; a[1]= 1; a[2]= 2; a[3]= 3; a[4]= 4; a[5]= 5; a[6]= 6; a[7]= 7;
float* b = (float*)(malloc(n*sizeof(float)));
for(int i=0; i<n; i++) {
fprintf(stdout, "%d (%f,%f)\n", i, a[i], b[i]);
return 0;
I can compile and run this code on an OS 10.6.8 host with CUDA 4.1 like this:
$ nvcc -Xptxas="-v" -arch=sm_12 -g -G thrustforeach.cu
./thrustforeach.cu(18): Warning: Cannot tell what pointer points to, assuming global memory space
./thrustforeach.cu(20): Warning: Cannot tell what pointer points to, assuming global memory space
./thrustforeach.cu(18): Warning: Cannot tell what pointer points to, assuming global memory space
./thrustforeach.cu(20): Warning: Cannot tell what pointer points to, assuming global memory space
ptxas info : Compiling entry function '_ZN6thrust6detail7backend4cuda6detail23launch_closure_by_valueINS2_18for_each_n_closureINS_12zip_iteratorINS_5tupleINS0_15normal_iteratorINS_10device_ptrIfEEEESB_NS_9null_typeESC_SC_SC_SC_SC_SC_SC_EEEEi12grid_functorEEEEvT_' for 'sm_12'
ptxas info : Used 14 registers, 160+0 bytes lmem, 16+16 bytes smem, 4 bytes cmem[1]
ptxas info : Compiling entry function '_ZN6thrust6detail7backend4cuda6detail23launch_closure_by_valueINS2_18for_each_n_closureINS_12zip_iteratorINS_5tupleINS0_15normal_iteratorINS_10device_ptrIfEEEESB_NS_9null_typeESC_SC_SC_SC_SC_SC_SC_EEEEj12grid_functorEEEEvT_' for 'sm_12'
ptxas info : Used 14 registers, 160+0 bytes lmem, 16+16 bytes smem, 4 bytes cmem[1]
$ ./a.out
0 (0.000000,0.100000)
1 (1.000000,1.100000)
2 (2.000000,2.100000)
3 (3.000000,3.100000)
4 (4.000000,4.100000)
5 (5.000000,5.100000)
6 (6.000000,6.100000)
7 (7.000000,7.100000)
