pointers, dynamic memo allocat for bidimensional arrays A sample - memory

Well, here is a full sample that works but the console vanishes right after the last print and i cant make it stay. Also there are a few queries that I include in some lines
//bidimensional array dynamic memory allocation
#include <stdio.h>
#include <stdlib.h>
void main()
int **p; // pointer to pointer
int n,m,i,j,k; // n is rows, m is cols, i and j are the indexes of the array, k is going to be like m, but used to print out
printf("\n how many rows?");
scanf ("\%d", &n);
while (n <= 0);
// booking memory for an array of n elements, each element is a pointer to an int (int *)
//Query: a pointer to an int? wouldnt it be a pointer to a pointer ? It uses **
p = (int **) malloc (n * sizeof(int *)); //
if(p == NULL)
printf("Insuficient memory space");
exit( -1);
for (i = 0; i < n; i++) // now lets tell each row how many cols it is going to have
printf("\n\nNumber of cols of the row%d :", i+1); // for each row it can be different
scanf("%d", &m); // tell how many cols
p[i] = (int*)malloc(m * sizeof(int)); // we allocate a number of bytes equal to datatype times the number of cols per row
/Query: I cant grasp the p[i] because if p was a pointer to a pointer, what is that array notation, i mean the square brackets/
if(p[i] == NULL)
{ printf("Insuficient memory space");
for (j=0;j<m;j++)
printf("Element[%d][%d]:", i+1,j+1);
scanf("%d",&p[i][j]); // reading through array notation
printf("\n elements of row %d:\n", i+1);
for (k = 0; k < m; k++)
// printing out array elements through pointer notation
printf("%d ", *(*(p+i)+k));
// freeing up memory assigned for each row
for (i = 0; i < n; i++)
free(p);// freeing up memory for the pointers matrix
getchar(); // it cannot stop the console from vanishing
fflush(stdin); // neither does this
// ********thanks a lot******

it's easy to understand pointers in context of arrays.
So if int * p is the one-dimensional array of int, then int ** p will be two -dimensional array of int. In other words it is an array that containt a pointers to one-dimensional array.
so p = (int **) malloc (n * sizeof(int *)); // is a pointer to the pointer
and p[i] is current pointer to the int.


Two coloring Breadth-First Search

In a standard BFS implementation, a node can be one of three colors to represent if it is undiscovered, discovered but incomplete, or discovered and completed. Is there a way to implement BFS using only two colors instead of three?
Yes, you can represent it with only two colors. Actually , in probably 99% of problems, you don't need a third color. You need to have an answer to: Is the node X in queue or not?
To answer that question we need to have an array. Let's say we call that array visited.
Values that this array can have are, 0 or 1.
visited[X] = 1, if the node X is in queue(node X is waiting to be processed) or the node is was in queue(which means node X is currently being processed, or was processed and we are done with that node)
visited[X] = 0, if the node X was not yet in queue
Here is a code:
#include <cstdio>
#include <cstdlib>
#include <vector>
#include <queue>
using namespace std;
const int N = 10003;
char visited[N];
queue<int> q;
vector<int> ls[N];
int main() {
int n, m; scanf("%d%d", &n, &m); // number of vertices and number of edges
for(int i = 0; i < m; ++i) {
int x, y; scanf("%d%d", &x, &y);
int num_of_components = 0;
for(int i = 1; i <= n; ++i) // iterating through all nodes
if(!visited[i]) { // if we didn't visit node i , then this is a new component
visited[i] = '1'; // mark node as visited
q.push(i); // push node to queue
++num_of_components; // new component was found, so add one
while(!q.empty()) {
int x = q.front();
int sz = ls[x].size();
for(int j = 0; j < sz; ++j) {
int y = ls[x][j];
if(!visited[y]) {
visited[y] = '1';
printf("%d\n", num_of_components);
return 0;

Storing functions in an array and applying them to an array of numbers

I've prototyped an algorithm for my iOS game in Python, and I need to rewrite in in ObjC. Basically, I have a board of 16 numbers, and I want to loop through every number three times and the four functions I'm using (add, subtract, multiply, exponentiate). 1+2+3, 2*3-4, 3^4-5, 9-4^3, etc., but without order of operations (first operation is always done first).
What I would like is an overview of how this might be implemented in Objective-C. Specifically, what is the equivalent of an array of functions in Objective-C? Is there an easy way to implement it with selectors? What's the best structure to use for loops with numbers? Array of NSIntegers, array of ints, NSArray/NSMutableArray of NSNumbers?
import random as rand
min = 0
max = 9
max_target = 20
maximum_to_calculate = 100
def multiply(x, y):
return x * y
def exponate(x, y):
return x ** y
def add(x, y):
return x + y
def subtract(x, y):
return x - y
function_array = [multiply, exponate, add, subtract]
board = [rand.randint(min, max) for i in xrange(0, 16)]
dict_of_frequencies = {}
for a in board:
for b in board:
for first_fun in function_array:
first_result = first_fun(a, b)
for c in board:
for second_fun in function_array:
final_result = second_fun(first_result, c)
if final_result not in dict_of_frequencies:
dict_of_frequencies[final_result] = 0
dict_of_frequencies[final_result] += 1
The most convenient way in Objective-C to construct an array of functions would be to use Blocks:
typedef NSInteger (^ArithmeticBlock)(NSInteger, NSInteger);
ArithmeticBlock add = ^NSInteger (NSInteger x, NSInteger y){
return x + y;
ArithmeticBlock sub = ^NSInteger (NSInteger x, NSInteger y){
return x - y;
NSArray * operations = #[add, sub];
Since there's no great way to perform arithmetic on NSNumbers, it would probably be best to create and store the board's values as primitives, such as NSIntegers, in a plain C array. You can box them up later easily enough, if necessary -- #(boardValue) gives you an NSNumber.
If you want to do it with straight C function pointers, something like this will do it:
#include <stdio.h>
#include <math.h>
long add(int a, int b) {
return a + b;
long subtract(int a, int b) {
return a - b;
long multiply(int a, int b) {
return a * b;
long exponate(int a, int b) {
return pow(a, b);
int main(void) {
long (*mfunc[4])(int, int) = {add, subtract, multiply, exponate};
char ops[4] = {'+', '-', '*', '^'};
for ( int i = 0; i < 4; ++i ) {
printf("5 %c 9 = %ld\n", ops[i], mfunc[i](5, 9));
return 0;
and gives the output:
paul#MacBook:~/Documents/src$ ./rndfnc
5 + 9 = 14
5 - 9 = -4
5 * 9 = 45
5 ^ 9 = 1953125
Function pointer syntax can be slightly convoluted. long (*mfunc[4])(int, int) basically translates to defining a four-element array, called mfunc, of pointers to functions returning long and taking two arguments of type int.
Maddy is right. Anyway, I'll give it a try just for the fun of it.
This has never seen a compiler. So please forgive me all the typos and minor syntax errors in advance.
#include <stdlib.h>
const int MIN = 0;
const int MAX = 9;
const int MAX_TARGET = 20;
const int MAX_TO_CALCULATE = 100;
- (int) multiply:(int)x with:(int)y { return x * y; }
- (int) exponate:(int)x with:(int)y { return x ^ y; }
- (int) add:(int)x to:(int)y { return x + y; }
- (int) substract:(int)x by:(int)y { return x - y; }
// some method should start here, probably with
-(void) someMethod {
NSArray *functionArray = [NSArray arrayWithObjects: #selector(multiply::), #selector(exponate::), #selector(add::), #substract(multiply::), nil]; // there are other ways of generating an array of objects
NSMutableArray *board = [NSMutableArray arrayWithCapacity:16]; //Again, there are other ways available.
for (int i = 0; i < 16; i++) {
[board addObject:#(arc4random() % (MAX-MIN) + MIN)];
NSMutableDictionary dictOfFrequencies = [[NSMutableDictionary alloc] init];
for (NSNumber a in board)
for (NSNumber b in board)
for (SEL firstFun in functionArray) {
NSNumber firstResult = #([self performSelector:firstFun withObject:a withObject:b]);
NSNumber countedResults = [dictOfFrequencies objectForKey:firstResult];
if (countedResults) {
[dictOfFrequencies removeObjectForKey:firstResult];
countedResults = #(1 + [countedResults intValue]);
} else {
countedResults = #1; // BTW, using the # followed by a numeric expression creates an NSNumber object with the value 1.
[dictOfFrequencies setObject:countedResults forKey:firstResult];
Well, let me add some comments before others do. :-)
There is no need for objective c. You python code is iterative therefore you can implement it in plain C. Plain C is available where ever Objective C is.
If you really want to go for Objective-C here then you should forget your python code and implement the same logic (aiming for the same result) in Objective-C in an OOP style. My code really tries to translate your code as close as possible. Therefore my code is far far away from neither beeing good style nor maintainable nor proper OOP. Just keep that in mind before you think, ObjC was complicated compared to python :-)

cudaFree is not freeing memory

The code below calculates the dot product of two vectors a and b. The correct result is 8192. When I run it for the first time the result is correct. Then when I run it for the second time the result is the previous result + 8192 and so on:
1st iteration: result = 8192
2nd iteration: result = 8192 + 8192
3rd iteration: result = 8192 + 8192
and so on.
I checked by printing it on screen and the device variable dev_c is not freed. What's more writing to it causes something like a sum, the result beeing the previous value plus the new one being written to it. I guess that could be something with the atomicAdd() operation, but nonetheless cudaFree(dev_c) should erase it after all.
#define N 8192
#include <stdio.h>
__global__ void dot( int *a, int *b, int *c ) {
__shared__ int temp[THREADS_PER_BLOCK];
int index = threadIdx.x + blockIdx.x * blockDim.x;
temp[threadIdx.x] = a[index] * b[index];
if( 0 == threadIdx.x ) {
int sum = 0;
for( int i= 0; i< THREADS_PER_BLOCK; i++ ){
sum += temp[i];
int main( void ) {
int *a, *b, *c;
int *dev_a, *dev_b, *dev_c;
int size = N * sizeof( int);
cudaMalloc( (void**)&dev_a, size );
cudaMalloc( (void**)&dev_b, size );
cudaMalloc( (void**)&dev_c, sizeof(int));
a = (int*)malloc(size);
b = (int*)malloc(size);
c = (int*)malloc(sizeof(int));
for(int i = 0 ; i < N ; i++){
a[i] = 1;
b[i] = 1;
cudaMemcpy( dev_a, a, size, cudaMemcpyHostToDevice);
cudaMemcpy( dev_b, b, size, cudaMemcpyHostToDevice);
dot<<< N/THREADS_PER_BLOCK,THREADS_PER_BLOCK>>>( dev_a, dev_b, dev_c);
cudaMemcpy( c, dev_c, sizeof(int) , cudaMemcpyDeviceToHost);
printf("Dot product = %d\n", *c);
return 0;
cudaFree doesn't erase anything, it simply returns memory to a pool to be re-allocated. cudaMalloc doesn't guarantee the value of memory that has been allocated. You need to initialize memory (both global and shared) that your program uses, in order to have consistent results. The same is true for malloc and free, by the way.
From the documentation of cudaMalloc();
The memory is not cleared.
That means that dev_c is not initialized, and your atomicAdd(c,sum); will add to any random value that happens to be stored in memory at the returned position.

Cuda-memcheck not reporting out of bounds shared memory access

I am runnig the follwoing code using shared memory:
__global__ void computeAddShared(int *in , int *out, int sizeInput){
//not made parameters gidata and godata to emphasize that parameters get copy of address and are different from pointers in host code
extern __shared__ float temp[];
int tid = blockIdx.x * blockDim.x + threadIdx.x;
int ltid = threadIdx.x;
temp[ltid] = 0;
while(tid < sizeInput){
temp[ltid] += in[tid];
tid+=gridDim.x * blockDim.x; // to handle array of any size
int offset = 1;
while(offset < blockDim.x){
if(ltid % (offset * 2) == 0){
temp[ltid] = temp[ltid] + temp[ltid + offset];
if(ltid == 0){
out[blockIdx.x] = temp[0];
int main(){
int size = 16; // size of present input array. Changes after every loop iteration
int cidata[] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16};
/*FILE *f;
f = fopen("invertedList.txt" , "w");
a[0] = 1 + (rand() % 8);
fprintf(f, "%d,",a[0]);
for( int i = 1 ; i< N; i++){
a[i] = a[i-1] + (rand() % 8) + 1;
fprintf(f, "%d,",a[i]);
int* gidata;
int* godata;
cudaMalloc((void**)&gidata, size* sizeof(int));
cudaMemcpy(gidata,cidata, size * sizeof(int), cudaMemcpyHostToDevice);
int TPB = 4;
int blocks = 10; //to get things kicked off
cudaEvent_t start, stop;
cudaEventRecord(start, 0);
while(blocks != 1 ){
if(size < TPB){
TPB = size; // size is 2^sth
blocks = (size+ TPB -1 ) / TPB;
cudaMalloc((void**)&godata, blocks * sizeof(int));
computeAddShared<<<blocks, TPB,TPB>>>(gidata, godata,size);
gidata = godata;
size = blocks;
//printf("The error by cuda is %s",cudaGetErrorString(cudaGetLastError()));
cudaEventRecord(stop, 0);
float elapsedTime;
cudaEventElapsedTime(&elapsedTime , start, stop);
printf("time is %f ms", elapsedTime);
int *output = (int*)malloc(sizeof(int));
cudaMemcpy(output, gidata, sizeof(int), cudaMemcpyDeviceToHost);
//Cant free either earlier as both point to same location
cudaError_t chk = cudaFree(godata);
printf("First chk also printed error. Maybe error in my logic\n");
printf("The error by threadsyn is %s", cudaGetErrorString(cudaGetLastError()));
printf("The sum of the array is %d\n", output[0]);
return 0;
Clearly, the first while loop in computeAddShared is causing out of bounds error because I am allocating 4 bytes to shared memory. Why does cudamemcheck not catch this. Below is the output of cuda-memcheck
time is 12.334816 msThe error by threadsyn is no errorThe sum of the array is 13
========= ERROR SUMMARY: 0 errors
Shared memory allocation granularity. The Hardware undoubtedly has a page size for allocations (probably the same as the L1 cache line side). With only 4 threads per block, there will "accidentally" be enough shared memory in a single page to let you code work. If you used a sensible number of threads block (ie. a round multiple of the warp size) the error would be detected because there would not be enough allocated memory.

Accessing value at row,col in a Matrix

I'm trying to access a specific row in a matrix but am having a hard time doing so.
I want to get the value at row j, column i but I don't think my algorithm is correct. I'm using OpenCV's Mat for my matrix and accessing it through the data member.
Here is how I am attempting to access values:
plane.data[i + j*plane.rows]
Where i = the column, j = the row. Is this correct? The Matrix is 1 plane from a YUV matrix.
Any help would be appreciated! Thanks.
No, your are wrong
plane.data[i + j*plane.rows] is not a good way to access pixel. Your pointer must depend on type of the matrix and its depth.
You should use at() operator of the matrix.
To make it simple here is a code sample which access each pixel of a matrix and prints it. It works almost for every matrix type and for any number of channels:
void printMat(const Mat& M){
switch ( (M.dataend-M.datastart) / (M.cols*M.rows*M.channels())){
case sizeof(char):
printMatTemplate<unsigned char>(M,true);
case sizeof(float):
case sizeof(double):
template <typename T>
void printMatTemplate(const Mat& M, bool isInt = true){
if (M.empty()){
printf("Empty Matrix\n");
if ((M.elemSize()/M.channels()) != sizeof(T)){
printf("Wrong matrix type. Cannot print\n");
int cols = M.cols;
int rows = M.rows;
int chan = M.channels();
char printf_fmt[20];
if (isInt)
if (chan > 1){
// Print multi channel array
for (int i = 0; i < rows; i++){
for (int j = 0; j < cols; j++){
const T* Pix = &M.at<T>(i,j);
for (int c = 0; c < chan; c++){
else {
// Single channel
for (int i = 0; i < rows; i++){
const T* Mi = M.ptr<T>(i);
for (int j = 0; j < cols; j++){
I do not think there is anything different between accessing RGB Mat and YUV Mat. Its just the colorspace different.
Please refer to http://opencv.willowgarage.com/wiki/faq#Howtoaccessmatrixelements.3F on how to access each pixel.
