Quickly dumping large tables passed from Lua to C - lua

In order to quickly save Lua tables containing large 1-dimensional arrays (the number of arrays is known however the number of elements isn't fixed. approximately 800,000 elements in each array), I planned to use Lua C binding in the following way-
#include "lua.h"
#include "lauxlib.h"
#include <stdio.h>
#include <assert.h>
static int save_table(lua_State *L) {
assert(L && lua_type(L, -1) == LUA_TTABLE);
int len, r;
void *ptr;
FILE *f;
lua_pushstring(L, "p");
lua_gettable(L, -2);
len = lua_objlen(L, -1);
ptr = lua_topointer(L, -1);
f = fopen("p.bin", "wb");
r = fwrite(ptr, sizeof(int), len, f);
printf("[p] wrote %d elements out of %d requested\n", r, len);
lua_pop(L, 1);
lua_pushstring(L, "q");
lua_gettable(L, -2);
len = lua_objlen(L, -1);
ptr = lua_topointer(L, -1);
f = fopen("q.bin", "wb");
r = fwrite(ptr, sizeof(float), len, f);
printf("[q] wrote %d elements out of %d requested\n", r, len);
lua_pop(L, 1);
return 1;
int luaopen_savetable(lua_State *L) {
static const luaL_reg Map[] = {{"save_table", save_table}, {NULL, NULL}};
luaL_register(L, "mytask", Map);
return 1;
Lua code is shown below-
-- sample table containg two 1-d array
my_table = {p = {11, 22, 33, 44}, q = {0.12, 0.23, 0.34, 0.45, 0.56}}
require "savetable"
The above code produces two binary files with the wrong content. What is wrong here?
PS: I am using Lua 5.1. I am not sure if this is the fastest way of dumping large Lua tables. Suggestions are always welcome.


Clang memory allocation

Could anyone please help me understand why Clang reallocates the same memory address for different variables while their lifetimes intersect?
I am using a sample program (below) to show the problem.
When I compile the program with clang -O0, variable j in function ok has the same memory address as variable solutions in function nqueens.
Function ok is called inside function nqueens, which means that the lifetime of the variables intersect; the same stack space cannot be used/reused for both functions.
Compiling the program with gcc or clang at -O1, however, they are assigned different memory addresses.
Any help is appreciated!
#include <stdlib.h>
#include <stdio.h>
#include <memory.h>
#include <alloca.h>
/* Checking information */
static int solutions[] = {
10, /* 5 */
724, /* 10 */
#define MAX_SOLUTIONS sizeof(solutions)/sizeof(int)
int total_count;
int sharedVar = 0;
int ok(int n, char *a)
int i, j;
char p, q;
printf("jjjjjjjjj: %d, %p\n", n,&j);
for (i = 0; i < n; i++) {
p = a[i];
for (j = i + 1; j < n; j++) {
q = a[j];
if (q == p || q == p - (j - i) || q == p + (j - i))
return 0;
return 1;
void nqueens (int n, int j, char *a, int *solutions)
int i,res;
sharedVar = sharedVar * j - n;
if (n == j) {
/* good solution, count it */
*solutions = 1;
printf("solutions: %d, %p\n", j, &solutions);
*solutions = 0;
/* try each possible position for queen <j> */
for (i = 0; i < n; i++) {
a[j] = (char) i;
if (ok(j + 1, a)) {
nqueens(n, j + 1, a,&res);
*solutions += res;
int main()
int size = 3;
char *a;
// printf("total_count: %p\n", &total_count);
a = (char *)alloca(size * sizeof(char));
printf("Computing N-Queens algorithm (n=%d) ", size);
sharedVar = -5;
nqueens(size, 0, a, &total_count);
printf("sharedVar: %d\n", sharedVar);

Extracting Hough lines intersection coordinate and getting the data to Notepad or Excel

I need help to get the coordinates of the lines that HoughLines produced and extract it to an output file (Notepad, Excel, or any other output files).
I managed to obtain the lines and based on my research on this site I found a post that tells how to obtain the coordinates, however due to my limited understanding I could not get the code to run along my original Hough code and get the intersection points coordinate onto an output file.
Here is my original Hough code:
#pragma once
#include <C:\OpenCV2.2\include\opencv\cv.h>
#include <C:\OpenCV2.2\include\opencv\highgui.h>
#include <C:\OpenCV2.2\include\opencv2\core\core.hpp>
#include <C:\OpenCV2.2\include\opencv2\imgproc\imgproc.hpp>
#include <C:\OpenCV2.2\include\opencv2\highgui\highgui.hpp>
#include <stdio.h>
#include <math.h>
#include <opencv2/opencv.hpp>
#include <iostream>
using namespace std;
using namespace cv;
int main(int argc, char* argv[])
cv::Mat dst_img, gray_img, contour_img, contrast_img;
cv::Mat src_img = cv::imread("C:\\Frame-1.bmp"); //Source image path
dst_img = src_img.clone();
dst_img.convertTo(contrast_img, -1, 1.5, 0);
cv::cvtColor(contrast_img, gray_img, CV_BGR2GRAY);
cv::Canny(gray_img, contour_img, 75, 225, 3);
vector<Vec2f> lines_;
HoughLines(contour_img, lines_, 1, CV_PI/180, 200);
for( size_t i = 0; i < lines_.size(); i++ )
float rho = lines_[i][0];
float theta = lines_[i][1];
double a = cos(theta), b = sin(theta);
double x0 = a*rho, y0 = b*rho;
Point pt1(cvRound(x0 + 1000*(-b)),
cvRound(y0 + 1000*(a)));
Point pt2(cvRound(x0 - 1000*(-b)),
cvRound(y0 - 1000*(a)));
cv::clipLine(gray_img.size(), pt1, pt2);
line( dst_img, pt1, pt2, Scalar(0, 0, 255), 1, CV_AA);
cv::imwrite("result.bmp", dst_img);
namedWindow("My Image");
imshow("My Image", dst_img);
return 0;
And here is the link to the code that I wanted to put into my original code:
I am struck at finding the point of intersection of most lines in an image
Right now my original code draws Houghlines and exports the image (as result.bmp) and at the same time displays the image on a new window.
I just need to figure how and where to put the new code plus an additional code to obtain the raw data of the coordinates onto an output file like Notepad, most desirably in the same folder as result.bmp (the name of the output file could be anything, just needed it to be there).
Sorry if this question sounds like a beginner`s question (I really am) and any help is much appreciated. Many thanks in advance.
Additional information: I am using OpenCV 2.2 and Microsoft Visual Studio Academic 2010
EDIT: This is all three codes (Hough, Coordinate extraction, and Exporting data to notepad) but as a complete beginner I don`t know to make them all work in a single code.
#pragma once
#include <C:\OpenCV2.2\include\opencv\cv.h>
#include <C:\OpenCV2.2\include\opencv\highgui.h>
#include <C:\OpenCV2.2\include\opencv2\core\core.hpp>
#include <C:\OpenCV2.2\include\opencv2\imgproc\imgproc.hpp>
#include <C:\OpenCV2.2\include\opencv2\highgui\highgui.hpp>
#include <stdio.h>
#include <math.h>
#include <opencv2/opencv.hpp>
#include <iostream>
#define PointMinusPoint(P,Q,R) {(P).x = (Q).x - (R).x; (P).y = (Q).y - (R).y;}
#define PointCross(P,Q) (((P).x*(Q).y)-((P).y*(Q).x))
#define SIGN(X) (((X)>=0)? 1:-1 )
#define ABS(a) ((a) >= 0 ? (a) : (-(a)))
#define ROUND(a) ((SIGN(a)) * ( ( int )( ABS(a) + 0.5 ) ) )
typedef struct{
int x,y;
typedef struct {
MYintPOINT pStart;
MYintPOINT pEnd;
} MyLine;
using namespace std;
using namespace cv;
int main(int argc, char* argv[])
cv::Mat dst_img, gray_img, contour_img, contrast_img;
cv::Mat src_img = cv::imread("C:\\Frame-1.bmp"); //Source image path
dst_img = src_img.clone();
dst_img.convertTo(contrast_img, -1, 1.5, 0);
cv::cvtColor(contrast_img, gray_img, CV_BGR2GRAY);
cv::Canny(gray_img, contour_img, 75, 225, 3);
vector<Vec2f> lines_;
HoughLines(contour_img, lines_, 1, CV_PI/180, 200);
for( size_t i = 0; i < lines_.size(); i++ )
float rho = lines_[i][0];
float theta = lines_[i][1];
double a = cos(theta), b = sin(theta);
double x0 = a*rho, y0 = b*rho;
Point pt1(cvRound(x0 + 1000*(-b)),
cvRound(y0 + 1000*(a)));
Point pt2(cvRound(x0 - 1000*(-b)),
cvRound(y0 - 1000*(a)));
cv::clipLine(gray_img.size(), pt1, pt2);
line( dst_img, pt1, pt2, Scalar(0, 0, 255), 1, CV_AA);
cv::imwrite("result.bmp", dst_img);
int findLinesIntersectionPoint(const MyLine*l1, const MyLine*l2, MYintPOINT *res){
MYintPOINT p = l1->pStart;
MYintPOINT dp;
MYintPOINT q = l2->pStart;
MYintPOINT dq;
MYintPOINT qmp; // q-p
int dpdq_cross; // 2 cross products
int qpdq_cross; // dp with dq, q-p with dq
float a;
dpdq_cross = PointCross(dp,dq);
if (!dpdq_cross){
// Perpendicular Lines
return 0;
qpdq_cross = PointCross(qmp,dq);
a = (qpdq_cross*1.0f/dpdq_cross);
res->x = ROUND(p.x+a*dp.x);
res->y = ROUND(p.y+a*dp.y);
return 1;
string FileName= FileName_S.c_str();
string::size_type Extension = FileName_S.find_last_of('.'); // Find extension point
Mat mInputImg;
mInputImg= imread(FileName_S,1);
Size szInput= mInputImg.size();
const string DestinationFileName = FileName_S.substr(0, Extension) + "_ImageData.csv"; // Form the new name with container
ofstream myfile (DestinationFileName.c_str());
if (!myfile.is_open())
MessageBox(L"Unable to Open File");
string Text= format("Row, Col , Pixel Data,\n");
myfile << Text;
for (int Row = 0; Row < szInput.height; Row++)
for (int Col = 0; Col < szInput.width; Col++)
string Text= format("%d , %d , %d",Row,Col,mInputImg.at<uchar>(Row,Col));
myfile << Text;
myfile << "\n";
namedWindow("My Image");
imshow("My Image", dst_img);
return 0;
Cannot add 4 numbers to mat opencv

I created a mat object in opencv , dimension Nx4 , in which I want to put N coordinates.
[Px Py 1 0]
[Py Px 0 1]
For this I wrote the following code :
vector<Point2f> features1 , features2;
Mat features_1;
for(int i = 0 , j = 0; i < feature1.size() ; ++i , j+=2)
features_1.at<Vec3d>(j) = {feature1[i].x , feature1[i].y , 1 , 0};
features_1.at<Vec3d>(j) = {feature1[i].y , -feature1[i].x , 0 , 1};
But at the first line of the loop I get the following error :
cv::Matx<_Tp, m, n>::Matx(_Tp, _Tp, _Tp, _Tp) [with _Tp = double; int m = 3; int n = 1]: Assertion channels >= 4' failed.
How can I solve this?
it's probably easier to push_back() single elements to a Mat, and later do a reshape().
this will make a 2*N x 4 Mat:
vector<Point2f> features1 , features2;
Mat features_1;
for(size_t i=0; i<feature1.size() ; ++i)
features_1 = features_1.reshape(1,4); // or reshape(1,features1.size()*2); //for 4xN*2
Where you wrote <Vec3d>, did you mean <Vec4d>?
Your code is not close to correct, but here is a minimal example of how your desired result can be achieved (if you don't have C++11, just use an old style iterator in the loop)
#include <iostream>
#include <vector>
using namespace std;
#include <opencv2/core/core.hpp>
using namespace cv;
int main()
vector<Point2f> feature1;
feature1.push_back(Point2f(1.0f, 2.0f));
feature1.push_back(Point2f(3.0f, 4.0f));
Mat features_1;
for(auto p : feature1)
features_1.push_back(Vec4d(p.x, p.y , 1 , 0));
features_1.push_back(Vec4d(p.y, -p.x , 0, 1));
// At this point the mat is (N, 1, CV_64FC4)
// Reshape if you want (N, 4, CV_64FC1);
features_1 = features_1.reshape(1,4);
cout << features_1;
This output of this is:
[1, 2, 1, 0;
2, -1, 0, 1;
3, 4, 1, 0;
4, -3, 0, 1]
However, if the features_1 is already allocated and you can't just use push_back, apart from releasing the mat, you could do this (assuming features_1.isContinuous() is true):
#include <iostream>
#include <vector>
using namespace std;
#include <opencv2/core/core.hpp>
using namespace cv;
int main()
vector<Point2f> feature1;
feature1.push_back(Point2f(1.0f, 2.0f));
feature1.push_back(Point2f(3.0f, 4.0f));
Mat features_1(2 * feature1.size(), 4, CV_64FC1);
Vec4d* ptr = features_1.ptr<Vec4d>(0);
for(auto p : feature1)
*ptr++ = Vec4d(p.x, p.y , 1 , 0);
*ptr++ = Vec4d(p.y, -p.x , 0, 1);
cout << features_1 << endl;
The above produces the same result as the previous version.
If there is any possibility in the above example that features_1.isContinuous() is false, you can use an iterator to scan features_1:
auto ptr = features_1.begin<double>();
for(auto p : feature1)
*ptr++ = p.x;
*ptr++ = p.y;
*ptr++ = 1;
*ptr++ = 0;
*ptr++ = p.y;
*ptr++ = -p.x;
*ptr++ = 0;
*ptr++ = 1;

OpenCL matrix multiplication

I'm a beginner in OpenCL. And I've been trying to write a matrix multiplication code.
It works fine only it gives garbage value as the output for C array. I'm unable to fix the error.
Any help will be much appreciated.
Here's is the host and kernel code.
#include <CL/cl.h>
#include <iostream>
#include <cstdio>
#include <fstream>
#include <stdlib.h>
#include <assert.h>
#include <string.h>
using namespace std;
#define SUCCESS 0
#define FAILURE 1
// Function to convert file name into a string
int convertToString(const char *filename, std::string &s)
size_t size;
char *str;
std::fstream f(filename, (std::fstream::in | std::fstream::binary));
if (f.is_open())
size_t fileSize;
f.seekg(0, std::fstream::end);
size = fileSize = (size_t)f.tellg();
f.seekg(0, std::fstream::beg);
str = new char[size + 1];
if (!str)
return 0;
f.read(str, fileSize);
str[size] = '\0';
s = str;
delete[] str;
return 0;
cout << "Error: failed to open file\n:" << filename << endl;
return FAILURE;
int main()
cl_uint status;
cl_int *error;
int A[9] = {1, 1, 1, 1, 1, 1, 1, 1, 1};
int B[9] = {2, 2, 2, 2, 2, 2, 2, 2, 2};
int C[9] = {0, 0, 0, 0, 0, 0, 0, 0, 0};
// Setting up platforms
cl_platform_id platform = NULL;
cl_uint numPlatforms = 0;
// Getting no of platforms
status = clGetPlatformIDs(0, NULL, &numPlatforms);
if (status != CL_SUCCESS)
cout << "\nUnable to query platforms";
return 0;
// Get the platform
if (numPlatforms > 0)
status = clGetPlatformIDs(numPlatforms, platforms, NULL);
platform = platforms[0];
cl_uint numDevices = 0;
cl_device_id *devices = NULL;
status =
clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 0, devices, &numDevices);
if (numDevices == 0)
cout << "No GPU device available! Choosing CPU.\n";
status = clGetDeviceIDs(platform, CL_DEVICE_TYPE_CPU, 0, devices,
devices = (cl_device_id *)malloc(numDevices * sizeof(cl_device_id));
status = clGetDeviceIDs(platform, CL_DEVICE_TYPE_CPU, numDevices,
devices, NULL);
devices = (cl_device_id *)malloc(numDevices * sizeof(cl_device_id));
status = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, numDevices,
devices, NULL);
if (status == 0)
cout << "Device error!";
return 0;
// Creating contexts
cl_context context =
clCreateContext(NULL, 1, devices, NULL, NULL, (cl_int *)status);
if (status != CL_SUCCESS)
cout << status;
// Creating command queues
cl_command_queue command =
clCreateCommandQueue(context, devices[0], 0, NULL);
// if(error!=CL_SUCCESS)
// cout<<error;
// Creating buffers
cl_mem bufferA = clCreateBuffer(context, CL_MEM_READ_ONLY,
3 * 3 * sizeof(int), NULL, NULL);
cl_mem bufferB = clCreateBuffer(context, CL_MEM_READ_ONLY,
3 * 3 * sizeof(int), NULL, NULL);
cl_mem bufferC = clCreateBuffer(context, CL_MEM_WRITE_ONLY,
3 * 3 * sizeof(int), NULL, NULL);
status = clEnqueueWriteBuffer(command, bufferA, CL_TRUE, 0, 9 * sizeof(int),
(void *)A, 0, NULL, NULL);
status = clEnqueueWriteBuffer(command, bufferB, CL_TRUE, 0, 9 * sizeof(int),
(void *)B, 0, NULL, NULL);
// status=clEnqueueReadBuffer(command,bufferA,CL_TRUE,0,9*sizeof(int),(void*)C,0,NULL,NULL);
const char *filename = "kernel.cl";
string sourceStr;
status = convertToString(filename, sourceStr);
const char *source = sourceStr.c_str();
size_t sourceSize[] = {strlen(source)};
cl_program program =
clCreateProgramWithSource(context, 1, &source, sourceSize, NULL);
status = clBuildProgram(program, numDevices, 0, NULL, NULL, NULL);
cl_kernel myKernel = clCreateKernel(program, "multiply", NULL);
// Setting kernel arguments
clSetKernelArg(myKernel, 0, sizeof(cl_mem), &bufferC);
clSetKernelArg(myKernel, 1, sizeof(cl_mem), &bufferA);
clSetKernelArg(myKernel, 2, sizeof(cl_mem), &bufferB);
size_t localws[2] = {9, 9};
size_t globalws[2] = {3, 3};
status = clEnqueueNDRangeKernel(command, myKernel, 2, NULL, globalws,
localws, 0, NULL, NULL);
status = clEnqueueReadBuffer(command, bufferC, CL_TRUE, 0, 9 * sizeof(int),
(void *)C, 0, NULL, NULL);
for (int i = 0; i < 9; i++) cout << C[i] << " ";
status = clReleaseKernel(myKernel); // Release kernel.
status = clReleaseProgram(program); // Release program object.
status = clReleaseMemObject(bufferA); // Release mem object.
status = clReleaseMemObject(bufferB);
status = clReleaseMemObject(bufferC);
status = clReleaseCommandQueue(command); // Release Command queue.
status = clReleaseContext(context); // Release context.
Kernel code:
__kernel void multiply(_global int outputC, _global int inputA,
_global int inputB)
int row = get_global_id(0);
int col = get_global_id(1);
int sum = 0;
for (int i = 0; i < 3; i++)
sum += inputA[row * 3 + 1] * inputB[i * 3 + col];
outputC[row + 3 + col] = sum;
As already pointed out by #Marco13 the kernel suffers from quite a few issues.
When running this kernel through a tool like clcc you can see that there are a number of compilation errors to begin with:
> clcc matmul.cl
"/tmp/OCLu7FyFF.cl", line 1: error: identifier "_global" is undefined
__kernel void multiply(_global int outputC, _global int inputA,
"/tmp/OCLu7FyFF.cl", line 1: error: invalid combination of type specifiers
__kernel void multiply(_global int outputC, _global int inputA,
"/tmp/OCLu7FyFF.cl", line 1: error: identifier "_global" is undefined
__kernel void multiply(_global int outputC, _global int inputA,
"/tmp/OCLu7FyFF.cl", line 1: error: invalid combination of type specifiers
__kernel void multiply(_global int outputC, _global int inputA,
"/tmp/OCLu7FyFF.cl", line 2: error: identifier "_global" is undefined
_global int inputB)
"/tmp/OCLu7FyFF.cl", line 2: error: invalid combination of type specifiers
_global int inputB)
6 errors detected in the compilation of "/tmp/OCLu7FyFF.cl".
A tool like clcc is very useful for catching errors early on. Most vendors also have their own version of a standalone kernel compiler/checker: e.g. Intel has its Kernel Builder, AMD's CodeXL contains a static kernel analyzer. Another option is to retrieve kernel compilation errors right from your host code, by calling clGetProgramBuildInfo to retrieve the compiler output, after clBuildProgram returned CL_BUILD_PROGRAM_FAILURE.
Once these compilation errors are fixed, it looks like your kernel is still not doing what you expect: as noted, the inputs and outputs should be pointers, as you will be passing buffers to the kernel. Also, the indexing of your input and output arrays is incorrect: In the for-loop inputA[row * 3 + 1] should be inputA[row * 3 + i] (i instead of 1). When saving the result to outputC, I would expect outputC[row * 3 + col] (row * 3) instead of row + 3).
I haven't looked in detail at the host code, but I would at least make sure, especially when just starting out with OpenCL, to always check every return code and error. This will save you a lot of time and frustration.
Finally, if you want a quick jump-start to learning OpenCL with a hands-on approach, I would strongly recommend going through the open source Hands-on OpenCL training by Simon McIntosh-Smith and Tom Deakin. It doesn't take very long, is quite pragmatic and provides lots of useful insights. Optimizing matrix multiplication is one of the use cases that is shown step-by-step.

lua5.2's error: multiple Lua VMs detected

I use 5.2 for learning recently, what I want to try like this:
Step 1, build a c module for lua:
#include "lua.h"
#include "lauxlib.h"
#include "lualib.h"
#include <stdlib.h>
static int add(lua_State *L) {
int x = luaL_checkint(L, -2);
int y = luaL_checkint(L, -1);
lua_pushinteger(L, x + y);
return 1;
static const struct luaL_Reg reg_lib[] = {
{"add", add}
int luaopen_tool(lua_State *L) {
luaL_newlib(L, reg_lib);
lua_setglobal(L, "tool");
return 0;
I compile and link it with liblua.a, and I'm sure it works well in lua script like "require("tool") tool.add(1, 2)"
Step 2, I write another C program that wants to require my c module in step 1 like this:
#include "lua.h"
#include "lauxlib.h"
#include "lualib.h"
#include <stdlib.h>
int main(int argc, char* const argv[]) {
lua_State *L = luaL_newstate();
luaL_requiref(L, "base", luaopen_base, 1);
luaL_requiref(L, "package", luaopen_package, 1);
lua_getglobal(L, "require");
if (!lua_isfunction(L, -1)) {
printf("require not found\n");
return 2;
lua_pushstring(L, "tool");
if (lua_pcall(L, 1, 1, 0) != LUA_OK) {
printf("require_fail=%s\n", lua_tostring(L, -1));
return 3;
lua_getfield(L, -1, "add");
lua_pushinteger(L, 2);
lua_pushinteger(L, 3);
lua_pcall(L, 2, 1, 0);
int n = luaL_checkint(L, -1);
printf("n=%d\n", n);
return 0;
I also compile & link with liblua.a, but error occurs when I run it:
"require_fail=multiple Lua VMs detected"
Someone's blog said that in lua5.2, you should link c module and c host program both dynamicly, but not staticly.
is there someone that has the same problem, or is there somehing wrong in my code, thanks.
the problem has been solved by compile main program with -Wl,-E, thanks a lot for all your help ^^.
Don't link your C module with liblua.a when you create a .so from it. For examples, see my page of Lua libraries: http://www.tecgraf.puc-rio.br/~lhf/ftp/lua/ . You can link liblua.a statically into your main program but you have to export its symbols by adding -Wl,-E at link time. That's how the Lua interpreter is built in Linux.
