Why lua_pushcclosure does not work ?
static int my_callback(lua_State* L)
{
std::cout << "my_callback" << std::endl;
std::cout << "PARAM1:" << lua_tostring(L, lua_upvalueindex(1)) << std::endl; //empty
std::cout << "PARAM2:" << lua_tostring(L, lua_upvalueindex(2)) << std::endl; //empty
return 0;
}
lua_pushstring(L, "param1");
lua_pushstring(L, "param2");
lua_pushcclosure(L, my_callback, 2);
lua_getfield(L, index, "Set_Callback");
lua_pushcfunction(L, my_callback);
int status_lua_pcall = lua_pcall(L, 1, 0, 0);
Why lua_pushcclosure does not work ?
After all, I set parameters 1 and 2 for the my_callback function. The my_callback function is called, but parameters 1 and 2 are empty.
Please explain me - how I can make a closure from - param1, param2, Set_callback, my_callback and lua_pushcclosure?
Or how do I then pass additional parameters to the my_callback function, which is called by the Set_callback method??
Here's how it works on pure Lua:
function my_callbac_func(param1, param2)
--some code...
end
param_1 = "Hello1"
param_2 = "Hello2"
my_table = func_Lua_app() -- The function returns a table in which there is a Set_callback method that needs to be called and passed as a parameter to the my_callback function itself.
my_table:Set_callback(function()my_callbac(param1, param2) end)
That is, I need to make an analog of this Lua code, only on the Lua C api.
Good, I try like this:
static int my_callback(lua_State* L)
{
std::cout << "my_callback" << std::endl;
std::cout << "PARAM1:" << lua_tostring(L, lua_upvalueindex(1)) << std::endl; //empty
std::cout << "PARAM2:" << lua_tostring(L, lua_upvalueindex(2)) << std::endl; //empty
return 0;
}
lua_getfield(L, index, "Set_Callback");
lua_pushstring(L, "param1");
lua_pushstring(L, "param2");
lua_pushcclosure(L, my_callback, 2);
int status_lua_pcall = lua_pcall(L, 3, 0, 0);
It doesn't work that way either.
Here is how each call affects the stack:
// stack: empty
lua_pushstring(L, "param1");
// stack: "param1"
lua_pushstring(L, "param2");
// stack: "param1", "param2"
// ^^^^^^^^^^^^^^^^^^
// upvalues
lua_pushcclosure(L, my_callback, 2);
// stack: the closure with upvalues
lua_getfield(L, index, "Set_Callback");
// stack: the closure with upvalues, the Set_Callback function
lua_pushcfunction(L, my_callback);
// stack: the closure with upvalues, the Set_Callback function, the cfunction without upvalues
// ^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
// what gets called parameter
int status_lua_pcall = lua_pcall(L, 1, 0, 0);
// stack: the closure with upvalues
You made the closure with upvalues and didn't use it. And then you called Set_Callback and the parameter you gave to Set_Callback was a different cfunction which had no upvalues. The one with upvalues is still on the stack, waiting to be used for something.
You need the closure with upvalues to be on the top of the stack when you get to lua_pcall, so that the parameter is the closure with upvalues.
Related
I'm trying to run a parallel for loop with triSYCL. This is my code:
#define TRISYCL_OPENCL
#define OMP_NUM_THREADS 8
#define BOOST_COMPUTE_USE_CPP11
//standart libraries
#include <iostream>
#include <functional>
//deps
#include "CL/sycl.hpp"
struct Color
{
float r, g, b, a;
friend std::ostream& operator<<(std::ostream& os, const Color& c)
{
os << "(" << c.r << ", " << c.g << ", " << c.b << ", " << c.a << ")";
return os;
}
};
struct Vertex
{
float x, y;
Color color;
friend std::ostream& operator<<(std::ostream& os, const Vertex& v)
{
os << "x: " << v.x << ", y: " << v.y << ", color: " << v.color;
return os;
}
};
template<typename T>
T mapNumber(T x, T a, T b, T c, T d)
{
return (x - a) / (b - a) * (d - c) + c;
}
int windowWidth = 640;
int windowHeight = 720;
int main()
{
auto exception_handler = [](cl::sycl::exception_list exceptions) {
for (std::exception_ptr const& e : exceptions)
{
try
{
std::rethrow_exception(e);
} catch (cl::sycl::exception const& e)
{
std::cout << "Caught asynchronous SYCL exception: " << e.what() << std::endl;
}
}
};
cl::sycl::default_selector defaultSelector;
cl::sycl::context context(defaultSelector, exception_handler);
cl::sycl::queue queue(context, defaultSelector, exception_handler);
auto* pixelColors = new Color[windowWidth * windowHeight];
{
cl::sycl::buffer<Color, 2> color_buffer(pixelColors, cl::sycl::range < 2 > {(unsigned long) windowWidth,
(unsigned long) windowHeight});
cl::sycl::buffer<int, 1> b_windowWidth(&windowWidth, cl::sycl::range < 1 > {1});
cl::sycl::buffer<int, 1> b_windowHeight(&windowHeight, cl::sycl::range < 1 > {1});
queue.submit([&](cl::sycl::handler& cgh) {
auto color_buffer_acc = color_buffer.get_access<cl::sycl::access::mode::write>(cgh);
auto width_buffer_acc = b_windowWidth.get_access<cl::sycl::access::mode::read>(cgh);
auto height_buffer_acc = b_windowHeight.get_access<cl::sycl::access::mode::read>(cgh);
cgh.parallel_for<class init_pixelColors>(
cl::sycl::range<2>((unsigned long) width_buffer_acc[0], (unsigned long) height_buffer_acc[0]),
[=](cl::sycl::id<2> index) {
color_buffer_acc[index[0]][index[1]] = {
mapNumber<float>(index[0], 0.f, width_buffer_acc[0], 0.f, 1.f),
mapNumber<float>(index[1], 0.f, height_buffer_acc[0], 0.f, 1.f),
0.f,
1.f};
});
});
std::cout << "cl::sycl::queue check - selected device: "
<< queue.get_device().get_info<cl::sycl::info::device::name>() << std::endl;
}//here the error appears
delete[] pixelColors;
return 0;
}
I'm building it with this CMakeLists.txt file:
cmake_minimum_required(VERSION 3.16.2)
project(acMandelbrotSet_stackoverflow)
set(CMAKE_CXX_STANDARD 17)
set(SRC_FILES
path/to/main.cpp
)
find_package(OpenCL REQUIRED)
set(Boost_INCLUDE_DIR path/to/boost)
include_directories(${Boost_INCLUDE_DIR})
include_directories(path/to/SYCL/include)
set(LIBS PRIVATE ${Boost_LIBRARIES} OpenCL::OpenCL)
add_executable(${PROJECT_NAME} ${SRC_FILES})
set_target_properties(${PROJECT_NAME} PROPERTIES DEBUG_POSTFIX _d)
target_link_libraries(${PROJECT_NAME} ${LIBS})
When I try to run it, I get this message: libc++abi.dylib: terminating with uncaught exception of type trisycl::non_cl_error from path/to/SYCL/include/triSYCL/command_group/detail/task.hpp line: 278 function: trisycl::detail::task::get_kernel, the message was: "Cannot use an OpenCL kernel in this context".
I've tried to create a lambda of mapNumber in the kernel but that didn't make any difference. I've also tried to use this before the end of the scope to catch errors:
try
{
queue.wait_and_throw();
} catch (cl::sycl::exception const& e)
{
std::cout << "Caught synchronous SYCL exception: " << e.what() << std::endl;
}
but nothing was printed to the console except the error from before. And I've also tried to make an event of the queue.submit call and then call event.wait() before the end of the scope but again the exact same output.
Does any body have an idea what else I could try?
The problem is that triSYCL is a research project looking deeper at some aspects of SYCL while not providing a global generic SYCL support for an end-user. I have just clarified this on the README of the project. :-(
Probably the problem here is that the OpenCL SPIR kernel has not been generated.
So you need to first compile the specific (old) Clang & LLVM from triSYCL https://github.com/triSYCL/triSYCL/blob/master/doc/architecture.rst#trisycl-architecture-for-accelerator. But unfortunately there is no simple Clang driver to use all the specific Clang & LLVM to generate the kernels from the SYCL source. Right know it is done with some ad-hoc awful Makefiles (look around https://github.com/triSYCL/triSYCL/blob/master/tests/Makefile#L360) and, even if you can survive to this, you might encounter some bugs...
The good news is now there are several other implementations of SYCL which are quite easier to use, quite more complete and quite less buggy! :-) Look at ComputeCpp, DPC++ and hipSYCL for example.
I got the FunctionDecl for the definition of a function. There is no declaration for this function.
For example:
int foo(char c, double d)
{
...
}
How do I get the signature (qualifier, return type, function name, parametrs) as a valid signature I could use to make a declaration?
I found that the easiest way is to use the lexer to get the signature of the function. Since I wanted to make a declaration out of a definition, I wanted the declaration to look exactly like the definition.
Therefore I defined a SourceRange from the start of the function to the beginning of the body of the function (minus the opening "{") and let the lexer give me this range as a string.
static std::string getDeclaration(const clang::FunctionDecl* D)
{
clang::ASTContext& ctx = D->getASTContext();
clang::SourceManager& mgr = ctx.getSourceManager();
clang::SourceRange range = clang::SourceRange(D->getSourceRange().getBegin(), D->getBody()->getSourceRange().getBegin());
StringRef s = clang::Lexer::getSourceText(clang::CharSourceRange::getTokenRange(range), mgr, ctx.getLangOpts());
return s.substr(0, s.size() - 2).str().append(";");
}
This solution assums that the FunctionDecl is a definition (has a body).
Maybe this is what you were looking for...
bool VisitDecl(Decl* D) {
auto k = D->getDeclKindName();
auto r = D->getSourceRange();
auto b = r.getBegin();
auto e = r.getEnd();
auto& srcMgr = Context->getSourceManager();
if (srcMgr.isInMainFile(b)) {
auto d = depth - 2u;
auto fname = srcMgr.getFilename(b);
auto bOff = srcMgr.getFileOffset(b);
auto eOff = srcMgr.getFileOffset(e);
llvm::outs() << std::string(2*d,' ') << k << "Decl ";
llvm::outs() << "<" << fname << ", " << bOff << ", " << eOff << "> ";
if (D->getKind() == Decl::Kind::Function) {
auto fnDecl = reinterpret_cast<FunctionDecl*>(D);
llvm::outs() << fnDecl->getNameAsString() << " ";
llvm::outs() << "'" << fnDecl->getType().getAsString() << "' ";
} else if (D->getKind() == Decl::Kind::ParmVar) {
auto pvDecl = reinterpret_cast<ParmVarDecl*>(D);
llvm::outs() << pvDecl->getNameAsString() << " ";
llvm::outs() << "'" << pvDecl->getType().getAsString() << "' ";
}
llvm::outs() << "\n";
}
return true;
}
Sample output:
FunctionDecl <foo.c, 48, 94> foo 'int (unsigned int)'
ParmVarDecl <foo.c, 56, 69> x 'unsigned int'
CompoundStmt <foo.c, 72, 94>
ReturnStmt <foo.c, 76, 91>
ParenExpr <foo.c, 83, 91>
BinaryOperator <foo.c, 84, 17>
ImplicitCastExpr <foo.c, 84, 84>
DeclRefExpr <foo.c, 84, 84>
ParenExpr <foo.c, 28, 45>
BinaryOperator <foo.c, 29, 43>
ParenExpr <foo.c, 29, 39>
BinaryOperator <foo.c, 30, 12>
IntegerLiteral <foo.c, 30, 30>
IntegerLiteral <foo.c, 12, 12>
IntegerLiteral <foo.c, 43, 43>
You will notice the reinterpret_cast<OtherDecl*>(D) function calls. Decl is the base class for all AST OtherDecl classes like FunctionDecl or ParmVarDecl. So reinterpreting the pointer is allowed and gets you access to that particular AST node's attributes. Since these more-specific AST Nodes inherit the NamedDecl and ValueDecl classes, obtaining the function name and the function type (signature) is simple. The same can be applied to the base class Stmt and other inherited classes like the OtherExpr classes.
Given
template<typename ...Types>
void print(Types&& ...args) {
(cout << ... << args);
}
// ....
print(1, 2, 3, 4); // prints 1234
How to add spaces so we get 1 2 3 4?
Update:
Correct answer:
((std::cout << args << ' ') , ...);
The usual workaround is to fold over the comma operator instead, though the simplistic approach will leave a trailing space:
((std::cout << args << ' '), ...);
Changing it to avoid the trailing space is left as an exercise for the reader.
SSE intrinsics includes _mm_shuffle_ps xmm1 xmm2 immx which allows one to pick 2 elements from xmm1 concatenated with 2 elements from xmm2. However this is for floats, (implied by the _ps , packed single). However if you cast your packed integers __m128i, then you can use _mm_shuffle_ps as well:
#include <iostream>
#include <immintrin.h>
#include <sstream>
using namespace std;
template <typename T>
std::string __m128i_toString(const __m128i var) {
std::stringstream sstr;
const T* values = (const T*) &var;
if (sizeof(T) == 1) {
for (unsigned int i = 0; i < sizeof(__m128i); i++) {
sstr << (int) values[i] << " ";
}
} else {
for (unsigned int i = 0; i < sizeof(__m128i) / sizeof(T); i++) {
sstr << values[i] << " ";
}
}
return sstr.str();
}
int main(){
cout << "Starting SSE test" << endl;
cout << "integer shuffle" << endl;
int A[] = {1, -2147483648, 3, 5};
int B[] = {4, 6, 7, 8};
__m128i pC;
__m128i* pA = (__m128i*) A;
__m128i* pB = (__m128i*) B;
*pA = (__m128i)_mm_shuffle_ps((__m128)*pA, (__m128)*pB, _MM_SHUFFLE(3, 2, 1 ,0));
pC = _mm_add_epi32(*pA,*pB);
cout << "A[0] = " << A[0] << endl;
cout << "A[1] = " << A[1] << endl;
cout << "A[2] = " << A[2] << endl;
cout << "A[3] = " << A[3] << endl;
cout << "B[0] = " << B[0] << endl;
cout << "B[1] = " << B[1] << endl;
cout << "B[2] = " << B[2] << endl;
cout << "B[3] = " << B[3] << endl;
cout << "pA = " << __m128i_toString<int>(*pA) << endl;
cout << "pC = " << __m128i_toString<int>(pC) << endl;
}
Snippet of relevant corresponding assembly (mac osx, macports gcc 4.8, -march=native on an ivybridge CPU):
vshufps $228, 16(%rsp), %xmm1, %xmm0
vpaddd 16(%rsp), %xmm0, %xmm2
vmovdqa %xmm0, 32(%rsp)
vmovaps %xmm0, (%rsp)
vmovdqa %xmm2, 16(%rsp)
call __ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
....
Thus it seemingly works fine on integers, which I expected as the registers are agnostic to types, however there must be a reason why the docs say that this instruction is only for floats. Does someone know any downsides, or implications I have missed?
There is no equivalent to _mm_shuffle_ps for integers. To achieve the same effect in this case you can do
SSE2
*pA = _mm_shuffle_epi32(_mm_unpacklo_epi32(*pA, _mm_shuffle_epi32(*pB, 0xe)),0xd8);
SSE4.1
*pA = _mm_blend_epi16(*pA, *pB, 0xf0);
or change to the floating point domain like this
*pA = _mm_castps_si128(
_mm_shuffle_ps(_mm_castsi128_ps(*pA),
_mm_castsi128_ps(*pB), _MM_SHUFFLE(3, 2, 1 ,0)));
But changing domains may incur bypass latency delays on some CPUs. Keep in mind that according to Agner
The bypass delay is important in long dependency chains where latency is a bottleneck, but
not where it is throughput rather than latency that matters.
You have to test your code and see which method above is more efficient.
Fortunately, on most Intel/AMD CPUs, there is usually no penalty for using shufps between most integer-vector instructions. Agner says:
For example, I found no delay when mixing PADDD and SHUFPS [on Sandybridge].
Nehalem does have 2 bypass-delay latency to/from SHUFPS, but even then a single SHUFPS is often still faster than multiple other instructions. Extra instructions have latency, too, as well as costing throughput.
The reverse (integer shuffles between FP math instructions) is not as safe:
In Agner Fog's microarchitecture on page 112 in Example 8.3a, he shows that using PSHUFD (_mm_shuffle_epi32) instead of SHUFPS (_mm_shuffle_ps) when in the floating point domain causes a bypass delay of four clock cycles. In Example 8.3b he uses SHUFPS to remove the delay (which works in his example).
On Nehalem there are actually five domains. Nahalem seems to be the most effected (the bypass delays did not exist before Nahalem). On Sandy Bridge the delays are less significant. This is even more true on Haswell. In fact on Haswell Agner said he found no delays between SHUFPS or PSHUFD (see page 140).
my problem is just astonishing. This is the code
#define NCHANNEL 3
#define NFRAME 100
Mat RR = Mat::zeros(NCHANNEL, NFRAME-1, CV_64FC1);
double *p_0 = RR.ptr<double>(0);
double *p_1 = RR.ptr<double>(1);
double *p_2 = RR.ptr<double>(2);
cout<< p_0[NFRAME-1] << endl << p_1[NFRAME-1] << endl << p_2[NFRAME-1] << endl;
And the output is: 0 0 -6.27744e+066 .
Where is that awful number come from? it seems I'm printing a pointer or something rough in memory. (uh, 0 is the value of all other elements, of course).
You are accessing after the last element of Mat. If you use NFRAME-1 for initialization then the last element has NFRAME-2 index.