I am investigating a strange problem: on Windows, lua_rawgeti() does not return back the value to which I have created reference, but a nil. Code:
lua_State *L = luaL_newstate();
luaL_requiref(L, "_G", luaopen_base, 1);
lua_pop(L, 1);
lua_getglobal(L, toStringz("_G"));
int t1 = lua_type(L, -1);
auto r = luaL_ref(L, LUA_REGISTRYINDEX);
lua_rawgeti(L, LUA_REGISTRYINDEX, r);
int t2 = lua_type(L, -1);
lua_close(L);
writefln("Ref: %d, types: %d, %d", r, t1, t2);
assert(r != LUA_REFNIL);
assert((t1 != LUA_TNIL) && (t1 == t2));
Full source and build bat: https://github.com/mkoskim/games/tree/master/tests/luaref
Compile & run:
rdmd -I<path>/DerelictLua/source/ -I<path>/DerelictUtil/source/ testref.d
64-bit Linux (_G is table, and rawgeti places a table to stack):
$ build.bat
Ref: 3, types: 5, 5
32-bit Windows (_G is table, but rawgeti places nil to stack):
$ build.bat
Ref: 3, types: 5, 0
<assertion fail>
So, either luaL_ref() fails to store reference to _G correctly, or lua_rawgeti() fails to retrieve _G correctly.
Update: I compiled Lua library from sources, and added printf() to lua_rawgeti() (lapi.c:660) to print out the reference:
printf("lua_rawgeti(%d)\n", n);
I also added writeln() to test.d to tell me at which point we call lua_rawgeti(). It shows that D sends the reference number correctly:
lua_rawgeti(2)
lua_rawgeti(0)
Dereferencing:
lua_rawgeti(3)
Ref: 3, types: 5, 0
On Windows, I use:
DMD 2.086.0 (32-bit Windows)
lua53.dll (32-bit Windows, I have tried both lua-5.3.4 and lua-5.3.5), from here: http://luabinaries.sourceforge.net/download.html
DerelictLua newest version (commit 5549c1a)
DerelictUtil newest version (commit 8dda339)
Questions:
Is there any bug in the code I just don't catch? Is there any known "quirks" or such to use 32-bit D and Lua on Windows? There can't be any big problems with my compiler and libraries, because they compile and link together without any errors, and lua calls mostly work (e.g. opening lua state, pushing _G to stack and such).
I was not able to find anything related when googling, so I am pretty sure there is something wrong in my setup (something is mismatching). It is hard to me to suspect problems in Lua libraries, because they have been stable quite some long time (even 32-bit versions).
I would like to know, if people have used 64-bit Windows DMD + Lua successfully. Of course, I would appreciate to hear if people use 32-bit Windows DMD + Lua successfully.
I am bit out of ideas where to look for solution. Any ideas what to try next?
Thanks in advance!
I got an answer from lua mailing list: http://lua-users.org/lists/lua-l/2019-05/msg00076.html
I suspect this is a bug in DerelictLua.
Lua defines lua_rawgeti thus:
int lua_rawgeti (lua_State *L, int index, lua_Integer n);
While DerelictLua defines its binding thus:
alias da_lua_rawgeti = int function(lua_State*, int, int);
I fixed that and created pull request to DerelictLua.
Related
I am trying to improve my loop computation speed by using foreach, but there is a simple Rcpp function I defined inside of this loop. I saved the Rcpp function as mproduct.cpp, and I call out the function simply using
sourceCpp("mproduct.cpp")
and the Rcpp function is a simple one, which is to perform matrix product in C++:
// [[Rcpp::depends(RcppArmadillo, RcppEigen)]]
#include <RcppArmadillo.h>
#include <RcppEigen.h>
// [[Rcpp::export]]
SEXP MP(const Eigen::Map<Eigen::MatrixXd> A, Eigen::Map<Eigen::MatrixXd> B){
Eigen::MatrixXd C = A * B;
return Rcpp::wrap(C);
}
So, the function in the Rcpp file is MP, referring to matrix product. I need to perform the following foreach loop (I have simplified the code for illustration):
foreach(j=1:n, .package='Rcpp',.noexport= c("mproduct.cpp"),.combine=rbind)%dopar%{
n=1000000
A<-matrix(rnorm(n,1000,1000))
B<-matrix(rnorm(n,1000,1000))
S<-MP(A,B)
return(S)
}
Since the size of matrix A and B are large, it is why I want to use foreach to alleviate the computational cost.
However, the above code does not work, since it provides me error message:
task 1 failed - "NULL value passed as symbol address"
The reason I added .noexport= c("mproduct.cpp") is to follow some suggestions from people who solved similar issues (Can't run Rcpp function in foreach - "NULL value passed as symbol address"). But somehow this does not solve my issue.
So I tried to install my Rcpp function as a library. I used the following code:
Rcpp.package.skeleton('mp',cpp_files = "<my working directory>")
but it returns me a warning message:
The following packages are referenced using Rcpp::depends attributes however are not listed in the Depends, Imports or LinkingTo fields of the package DESCRIPTION file: RcppArmadillo, RcppEigen
so when I tried to install my package using
install.packages("<my working directory>",repos = NULL,type='source')
I got the warning message:
Error in untar2(tarfile, files, list, exdir, restore_times) :
incomplete block on file
In R CMD INSTALL
Warning in install.packages :
installation of package ‘C:/Users/Lenovo/Documents/mproduct.cpp’ had non-zero exit status
So can someone help me out how to solve 1) using foreach with Rcpp function MP, or 2) install the Rcpp file as a package?
Thank you all very much.
The first step would be making sure that you are optimizing the right thing. For me, this would not be the case as this simple benchmark shows:
set.seed(42)
n <- 1000
A<-matrix(rnorm(n*n), n, n)
B<-matrix(rnorm(n*n), n, n)
MP <- Rcpp::cppFunction("SEXP MP(const Eigen::Map<Eigen::MatrixXd> A, Eigen::Map<Eigen::MatrixXd> B){
Eigen::MatrixXd C = A * B;
return Rcpp::wrap(C);
}", depends = "RcppEigen")
bench::mark(MP(A, B), A %*% B)[1:5]
#> # A tibble: 2 x 5
#> expression min median `itr/sec` mem_alloc
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt>
#> 1 MP(A, B) 277.8ms 278ms 3.60 7.63MB
#> 2 A %*% B 37.4ms 39ms 22.8 7.63MB
So for me the matrix product via %*% is several times faster than the one via RcppEigen. However, I am using Linux with OpenBLAS for matrix operations while you are on Windows, which often means reference BLAS for matrix operations. It might be that RcppEigen is faster on your system. I am not sure how difficult it is for Windows user to get a faster BLAS implementation (https://csgillespie.github.io/efficientR/set-up.html#blas-and-alternative-r-interpreters might contain some pointers), but I would suggest spending some time on investigating this.
Now if you come to the conclusion that you do need RcppEigen or RcppArmadillo in your code and want to put that code into a package, you can do the following. Instead of Rcpp::Rcpp.package.skeleton() use RcppEigen::RcppEigen.package.skeleton() or RcppArmadillo::RcppArmadillo.package.skeleton() to create a starting point for a package based on RcppEigen or RcppArmadillo, respectively.
there is some problem with my code
In follow code:
GainDetailMatI is Mat type with 9792*2448 matrix
ContrastGainBound4096x,ContrastGainLayerI is int
Platform: Android 4.4, NDK gcc 4.9
A:
Mat plus = ContrastGainLayerI * min(ContrastGainBound4096x, max(0, GainDetailMatI - 4096.0));
B:
Mat t=max(0, GainDetailMatI - 4096.0);
Mat plus = ContrastGainLayerI * min(ContrastGainBound4096x, t);
A use 13 millisec more than B.
I close gcc optimize by set APP_OPTIM := debug at Application.mk
Is there anyone know the reason?
I think maybe max(0, GainDetailMatI - 4096.0) return with type MatExpr
And t=max(0, GainDetailMatI - 4096.0); convert MatExpr to Mat
Maybe this is the reason?
Thanks a lot!
In example B you first store the object in t, retrieving it to use in the second part of your code. In example A you skip the storing and retrieving making the code more efficient. While this shows that dumping all your code on one line often makes it more efficient, keep in mind that readablility has ALOT of value. More info on Java performance can be found on the wiki. https://en.wikipedia.org/wiki/Java_performance#Compressed_Oops
if someone reading this question has a minute or two, might test the build of the following code:
#include <cstdint>
#include <x86intrin.h>
// some compiler feature tests used by makefile
typedef uint8_t vector_8_16 __attribute__ ((vector_size(16)));
static const vector_8_16 key16 = { 7, 6, 5, 4, 3, 2, 1, 0,
15, 14, 13, 12, 11, 10, 9, 8};
int main() {
vector_8_16 a = key16;
vector_8_16 b, c;
b = reinterpret_cast<vector_8_16>(_mm_shuffle_pd(a, a, 1));
c = _mm_xor_si128(b, a);
c = _mm_cmpeq_epi8(b, a);
c = _mm_andnot_si128(c, a);
return c[2] & 0;
}
with the following invocation:
gcc -std=c++11 -march=corei7-avx -flax-vector-conversions test.cc
At the moment, I tried gcc5 from this site: http://hpc.sourceforge.net/ but it just doesn't work:
/var/folders/d8/m9xrbkrs2tj3x6xw_h0nmkn40000gn/T//ccXbpcH7.s:10:no such instruction: `vmovdqa LC0(%rip), %xmm0'
/var/folders/d8/m9xrbkrs2tj3x6xw_h0nmkn40000gn/T//ccXbpcH7.s:11:no such instruction: `vmovaps %xmm0, -16(%rbp)'
/var/folders/d8/m9xrbkrs2tj3x6xw_h0nmkn40000gn/T//ccXbpcH7.s:12:no such instruction: `vmovapd -16(%rbp), %xmm1'
/var/folders/d8/m9xrbkrs2tj3x6xw_h0nmkn40000gn/T//ccXbpcH7.s:13:no such instruction: `vmovapd -16(%rbp), %xmm0'
/var/folders/d8/m9xrbkrs2tj3x6xw_h0nmkn40000gn/T//ccXbpcH7.s:14:no such instruction: `vshufpd $1, %xmm1,%xmm0,%xmm0'
A few years ago, I managed to get gcc 4.7 working, after building from source, and replacing the assembler in /usr/bin/as with the one from gcc. But that endeavour took some days, and I'm not sure if it works with the current OSX and Xcode tools versions. I suspect this one has similar problems, either trying to use the same assembler as the one came with Xcode, and has a misunderstanding with it, or trying to use it's own assembler, which doesn't know about AVX . I'm not sure yet what exactly is the problem, my next hope ( before spending a few days hacking it to use a useful assembler ) is to try the brew GCC package.
Or, if anyone knows an easy way to bring GCC with AVX to life on mac OS X, I'm happy to hear about it.
Note: clang I can already use, this question is specifically about GCC
==EDIT==
After a lot of searching, I found the same issue answered here, with a solution that works for me:
How to use AVX/pclmulqdq on Mac OS X
Sorry for another duplicate question.
Nice talking to myself.
Out
I'm having this very weird problem, only in my project. I'm using XCode 4.3.6 and trying to add Accelerating Framework to my project. So in my file I just do a simple import statement:
#import <Accelerate/Accelerate.h>
And then I build my project and get 4 errors in the file clapack.h file of vecLib.framework pointing to these lines:
int claswp_(__CLPK_integer *n, __CLPK_complex *a, __CLPK_integer *lda, __CLPK_integer *
k1, __CLPK_integer *k2, __CLPK_integer *ipiv, __CLPK_integer *incx) __OSX_AVAILABLE_STARTING(__MAC_10_2,__IPHONE_4_0);
int dlaswp_(__CLPK_integer *n, __CLPK_doublereal *a, __CLPK_integer *lda, __CLPK_integer
*k1, __CLPK_integer *k2, __CLPK_integer *ipiv, __CLPK_integer *incx) __OSX_AVAILABLE_STARTING(__MAC_10_2,__IPHONE_4_0);
int slaswp_(__CLPK_integer *n, __CLPK_real *a, __CLPK_integer *lda, __CLPK_integer *k1,
__CLPK_integer *k2, __CLPK_integer *ipiv, __CLPK_integer *incx) __OSX_AVAILABLE_STARTING(__MAC_10_2,__IPHONE_4_0);
int zlaswp_(__CLPK_integer *n, __CLPK_doublecomplex *a, __CLPK_integer *lda,
__CLPK_integer *k1, __CLPK_integer *k2, __CLPK_integer *ipiv, __CLPK_integer *incx) __OSX_AVAILABLE_STARTING(__MAC_10_2,__IPHONE_4_0);
All these errors showing that there's a missing expected closing bracket ')' at k1. It's weird that I don't get these errors in any other projects at all. What could be the reason for this error? I'd be really appreciate it if someone can suggest a solution for this.
Your code (or one of the headers that you include before <Accelerate/Accelerate.h>) defines a macro with the name k1. Something like:
#define k1 *some expression*
It’s a bug for a system library to use “common” parameter names like this for exactly this reason, but it’s also bad style for you to use them as macro names for the same reason.
There are a few ways that you can resolve the issue:
Change the name of your macro.
Move the definition of your macro so that it comes after the inclusion of the Accelerate header.
If you’re not using the LAPACK functions, but instead some other part of Accelerate, you can prevent the compiler from seeing the clapack.h prototypes via include-guard abuse:
#define __CLAPACK_H // hide clapack.h prototypes
#import <Accelerate/Accelerate.h>
Please refer this link : https://github.com/aosm/xnu/blob/master/EXTERNAL_HEADERS/Availability.h
The desktop Mac OS X and iOS each have different version numbers.
The __OSX_AVAILABLE_STARTING() macro allows you to specify both the desktop
and iOS version numbers. For instance:
__OSX_AVAILABLE_STARTING(__MAC_10_2,__IPHONE_2_0)
means the function/method was first available on Mac OS X 10.2 on the desktop
and first available in iOS 2.0 on the iPhone.
If a function is available on one platform, but not the other a _NA (not
applicable) parameter is used. For instance:
__OSX_AVAILABLE_STARTING(__MAC_10_3,__IPHONE_NA)
means that the function/method was first available on Mac OS X 10.3, and it
currently not implemented on the iPhone.
At some point, a function/method may be deprecated. That means Apple
recommends applications stop using the function, either because there is a
better replacement or the functionality is being phased out. Deprecated
functions/methods can be tagged with a __OSX_AVAILABLE_BUT_DEPRECATED()
macro which specifies the OS version where the function became available
as well as the OS version in which it became deprecated. For instance:
__OSX_AVAILABLE_BUT_DEPRECATED(__MAC_10_0,__MAC_10_5,__IPHONE_NA,__IPHONE_NA)
means that the function/method was introduced in Mac OS X 10.0, then
deprecated beginning in Mac OS X 10.5. On iOS the function
has never been available.
I'm doing ZigZag encoding on 32bit integers with Dart. This is the source code that I'm using:
int _encodeZigZag(int instance) => (instance << 1) ^ (instance >> 31);
int _decodeZigZag(int instance) => (instance >> 1) ^ (-(instance & 1));
The code works as expected in the DartVM.
But in dart2js the _decodeZigZag function is returning invalid results if I input negativ numbers. For example -10. -10 is encoded to 19 and should be decoded back to -10, but it is decoded to 4294967286. If I run (instance >> 1) ^ (-(instance & 1)) in the JavaScript console of Chrome, I get the expected result of -10. That means for me, that Javascript should be able to run this operation properly with it number model.
But Dart2Js generate the following JavaScript, that looks different from the code I tested in the console:
return ($.JSNumber_methods.$shr(instance, 1) ^ -(instance & 1)) >>> 0;
Why does Dart2Js adds a usinged right shift by 0 to the function? Without the shift, the result would be as expected.
Now I'm wondering, is it a bug in the Dart2Js compiler or the expected result? Is there a way to force Dart2Js to output the right javascript code?
Or is my Dart code wrong?
PS: Also tested splitting up the XOR into other operations, but Dart2Js is still adding the right shift:
final a = -(instance & 1);
final b = (instance >> 1);
return (a & -b) | (-a & b);
Results in:
a = -(instance & 1);
b = $.JSNumber_methods.$shr(instance, 1);
return (a & -b | -a & b) >>> 0;
For efficiency reasons dart2js compiles Dart numbers to JS numbers. JS, however, only provides one number type: doubles. Furthermore bit-operations in JS are always truncated to 32 bits.
In many cases (like cryptography) it is easier to deal with unsigned 32 bits, so dart2js compiles bit-operations so that their result is an unsigned 32 bit number.
Neither choice (signed or unsigned) is perfect. Initially dart2js compiled to signed 32 bits, and was only changed when we tripped over it too frequently. As your code demonstrate, this doesn't remove the problem, just shifts it to different (hopefully less frequent) use-cases.
Non-compliant number semantics have been a long-standing bug in dart2js, but fixing it will take time and potentially slow down the resulting code. In the short-term future Dart developers (compiling to JS) need to know about this restriction and work around it.
Looks like I found equivalent code that output the right result. The unit test pass for both the dart vm and dart2js and I will use it for now.
int _decodeZigZag(int instance) => ((instance & 1) == 1 ? -(instance >> 1) - 1 : (instance >> 1));
Dart2Js is not adding a shift this time. I would still be interested into the reason for this behavior.