Creating a simple Rcpp package with dependency with other Rcpp package - foreach

I am trying to improve my loop computation speed by using foreach, but there is a simple Rcpp function I defined inside of this loop. I saved the Rcpp function as mproduct.cpp, and I call out the function simply using
sourceCpp("mproduct.cpp")
and the Rcpp function is a simple one, which is to perform matrix product in C++:
// [[Rcpp::depends(RcppArmadillo, RcppEigen)]]
#include <RcppArmadillo.h>
#include <RcppEigen.h>
// [[Rcpp::export]]
SEXP MP(const Eigen::Map<Eigen::MatrixXd> A, Eigen::Map<Eigen::MatrixXd> B){
Eigen::MatrixXd C = A * B;
return Rcpp::wrap(C);
}
So, the function in the Rcpp file is MP, referring to matrix product. I need to perform the following foreach loop (I have simplified the code for illustration):
foreach(j=1:n, .package='Rcpp',.noexport= c("mproduct.cpp"),.combine=rbind)%dopar%{
n=1000000
A<-matrix(rnorm(n,1000,1000))
B<-matrix(rnorm(n,1000,1000))
S<-MP(A,B)
return(S)
}
Since the size of matrix A and B are large, it is why I want to use foreach to alleviate the computational cost.
However, the above code does not work, since it provides me error message:
task 1 failed - "NULL value passed as symbol address"
The reason I added .noexport= c("mproduct.cpp") is to follow some suggestions from people who solved similar issues (Can't run Rcpp function in foreach - "NULL value passed as symbol address"). But somehow this does not solve my issue.
So I tried to install my Rcpp function as a library. I used the following code:
Rcpp.package.skeleton('mp',cpp_files = "<my working directory>")
but it returns me a warning message:
The following packages are referenced using Rcpp::depends attributes however are not listed in the Depends, Imports or LinkingTo fields of the package DESCRIPTION file: RcppArmadillo, RcppEigen
so when I tried to install my package using
install.packages("<my working directory>",repos = NULL,type='source')
I got the warning message:
Error in untar2(tarfile, files, list, exdir, restore_times) :
incomplete block on file
In R CMD INSTALL
Warning in install.packages :
installation of package ‘C:/Users/Lenovo/Documents/mproduct.cpp’ had non-zero exit status
So can someone help me out how to solve 1) using foreach with Rcpp function MP, or 2) install the Rcpp file as a package?
Thank you all very much.

The first step would be making sure that you are optimizing the right thing. For me, this would not be the case as this simple benchmark shows:
set.seed(42)
n <- 1000
A<-matrix(rnorm(n*n), n, n)
B<-matrix(rnorm(n*n), n, n)
MP <- Rcpp::cppFunction("SEXP MP(const Eigen::Map<Eigen::MatrixXd> A, Eigen::Map<Eigen::MatrixXd> B){
Eigen::MatrixXd C = A * B;
return Rcpp::wrap(C);
}", depends = "RcppEigen")
bench::mark(MP(A, B), A %*% B)[1:5]
#> # A tibble: 2 x 5
#> expression min median `itr/sec` mem_alloc
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt>
#> 1 MP(A, B) 277.8ms 278ms 3.60 7.63MB
#> 2 A %*% B 37.4ms 39ms 22.8 7.63MB
So for me the matrix product via %*% is several times faster than the one via RcppEigen. However, I am using Linux with OpenBLAS for matrix operations while you are on Windows, which often means reference BLAS for matrix operations. It might be that RcppEigen is faster on your system. I am not sure how difficult it is for Windows user to get a faster BLAS implementation (https://csgillespie.github.io/efficientR/set-up.html#blas-and-alternative-r-interpreters might contain some pointers), but I would suggest spending some time on investigating this.
Now if you come to the conclusion that you do need RcppEigen or RcppArmadillo in your code and want to put that code into a package, you can do the following. Instead of Rcpp::Rcpp.package.skeleton() use RcppEigen::RcppEigen.package.skeleton() or RcppArmadillo::RcppArmadillo.package.skeleton() to create a starting point for a package based on RcppEigen or RcppArmadillo, respectively.

Related

lua static analysis: detecting uninitialized table field

I'm using luacheck (within the Atom editor), but open to other static analysis tools.
Is there a way to check that I'm using an uninitialized table field? I read the docs (http://luacheck.readthedocs.io/en/stable/index.html) but maybe I missed how to do this?
In all three cases in the code below I'm trying to detect that I'm (erroneously) using field 'y1'. None of them do. (At run-time it is detected, but I'm trying to catch it before run-time).
local a = {}
a.x = 10
a.y = 20
print(a.x + a.y1) -- no warning about uninitialized field y1 !?
-- luacheck: globals b
b = {}
b.x = 10
b.y = 20
print(b.x + b.y1) -- no warning about uninitialized field y1 !?
-- No inline option for luacheck re: 'c', so plenty of complaints
-- about "non-standard global variable 'c'."
c = {} -- warning about setting
c.x = 10 -- warning about mutating
c.y = 20 -- " " "
print(c.x + c.y1) -- more warnings (but NOT about field y1)
The point is this: as projects grow (files grow, and the number & size of modules grow), it would be nice to prevent simple errors like this from creeping in.
Thanks.
lua-inspect should be able to detect and report these instances. I have it integrated into ZeroBrane Studio IDE and when running with the deep analysis it reports the following on this fragment:
unknown-field.lua:4: first use of unknown field 'y1' in 'a'
unknown-field.lua:7: first assignment to global variable 'b'
unknown-field.lua:10: first use of unknown field 'y1' in 'b'
unknown-field.lua:14: first assignment to global variable 'c'
unknown-field.lua:17: first use of unknown field 'y1' in 'c'
(Note that the integration code only reports first instances of these errors to minimize the number of instances reported; I also fixed an issue that only reported first unknown instance of a field, so you may want to use the latest code from the repository.)
People who look into questions related to "Lua static analysis" may also be interested in the various dialects of typed Lua, for example:
Typed Lua
Titan
Pallene
Ravi
But you may not have heard of "Teal". (early in its life it was called "tl"); .
I'm taking the liberty to answer my original question using Teal, since I find it intriguing.
-- 'record' (like a 'struct')
local Point = record
x : number
y : number
end
local a : Point = {}
a.x = 10
a.y = 20
print(a.x + a.y1) -- will trigger an error
-- (in VS Code using teal extension & at command line)
From command line:
> tl check myfile.tl
========================================
1 error:
myfile.tl:44:13: invalid key 'y1' in record 'a'
By the way...
> tl gen myfile.tl'
creates a pure Lua file: 'myfile.lua' that has no type information in it. Note: running this Lua file will trigger the 'nil' error... lua: myfile.lua:42: attempt to index a nil value (local 'a').
So, Teal gives you a chance to catch 'type' errors, but it doesn't require you to fix them before generating Lua files.

Modulo Power function in J

If I want to compute a^b mod c then there is an efficient way of doing this, without computing a^b in full.
However, when programming, if I write f g x then g(x) is computed regardless of f.
J provides for composing f and g in special cases, and the modulo power function is one of them. For instance, the following runs extremely quickly.
1000&| # (2&^) 10000000x
This is because the 'atop' conjunction # tells the language to compose the functions if possible. If I remove it, it goes unbearably slowly.
If however I want to work with x^x , then ^~ no longer works and I get limit errors for large values. Binding those large values however does work.
So
999&| # (100333454&^) 100333454x
executes nice and fast but
999&| # ^~ 100333454x
gives me a limit error - the RHS is too big.
Am I right in thinking that in this case, J is not using an efficient power-modulo algorithm?
Per the special code page:
m&|#^ dyad avoids exponentiation for integer arguments
m&|#(n&^) monad avoids exponentiation for integer arguments
The case for ^~ is not supported by special code.
https://github.com/Pascal-J/BN-openssl-bindings-for-J
includes bindings for openssl (distributed with J on windows, and pre-installed on all other compatible systems)'s BN library including modexp. The x ^ x case will be especially efficient due to fewer parameters converted from J to "C" (BN)

Arrays and Datatypes in Z3py

I'm actually using Z3py for scheduling solving problems and I'm trying to represent a 2 processors system where 4 process of different execution time must be done.
My actual data are :
Process 1 : Arrival at 0 and execution time of 4
Process 2 : Arrival at 1 and execution time of 3
Process 3 : Arrival at 3 and execution time of 5
Process 4 : Arrival at 1 and execution time of 2
I'm actually trying to represent each process while decomposing each in subprocess of equal time so my datatypes are like this :
Pn = Datatype('Pn')
Pn.declare('1')
Pn.declare('2')
Pn.declare('3')
Pn.declare('4')
Pt = Datatype('Pt')
Pt.declare('1')
Pt.declare('2')
Pt.declare('3')
Pt.declare('4')
Pt.declare('5')
Process = Datatype('Process')
Process.declare('cons' , ('name',Pn) , ('time', Pt))
Process.declare('idle')
where pn and pt are the process name and the part of the process (process 1 is in 4 parts, ...)
But now I don't know how I can represent my processors to add 3 rules I need : unicity (each sub process must be done 1 and only 1 time by only 1 processor) check arrival (the first part of a process can't be processed before it arrived) and order (each part of a process must be processed after the precedent)
So I was thinking of using arrays to represent my 2 processors with this kind of declaration :
P = Array('P', IntSort() , Process)
But when I tried to execute it I got an error message saying :
Traceback (most recent call last):
File "C:\Users\Alexis\Desktop\test.py", line 16, in <module>
P = Array('P', IntSort() , Process)
File "src/api/python\z3.py", line 3887, in Array
File "src/api/python\z3.py", line 3873, in ArraySort
File "src/api/python\z3.py", line 56, in _z3_assert
Z3Exception: 'Z3 sort expected'
And know I don't know how handle it... must I create a new datatype and figure a way to add my rules ? or Is there a way to add datatypes to array which would let me create rules like this :
unicity = ForAll([x,y] , (Implies(x!=y,P[x]!=P[y])))
Thanks in advance
There is a tutorial on using Datatypes from the Python API. A link to the tutorial is:
http://rise4fun.com/Z3Py/tutorialcontent/advanced#h22
It shows how to create a list data-type and use the "create()" method to instantiate a Sort object from the object used when declaring the data-type. For your example, it suffices to add calls to "create()" in the places where you want to use the declared type as a sort.
See: http://rise4fun.com/Z3Py/rQ7t
Regarding the rest of the case study you are looking at: it is certainly possible to express the constrsaints you describe using quantifiers and arrays. You could also consider somewhat more efficient encodings:
Instead of using an array, use a function declaration. So P would be declared as a unary function:
P = Function('P', IntSort(), Process.create()).
Using quantifiers for small finite domain problems may be more of an overhead than a benefit. Writing down the constraints directly as a finite conjunction saves the overhead of instantiating quantifiers possibly repeatedly. That said, some quantified axioms can also be optimized. Z3 automatically compiles axioms of the form: ForAll([x,y], Implies(x != y, P(x) != P(y))) into
an axioms of the form Forall([x], Pinv(P(x)) == x), where "Pinv" is a fresh function. The new axiom still enforces that P is injective but requires only a linear number of instantiations, linear in the number of occurrences of P(t) for some term 't'.
Have fun!

Matlab, Econometrics toolbox - Simulate ARIMA with deterministic time-varying variance

DISCLAIMER: This question is only for those who have access to the econometrics toolbox in Matlab.
The Situation: I would like to use Matlab to simulate N observations from an ARIMA(p, d, q) model using the econometrics toolbox. What's the difficulty? I would like the innovations to be simulated with deterministic, time-varying variance.
Question 1) Can I do this using the in-built matlab simulate function without altering it myself? As near as I can tell, this is not possible. From my reading of the docs, the innovations can either be specified to have a constant variance (ie same variance for each innovation), or be specified to be stochastically time-varying (eg a GARCH model), but they cannot be deterministically time-varying, where I, the user, choose their values (except in the trivial constant case).
Question 2) If the answer to question 1 is "No", then does anyone see any reason why I can't edit the simulate function from the econometrics toolbox as follows:
a) Alter the preamble such that the function won't throw an error if the Variance field in the input model is set to a numeric vector instead of a numeric scalar.
b) Alter line 310 of simulate from:
E(:,(maxPQ + 1:end)) = Z * sqrt(variance);
to
E(:,(maxPQ + 1:end)) = (ones(NumPath, 1) * sqrt(variance)) .* Z;
where NumPath is the number of paths to be simulated, and it can be assumed that I've included an error trap to ensure that the (input) deterministic variance path stored in variance is of the right length (ie equal to the number of observations to be simulated per path).
Any help would be most appreciated. Apologies if the question seems basic, I just haven't ever edited one of Mathwork's own functions before and didn't want to do something foolish.
UPDATE (2012-10-18): I'm confident that the code edit I've suggested above is valid, and I'm mostly confident that it won't break anything else. However it turns out that implementing the solution is not trivial due to file permissions. I'm currently talking with Mathworks about the best way to achieve my goal. I'll post the results here once I have them.
It's been a week and a half with no answer, so I think I'm probably okay to post my own answer at this point.
In response to my question 1), no, I have not found anyway to do this with the built-in matlab functions.
In response to my question 2), yes, what I have posted will work. However, it was a little more involved than I imagined due to matlab file permissions. Here is a step-by-step guide:
i) Somewhere in your matlab path, create the directory #arima_Custom.
ii) In the command window, type edit arima. Copy the text of this file into a new m file and save it in the directory #arima_Custom with the filename arima_Custom.m.
iii) Locate the econometrics toolbox on your machine. Once found, look for the directory #arima in the toolbox. This directory will probably be located (on a Linux machine) at something like $MATLAB_ROOT/toolbox/econ/econ/#arima (on my machine, $MATLAB_ROOT is at /usr/local/Matlab/R2012b). Copy the contents of #arima to #arima_Custom, except do NOT copy the file arima.m.
iv) Open arima_Custom for editing, ie edit arima_Custom. In this file change line 1 from:
classdef (Sealed) arima < internal.econ.LagIndexableTimeSeries
to
classdef (Sealed) arima_Custom < internal.econ.LagIndexableTimeSeries
Next, change line 406 from:
function OBJ = arima(varargin)
to
function OBJ = arima_Custom(varargin)
Now, change line 993 from:
if isa(OBJ.Variance, 'double') && (OBJ.Variance <= 0)
to
if isa(OBJ.Variance, 'double') && (sum(OBJ.Variance <= 0) > 0)
v) Open the simulate.m located in #arima_Custom for editing (we copied it there in step iii). It is probably best to open this file by navigating to it manually in the Current Folder window, to ensure the correct simulate.m is opened. In this file, alter line 310 from:
E(:,(maxPQ + 1:end)) = Z * sqrt(variance);
to
%Check that the input variance is of the right length (if it isn't scalar)
if isscalar(variance) == 0
if size(variance, 2) ~= 1
error('Deterministic variance must be a column vector');
end
if size(variance, 1) ~= numObs
error('Deterministic variance vector is incorrect length relative to number of observations');
end
else
variance = variance(ones(numObs, 1));
end
%Scale innovations using deterministic variance
E(:,(maxPQ + 1:end)) = sqrt(ones(numPaths, 1) * variance') .* Z;
And we're done!
You should now be able to simulate with deterministically time-varying variance using the arima_Custom class, for example (for an ARIMA(0,1,0)):
ARIMAModel = arima_Custom('D', 1, 'Variance', ScalarVariance, 'Constant', 0);
ARIMAModel.Variance = TimeVaryingVarianceVector;
[X, e, VarianceVector] = simulate(ARIMAModel, NumObs, 'numPaths', NumPaths);
Further, you should also still be able to use matlab's original arima class, since we didn't alter it.

How does the Erlang compiler handle pattern matching? What does it output?

I just asked a question about how the Erlang compiler implements pattern matching, and I got some great responses, one of which is the compiled bytecode (obtained with a parameter passed to the c() directive):
{function, match, 1, 2}.
{label,1}.
{func_info,{atom,match},{atom,match},1}.
{label,2}.
{test,is_tuple,{f,3},[{x,0}]}.
{test,test_arity,{f,3},[{x,0},2]}.
{get_tuple_element,{x,0},0,{x,1}}.
{test,is_eq_exact,{f,3},[{x,1},{atom,a}]}.
return.
{label,3}.
{badmatch,{x,0}}
Its all just plain Erlang tuples. I was expecting some cryptic binary thingy, guess not. I am asking this on impulse here (I could look at the compiler source but asking questions always ends up better with extra insight), how is this output translated in the binary level?
Say {test,is_tuple,{f,3},[{x,0}]} for example. I am assuming this is one instruction, called 'test'... anyway, so this output would essentially be the AST of the bytecode level language, from which the binary encoding is just a 1-1 translation?
This is all so exciting, I had no idea that I can this easily see what the Erlang compiler break things into.
ok so I dug into the compiler source code to find the answer, and to my surprise the asm file produced with the 'S' parameter to the compile:file() function is actually consulted in as is (file:consult()) and then the tuples are checked one by one for further action(line 661 - beam_consult_asm(St) -> - compile.erl). further on then there's a generated mapping table in there (compile folder of the erlang source) that shows what the serial number of each bytecode label is, and Im guessing this is used to generate the actual binary signature of the bytecode.
great stuff. but you just gotta love the consult() function, you can almost have a lispy type syntax for a random language and avoid the need for a parser/lexer fully and just consult source code into the compiler and do stuff with it... code as data data as code...
The compiler has a so-called pattern match compiler which will take a pattern and compile it down to what is essentially a series of branches, switches and such. The code for Erlang is in v3_kernel.erl in the compiler. It uses Simon Peyton Jones, "The Implementation of Functional
Programming Languages", available online at
http://research.microsoft.com/en-us/um/people/simonpj/papers/slpj-book-1987/
Another worthy paper is the one by Peter Sestoft,
http://www.itu.dk/~sestoft/papers/match.ps.gz
which derives a pattern match compiler by inspecting partial evaluation of a simpler system. It may be an easier read, especially if you know ML.
The basic idea is that if you have, say:
% 1
f(a, b) ->
% 2
f(a, c) ->
% 3
f(b, b) ->
% 4
f(b, c) ->
Suppose now we have a call f(X, Y). Say X = a. Then only 1 and 2 are applicable. So we check Y = b and then Y = c. If on the other hand X /= a then we know that we can skip 1 and 2 and begin testing 3 and 4. The key is that if something does not match it tells us something about where the match can continue as well as when we do match. It is a set of constraints which we can solve by testing.
Pattern match compilers seek to optimize the number of tests so there are as few as possible before we have conclusion. Statically typed language have some advantages here since they may know that:
-type foo() :: a | b | c.
and then if we have
-spec f(foo() -> any().
f(a) ->
f(b) ->
f(c) ->
and we did not match f(a), f(b) then f(c) must match. Erlang has to check and then fail if it doesn't match.

Resources