Performing an "online" linear interpolation - time-series

I have a problem where I need to do a linear interpolation on some data as it is acquired from a sensor (it's technically position data, but the nature of the data doesn't really matter). I'm doing this now in matlab, but since I will eventually migrate this code to other languages, I want to keep the code as simple as possible and not use any complicated matlab-specific/built-in functions.
My implementation initially seems OK, but when checking my work against matlab's built-in interp1 function, it seems my implementation isn't perfect, and I have no idea why. Below is the code I'm using on a dataset already fully collected, but as I loop through the data, I act as if I only have the current sample and the previous sample, which mirrors the problem I will eventually face.
%make some dummy data
np = 109; %number of data points for x and y
x_data = linspace(3,98,np) + (normrnd(0.4,0.2,[1,np]));
y_data = normrnd(2.5, 1.5, [1,np]);
%define the query points the data will be interpolated over
qp = [1:100];
kk=2; %indexes through the data
cc = 1; %indexes through the query points
qpi = qp(cc); %qpi is the current query point in the loop
y_interp = qp*nan; %this will hold our solution
while kk<=length(x_data)
kk = kk+1; %update the data counter
%perform online interpolation
if cc<length(qp)-1
if qpi>=y_data(kk-1) %the query point, of course, has to be in-between the current value and the next value of x_data
y_interp(cc) = myInterp(x_data(kk-1), x_data(kk), y_data(kk-1), y_data(kk), qpi);
end
if qpi>x_data(kk), %if the current query point is already larger than the current sample, update the sample
kk = kk+1;
else %otherwise, update the query point to ensure its in between the samples for the next iteration
cc = cc + 1;
qpi = qp(cc);
%It is possible that if the change in x_data is greater than the resolution of the query
%points, an update like the above wont work. In this case, we must lag the data
if qpi<x_data(kk),
kk=kk-1;
end
end
end
end
%get the correct interpolation
y_interp_correct = interp1(x_data, y_data, qp);
%plot both solutions to show the difference
figure;
plot(y_interp,'displayname','manual-solution'); hold on;
plot(y_interp_correct,'k--','displayname','matlab solution');
leg1 = legend('show');
set(leg1,'Location','Best');
ylabel('interpolated points');
xlabel('query points');
Note that the "myInterp" function is as follows:
function yi = myInterp(x1, x2, y1, y2, qp)
%linearly interpolate the function value y(x) over the query point qp
yi = y1 + (qp-x1) * ( (y2-y1)/(x2-x1) );
end
And here is the plot showing that my implementation isn't correct :-(
Can anyone help me find where the mistake is? And why? I suspect it has something to do with ensuring that the query point is in-between the previous and current x-samples, but I'm not sure.

The problem in your code is that you at times call myInterp with a value of qpi that is outside of the bounds x_data(kk-1) and x_data(kk). This leads to invalid extrapolation results.
Your logic of looping over kk rather than cc is very confusing to me. I would write a simple for loop over cc, which are the points at which you want to interpolate. For each of these points, advance kk, if necessary, such that qp(cc) is in between x_data(kk) and x_data(kk+1) (you can use kk-1 and kk instead if you prefer, just initialize kk=2 to ensure that kk-1 exists, I just find starting at kk=1 more intuitive).
To simplify the logic here, I'm limiting the values in qp to be inside the limits of x_data, so that we don't need to test to ensure that x_data(kk+1) exists, nor that x_data(1)<pq(cc). You can add those tests in if you wish.
Here's my code:
qp = [ceil(x_data(1)+0.1):floor(x_data(end)-0.1)];
y_interp = qp*nan; % this will hold our solution
kk=1; % indexes through the data
for cc=1:numel(qp)
% advance kk to where we can interpolate
% (this loop is guaranteed to not index out of bounds because x_data(end)>qp(end),
% but needs to be adjusted if this is not ensured prior to the loop)
while x_data(kk+1) < qp(cc)
kk = kk + 1;
end
% perform online interpolation
y_interp(cc) = myInterp(x_data(kk), x_data(kk+1), y_data(kk), y_data(kk+1), qp(cc));
end
As you can see, the logic is a lot simpler this way. The result is identical to y_interp_correct. The inner while x_data... loop serves the same purpose as your outer while loop, and would be the place where you read your data from wherever it's coming from.

Related

How to randomly get a value from a table [duplicate]

I am working on programming a Markov chain in Lua, and one element of this requires me to uniformly generate random numbers. Here is a simplified example to illustrate my question:
example = function(x)
local r = math.random(1,10)
print(r)
return x[r]
end
exampleArray = {"a","b","c","d","e","f","g","h","i","j"}
print(example(exampleArray))
My issue is that when I re-run this program multiple times (mash F5) the exact same random number is generated resulting in the example function selecting the exact same array element. However, if I include many calls to the example function within the single program by repeating the print line at the end many times I get suitable random results.
This is not my intention as a proper Markov pseudo-random text generator should be able to run the same program with the same inputs multiple times and output different pseudo-random text every time. I have tried resetting the seed using math.randomseed(os.time()) and this makes it so the random number distribution is no longer uniform. My goal is to be able to re-run the above program and receive a randomly selected number every time.
You need to run math.randomseed() once before using math.random(), like this:
math.randomseed(os.time())
From your comment that you saw the first number is still the same. This is caused by the implementation of random generator in some platforms.
The solution is to pop some random numbers before using them for real:
math.randomseed(os.time())
math.random(); math.random(); math.random()
Note that the standard C library random() is usually not so uniformly random, a better solution is to use a better random generator if your platform provides one.
Reference: Lua Math Library
Standard C random numbers generator used in Lua isn't guananteed to be good for simulation. The words "Markov chain" suggest that you may need a better one. Here's a generator widely used for Monte-Carlo calculations:
local A1, A2 = 727595, 798405 -- 5^17=D20*A1+A2
local D20, D40 = 1048576, 1099511627776 -- 2^20, 2^40
local X1, X2 = 0, 1
function rand()
local U = X2*A2
local V = (X1*A2 + X2*A1) % D20
V = (V*D20 + U) % D40
X1 = math.floor(V/D20)
X2 = V - X1*D20
return V/D40
end
It generates a number between 0 and 1, so r = math.floor(rand()*10) + 1 would go into your example.
(That's multiplicative random number generator with period 2^38, multiplier 5^17 and modulo 2^40, original Pascal code by http://osmf.sscc.ru/~smp/)
math.randomseed(os.clock()*100000000000)
for i=1,3 do
math.random(10000, 65000)
end
Always results in new random numbers. Changing the seed value will ensure randomness. Don't follow os.time() because it is the epoch time and changes after one second but os.clock() won't have the same value at any close instance.
There's the Luaossl library solution: (https://github.com/wahern/luaossl)
local rand = require "openssl.rand"
local randominteger
if rand.ready() then -- rand has been properly seeded
-- Returns a cryptographically strong uniform random integer in the interval [0, n−1].
randominteger = rand.uniform(99) + 1 -- randomizes an integer from range 1 to 100
end
http://25thandclement.com/~william/projects/luaossl.pdf

In Lua Torch, the product of two zero matrices has nan entries

I have encountered a strange behavior of the torch.mm function in Lua/Torch. Here is a simple program that demonstrates the problem.
iteration = 0;
a = torch.Tensor(2, 2);
b = torch.Tensor(2, 2);
prod = torch.Tensor(2,2);
a:zero();
b:zero();
repeat
prod = torch.mm(a,b);
ent = prod[{2,1}];
iteration = iteration + 1;
until ent ~= ent
print ("error at iteration " .. iteration);
print (prod);
The program consists of one loop, in which the program multiplies two zero 2x2 matrices and tests if entry ent of the product matrix is equal to nan. It seems that the program should run forever since the product should always be equal to 0, and hence ent should be 0. However, the program prints:
error at iteration 548
0.000000 0.000000
nan nan
[torch.DoubleTensor of size 2x2]
Why is this happening?
Update:
The problem disappears if I replace prod = torch.mm(a,b) with torch.mm(prod,a,b), which suggests that something is wrong with the memory allocation.
My version of Torch was compiled without BLAS & LAPACK libraries. After I recompiled torch with OpenBLAS, the problem disappeared. However, I am still interested in its cause.
The part of code that auto-generates the Lua wrapper for torch.mm can be found here.
When you write prod = torch.mm(a,b) within your loop it corresponds to the following C code behind the scenes (generated by this wrapper thanks to cwrap):
/* this is the tensor that will hold the results */
arg1 = THDoubleTensor_new();
THDoubleTensor_resize2d(arg1, arg5->size[0], arg6->size[1]);
arg3 = arg1;
/* .... */
luaT_pushudata(L, arg1, "torch.DoubleTensor");
/* effective matrix multiplication operation that will fill arg1 */
THDoubleTensor_addmm(arg1,arg2,arg3,arg4,arg5,arg6);
So:
a new result tensor is created and resized with the proper dimensions,
but this new tensor is NOT initialized, i.e. there is no calloc or explicit fill here so it points to junk memory and could contain NaN-s,
this tensor is pushed on the stack so as to be available on the Lua side as the return value.
The last point means that this returned tensor is different from the initial prod one (i.e. within the loop, prod shadows the initial value).
On the other hand calling torch.mm(prod,a,b) does use your initial prod tensor to store the results (behind the scenes there is no need to create a dedicated tensor in that case). Since in your code snippet you do not initialize / fill it with given values it could also contain junk.
In both cases the core operation is a gemm multiplication like C = beta * C + alpha * A * B, with beta=0 and alpha=1. The naive implementation looks like that:
real *a_ = a;
for(i = 0; i < m; i++)
{
real *b_ = b;
for(j = 0; j < n; j++)
{
real sum = 0;
for(l = 0; l < k; l++)
sum += a_[l*lda]*b_[l];
b_ += ldb;
/*
* WARNING: beta*c[j*ldc+i] could give NaN even if beta=0
* if the other operand c[j*ldc+i] is NaN!
*/
c[j*ldc+i] = beta*c[j*ldc+i]+alpha*sum;
}
a_++;
}
Comments are mine.
So:
with torch.mm(a,b): at each iteration, a new result tensor is created without being initialized (it could contain NaN-s). So every iteration presents a risk of returning NaN-s (see above warning),
with torch.mm(prod,a,b): there is the same risk since you do not initialized the prod tensor. BUT: this risk only exists at the first iteration of the repeat / until loop since right after prod is filled with 0-s and re-used for the subsequent iterations.
So this is why you do not observe a problem here (it is less frequent).
In case 1: this should be improved at the Torch level, i.e. make sure the wrapper initializes the output (e.g. with THDoubleTensor_fill(arg1, 0);).
In case 2: you should initialize prod initially and use the torch.mm(prod,a,b) construct to avoid any NaN problem.
--
EDIT: this problem is now fixed (see this pull request).

Moving Average across Variables in Stata

I have a panel data set for which I would like to calculate moving averages across years.
Each year is a variable for which there is an observation for each state, and I would like to create a new variable for the average of every three year period.
For example:
P1947=rmean(v1943 v1944 v1945), P1947=rmean(v1944 v1945 v1946)
I figured I should use a foreach loop with the egen command, but I'm not sure about how I should refer to the different variables within the loop.
I'd appreciate any guidance!
This data structure is quite unfit for purpose. Assuming an identifier id you need to reshape, e.g.
reshape long v, i(id) j(year)
tsset id year
Then a moving average is easy. Use tssmooth or just generate, e.g.
gen mave = (L.v + v + F.v)/3
or (better)
gen mave = 0.25 * L.v + 0.5 * v + 0.25 * F.v
More on why your data structure is quite unfit: Not only would calculation of a moving average need a loop (not necessarily involving egen), but you would be creating several new extra variables. Using those in any subsequent analysis would be somewhere between awkward and impossible.
EDIT I'll give a sample loop, while not moving from my stance that it is poor technique. I don't see a reason behind your naming convention whereby P1947 is a mean for 1943-1945; I assume that's just a typo. Let's suppose that we have data for 1913-2012. For means of 3 years, we lose one year at each end.
forval j = 1914/2011 {
local i = `j' - 1
local k = `j' + 1
gen P`j' = (v`i' + v`j' + v`k') / 3
}
That could be written more concisely, at the expense of a flurry of macros within macros. Using unequal weights is easy, as above. The only reason to use egen is that it doesn't give up if there are missings, which the above will do.
FURTHER EDIT
As a matter of completeness, note that it is easy to handle missings without resorting to egen.
The numerator
(v`i' + v`j' + v`k')
generalises to
(cond(missing(v`i'), 0, v`i') + cond(missing(v`j'), 0, v`j') + cond(missing(v`k'), 0, v`k')
and the denominator
3
generalises to
!missing(v`i') + !missing(v`j') + !missing(v`k')
If all values are missing, this reduces to 0/0, or missing. Otherwise, if any value is missing, we add 0 to the numerator and 0 to the denominator, which is the same as ignoring it. Naturally the code is tolerable as above for averages of 3 years, but either for that case or for averaging over more years, we would replace the lines above by a loop, which is what egen does.
There is a user written program that can do that very easily for you. It is called mvsumm and can be found through findit mvsumm
xtset id time
mvsumm observations, stat(mean) win(t) gen(new_variable) end

Randomly scan through every coordinate

TL;DR Randomly access every tile in a tilemap
I have a way of generating random positions of tiles by just filling an entire layer of them (its only 10x10) then running a forloop like for (int x = 0; x < 13; x++)
{
for (int y = 0; y < 11; y++)}}
Where I randomly delete tiles. I also have a cap to this, which is at about 30. The problem is that when the loop runs over, it uses up the cap to the left (Because it starts at x=0 y=0 then does x0 x1 x2 x3 ...). I tried randomly generating coordinates, but this didn't work because it doesnt go over all the coordinates.
Does anyone know a better practice for scanning over every coordinate in a map in random order?
Number your tiles 0 to n anyway you want. Create a NSMutableIndexSet, and add all to it. Use a random number generator scaled to the number of items still in the index set (actually the range of first to last), grab one, then remove it from the set. If the random number is not in the set generate a new be, etc, until you find one in the set.
I think the best practice to accomplish this is via double hashing. To find out more read this link: Double Hashing. I will try to explain it simply.
You have two auxiliary hash functions which need to be pre-computed. And a main hash function which will run in a for loop. Lets test this out (this will be pseudo code):
key = random_number() //lets get a random number and call it "key"
module = map_size_x // map size for our module
//form of hash function 1 is: h1(key) = key % module, lets compute the hash 1 for our main hash function
aux1 = key % module
//form of hash function 2 is: h2(key) = 1 + (key % module'), where module' is module smaller for a small number (lets use 1), lets compute it:
aux2 = 1 + (key % (module - 1))
//the main hash function which will generate a random permutation is in the form of: h(key, index) = (h1(key) + index*h2(key)) % module. we already have h1 and h2 so lets loop this through:
for (i = 0; i < map_size_x; i++)
{
randomElement = (aux1 + i*aux2) % module //here we have index of the random element
//DO STUFF HERE
}
To get another permutation simply change the value of key. For more info check the link.
Hope this helps. Cheers.

Implementing post / pre increment / decrement when translating to Lua

I'm writing a LSL to Lua translator, and I'm having all sorts of trouble implementing incrementing and decrementing operators. LSL has such things using the usual C like syntax (x++, x--, ++x, --x), but Lua does not. Just to avoid massive amounts of typing, I refer to these sorts of operators as "crements". In the below code, I'll use "..." to represent other parts of the expression.
... x += 1 ...
Wont work, coz Lua only has simple assignment.
... x = x + 1 ...
Wont work coz that's a statement, and Lua can't use statements in expressions. LSL can use crements in expressions.
function preIncrement(x) x = x + 1; return x; end
... preIncrement(x) ...
While it does provide the correct value in the expression, Lua is pass by value for numbers, so the original variable is not changed. If I could get this to actually change the variable, then all is good. Messing with the environment might not be such a good idea, dunno what scope x is. I think I'll investigate that next. The translator could output scope details.
Assuming the above function exists -
... x = preIncrement(x) ...
Wont work for the "it's a statement" reason.
Other solutions start to get really messy.
x = preIncrement(x)
... x ...
Works fine, except when the original LSL code is something like this -
while (doOneThing(x++))
{
doOtherThing(x);
}
Which becomes a whole can of worms. Using tables in the function -
function preIncrement(x) x[1] = x[1] + 1; return x[1]; end
temp = {x}
... preincrement(temp) ...
x = temp[1]
Is even messier, and has the same problems.
Starting to look like I might have to actually analyse the surrounding code instead of just doing simple translations to sort out what the correct way to implement any given crement will be. Anybody got any simple ideas?
I think to really do this properly you're going to have to do some more detailed analysis, and splitting of some expressions into multiple statements, although many can probably be translated pretty straight-forwardly.
Note that at least in C, you can delay post-increments/decrements to the next "sequence point", and put pre-increments/decrements before the previous sequence point; sequence points are only located in a few places: between statements, at "short-circuit operators" (&& and ||), etc. (more info here)
So it's fine to replace x = *y++ + z * f (); with { x = *y + z * f(); y = y + 1; }—the user isn't allowed to assume that y will be incremented before anything else in the statement, only that the value used in *y will be y before it's incremented. Similarly, x = *--y + z * f(); can be replaced with { y = y - 1; x = *y + z * f (); }
Lua is designed to be pretty much impervious to implementations of this sort of thing. It may be done as kind of a compiler/interpreter issue, since the interpreter can know that variables only change when a statement is executed.
There's no way to implement this kind of thing in Lua. Not in the general case. You could do it for global variables by passing a string to the increment function. But obviously it wouldn't work for locals, or for variables that are in a table that is itself global.
Lua doesn't want you to do it; it's best to find a way to work within the restriction. And that means code analysis.
Your proposed solution only will work when your Lua variables are all global. Unless this is something LSL also does, you will get trouble translating LSL programs that use variables called the same way in different places.
Lua is only able of modifying one lvalue per statement - tables being passed to functions are the only exception to this rule. You could use a local table to store all locals, and that would help you out with the pre-...-crements; they can be evaluated before the expression they are contained in is evauated. But the post-...-crements have to be evaluated later on, which is simply not possible in lua - at least not without some ugly code involving anonymous functions.
So you have one options: you must accept that some LSL statements will get translated to several Lua statements.
Say you have a LSL statement with increments like this:
f(integer x) {
integer y = x + x++;
return (y + ++y)
}
You can translate this to a Lua statement like this:
function f(x) {
local post_incremented_x = x + 1 -- extra statement 1 for post increment
local y = x + post_incremented_x
x = post_incremented_x -- extra statement 2 for post increment
local pre_incremented_y = y + 1
return y + pre_incremented_y
y = pre_incremented_y -- this line will never be executed
}
So you basically will have to add two statements per ..-crement used in your statements. For complex structures, that will mean calculating the order in which the expressions are evaluated.
For what is worth, I like with having post decrements and predecrements as individual statements in languages. But I consider it a flaw of the language when they can also be used as expressions. The syntactic sugar quickly becomes semantic diabetes.
After some research and thinking I've come up with an idea that might work.
For globals -
function preIncrement(x)
_G[x] = _G[x] + 1
return _G[x]
end
... preIncrement("x") ...
For locals and function parameters (which are locals to) I know at the time I'm parsing the crement that it is local, I can store four flags to tell me which of the four crements is being used in the variables AST structure. Then when it comes time to output the variables definition, I can output something like this -
local x;
function preIncrement_x() x = x + 1; return x; end
function postDecrement_x() local y = x; x = x - 1; return y; end
... preIncrement_x() ...
In most of your assessment of configurability to the code. You are trying to hard pass data types from one into another. And call it a 'translator'. And in all of this you miss regex and other pattern match capacities. Which are far more present in LUA than LSL. And since the LSL code is being passed into LUA. Try using them, along with other functions. Which would define the work more as a translator, than a hard pass.
Yes I know this was asked a while ago. Though, for other viewers of this topic. Never forget the environments you are working in. EVER. Use what they give you to the best ability you can.

Resources