Interpolating numeric values in Stata without creating new variables - time-series

I have a longitudinal data set with recurring observations (id 1,2,3...) per year. I have thousands of variables of all types. Some rows (indicated by a variable to_interpolate == 1) need to have their numeric variables linearly interpolated (they are empty) based on values of the same id from previous and next years.
Since I can't name all variables, I created a varlist of numeric variables. Also, I do not want to recreate thousands of extra variables, so I need to replace the existing missing values.
What I did so far:
quietly ds, has(type numeric)
local varlist `r(varlist)'
sort id year
foreach var of local varlist {
by id: ipolate `var' year replace(`var') if to_interpolate==1
}
No matter what I do, I get an error message:
factor variables and time-series operators not allowed
r(101);
My questions:
How is the 'replace' even proper syntax? if not, how to replace the existing variable values instead of creating new variables?
If the error means that factors exist in my varlist - how to detect them?
If not, how to get around this?
Thanks!

As #William Lisowski underlines, there is no replace() option to `ipolate'. Whatever is not allowed by its syntax diagram is forbidden. In any case, keeping a copy of the original is surely to be commended as part of an audit trail.
sort id
quietly ds, has(type numeric)
foreach var in `r(varlist)' {
by id: ipolate `var' year, gen(`var'2)
}

Ok, this is a workaround since I can't find a way to replace values with ipolate that is feasible for thousands of variables:
quietly ds, has(type double float long int)
local varlist `r(varlist)'
sort id year
foreach var of local varlist {
quietly by id: replace `var' = (`var'[_n-1] + `var'[_n+1])/2 if to_interpolate==1
}
This is a linear interpolation, which will work for single year gaps, but not for two years in a row, but for my purposes it is enough. I will be very happy to see a better solution :)

Related

Lua: Sort table of numbers with multiple dots

I have a table of strings like this:
{
"1",
"1.5",
"3.13",
"1.2.5.7",
"2.5",
"1.3.5",
"2.2.5.7.10",
"1.17",
"1.10.5",
"2.3.14.9",
"3.5.21.9.3",
"4"
}
And would like to sort that like this:
{
"1",
"1.2.5.7",
"1.3.5",
"1.5",
"1.10.5",
"1.17",
"2.2.5.7.10",
"2.3.14.9",
"2.5",
"3.5.21.9.3",
"3.13",
"4"
}
How do I sort this in Lua? I know that table.sort() will be used, I just don't know the function (second parameter) to use for comparison.
Given your requirements, you probably want something like natural sort order. I described several possible solution as well as their impact on the results in a blog post.
The simplest solution may look like this (below), but there are 5 different solutions listed with different complexity and the results:
function alphanumsort(o)
local function padnum(d) return ("%03d%s"):format(#d, d) end
table.sort(o, function(a,b)
return tostring(a):gsub("%d+",padnum) < tostring(b):gsub("%d+",padnum) end)
return o
end
table.sort sorts ascending by default. You don't have to provide a second parameter then. As you're sorting strings Lua will compare the strings character by character. Hence you must implement a sorting function that tells Lua which comes first.
I just don't know the function (second parameter) to use for
comparison.
That's why people wrote the Lua Reference Manual
table.sort (list [, comp])
Sorts the list elements in a given order, in-place, from list1 to
list[#list]. If comp is given, then it must be a function that
receives two list elements and returns true when the first element
must come before the second in the final order, so that, after the
sort, i <= j implies not comp(list[j],list[i]). If comp is not given,
then the standard Lua operator < is used instead.
The comp function must define a consistent order; more formally, the
function must define a strict weak order. (A weak order is similar to
a total order, but it can equate different elements for comparison
purposes.)
The sort algorithm is not stable: Different elements considered equal
by the given order may have their relative positions changed by the
sort.
Think about how you would do it with pen an paper. You would compare each number segment. As soon as a segment is smaller than the other you know this number comes first.
So a solution would probably require you to get those segments for the strings, convert them to numbers so you can compare their values...

How do i remove rows based on comma-separated list of values in a Power BI parameter in Power Query?

I have a list of data with a title column (among many other columns) and I have a Power BI parameter that has, for example, a value of "a,b,c". What I want to do is loop through the parameter's values and remove any rows that begin with those characters.
For example:
Title
a
b
c
d
Should become
Title
d
This comma separated list could have one value or it could have twenty. I know that I can turn the parameter into a list by using
parameterList = Text.Split(<parameter-name>,",")
but then I am unsure how to continue to use that to filter on. For one value I would just use
#"Filtered Rows" = Table.SelectRows(#"Table", each Text.StartsWith([key], <value-to-filter-on>))
but that only allows one value.
EDIT: I may have worded my original question poorly. The comma separated values in the parameterList can be any number of characters (e.g.: a,abcd,foo,bar) and I want to see if the value in [key] starts with that string of characters.
Try using List.Contains to check whether the starting character is in the parameter list.
each List.Contains(parameterList, Text.Start([key], 1)
Edit: Since you've changed the requirement, try this:
Table.SelectRows(
#"Table",
(C) => not List.AnyTrue(
List.Transform(
parameterList,
each Text.StartsWith(C[key], _)
)
)
)
For each row, this transforms the parameterList into a list of true/false values by checking if the current key starts with each text string in the list. If any are true, then List.AnyTrue returns true and we choose not to select that row.
Since you want to filter out all the values from the parameter, you can use something like:
= Table.SelectRows(#"Changed Type", each List.Contains(Parameter1,Text.Start([Title],1))=false)
Another way to do this would be to create a custom column in the table, which has the first character of title:
= Table.AddColumn(#"Changed Type", "FirstChar", each Text.Start([Title],1))
and then use this field in the filter step:
= Table.SelectRows(#"Added Custom", each List.Contains(Parameter1,[FirstChar])=false)
I tested this with a small sample set and it seems to be running fine. You can test both and see if it helps with the performance. If you are still facing performance issues, it would probably be easier if you can share the pbix file.
This seems to work fairly well:
= List.Select(Source[Title], each Text.Contains(Parameter1,Text.Start(_,1))=false)
Replace Source with the name of your table and Parameter1 with the name of your Parameter.

Is there a way to tell `next` to start at specific key?

My understanding is that pairs(t) simply returns next, t, nil.
If I change that to next, t, someKey (where someKey is a valid key in my table) will next start at/after that key?
I tried this on the Lua Demo page:
t = { foo = "foo", bar = "bar", goo = "goo" }
for k,v in next, t, t.bar do
print(k);
end
And got varying results each time I ran the code. So specifying a starting key has an effect, unfortunately the effect seems somewhat random. Any suggestions?
Every time you run a program that traverses a Lua table the order will be different because Lua internally uses a random salt in hash tables.
This was introduced in Lua 5.2. See luai_makeseed.
From the lua documentation:
The order in which the indices are enumerated is not specified, even
for numeric indices. (To traverse a table in numeric order, use a
numerical for.)

How to get observations as a list in Stata?

Stata has r() macro for values that some commands return (return list after the command).
I need similar access to x after list x if y == 1, but list returns only r(N), not values themselves.
Is it possible to get the observations as a local or global macro to refer to it in the code?
Try levelsof command to get distinct values. It's the cat's pajamas.
One way to save values of all observations (i.e. including repeated) is with a loop:
clear
set more off
*----- exmple data -----
sysuse auto
keep rep78
list
*----- what you want -----
forvalues i = 1/`=_N' {
local myvals `myvals' `=rep78[`i']'
}
display "`myvals'"
But more importantly, why do you think you need such a thing?

Lua table C api

I know of:
http://lua-users.org/wiki/SimpleLuaApiExample
It shows me how to build up a table (key, value) pair entry by entry.
Suppose instead, I want to build a gigantic table (say something a 1000 entry table, where both key & value are strings), is there a fast way to do this in lua (rather than 4 func calls per entry:
push
key
value
rawset
What you have written is the fast way to solve this problem. Lua tables are brilliantly engineered, and fast enough that there is no need for some kind of bogus "hint" to say "I expect this table to grow to contain 1000 elements."
For string keys, you can use lua_setfield.
Unfortunately, for associative tables (string keys, non-consecutive-integer keys), no, there is not.
For array-type tables (where the regular 1...N integer indexing is being used), there are some performance-optimized functions, lua_rawgeti and lua_rawseti: http://www.lua.org/pil/27.1.html
You can use createtable to create a table that already has the required number of slots. However, after that, there is no way to do it faster other than
for(int i = 0; i < 1000; i++) {
lua_push... // key
lua_push... // value
lua_rawset(L, tableindex);
}

Resources