Title might be misleading.
I have a longitudinal dataset with a dummy (dummy1) variable indicating if a condition is met in a certain year, for given category. I want this event to be taken into account for the next twenty years as well. Hence, I want to create a new dummy (dummy2), which takes the value 1 for the 19 observations following the observation where dummy1 was 1, as well as that same observation (example below).
I was trying to create a loop with lag operators, but failed to get it to work so far.
Even code that failed might be close to a good solution. Not giving code that failed means that we can't explain your mistakes. Furthermore, questions focusing on how to use software to do something are widely considered marginal or off-topic on SO.
One approach is
bysort category (year) : gen previous = year if dummy1
by category : replace previous = previous[_n-1] if missing(previous)
gen byte dummy2 = (year - previous) < 20
The trick here is to create a variable holding the last year that the dummy (indicator) was 1, and the trick in that is spelled out in How can I replace missing values with previous or following nonmissing values or within sequences?
Note that this works independently of
whether the panel identifier is numeric (it could be string here, on the evidence given)
whether you have tsset or xtset the data
what happens before the first event; for such years, previous is born missing and remains missing (however, in general, watch for problems with code at the ends of time series).
Related
This is actually part of my thesis research, where I have to run a time series analysis on pollution and economic growth of a single country.
I have data of over 144 years of the two variables with each value representing a single year. I imported, set the values as numeric and attached the dataset through the console and ran:
ts_gdp= (data=`GDP per capita, start=1871,end=2014,frequency=1, names=gdp)
I get to see all the values for the first variable and then follow up with the stl() but I get this error. Any clues why this shows up, although I have set the frequency=1, which is the number of observations for the unit of time, in this case a year? Thank you in advance!
Error in stl(GDP, s.window = "periodic") :
series is not periodic or has less than two periods
In the Internet, across many boards one can read that it is impossible to make use of MarketInfo() function in the Strategy Tester. It's the limitation of the platform.
I haven't found on the web any workaround for this. However, since the need is the mother of invention, and my need was to make USDJPY market decisions with EA that depends on the state of EURUSD market I've found the workaround ( which is good enough for me ). I use iMA() with a period of one, and M1 resolution.
iMA( "EURUSD", PERIOD_M1, 1, 0, MODE_SMA, PRICE_MEDIAN, i )
The question is: since MetaTrader is able to calculate Moving Average for another currency ( which is surely based on the actual price of the pair! ),Q1: why can't one has access to the current value : directly?
And a follow-up question:Q2: Is there any other ( more accurate ) workaround for this limitation?
REASON: The reason to that question is because of "Tick". Because a "tick" in one currency happens independently from the "tick" of another currency, it is not possible to accurately determine the Price of another currency basing on the current "Tick" of one currency. The iMA is calculated using the OHLC of the M1 candle, and not the actual "Tick" (which is not the same as "Tick" data).
Rephrase: Let's say we are on USDJPY, and this "tick" happens at 12:00:00.210 (12mid-night at the 210th millisecond). When that "tick" happens, the start() event gets triggered. In that function, we look for Bid of EURUSD. However, there is no "Tick" for EURUSD as of that time (the USDJPY and EURUSD do not "tick" at the same time), thus it is not possible to determine the exact price of EURUSD at-that-point-in-time.
There is no workaround because it is impossible to determine the price on the "Tick" level because MQL4's datetime variable is an integer and accurate only to the seconds, and the HistoryCenter>Export are OHLC only.
Your iMA() is as good as it gets.
Q1: was well served by #JosephLee + there is one more option (ref. below)
Q2: deserves a word:
Yes, there is a workaround.
While MetaTrader Terminal 4 has many weaknesses, not worth one's time to spend time here, there are also nice things one can do with it.
Some five years ago, there was a project-need to integrate a distributed-processing for MT4, so as to circumvent it's given weaknesses.
That happened. This way you can benefit from a distributed-processing framework and can have down to a few nanoseconds ( latency-wise ) exact prices of all instruments at will ( based on a remote QUOTE-stream processing ) independently of your localhost MT4.graph _Symbol
Do not hesitate to ask more
&
welcome to the
Worlds of MQL4
New-MQL4.56789 has an iClose() functionality for multi-currency
Returns Close price value for the bar of specified symbol with timeframe and shift.
double iClose( string symbol, // symbol
int timeframe, // timeframe
int shift // shift
);
Parameterssymbol[in] Symbol name. NULL means the current symbol.timeframe[in] Timeframe. It can be any of ENUM_TIMEFRAMES enumeration values. 0 means the current chart timeframe.shift[in] Index of the value taken from the indicator buffer (shift relative to the current bar the given amount of periods ago).Returned valueClose price value for the bar of specified symbol with timeframe and shift. If local history is empty (not loaded), function returns 0. To check errors, one has to call the GetLastError() function.
Be cautious on it's use in StrategyTester with a due care taken on error-handling the cases the history data is not in local database and a need to provide a remedy-handling procedures for remote-retrieval from server appears.
Print( "A first date in the history for the EURUSD on the [MT4SERVER] = ",
(datetime) SeriesInfoInteger( "EURUSD", 0, SERIES_SERVER_FIRSTDATE )
);
Similarly need to provide some measures for cases the above indicated need ERR_HISTORY_WILL_UPDATED appears while the remote-server is not online / market is closed ERR_MARKET_CLOSED / the requested date is already before the SERIES_SERVER_FIRSTDATE etc.
In a corner-case, there is always a possibility to create a special-setup with step-wise updating both local CCY_PAIR and a REMOTE_CCY_PAIR totally independent of a Broker-side equipment status.
All these are important aspects of this new MQL4 feature.
This question already has answers here:
How dangerous is it to compare floating point values?
(12 answers)
Error subtracting floating point numbers when passing through 0.0
(4 answers)
Closed 9 years ago.
I tested this on a empty project and does not happen.
As you can see the newValue becomes 2.98023e-08 when I subtract the bossPercentage value.
This happens only when bossPercentage is 0.2f and the previous value is 0.2f.
The difference should be 0.0f but I don't understand why I get 2.98023e-08 instead.
For reference, remainingBossPercentage is a property in [GameController] class defined as following:
//header
#property (readwrite, nonatomic) float remainingBossPercentage;
//.m
#synthetize remainingBossPercentage;
//init
remainingBossPercentage=1.0f;
I'd like to ask you inisght on what I may be doing that causes this error.
EDIT: I subtract 0.2f to remainingBossPercentage (for each boss enemy) and everything works fine until I reach the last enemy object that has again 0.2f and I get to the crucial point of doing 0.2f - 0.2f (screenshot below)
EDIT 2: I am greatful for all comments and answers, also the closing votes. What induced me to ask this question is the fact that newValue is 2.98023e-08. I now see that there are also comparison issues (thanks to the extremely useful QA linked by the people that voted to close the answer). What I wonder is.. why in my new test project with only 2 test variables this does not happen? (I created a HelloWorld project that substracts two floats).
I am asking this because, as one of the user suggests, is important to understand floating points without taking shourtcuts. YES, I am taking a shortcut by asking this question because I don't have time tonight to study it properly but I would like to try understanding and learning at the best I can. I will read the answers properly and dedicate my time to understand but if in the meanwhile I can I would like to add a doubt:
could it be that for memory management reasons the two projects (the test one and my actual game) beheave differently? Could the different beheaviour of the two projects somehow linked with memory being swapped in dirty areas? (e.g. the game having bigger memory usage gets swapped more and hence there may be a loss of precision?)
PS: I found out a question with exactly the same 2.98023e-08 value. What I still wonder is why this doesn't happen in the same test project (I am doing some more testing now).
Simply, floating point numbers should not be expected to be completely accurate.
Floating point numbers (as used in our usual computers) are natively in base 2, out usual number is base 10. Not all numbers in one number base can be expressed with full accuracy in another number base.
As an empale 1/3 can not be expressed with complete accurate in the base 10 number system (0.333333...) but can be in the base 3 number system.
The upshot, one needs to compare floating point numbers with a specified error range. Take the absolute value of the difference and compare that to the allowable range.
Because of this financial amounts are generally not (should not be) expressed as floating point numbers. This give rise to classes such as NSDecimalNumber.
Imagine we have some reference text on hand
Four score and seven years ago our fathers brought forth on this
continent a new nation, conceived in liberty, and dedicated to the
proposition that all men are created equal. Now we are engaged in a
great civil war, testing whether that nation, or any nation, so
conceived and so dedicated, can long endure. We are met on a great
battle-field of that war. We have come to dedicate a portion of that
field, as a final resting place for those who here gave their lives
that that nation might live. It is altogether fitting and proper that
we should do this. But, in a larger sense, we can not dedicate, we can
not consecrate, we can not hallow this ground. The brave men, living
and dead, who struggled here, have consecrated it, far above our poor
power to add or detract. The world will little note, nor long remember
what we say here, but it can never forget what they did here. It is
for us the living, rather, to be dedicated here to the unfinished work
which they who fought here have thus far so nobly advanced. It is
rather for us to be here dedicated to the great task remaining before
us—that from these honored dead we take increased devotion to that
cause for which they gave the last full measure of devotion—that we
here highly resolve that these dead shall not have died in vain—that
this nation, under God, shall have a new birth of freedom—and that
government of the people, by the people, for the people, shall not
perish from the earth.
and we receive snippets of that text back to us with no spaces or punctuation, and some characters deleted, inserted, and substituted
ieldasafinalrTstingplaceforwhofoughtheregavetheirliZesthatthatn
Using the reference text what are some tools (in any programming language) we can use to try properly space the words
ield as a final rTsting place for who fought here gave their liZes that that n
correcting errors is not necessary, just spacing
Weird problem you've got there :)
If you can't rely on capitalization for hints, just lowercase everything to start with.
Then get a dictionary of words. Maybe just a wordlist, or you could try Wordnet.
And a corpus of similar, correctly spaced, material. If suitable, download the Wikipedia dump. You'll need to clean it up and break into ngrams. 3 grams will probably suit the task. Or save yourself the time and use Google's ngram data. Either the web ngrams (paid) or the book ngrams (free-ish).
Set a max word length cap. Let's say 20chars.
Take the first char of your mystery string and look it up in the dictionary. Then take the first 2 chars and look them up. Keep doing this until you get to 20. Store all matches you get, but the longest one is probably the best. Move the starting point 1 char at a time, through your string.
You'll end up with an array of valid word matches.
Loop through this new array and pair each value up with the following value, comparing it to the original string, so that you identify all possible valid word combinations that don't overlap. You might end up with 1 output string, or several.
If you've got several, break each output string into 3-grams. Then lookup in your new ngram database to see which combinations are most frequent.
There might also be some time-saving techniques like starting with stop words, checking them in a dictionary combined with incremental letter either side, and adding spaces there first.
... or I'm over-thinging the whole issue and there's an awk one liner that someone will humble me with :)
You can do this using edit distance and finding the minimum edit distance substring of the reference. Check out my answer (PHP implementation) to a similar question here:
Longest Common Substring with wrong character tolerance
Using the shortest_edit_substring() function from the above link, you can add this to do the search after stripping out everything but letters (or whatever you want to keep in: letters, numbers, etc.) and then correctly map the result back to the original version.
// map a stripped down substring back to the original version
function map_substring($haystack_letters,$start,$length,$haystack, $regexp)
{
$r_haystack = str_split($haystack);
$r_haystack_letters = $r_haystack;
foreach($r_haystack as $k => $l)
{
if (preg_match($regexp,$l))
{
unset($r_haystack_letters[$k]);
}
}
$key_map = array_keys($r_haystack_letters);
$real_start = $key_map[$start];
$real_end = $key_map[$start+$length-1];
$real_length = $real_end - $real_start + 1;
return array($real_start,$real_length);
}
$haystack = 'Four score and seven years ago our fathers brought forth on this continent a new nation, conceived in liberty, and dedicated to the proposition that all men are created equal. Now we are engaged in a great civil war, testing whether that nation, or any nation, so conceived and so dedicated, can long endure. We are met on a great battle-field of that war. We have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this. But, in a larger sense, we can not dedicate, we can not consecrate, we can not hallow this ground. The brave men, living and dead, who struggled here, have consecrated it, far above our poor power to add or detract. The world will little note, nor long remember what we say here, but it can never forget what they did here. It is for us the living, rather, to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced. It is rather for us to be here dedicated to the great task remaining before us—that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion—that we here highly resolve that these dead shall not have died in vain—that this nation, under God, shall have a new birth of freedom—and that government of the people, by the people, for the people, shall not perish from the earth.';
$needle = 'ieldasafinalrTstingplaceforwhofoughtheregavetheirliZesthatthatn';
// strip out all non-letters
$regexp_to_strip_out = '/[^A-Za-z]/';
$haystack_letters = preg_replace($regexp_to_strip_out,'',$haystack);
list($start,$length) = shortest_edit_substring($needle,$haystack_letters);
list($real_start,$real_length) = map_substring($haystack_letters,$start,$length,$haystack,$regexp_to_strip_out);
printf("Found |%s| in |%s|, matching |%s|\n",substr($haystack,$real_start,$real_length),$haystack,$needle);
This will do the error correction as well; it's actually easier to do it than to not do it. The minimum edit distance search is pretty straightforward to implement in other languages if you want something faster than PHP.
In Lua, one would usually generate random values, and/or strings by using math.random & math.randomseed, where os.time is used for math.randomseed.
This method however has one major weakness; The returned number is always just as random as the current time, AND the interval for each random number is one second, which is way too long if one needs many random values in a very short time.
This issue is even pointed out by the Lua Users wiki: http://lua-users.org/wiki/MathLibraryTutorial, and the corresponding RandomStringS receipe: http://lua-users.org/wiki/RandomStrings.
So I've sat down and wrote a different algorithm (if it even can be called that), that generates random numbers by (mis-)using the memory addresses of tables:
math.randomseed(os.time())
function realrandom(maxlen)
local tbl = {}
local num = tonumber(string.sub(tostring(tbl), 8))
if maxlen ~= nil then
num = num % maxlen
end
return num
end
function string.random(length,pattern)
local length = length or 11
local pattern = pattern or '%a%d'
local rand = ""
local allchars = ""
for loop=0, 255 do
allchars = allchars .. string.char(loop)
end
local str=string.gsub(allchars, '[^'..pattern..']','')
while string.len(rand) ~= length do
local randidx = realrandom(string.len(str))
local randbyte = string.byte(str, randidx)
rand = rand .. string.char(randbyte)
end
return rand
end
At first, everything seems perfectly random, and I'm sure they are... at least for the current program.
So my question is, how random are these numbers returned by realrandom really?
Or is there an even better way to generate random numbers in a shorter interval than one second (which kind of implies that os.time shouldn't be used, as explaind above), without relying on external libraries, AND, if possible, in an entirely crossplatform manner?
EDIT:
There seems to be a major misunderstanding regarding the way the RNG is seeded; In production code, the call to math.randomseed() happens just once, this was just a badly chosen example here.
What I mean by the random value is only random once per second, is easily demonstrated by this paste: http://codepad.org/4cDsTpcD
As this question will get downvoted regardless my edits, I also cancelled my previously accepted answer - In hope for a better one, even if just better opinions. I understand that issues regarding random values/numbers has been discussed many times before, but I have not found such a question that could be relevant to Lua - Please keep that in mind!
You should not call seed each time you call random, you ought to call it only once, on the program initialization (unless you get the seed from somewhere, for example, to replicate some previous "random" behaviour).
Standard Lua random generator is of poor quality in the statistical sense (as it is, in fact, standard C random generator), do not use it if you care for that. Use, for example, lrandom module (available in LuaRocks).
If you need more secure random, read from /dev/random on Linux. (I think that Windows should have something along the same lines — but you may need to code something in C to use it.)
Relying on table pointer values is a bad idea. Think about alternate Lua implementations, in Java, for example — there is no telling what they would return. (Also, the pointer values may be predictable, and they may be, under certain circumstances the same each time the program is invoked.)
If you want finer precision for the seed (and you will want this only if you're launching the program more often than once per second), you should use a timer with better resolution. For example, socket.gettime() from LuaSocket. Multiply it by some value, since math.randomseed is working with integer part only, and socket.gettime() returns time in (floating point) seconds.
require 'socket'
math.randomseed(socket.gettime() * 1e6)
for i = 1, 1e3 do
print(math.random())
end
This method however has one major
weakness; The returned number is
always just as random as the current
time, AND the interval for each random
number is one second, which is way too
long if one needs many random values
in a very short time.
It has those weaknesses only if you implement it incorrectly.
math.randomseed is supposed to be called sparingly - usually just once at the beginning of your program, and it usually seeds using os.time. Once the seed is set, you can use math.random many times, and it will yield random values.
See what happens on this sample:
> math.randomseed(1)
> return math.random(), math.random(), math.random()
0.84018771715471 0.39438292681909 0.78309922375861
> math.randomseed(2)
> return math.random(), math.random(), math.random()
0.70097636929759 0.80967634907443 0.088795455214007
> math.randomseed(1)
> return math.random(), math.random(), math.random()
0.84018771715471 0.39438292681909 0.78309922375861
When I change the seed from 1 to 2, I get different random results. But when I go back to 1, the "random sequence" is reset. I obtain the same values as before.
os.time() returns an ever-increasing number. Using it as a seed is appropriate; then you can invoke math.random forever and have different random numbers every time you invoke it.
The only scenario you have to be a bit worried about non-randomness is when your program is supposed to be executed more than once per second. In that case, as the others are saying, the simplest solution is using a clock with higher definition.
In other words:
Invoke math.randomseed with an appropiate seed (os.time() is ok 99% of the cases) at the beginning of your program
Invoke math.random every time you need a random number.
Regards!
Some thoughts on the first part of your question:
So my question is, how random are these numbers returned by realrandom really?
Your function is attempting to discover the address of a table by using a quirk of its default implementation of tostring(). I don't believe that the string returned by tostring{} has a specified format, or that the value included in that string has any documented meaning. In practice, it is derived from the address of something related to the specific table, and so distinct tables convert to distinct strings. However, the next version of Lua is free to change that to anything that is convenient. Worse, the format it takes will be highly platform dependent because it appears to use the %p format specifier to sprintf() which is only specified as being a sensible representation of a pointer.
There's also a much bigger issue. While the address of the nth table created in a process might seem random on your platform, tt might not be random at all. Or it might vary in only a few bits. For example, on my win7 box only a few bits vary, and not very randomly:
C:...>for /L %i in (1,1,20) do # lua -e "print{}"
table: 0042E5D8
table: 0061E5D8
table: 0024E5D8
table: 0049E5D8
table: 0042E5D8
table: 0042E5D8
table: 0042E5D8
table: 0064E5D8
table: 0042E5D8
table: 002FE5D8
table: 0042E5D8
table: 0049E5D8
table: 0042E5D8
table: 0042E5D8
table: 0042E5D8
table: 0024E5D8
table: 0042E5D8
table: 0042E5D8
table: 0061E5D8
table: 0042E5D8
Other platforms will vary, of course. I'd even expect there to be platforms where the address of the first allocated table is completely deterministic, and hence identical on every run of the program.
In short, the address of an arbitrary object in your process image is not a very good source of randomness.
Edit: For completeness, I'd like to add a couple of other thoughts that came to mind over night.
The stock tostring() function is supplied by the base library and implemented by the function luaB_tostring(). The relevant bit is this fragment:
switch (lua_type(L, 1)) {
...
default:
lua_pushfstring(L, "%s: %p", luaL_typename(L, 1), lua_topointer(L, 1));
break;
If you really are calling this function, then the end of the string will be an address, represented by standard C sprintf() format %p, strongly related to the specific table. One observation is that I've seen several distinct implementations for %p. Windows MSVCR80.DLL (the version of the C library used by the current release of Lua for Windows) makes it equivalent to %08X. My Ubuntu Karmic Koala box appears to make it equivalent to %#x which notably drops leading zeros. If you are going to parse out that part of the string, then you should do it in a way that is more flexible in the face of variation of the meaning of %p.
Note, also, that doing anything like this in library code may expose you to a couple of surprises.
First, if the table passed to tostring() has a metatable that provides the function __tostring(), then that function will be called, and the fragment quoted above will never be executed at all. In your case, that issue cannot arise because tables have individual metatables, and you didn't accidentally apply a metatable to your local table.
Second, by the time your module loads, some other module or user-supplied code might have replaced the stock tostring() with something else. If the replacement is benign, (such as a memoization wrapper) then it likely doesn't matter to the code as written. However, this would be a source of attack, and is entirely outside the control of your module. That doesn't strike me as a good idea if the goal is some kind of improved security for your random seed material.
Third, you might not be loaded in a stock Lua interpreter at all, and the larger application (Lightroom, WoW, Wireshark, ...) may choose to replace the base library functions with their own implementations. This is a much less likely issue for tostring(), but note that the base library's print() is a frequent target for replacement or removal in alternate implementations and there are modules (Lua Lanes, for one) that break if print is not the implementation in the base library.
A few important things come to mind:
In most other languages you typically only call the random 'seed' function once at the beginning of the program or perhaps at limited times throughout its execution. You generally do not want to call it each time you generate a random number/sequence. If you call it once when the program starts you get around the "once per second" limitation. By calling it each time you may actually end up with less randomness in your results.
Your realrandom() function seems to rely on a private implementation detail of Lua. What happens in the next major release if this detail changes to always return the same number, or only even numbers, etc.... Just because it works for now is not a strong enough guarantee, especially in the case of wanting a secure RNG.
When you say "everything seems perfectly random" how are you measuring this performance? We humans are terrible at determining if a sequence is random or not and just looking at a sequence of numbers would be virtually impossible to truly tell if they were random or not. There are many ways to quantify the "randomness" of a series including frequency distribution, autocorrelation, compression, and many more far beyond my understanding.
If you are writing a true "secure PRNG" for production do not write your own! Investigate and use a library or algorithm by experts who has spent years/decades studying, designing and trying to break it. True secure random number generation is hard.
If you need more info start on the PRNG article on Wikipedia and use the references/links there as needed.