Finding trending topics from a stream of data - keyword

Finding single word trend is simple you can chunk each word of the data stream and do a count and limit it by last 24 hrs or 48 hrs. I'm not sure how to find trends of 2 word or 3 word combination?
Any help is apprciated

So you've got something - for the single-word case - that says something along the lines of:
while (true)
word = readNextWord()
register(word, now)
discardWordsOlderThan (now - windowSize)
Just keep track of the previous word:
while (true)
word = readNextWord()
register(prev + " " + word, now)
prev = word
discardWordsOlderThan (now - windowSize)

Related

Google Sheets: Get Daily Yield From Compounded Annual Percentage Yield (APY)

I'm looking for a function that is almost the opposite of FV().
In cryptocurrency tokens, returns are sometimes quoted as a compounded Annual Percentage Yield (APY). These tokens can make payments in periods which are daily, or even each hour, or each 8 hours, etc.
So I'd like to work out the yield per period, from the compounded APY.
I've looked through the financial functions at Google Sheets > Financial but most of these are way over my head.
Any suggestions would be most welcome!
[Edit] I've tried using FV(), by using 365 periods per year, and (say) $100 for current value, seeing what the outcome is to get an APY - but I have to keep modifying that daily rate until I get close to the APY that's quoted. In other words, I'm trying to do it backwards. Must be a function that can do this though?
After mulling over this for some time, a moment of clarity yielded the surprisingly simple answer.
Given:
p (periods) = 365*3 (ie, each 8 hours, for a year)
= 1095
r (rate) = 1.8%
Then:
APY = (1 + r) ^ p
= (1 + 1.8%) ^ 1095
= 30,466,103,336.2661%
So to get the Rate from the APY it becomes:
r = APY ^ 1/p - 1
= 30,466,103,336.2661% ^ 1/1095 - 1
= 1.8%

iMA not displaying correctly in backtesting

I developed and started testing an EA in MQL4 that uses iMA function. Basically the program compares the iMA value of the current candle with the iMA value of the previous candle. When I test the EA using the Strategy Tester (Every Tick) my EA is not opening and closing trades correctly. What I mean is the trade does not open on the correct candle. Upon further investigating I noticed, on the current candle the value for iMA in the data window and chart are the same, but they difer from the 'Print' value. The value for the previous candle is correct. When I did a Google search I found that someone in 2008 reported this exact same issue. In 2008 there didn't appear to be a solution.
Now that we are in a new decade, I'm wondering if there is a solution?
Does anyone know if iMA works in MQL5 Strategy Tester?
double MAEMACurrent = iMA(NULL,0,3,0,MODE_EMA,PRICE_CLOSE,0);
double MAEMAPrevious = iMA(NULL,0,3,0,MODE_EMA,PRICE_CLOSE,1);
double MASlowEMACurrent = iMA(NULL,0,10,0,MODE_EMA,PRICE_CLOSE,0);
double MASlowEMAPrevious = iMA(NULL,0,10,0,MODE_EMA,PRICE_CLOSE,1);
Print("MAEMACurrent " + MAEMACurrent + " MAEMAPrevious " + MAEMAPrevious + " MASlowEMACurrent " + MASlowEMACurrent + " MASlowEMAPrevious " + MASlowEMAPrevious);
Chart & Data Window:
MAEMACurrent: 1.95552
MAEMAPrevious: 1.95572
MASlowEMACurrent: 1.95201
MASlowEMAPrevious: 1.95097
Print Value:
MAEMACurrent: 1.95538
MAEMAPrevious: 1.95572
MASlowEMACurrent: 1.951086
MASlowEMAPrevious: 1.950972
As you can see from the above example the 'Chart & Data Window' values for MAEMACurrent and the MASlowEMACurrent do not match 'Print Value'.
This is the first time that I'm asking a question, so if I've missed something or I am not following the correct protocol for asking a question please let me know.
First, always use the "NormalizeDouble" function to round the values to a proper number of fractions. In your case, if there are just 5 digits after fraction, use the following code to round the values of "MASlowEMACurrent" and "MASlowEMAPrevious" to 5 digits:
double dNormalizedValue = NormalizeDouble(MASlowEMACurrent, 5);
In addition, never compare the value of the in-progress candle on the chart with the values which are returned by the indicator or price functions like (iMA, iClose, etc). Please note that even a very slight time difference could cause differences in the two values. For other candles (in your case the previous candle), since they all have been closed and there are no changes in progress so you can compare the values on the chart with the values returned by the functions. So, iMA is working as expected.

Lua Line Wrapping excluding certain characters

I've located a code that I want to use when I'm writing notes on a MUD I play. Lines can only be 79 characters long for each note, so it's a hassle sometimes to write a note unless you're counting characters. The code is below:
function wrap(str, limit, indent, indent1)
indent = indent or ""
indent1 = indent1 or indent
limit = limit or 79
local here = 1-#indent1
return indent1..str:gsub("(%s+)()(%S+)()",
function(sp, st, word, fi)
if fi-here > limit then
here = st - #indent
return "\n"..indent..word
end
end)
end
This would work great; I can type a 300 character line and it will format it to 79 characters, respecting full words.
The problem I'm having, and I cannot seem to figure out how to solve, is that sometimes, I want to add colour codes to the line, and colour codes are not counted against word count. For example:
#GThis is a colour-coded #Yline that should #Bbreak off at 79 #Mcharacters, but ignore #Rthe colour codes (#G, #Y, #B, #M, #R, etc) when doing so.
Essentially, it would strip the colour codes away and break the line appropriately, but without losing the colour codes.
Edited to include what it should check, and what the final output should be.
The function would only check the string below for line breaks:
This is a colour-coded line that should break off at 79 characters, but ignore the colour codes (, , , , , etc) when doing so.
but would actually return:
#GThis is a colour-coded #Yline that should #Bbreak off at 79 #Ncharacters, but ignore
the colour codes (#G, #Y, #B, #M, #R, etc) when doing so.
To complicate things, we also have xterm colour codes, which are similar, but look like this:
#x123
It is always #x followed by a 3-digit number. And lastly, to further complicate things, I don't want it to strip out purpose colour codes (which would be ##R, ##x123, etc.).
Is there a clean way of doing this that I'm missing?
function(sp, st, word, fi)
local delta = 0
word:gsub('#([#%a])',
function(c)
if c == '#' then delta = delta + 1
elseif c == 'x' then delta = delta + 5
else delta = delta + 2
end
end)
here = here + delta
if fi-here > limit then
here = st - #indent + delta
return "\n"..indent..word
end
end

Moving Average across Variables in Stata

I have a panel data set for which I would like to calculate moving averages across years.
Each year is a variable for which there is an observation for each state, and I would like to create a new variable for the average of every three year period.
For example:
P1947=rmean(v1943 v1944 v1945), P1947=rmean(v1944 v1945 v1946)
I figured I should use a foreach loop with the egen command, but I'm not sure about how I should refer to the different variables within the loop.
I'd appreciate any guidance!
This data structure is quite unfit for purpose. Assuming an identifier id you need to reshape, e.g.
reshape long v, i(id) j(year)
tsset id year
Then a moving average is easy. Use tssmooth or just generate, e.g.
gen mave = (L.v + v + F.v)/3
or (better)
gen mave = 0.25 * L.v + 0.5 * v + 0.25 * F.v
More on why your data structure is quite unfit: Not only would calculation of a moving average need a loop (not necessarily involving egen), but you would be creating several new extra variables. Using those in any subsequent analysis would be somewhere between awkward and impossible.
EDIT I'll give a sample loop, while not moving from my stance that it is poor technique. I don't see a reason behind your naming convention whereby P1947 is a mean for 1943-1945; I assume that's just a typo. Let's suppose that we have data for 1913-2012. For means of 3 years, we lose one year at each end.
forval j = 1914/2011 {
local i = `j' - 1
local k = `j' + 1
gen P`j' = (v`i' + v`j' + v`k') / 3
}
That could be written more concisely, at the expense of a flurry of macros within macros. Using unequal weights is easy, as above. The only reason to use egen is that it doesn't give up if there are missings, which the above will do.
FURTHER EDIT
As a matter of completeness, note that it is easy to handle missings without resorting to egen.
The numerator
(v`i' + v`j' + v`k')
generalises to
(cond(missing(v`i'), 0, v`i') + cond(missing(v`j'), 0, v`j') + cond(missing(v`k'), 0, v`k')
and the denominator
3
generalises to
!missing(v`i') + !missing(v`j') + !missing(v`k')
If all values are missing, this reduces to 0/0, or missing. Otherwise, if any value is missing, we add 0 to the numerator and 0 to the denominator, which is the same as ignoring it. Naturally the code is tolerable as above for averages of 3 years, but either for that case or for averaging over more years, we would replace the lines above by a loop, which is what egen does.
There is a user written program that can do that very easily for you. It is called mvsumm and can be found through findit mvsumm
xtset id time
mvsumm observations, stat(mean) win(t) gen(new_variable) end

How can I generate a half second pause in TwiML?

I'm trying to use Twilio's <Say> verb to pronounce a sequence of digits clearly. I'm finding it is hard to generate a natural (half-second) pause between each digit. How do I do this correctly?
The <Pause> xml command only takes integer values for seconds, so it's too long to use.
From here: Link
When saying numbers, '12345' will be spoken as "twelve thousand three
hundred forty five." Whereas '1 2 3 4 5' will be spoken as "one two
three four five."
Punctuation such as commas and periods will be interpreted as natural
pauses by the speech engine.
If you want to insert a long pause try using the <Pause> verb. <Pause> should be placed outside <Say> tags, not nested inside them.
For less then one second pause:
<Say language="en-US" voice="alice">
Your verification code is 1,,,,2,,,,3,,,,4,,,,5
</Say>
You can increase and decrease the number of commas according to you convenience.
This is tangentially related, but I figured people looking for something similar would end up on this question (like I did).
I wanted the Say verb to read a US phone number in a natural 3-3-4 cadence. Here is some C# that does just that. I'm sure you can figure out how to translate it to other languages:
private static string SayNaturalNumber(string digits)
{
var newNumber = "";
for (int i = 0; i < digits.Length; i++)
{
if (i == 0)
newNumber += digits[i];
else
newNumber += " " + digits[i];
if (i == 2) //after third digit
newNumber += ",,,,";
if (i == 5) //after sixth digit
newNumber += ",,,,";
}
return newNumber;
}

Resources