I am building a social analytics app. I need to find the number of tweets containing a certain word in the past one hour (time range may vary).
How do I do this?
I tried these:
1. using until param - But it doesnt support time and heard its unreliable
2. Fetching max possible tweets and filtering by date and time - computationally intensive
how should I proceed?
Assuming you are using an sql database and that the title of the tweet should contain a certain word, this should do the trick.
Tweet.where('created_at >= ? AND title LIKE ?', 1.hour.ago, '%some_word%')
Related
My original data looks like this:
Original Data
I'm looking for a way for my query to return information ONLY for the latest date associated with each Test. For that date, I am looking to get the count number of customers and the $ Paid total. What's complicating my effort is the fact that multiple people could take the Test at a given date and across dates.
The ideal results should look like something like this:
Ideal Results
I am getting information submitted into this table via Google Forms in real-time hence row range will be dynamic & need a solution that can give me the info I am looking for at any given time.
Here is the one that came the closest for me (although still far off as it does not show the Count or the Total $):
=ARRAYFORMULA(VLOOKUP(QUERY({ROW(A2:A),SORT(A2:D)}, "SELECT MAX(Col1) WHERE Col3 IS NOT NULL GROUP BY Col3 LABEL MAX(Col1)''",0),{ROW(A2:A), SORT(A2:D)},{2,3,4,5},0))
Spreadsheet link for Original data and the results of the above query:
Google Spreadsheet with Original Data
I would really appreciate any insights or help from anybody.
Possible solution (based on your document)
https://docs.google.com/spreadsheets/d/1tADqNS4YtdDrMjToCNy2auKB_eNuMdMEA4ahmYJi_3M/edit?usp=sharing
Details:
Lastest date for a test could be found using FILTER MAX functions
Count for a test (at lastest date) could be found using COUNTIFS
Sum for a test (at lastest date) could be found using SUMIFS
So my main goal is to build a graph that charts how much data I've went through in a month in grafana (I'm on a comcast line).
Month is not a time period however that influxdb's GROUP BY time() function supports. The documentation I looked at. From here it looks like the longest time period is a week, likely because it doesn't change like a month does.
However I noticed that all my time stamps use the same format (it would be weird if they didn't I guess). I know that influxdb supports regex FROM and WHERE statements, but does it support GROUP BY? If it did, I could use something like "/-([^-]+)-/" to query timestamps like 2016-12-18T08:25:50Z and group by that? Or does influxdb support nested queries?
edit: looks like I was looking at .9. I edited to 1.1 but didn't edit anything about my question.
GROUP BY time() queries group query results by a user-specified time interval
I'm not really sure how that would work for a month considering that my length of time changes?
No version of InfluxDB supports GROUP BY <regular expression>, but as on 1.2 subqueries will be supported.
I am trying to create a yahoo pipe that takes ideally takes all tweets tweeted at any point in time and filters down by a number of attributes to then display a filtered feed.
Basically in order this is what I want to happen:
Get a feed of all tweets at any one time.
Filter tweets by geolocation origin, i.e. UK,
Filter by a number of of different combinations of keywords.
Output as an RSS feed (though this isn't really the crucial stage as Yahoo Pipes takes care of this anyway)
Disclaimer: of course I understand that there are limits to the amount of tweets that could come through etc but I would like to cast the input net as wide as possible.
I have managed to get stages 3 & 4 working correctly and for the time being I am not really worrying about step 2 (although if you have any suggestions I am all ears), but stages 1 is where I am struggling. What I have attempted is using a Fetch Feed module with the URL - http://search.twitter.com/search.atom?q=lang:en - however it seems that this only pulls 15 tweets. Is there any way that I can pull more than 15 tweets every time the pipe is run, otherwise I think this may all be in vain.
FYI, here is the link to the pipe as it stands - http://pipes.yahoo.com/ludus247/182ef4a83885698428d57865da5cf85b
Thanks in advance!
I would like to get the adjusted price (adjusting for splits and dividends) for a group of stock symbols using Yahoo! Finance. It looks like the historical prices call is limited to one symbol at a time. Could please let me know if there is a way to get multiple symbols in one call?
I would like to get this data so I can do some back testing on that data. Since I may require quite a few symbols (say 500-1000), it will be easier if I can make just a few batch calls to Yahoo!'s servers instead of making one call per symbol everyday.
Another way of getting the adjusted price is to use their daily stock price api and adjust it manually using dividend and splits information (they allow multiple symbols for their daily stock quotes). Unfortunately I cannot find any way to get splits information from the http call (guessing based on 50% or 200% is one option but if you deal with penny stocks, this can be dangerous and cannot figure out uneven splits). Also, the dividend information returned by it is not easy to decode. They seem to be returning the total over 4 quarters and the dividend date doesn't really correspond with the actual dividend date based on the historical price. The various options for the call can be found here: http://www.gummy-stuff.org/Yahoo-data.htm
Any suggestions on getting adjusted price for multiple symbols? Or Am I unnecessarily worrying about making 100s of calls to Yahoo! everyday? Ideally I would like to download all the required data within a couple of hours each day - that would be 10-20 calls per minute. Is that too much? I couldn't find any documentation on the permissible number of requests per second.
I am open to other places where I can get similar data. However, since I am just trying to learn the basics of quant trading and not trade, I would prefer free downloads.
Thanks
-e
This is an old question, but I did find a source where split data is available. Not sure how comprehensive these announcements are though:
http://biz.yahoo.com/c/09/s1.html
In the url, the "09" part is the year (2009), and the "s1" part is the month (s1 = Jan, s2 = Feb., s3 = Mar., etc.)
It isn't a nice clean CSV, but the format of the page is consistent and should be parseable. Just make a query each day for the current month, parse the page, and process any splits that you didn't see the day before.
ETA: And another source (probably less reliable than Yahoo, but can be queried by ticker):
http://getsplithistory.com/
I am not sure which language you are using but I have a sample in C#. I think it will give you the idea at least or may be help some one else
private string BASE_URL = "http://query.yahooapis.com/v1/public/yql?q=" + "select%20*%20from%20yahoo.finance.quotes%20where%20symbol%20in%20({0})" + "&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys";
Collection<Quote> quotes;
string symbolList = String.Join("%2C", quotes.Select(w => "%22" + w.Symbol + "%22").ToArray());
string url = string.Format(BASE_URL,symbolList);
XDocument doc = XDocument.Load(url);
Parse(quotes,doc);
What we are doing here is appending "," to each array item then passing that symbol list to yahoo. I have successfully fetched prices for 700 symbols in each call. Hitting yahoo servers for each ticker is a pain. I fetch stock prices for all of 6500+ tickers everyday. Earlier it use to take 3 hours now it is less than 2 mins.....sweet
Source link for that code is here - http://www.jarloo.com/get-yahoo-finance-api-data-via-yql/
P.S. Please get a api key to work smoothly. The above url is a public link where tables are timed out most of the time. Once you get an api key then your url will be (minus "public")
http://query.yahooapis.com/v1/yql
I have a lot of logfile data that I want to display dynamic graphs from, for basically arbitrary time periods, optionally filtered or aggregated by different columns (that I could pregenerate). I'm wondering about the best way to store the data in a database and access it for displaying charts, when:
the time resolution should be variable from one second to a year
there are entries that span several 'time buckets', e.g. a connection might have been open for a few days and I want to count and display the user for every hour she was connected, not just in the hour 'slot' the connection was created or finished
Are there best practices, or tools/plugins for rails that help handle this kind and amount of data? Are there maybe database engines specifically tailored towards this, or having helpful functions (e.g. CouchDB indexes)?
EDIT: I'm looking for a scalable way to handle this data and access pattern. Things we considered: Run a query for each bucket, merge in app - probably way too slow. GROUP BY timestamp/granularity - does not count connections correctly. Preprocessing data into rows by smallest granularity and downsampling on query - probably the best way.
I think you can use mysql timestamps for this.
The way I solved it in the end was to pre-process the data into per-minute buckets, so there's one row for every event and minute. That makes it easy and fast enough to select and yields correct results. To get different granularity, you can do integer arithmetic on the timestamp columns - select abs(timestamp/factor)*factor and group by abs(timestamp/factor)*factor.