Please forgive me, as I have limited knowledge of scraperwiki and twitter mining.
I have the following code to scrape twitter data. However, I want to edit the code to only give me results that are geotagged for New York on a particular date (let's say, April 1, 2013). Do you know how I should do this?
###############################################################################
# Twitter srcaper for the term 'hello'.
###############################################################################
import scraperwiki
import simplejson
# retrieve a page
base_url = 'http://search.twitter.com/search.json?q='
q = 'hello'
options = '&rpp=10&page='
page = 1
while 1:
try:
url = base_url + q + options + str(page)
html = scraperwiki.scrape(url)
#print html
soup = simplejson.loads(html)
for result in soup['results']:
data = {}
data['id'] = result['id']
data['text'] = result['text']
data['from_user'] = result['from_user']
data['created_at'] = result['created_at']
# save records to the datastore
scraperwiki.datastore.save(["id"], data)
page = page + 1
except:
print str(page) + ' pages scraped'
break
In addition to q, use the query parameters geocode and until. See this page of the Twitter API documentation. Please note that you cannot use the Search API to find Tweets older than about a week.
Besides, it's easier to use urllib.urlencode() to construct your query, like for example
query_dict = {'q':'search term(s)', 'geocode':'37.781157,-122.398720,25mi', 'until':'2013-05-10'}
query = urllib.urlencode(query_dict)
response = urllib.urlopen(basic_url + query).read()
Update: Please see this example scraper that you can copy and adapt to your needs.
Related
I am collecting Data from twitter for data analysis.i need a collection of tweet contains "#depression" tag for make a dataset . it is very difficult to go to search then copy and paste.
if there are any existing code/plugin/api to get all post with username and post date? i will use it to store post,username,date on my excel dataset.
I would recommend a bit of python for this. Incidently, here seems to be a script that does something like what you want. It will get all tweets from a specified date, with a specified #tag - and print them to a CSV file. I guess you could import the file in Excel.
The script:
https://gist.github.com/vickyqian/f70e9ab3910c7c290d9d715491cde44c
I have not read it thoroughly - so read through it before invkoking. And of course replace the #tag parameter:
...
for tweet in tweepy.Cursor(api.search,q="#depression",count=100,
...
You also need to to set up the parameters:
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''
Here's the instructions on how to get those:
https://developer.twitter.com/en/docs/basics/authentication/guides/access-tokens.html
And you need to specify the fields that you are requiring in this line
print (tweet.created_at, tweet.text)
These fields are available:
text = tweet.text
language = tweet.lang
date = tweet.created_at
username = tweet.user
retweets = tweet.retweet_count
likes = tweet.favorite_count
So you could change it to this:
print (tweet.created_at, tweet.user, tweet.text)
I am building a data mining app to collect tweets using the Twitter streaming API (via tweepy) and run a suite of NLP algorithms on it. So far all I have been able to do is get the tweets to be written into an external file. Due to the volume of tweets I am going to collect is a 100 at a time (pretty small) and deployment concerns, I wish to collect these tweets to a dictionary or list for further analysis. However, I have failed in doing this. The code I have so far is given below:
import tweepy
class MyStreamListener(tweepy.StreamListener):
def __init__(self, api=None):
super(MyStreamListener, self).__init__()
self.num_tweets = 0
self.tweets = []
def on_status(self, status):
#print(status.text)
self.num_tweets += 1
self.tweets.append(status.text)
if self.num_tweets > 100:
return False
def getstreams(keyword):
CONSUMER_KEY = ''
CONSUMER_SECRET = ''
ACCESS_TOKEN = ''
ACCESS_SECRET = ''
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)
api = tweepy.API(auth, wait_on_rate_limit=True)
myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth = api.auth,listener=myStreamListener)
tweet_list = myStream.filter(track=[keyword])
return tweet_list.tweets
getstreams('Starbucks')
However when I run this, all I get is:
AttributeError: 'NoneType' object has no attribute 'tweets'
pointing to the line:
return tweet_list.tweets
I'd be grateful if anyone could answer how to overcome this issue and shed insight on how to collect n number of tweets into a list.
You can use the on_data function in your class.
def on_data(self, data):
# Converting data , which is an object, into JSON
tweet = json.loads(data)
# my_tweet is our list declared globally
my_tweet.append(tweet)
For months I've been using a url like this, from perl:
http://finance.yahoo.com/d/quotes.csv?s=$s&f=ynl1 #returns yield, name, price;
Today, 11/1/17, it suddenly returns a 999 error.
Is this a glitch, or has Yahoo terminated the service?
I get the error even if I enter the URL directly into a browser as, eg:
http://finance.yahoo.com/d/quotes.csv?s=INTC&f=ynl1
so it doesn't seem to be a 'crumb' problem.
Note: This is NOT a question which has been answered in the past!
It was working yesterday.That it happened on the first of the month is suspicious.
As noted in the other answers and elsewhere (e.g. https://stackoverflow.com/questions/47076404/currency-helper-of-yahoo-sorry-unable-to-process-request-at-this-time-erro/47096766#47096766), Yahoo has indeed ceased operation of the Yahoo Finance API. However, as a workaround, you can access a trove of financial information, in JSON format, for a given ticker symbol, by doing a HTTPS GET request to: https://finance.yahoo.com/quote/SYMBOL (e.g. https://finance.yahoo.com/quote/MSFT). If you do a GET request to the above URL, you'll see that the financial data is contained within the response in JSON format. The following python3 script shows how you can parse individual values that you may be interested in:
import requests
import json
symbol = 'MSFT'
url ='https://finance.yahoo.com/quote/' + symbol
resp = requests.get(url)
# parse the section from the html document containing the raw json data that we need
# you can write jsonstr to a file, then open the file in a web browser to browse the structure of the json data
r = str(resp.content, 'utf-8')
i1 = 0
i1 = r.find('root.App.main', i1)
i1 = r.find('{', i1)
i2 = r.find("\n", i1)
i2 = r.rfind(';', i1, i2)
jsonstr = r[i1:i2]
# load the raw json data into a python data object
data = json.loads(jsonstr)
# pull the values that we are interested in
name = data['context']['dispatcher']['stores']['QuoteSummaryStore']['price']['shortName']
price = data['context']['dispatcher']['stores']['QuoteSummaryStore']['price']['regularMarketPrice']['raw']
change = data['context']['dispatcher']['stores']['QuoteSummaryStore']['price']['regularMarketChange']['raw']
shares_outstanding = data['context']['dispatcher']['stores']['QuoteSummaryStore']['defaultKeyStatistics']['sharesOutstanding']['raw']
market_cap = data['context']['dispatcher']['stores']['QuoteSummaryStore']['summaryDetail']['marketCap']['raw']
trailing_pe = data['context']['dispatcher']['stores']['QuoteSummaryStore']['summaryDetail']['trailingPE']['raw']
earnings_per_share = data['context']['dispatcher']['stores']['QuoteSummaryStore']['defaultKeyStatistics']['trailingEps']['raw']
forward_annual_dividend_rate = data['context']['dispatcher']['stores']['QuoteSummaryStore']['summaryDetail']['dividendRate']['raw']
forward_annual_dividend_yield = data['context']['dispatcher']['stores']['QuoteSummaryStore']['summaryDetail']['dividendYield']['raw']
# print the values
print('Symbol:', symbol)
print('Name:', name)
print('Price:', price)
print('Change:', change)
print('Shares Outstanding:', shares_outstanding)
print('Market Cap:', market_cap)
print('Trailing PE:', trailing_pe)
print('Earnings Per Share:', earnings_per_share)
print('Forward Annual Dividend Rate:', forward_annual_dividend_rate)
print('Forward_annual_dividend_yield:', forward_annual_dividend_yield)
Yahoo confirmed that they terminated the service:
It has come to our attention that this service is being used in violation of the Yahoo Terms of Service. As such, the service is being discontinued. For all future markets and equities data research, please refer to finance.yahoo.com .
There is still a way to get this data by querying some APIs used by the finance.yahoo.com page. Not sure if Yahoo will be supporting it long term as the previous API was (hopefully they will).
I adapted the method used by https://github.com/pstadler/ticker.sh into the following python hack that takes a list of symbols from the command line and outputs some of the variables as a csv:
#!/usr/bin/env python
import sys
import time
import requests
if len(sys.argv) < 2:
print("missing parameters: <symbol> ...")
exit()
apiEndpoint = "https://query1.finance.yahoo.com/v7/finance/quote"
fields = [
'symbol',
'regularMarketVolume',
'regularMarketPrice',
'regularMarketDayHigh',
'regularMarketDayLow',
'regularMarketTime',
'regularMarketChangePercent']
fields = ','.join(fields)
symbols = sys.argv[1:]
symbols = ','.join(symbols)
payload = {
'lang': 'en-US',
'region': 'US',
'corsDomain': 'finance.yahoo.com',
'fields': fields,
'symbols': symbols}
r = requests.get(apiEndpoint, params=payload)
for i in r.json()['quoteResponse']['result']:
if 'regularMarketPrice' in i:
a = []
a.append(i['symbol'])
a.append(i['regularMarketPrice'])
a.append(time.strftime(
'%Y-%m-%d %H:%M:%S', time.localtime(i['regularMarketTime'])))
a.append(i['regularMarketChangePercent'])
a.append(i['regularMarketVolume'])
a.append("{0:.2f} - {1:.2f}".format(
i['regularMarketDayLow'], i['regularMarketDayHigh']))
print(",".join([str(e) for e in a]))
Sample Run:
$ ./getquotePy.py AAPL GOOGL
AAPL,174.5342,2017-11-07 17:21:28,0.1630961,19905458,173.60 - 173.60
GOOGL,1048.6753,2017-11-07 17:21:22,0.5749836,840447,1043.00 - 1043.00
var API = "https://query1.finance.yahoo.com/v7/finance/quote?symbols=AAPL";
$.getJSON(API, function (json) {...});call throws this error: No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'http://www.microplan.at/sar' is therefore not allowed access.
For months I've been using a url like this, from perl:
http://finance.yahoo.com/d/quotes.csv?s=$s&f=ynl1 #returns yield, name, price;
Today, 11/1/17, it suddenly returns a 999 error.
Is this a glitch, or has Yahoo terminated the service?
I get the error even if I enter the URL directly into a browser as, eg:
http://finance.yahoo.com/d/quotes.csv?s=INTC&f=ynl1
so it doesn't seem to be a 'crumb' problem.
Note: This is NOT a question which has been answered in the past!
It was working yesterday.That it happened on the first of the month is suspicious.
As noted in the other answers and elsewhere (e.g. https://stackoverflow.com/questions/47076404/currency-helper-of-yahoo-sorry-unable-to-process-request-at-this-time-erro/47096766#47096766), Yahoo has indeed ceased operation of the Yahoo Finance API. However, as a workaround, you can access a trove of financial information, in JSON format, for a given ticker symbol, by doing a HTTPS GET request to: https://finance.yahoo.com/quote/SYMBOL (e.g. https://finance.yahoo.com/quote/MSFT). If you do a GET request to the above URL, you'll see that the financial data is contained within the response in JSON format. The following python3 script shows how you can parse individual values that you may be interested in:
import requests
import json
symbol = 'MSFT'
url ='https://finance.yahoo.com/quote/' + symbol
resp = requests.get(url)
# parse the section from the html document containing the raw json data that we need
# you can write jsonstr to a file, then open the file in a web browser to browse the structure of the json data
r = str(resp.content, 'utf-8')
i1 = 0
i1 = r.find('root.App.main', i1)
i1 = r.find('{', i1)
i2 = r.find("\n", i1)
i2 = r.rfind(';', i1, i2)
jsonstr = r[i1:i2]
# load the raw json data into a python data object
data = json.loads(jsonstr)
# pull the values that we are interested in
name = data['context']['dispatcher']['stores']['QuoteSummaryStore']['price']['shortName']
price = data['context']['dispatcher']['stores']['QuoteSummaryStore']['price']['regularMarketPrice']['raw']
change = data['context']['dispatcher']['stores']['QuoteSummaryStore']['price']['regularMarketChange']['raw']
shares_outstanding = data['context']['dispatcher']['stores']['QuoteSummaryStore']['defaultKeyStatistics']['sharesOutstanding']['raw']
market_cap = data['context']['dispatcher']['stores']['QuoteSummaryStore']['summaryDetail']['marketCap']['raw']
trailing_pe = data['context']['dispatcher']['stores']['QuoteSummaryStore']['summaryDetail']['trailingPE']['raw']
earnings_per_share = data['context']['dispatcher']['stores']['QuoteSummaryStore']['defaultKeyStatistics']['trailingEps']['raw']
forward_annual_dividend_rate = data['context']['dispatcher']['stores']['QuoteSummaryStore']['summaryDetail']['dividendRate']['raw']
forward_annual_dividend_yield = data['context']['dispatcher']['stores']['QuoteSummaryStore']['summaryDetail']['dividendYield']['raw']
# print the values
print('Symbol:', symbol)
print('Name:', name)
print('Price:', price)
print('Change:', change)
print('Shares Outstanding:', shares_outstanding)
print('Market Cap:', market_cap)
print('Trailing PE:', trailing_pe)
print('Earnings Per Share:', earnings_per_share)
print('Forward Annual Dividend Rate:', forward_annual_dividend_rate)
print('Forward_annual_dividend_yield:', forward_annual_dividend_yield)
Yahoo confirmed that they terminated the service:
It has come to our attention that this service is being used in violation of the Yahoo Terms of Service. As such, the service is being discontinued. For all future markets and equities data research, please refer to finance.yahoo.com .
There is still a way to get this data by querying some APIs used by the finance.yahoo.com page. Not sure if Yahoo will be supporting it long term as the previous API was (hopefully they will).
I adapted the method used by https://github.com/pstadler/ticker.sh into the following python hack that takes a list of symbols from the command line and outputs some of the variables as a csv:
#!/usr/bin/env python
import sys
import time
import requests
if len(sys.argv) < 2:
print("missing parameters: <symbol> ...")
exit()
apiEndpoint = "https://query1.finance.yahoo.com/v7/finance/quote"
fields = [
'symbol',
'regularMarketVolume',
'regularMarketPrice',
'regularMarketDayHigh',
'regularMarketDayLow',
'regularMarketTime',
'regularMarketChangePercent']
fields = ','.join(fields)
symbols = sys.argv[1:]
symbols = ','.join(symbols)
payload = {
'lang': 'en-US',
'region': 'US',
'corsDomain': 'finance.yahoo.com',
'fields': fields,
'symbols': symbols}
r = requests.get(apiEndpoint, params=payload)
for i in r.json()['quoteResponse']['result']:
if 'regularMarketPrice' in i:
a = []
a.append(i['symbol'])
a.append(i['regularMarketPrice'])
a.append(time.strftime(
'%Y-%m-%d %H:%M:%S', time.localtime(i['regularMarketTime'])))
a.append(i['regularMarketChangePercent'])
a.append(i['regularMarketVolume'])
a.append("{0:.2f} - {1:.2f}".format(
i['regularMarketDayLow'], i['regularMarketDayHigh']))
print(",".join([str(e) for e in a]))
Sample Run:
$ ./getquotePy.py AAPL GOOGL
AAPL,174.5342,2017-11-07 17:21:28,0.1630961,19905458,173.60 - 173.60
GOOGL,1048.6753,2017-11-07 17:21:22,0.5749836,840447,1043.00 - 1043.00
var API = "https://query1.finance.yahoo.com/v7/finance/quote?symbols=AAPL";
$.getJSON(API, function (json) {...});call throws this error: No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'http://www.microplan.at/sar' is therefore not allowed access.
How do I query the contents of a specific collection using the Python client for Google Docs API?
This is how far I've come:
client = gdata.docs.service.DocsService()
client.ClientLogin('myuser', 'mypassword')
FOLDER_FEED1 = "/feeds/documents/private/full/-/folder"
FOLDER_FEED2 = "/feeds/default/private/full/folder%3A"
feed = client.Query(uri=FOLDER_FEED1 + "?title=MyFolder&title-exact=true")
full_id = feed.entry[0].resourceId.text
(res_type, res_id) = full_id.split(":")
feed = client.Query(uri=FOLDER_FEED2 + res_id + "/contents")
for entry in feed.entry:.
print entry.title.text
The first call to Client.Query succeeds and seems to provide a valid resource ID. The second call, however, returns:
{'status': 400, 'body': 'Invalid request URI', 'reason': 'Bad Request'}
How can I correct this to get it working?
It is much easier once you have a folder entry, to call client.GetResources(entry.content.src) rather than generating the URI by yourself and using a Query.
In your case, client.GetResources(feed.entry[0].content.src).