Who can comment the following:
import json, urllib
url = "funnyfurniture.net/p/10/oops-chair/"
url2 = "http://funnyfurniture.net/p/10/oops-chair/"
tw_url = "http://urls.api.twitter.com/1/urls/count.json?url=%s" %url
tw_url2 = "http://urls.api.twitter.com/1/urls/count.json?url=%s" %url2
js2 = json.load(urllib.urlopen(tw_url))
js = json.load(urllib.urlopen(tw_url2))
print js2, js
Gives
{u'count': 0, u'url': u'http://funnyfurniture.net/p/10/oops-chair/'} {u'count': 1, u'url': u'http://funnyfurniture.net/p/10/oops-chair/'}
What is the difference??
The Twitter API normalizes urls, so when you pass in cnn.com it converts it to http://cnn.com automatically:
% curl 'http://urls.api.twitter.com/1/urls/count.json?url=foo'
{"count":0,"url":"http://foo/"}
The discrepancy in the counts you saw may have been a temporary bug on Twitter's side, e.g. calculating the counts before normalizing the urls.
Since Twitter infrastructure in cloud, it might be replication data "error". Not being synchronized well in all nodes.
Related
We got to build our own reporting database for our Youtube channel to measure the channel and video performance.
To support this, we implemented an ETL job to extract using Youtube Analytics API and used below python code to get the data.
def GetAnalyticsData(extractDate,accessToken, channelId):
channelId = 'channel%3D%3D{0}'.format(channelId)
headers = {'Authorization': 'Bearer {}'.format(accessToken),
'accept': 'application/json'}
url = 'https://youtubeanalytics.googleapis.com/v2/reports?dimensions={dimensions}&endDate={enddate}&ids={ids}&maxResults={maxresults}&metrics={metrics}&startDate={startdate}&alt={alt}&sort={sort}'.format(
dimensions='video',
ids=channelId,
enddate= extractDate,
startdate=extractDate,
metrics = 'views%2Ccomments%2Clikes%2Cdislikes%2Cshares%2CestimatedMinutesWatched%2CsubscribersGained%2CsubscribersLost%2CannotationClicks%2CannotationClickThroughRate%2CaverageViewDuration%2CaverageViewPercentage%2CannotationCloseRate%2CannotationImpressions%2CannotationClickableImpressions%2CannotationClosableImpressions%2CannotationCloses',
maxresults = 200,
alt ='json',
sort='-views'
)
return requests.get(url,headers=headers)
We hit this API everyday and get all the video metric and sorted by views in descending order.
This solved our need partially and it returns only 200 videos, if we specify maxResults more than 200, its return 400 error code.
The challenge is, how to get all videos for the given date and given channel?
Thanks in advance.
Regards,
Guna
I am not keen on YouTube analytics API, but it seems that you are looking for startIndex.
startIndex
integer
The 1-based index of the first entity to retrieve. (The default value is 1.) Use this parameter as a pagination mechanism along with the max-results parameter.
I'm working on a website to load multiple youtube channels live streams. At first i was trying to figure out a way to do this without utilizing youtube's api but have decided to give in.
To find whether a channel is live streaming and to get the live stream links I've been using:
https://www.googleapis.com/youtube/v3/search?part=snippet&channelId={CHANNEL_ID}&eventType=live&maxResults=10&type=video&key={API_KEY}
However with the minimum quota being 10000 and each search being worth 100, Im only able to do about 100 searches before I exceed my quota limit which doesn't help at all. I ended up exceeding the quota limit in about 10 minutes. :(
Does anyone know of a better way to figure out if a channel is currently live streaming and what the live stream links are, using as minimal quota points as possible?
I want to reload youtube data for each user every 3 minutes, save it into a database, and display the information using my own api to save server resources as well as quota points.
Hopefully someone has a good solution to this problem!
If nothing can be done about links just determining if the user is live without using 100 quota points each time would be a big help.
Since the question only specified that Search API quotas should not be used in finding out if the channel is streaming, I thought I would share a sort of work-around method. It might require a bit more work than a simple API call, but it reduces API quota use to practically nothing:
I used a simple Perl GET request to retrieve a Youtube channel's main page. Several unique elements are found in the HTML of a channel page that is streaming live:
The number of live viewers tag, e.g. <li>753 watching</li>. The LIVE NOW
badge tag: <span class="yt-badge yt-badge-live" >Live now</span>.
To ascertain whether a channel is currently streaming live requires a simple match to see if the unique HTML tag is contained in the GET request results. Something like: if ($get_results =~ /$unique_html/) (Perl). Then, an API call can be made only to a channel ID that is actually streaming, in order to obtain the video ID of the stream.
The advantage of this is that you already know the channel is streaming, instead of using thousands of quota points to find out. My test script successfully identifies whether a channel is streaming, by looking in the HTML code for: <span class="yt-badge yt-badge-live" > (note the weird extra spaces in the code from Youtube).
I don't know what language OP is using, or I would help with a basic GET request in that language. I used Perl, and included browser headers, User Agent and cookies, to look like a normal computer visit.
Youtube's robots.txt doesn't seem to forbid crawling a channel's main page, only the community page of a channel.
Let me know what you think about the pros and cons of this method, and please comment with what might be improved rather than disliking if you find a flaw. Thanks, happy coding!
2020 UPDATE
The yt-badge-live seems to have been deprecated, it no longer reliably shows whether the channel is streaming. Instead, I now check the HTML for this string:
{"text":" watching"}
If I get a match, it means the page is streaming. (Non-streaming channels don't contain this string.) Again, note the weird extra whitespace. I also escape all the quotation marks since I'm using Perl.
Here are my two suggestions:
Check my answer where I explain how you can check how retrieve videos from channels who are livesrteaming.
Another option could be use the following URL and somehow make request(s) each time for check if there's a livestreaming.
https://www.youtube.com/channel/<CHANNEL_ID>/live
Where CHANNEL_ID is the channel id you want check if that channel is livestreaming1.
1 Just notice that maybe the URL wont work in all channels (and that depends of the channel itself).
For example, if you check the channel_id UC7_YxT-KID8kRbqZo7MyscQ - link to this channel livestreaming - https://www.youtube.com/channel/UC4nprx9Vd84-ly7N-1Ce6Og/live, this channel will show if he is livestreaming, but, with his channel id UC4nprx9Vd84-ly7N-1Ce6Og - link to this channel livestreaming -, it will show his main page instead.
Adding to the answer by Bman70, I tried eliminating the need of making a costly search request after knowing that the channel is streaming live. I did this using two indicators in the HTML response from channels page who are streaming live.
function findLiveStreamVideoId(channelId, cb){
$.ajax({
url: 'https://www.youtube.com/channel/'+channelId,
type: "GET",
headers: {
'Access-Control-Allow-Origin': '*',
'Accept-Language': 'en-US, en;q=0.5'
}}).done(function(resp) {
//one method to find live video
let n = resp.search(/\{"videoId[\sA-Za-z0-9:"\{\}\]\[,\-_]+BADGE_STYLE_TYPE_LIVE_NOW/i);
//If found
if(n>=0){
let videoId = resp.slice(n+1, resp.indexOf("}",n)-1).split("\":\"")[1]
return cb(videoId);
}
//If not found, then try another method to find live video
n = resp.search(/https:\/\/i.ytimg.com\/vi\/[A-Za-z0-9\-_]+\/hqdefault_live.jpg/i);
if (n >= 0){
let videoId = resp.slice(n,resp.indexOf(".jpg",n)-1).split("/")[4]
return cb(videoId);
}
//No streams found
return cb(null, "No live streams found");
}).fail(function() {
return cb(null, "CORS Request blocked");
});
}
However, there's a tradeoff. This method confuses a recently ended stream with currently live streams. A workaround for this issue is to get status of the videoId returned from Youtube API (costs a single unit from your quota).
I found youtube API to be very restrictive given the cost of search operation. Apparently the accepted answer did not work for me as I found the string on non live streams as well. Web scraping with aiohttp and beautifulsoup was not an option since the better indicators required javascript support. Hence I turned to selenium. I looked for the css selector
#info-text
and then search for the string Started streaming or with watching now in it.
To reduce load on my tiny server that would have otherwise required lot more resources, I moved this test of functionality to a heroku dyno with a small flask app.
# import flask dependencies
import os
from flask import Flask, request, make_response, jsonify
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
base = "https://www.youtube.com/watch?v={0}"
delay = 3
# initialize the flask app
app = Flask(__name__)
# default route
#app.route("/")
def index():
return "Hello World!"
# create a route for webhook
#app.route("/islive", methods=["GET", "POST"])
def is_live():
chrome_options = Options()
chrome_options.binary_location = os.environ.get('GOOGLE_CHROME_BIN')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--remote-debugging-port=9222')
driver = webdriver.Chrome(executable_path=os.environ.get('CHROMEDRIVER_PATH'), chrome_options=chrome_options)
url = request.args.get("url")
if "youtube.com" in url:
video_id = url.split("?v=")[-1]
else:
video_id = url
url = base.format(url)
print(url)
response = { "url": url, "is_live": False, "ok": False, "video_id": video_id }
driver.get(url)
try:
element = WebDriverWait(driver, delay).until(EC.presence_of_element_located((By.CSS_SELECTOR, "#info-text")))
result = element.text.lower().find("Started streaming".lower())
if result != -1:
response["is_live"] = True
else:
result = element.text.lower().find("watching now".lower())
if result != -1:
response["is_live"] = True
response["ok"] = True
return jsonify(response)
except Exception as e:
print(e)
return jsonify(response)
finally:
driver.close()
# run the app
if __name__ == "__main__":
app.run()
You'll however need to add the following buildpacks in settings
https://github.com/heroku/heroku-buildpack-google-chrome
https://github.com/heroku/heroku-buildpack-chromedriver
https://github.com/heroku/heroku-buildpack-python
Set the following Config Vars in settings
CHROMEDRIVER_PATH=/app/.chromedriver/bin/chromedriver
GOOGLE_CHROME_BIN=/app/.apt/usr/bin/google-chrome
You can find supported python runtime here but anything below python 3.9 should be good since selenium had problems with improper use of is operator
I hope youtube will provide better alternatives than workarounds.
I know this is a old thread, but i thought i share my way of checking to for example grab the status code to use in an app.
This is for a single Channel, but you could easly do a foreach with it.
<?php
#####
$ytchannelID = "UCd0BTXriKLvOs1ANx3puZ3Q";
#####
$ytliveurl = "https://www.youtube.com/channel/".$ytchannelID."/live";
$ytchannelLIVE = '{"text":" watching now"}';
$contents = file_get_contents($ytliveurl);
if ( strpos($contents, $ytchannelLIVE) !== false ){http_response_code(200);} else {http_response_code(201);}
unset($ytliveurl);
?>
Adding onto the other answers here, I use a GET request to https://www.youtube.com/c/<CHANNEL_NAME>/live and then search for "isLive":true (rather than {"text":" watching"})
For months I've been using a url like this, from perl:
http://finance.yahoo.com/d/quotes.csv?s=$s&f=ynl1 #returns yield, name, price;
Today, 11/1/17, it suddenly returns a 999 error.
Is this a glitch, or has Yahoo terminated the service?
I get the error even if I enter the URL directly into a browser as, eg:
http://finance.yahoo.com/d/quotes.csv?s=INTC&f=ynl1
so it doesn't seem to be a 'crumb' problem.
Note: This is NOT a question which has been answered in the past!
It was working yesterday.That it happened on the first of the month is suspicious.
As noted in the other answers and elsewhere (e.g. https://stackoverflow.com/questions/47076404/currency-helper-of-yahoo-sorry-unable-to-process-request-at-this-time-erro/47096766#47096766), Yahoo has indeed ceased operation of the Yahoo Finance API. However, as a workaround, you can access a trove of financial information, in JSON format, for a given ticker symbol, by doing a HTTPS GET request to: https://finance.yahoo.com/quote/SYMBOL (e.g. https://finance.yahoo.com/quote/MSFT). If you do a GET request to the above URL, you'll see that the financial data is contained within the response in JSON format. The following python3 script shows how you can parse individual values that you may be interested in:
import requests
import json
symbol = 'MSFT'
url ='https://finance.yahoo.com/quote/' + symbol
resp = requests.get(url)
# parse the section from the html document containing the raw json data that we need
# you can write jsonstr to a file, then open the file in a web browser to browse the structure of the json data
r = str(resp.content, 'utf-8')
i1 = 0
i1 = r.find('root.App.main', i1)
i1 = r.find('{', i1)
i2 = r.find("\n", i1)
i2 = r.rfind(';', i1, i2)
jsonstr = r[i1:i2]
# load the raw json data into a python data object
data = json.loads(jsonstr)
# pull the values that we are interested in
name = data['context']['dispatcher']['stores']['QuoteSummaryStore']['price']['shortName']
price = data['context']['dispatcher']['stores']['QuoteSummaryStore']['price']['regularMarketPrice']['raw']
change = data['context']['dispatcher']['stores']['QuoteSummaryStore']['price']['regularMarketChange']['raw']
shares_outstanding = data['context']['dispatcher']['stores']['QuoteSummaryStore']['defaultKeyStatistics']['sharesOutstanding']['raw']
market_cap = data['context']['dispatcher']['stores']['QuoteSummaryStore']['summaryDetail']['marketCap']['raw']
trailing_pe = data['context']['dispatcher']['stores']['QuoteSummaryStore']['summaryDetail']['trailingPE']['raw']
earnings_per_share = data['context']['dispatcher']['stores']['QuoteSummaryStore']['defaultKeyStatistics']['trailingEps']['raw']
forward_annual_dividend_rate = data['context']['dispatcher']['stores']['QuoteSummaryStore']['summaryDetail']['dividendRate']['raw']
forward_annual_dividend_yield = data['context']['dispatcher']['stores']['QuoteSummaryStore']['summaryDetail']['dividendYield']['raw']
# print the values
print('Symbol:', symbol)
print('Name:', name)
print('Price:', price)
print('Change:', change)
print('Shares Outstanding:', shares_outstanding)
print('Market Cap:', market_cap)
print('Trailing PE:', trailing_pe)
print('Earnings Per Share:', earnings_per_share)
print('Forward Annual Dividend Rate:', forward_annual_dividend_rate)
print('Forward_annual_dividend_yield:', forward_annual_dividend_yield)
Yahoo confirmed that they terminated the service:
It has come to our attention that this service is being used in violation of the Yahoo Terms of Service. As such, the service is being discontinued. For all future markets and equities data research, please refer to finance.yahoo.com .
There is still a way to get this data by querying some APIs used by the finance.yahoo.com page. Not sure if Yahoo will be supporting it long term as the previous API was (hopefully they will).
I adapted the method used by https://github.com/pstadler/ticker.sh into the following python hack that takes a list of symbols from the command line and outputs some of the variables as a csv:
#!/usr/bin/env python
import sys
import time
import requests
if len(sys.argv) < 2:
print("missing parameters: <symbol> ...")
exit()
apiEndpoint = "https://query1.finance.yahoo.com/v7/finance/quote"
fields = [
'symbol',
'regularMarketVolume',
'regularMarketPrice',
'regularMarketDayHigh',
'regularMarketDayLow',
'regularMarketTime',
'regularMarketChangePercent']
fields = ','.join(fields)
symbols = sys.argv[1:]
symbols = ','.join(symbols)
payload = {
'lang': 'en-US',
'region': 'US',
'corsDomain': 'finance.yahoo.com',
'fields': fields,
'symbols': symbols}
r = requests.get(apiEndpoint, params=payload)
for i in r.json()['quoteResponse']['result']:
if 'regularMarketPrice' in i:
a = []
a.append(i['symbol'])
a.append(i['regularMarketPrice'])
a.append(time.strftime(
'%Y-%m-%d %H:%M:%S', time.localtime(i['regularMarketTime'])))
a.append(i['regularMarketChangePercent'])
a.append(i['regularMarketVolume'])
a.append("{0:.2f} - {1:.2f}".format(
i['regularMarketDayLow'], i['regularMarketDayHigh']))
print(",".join([str(e) for e in a]))
Sample Run:
$ ./getquotePy.py AAPL GOOGL
AAPL,174.5342,2017-11-07 17:21:28,0.1630961,19905458,173.60 - 173.60
GOOGL,1048.6753,2017-11-07 17:21:22,0.5749836,840447,1043.00 - 1043.00
var API = "https://query1.finance.yahoo.com/v7/finance/quote?symbols=AAPL";
$.getJSON(API, function (json) {...});call throws this error: No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'http://www.microplan.at/sar' is therefore not allowed access.
I've been trying to get this to work for probably 6 hours now to no avail, read every stackoverflow question I could find on the topic.
I'm trying to get 100, 200, or maybe 500 photos from a single tag:
func hashtags(hashtag: String, nextMaxTagId: String?) -> RequestParamters {
var params = "/tags/\(hashtag)/media/recent|access_token=\(accessToken)"
var parameters = Dictionary<String, AnyObject>()
parameters["access_token"] = accessToken
let urlString = "https://api.instagram.com/v1/tags/\(hashtag)/media/recent"
if let nextMaxTagId = nextMaxTagId {
params += "|max_tag_id=\(nextMaxTagId)"
parameters["max_tag_id"] = nextMaxTagId
}
let sig = HMAC.signWithKey(C.InstagramClientSecret(), usingData: params)
parameters["sig"] = sig
return (urlString: urlString, parameters: parameters)
}
This is what I use to construct my urls and parameters for my request. My first request does not have a nextMaxTagId, and that request goes through, returns 20 images and a pagination json.
Then, when I extract the next_max_tag_id from the pagination block, and create a request using that parameter, I get another 20 images, but they are the same images as before and now I do not get a pagination block.
I am signing my requests correctly (as all my other API requests throughout the app go through no problem) and I am not in Sandbox mode.
Edit: I've also tried using min_tag_id=\(nextMinTagId), still do not receive pagination in the next request.
Seems like:
1) You are using the Instagram Developer API with what seems like an authorized APIKey, and you mentioned you are NOT in Sandbox, so you're in a the Production environment for that api.
I'm trying to get 100, 200, or maybe 500 photos from a single tag
2) This means, combined with returns 20 images and a pagination json, that for 100, you need to make 5 calls minimum (100/20 == 5), 200 == 10, 500 = 25.
3) According to the developer documentation rate limits, the overall cap on Production is 5000 req/hour, with several APIs restricted to a much smaller limit (some are 30/60 req/hour). I'm not sure I see the exact tag rate limit you are hitting, but since the question mentions:
for probably 6 hours now to no avail
it's also possible you've just been hitting the overall hourly request limit each hour.
I definitely know that this is not an answer that I enjoy giving, because it's essentially saying: you're stuck. I've actually played with the rate limits myself before, and I find them extremely limiting (pun fully intended). The only other option, albeit not as "above board", is to scrape Instagram itself for the information you need. I say it's not as "above board" because if you needed info not found on a web scrape, you could theoretically scrape the mobile API through some minor reverse engineering (ie using an HTTP proxy to spoof mobile traffic systematically).
In the end, the API Instagram publishes is definitely very limited, and will face rate limits for the foreseeable future (unless you can get those somehow lifted in a specific partnership they somehow deem worthy, although I'm not sure how this could be approached).
The text here: https://developers.google.com/fusiontables/docs/v2/migration_guide implies that the 10MB limit not in effect for API v2, or that an alternative service "Media download" could be used for large responses.
The API Reference here: https://developers.google.com/fusiontables/docs/v2/reference/ does not have any information regarding the 10MB limit, or how you use "media download" to recieve your request.
How do I work around the 10MB limit for Fusion Tables API v2? I can't seem to find documentation that explains it.
To use media-download simply add the parameter alt=media to the URL
For those who use Google's API Client Libraries, the 'media download' is specified by using a specific method. For the Python library, there are two versions of the SQL query methods: sql*() and sql*_media() (and this is very likely true for the other client libraries as well).
Example usage:
# Build the googleapiclient service
FusionTables = build('fusiontables', 'v2', credentials=credentials);
query = 'select * from <table id>';
# "standard" query, returning fusiontables#sqlresponse JSON:
jsonRequest = FusionTables.query().sqlGet(sql = query);
jsonResponse = jsonRequest.execute();
# alt=media query, returning a CSV-formatted bytestring (in Python, at least):
bytestrRequest = FusionTables.query().sqlGet_media(sql = query);
byteResponse = bytestrRequest.execute();
As Kerry mentions here, media format queries that are too large to be sent as a GET request will fail (while regular format queries of the same length succeed provided the query result is less than 10 MB). In the python client, this failure appears as a HTTP 502: Bad Gateway error.
Also note that ROWIDs are currently not returned in the media response format.