Parse list of Short URL's

Parse list of Short URL's - parsing

I have a nice long list of shortened URLs with identifiers, like:
43242 http://t.co/12352
61431 http://t.co/16192
98718 http://t.co/95351
.
Wondering if a utility exists to feed a list of shortened links and have it return the long version?

The full url is available as an entity.
You can get this with the streaming API.
Here is an example for you : Twitter Full URL - Shortened URL

Related

Google Indexing and Url

On my website, brand URLs like this
.../shopbybrands.aspx?auth=430&brand=AQUATALIA by MARVIN K
have no Hyphen or Underscore: "AQUATALIA by MARVIN K". But when we open the URL in a browser it looks like:
.../shopbybrands.aspx?auth=430&brand=AQUATALIA%20by%20MARVIN%20K
Automatically, %20 appears in the URLs. The issue is that if I search the first URL in Google, I will not get any results; but if I search the second URLI do.
Also, in XML sitemap, URLs are included (e.g. the first URL) and I'm seeing in Google webmaster tool that only 35 URLs indexed out of 105. While I'm searching Google with %20 URLS, I'm getting index or results for the same but when I search the normal URL, like first, I don't.
So, please suggest what I need to do to fix this?

Decode url_encoded_fmt_stream_map to valid Url

Im using url_encoded_fmt_stream_map to get a List of Youtube VideoStream "urls".
I want to use these urls to show Youtube videos in my VideoView.
My method pretty much returns a String Array containing Strings like this:
sig=3E0D90E459ADEF9F88553D716B9275930A8AA418.AD0319F9287244E34CCA97F7DE245C0606DD46C5&itag=45&fallback_host=tc.v22.cache5.c.youtube.com&url=http%3A%2F%2Fr4---sn-uigxx50n-8pxl.c.youtube.com%2Fvideoplayback%3Fsparams%3Dcp%252Cid%252Cip%252Cipbits%252Citag%252Cratebypass%252Csource%252Cupn%252Cexpire%26id%3D4e6580527662c67d%26cp%3DU0hVSVlQV19FUENONV9RSkFIOnZ3SkZXb3hfdUdp%26source%3Dyoutube%26fexp%3D919110%252C913564%252C916624%252C932000%252C906383%252C902000%252C919512%252C929903%252C931202%252C900821%252C900823%252C931203%252C931401%252C909419%252C913566%252C908529%252C930807%252C919373%252C930803%252C906836%252C920201%252C929602%252C930101%252C930603%252C900824%252C910223%26ms%3Dau%26mv%3Dm%26mt%3D1364933359%26sver%3D3%26itag%3D45%26key%3Dyt1%26ip%3D178.115.248.80%26newshard%3Dyes%26upn%3DNLMBgU-0oUc%26expire%3D1364959706%26ipbits%3D8%26ratebypass%3Dyes&quality=hd720&type=video%2Fwebm%3B+codecs%3D%22vp8.0%2C+vorbis%22
If you have a close look at it, you can see that this String actually contains valid information. Unfortunately I have no idea how to extract a valid url out of this.
How would I do that?
If I create an URI with the String above and add it to my VideoView, a message "Can't play this video." pops up.

As you probably know, youtube link looks like that:
http://r6---sn-nhpax-ua8l.c.youtube.com/videoplayback?algorithm=throttle-factor&burst=40&cp=U0hVTFlTUV9HUUNONV9RTVVCOkk4ams1XzRlZUpq&cpn=qAthIxAZB16Q6_qB&expire=1367983127&factor=1.25&fexp=927900%2C919357%2C921716%2C916623%2C922911%2C931009%2C932000%2C932004%2C906383%2C904479%2C901208%2C925714%2C929119%2C931202%2C900821%2C900823%2C912518%2C911416%2C930807%2C919373%2C906836%2C926403%2C900824%2C912711%2C929606%2C910075&id=932a6c200e40b791&ip=84.228.249.95&ipbits=8&itag=34&key=yt1&ms=au&mt=1367957244&mv=m&newshard=yes&ratebypass=yes&signature=7E86D1813AE13CEB544CFF3749FBD042FF50EE91.AF7D6D9DCF286D291825630E733B3407683C46C9&source=youtube&sparams=algorithm%2Cburst%2Ccp%2Cfactor%2Cid%2Cip%2Cipbits%2Citag%2Csource%2Cupn%2Cexpire&sver=3&upn=t0jDC5Yk7Go
And as you said, all the parameters we need are shown in your string.
There are 2 important parameters we need to extract from your string: url and signature.
The Link eventually should look like this: url + "&signature=" + sig.
You can take a look at my code, where I extract all the available links out of youtube link. It's wriitten in java for an android app: link

Thank you for your effort.
I solved this problem by sticking to the official YouTube API. Also see this question:
Getting YouTube Video ID including parameters

How do I reverse a t.co URL to the originating Tweet?

I'm going through our site analytics, and have a load of t.co URLs which were referrers to a promotion we were doing. I'm trying to figure out if there is a way to reverse those back to the original tweet where they originated, through the Twitter API or other means. I can't seem to find a good means to do this though, is there one?

This is not possible with the public APIs that twitter provides.
If I understand correctly you want to find a tweet that originally had a particular t.co link embedded. i.e. The t.co when followed resolves to your site, not the twitter tweet.

When a t.co forward points to a tweet, it goes to the web page for that tweet and the HTML for the page will include the canonical URL.
The ugly way to get this information is to use wget or curl to grab the HTML destination which will include the URL for your initial tweet.
A better way to do it is with the Python module, Requests (you will need to install this module first). Here's a quick command line script that will do it:
#!/usr/bin/env python
import requests
shorturl = raw_input("Enter the shortened URL in its entirety: ")
r = requests.get(shorturl)
print("""
The shortened URL forwards to:
%s
""" % r.url)
That code will work on any of those URL shortening services, not just Twitter's t.co site.
I did my testing with Python 2.7, but chances are that the above code will work with Python 3.x. Either way, Requests is your friend, see the documentation for details:
http://docs.python-requests.org/en/latest/index.html
The redirection and history section covers this example.
I don't know of a way to do it through the Twitter API and it may not be possible if all URL shortening is automatic. Still an API based solution would only work with the t.co addresses, whereas the code above will work on any other shortened URL or any URL which redirects (e.g. HTTP 301 or 302 response codes) to another location.
Edit (better a bit later than never): After using the above to find where the t.co forward actually points to, there will be three or four types of possible results. The most common being that it is what the OP believes they all are, a shortening to a URL pasted into a tweet and, to be fair, that is what most of them are.
The other possibilities are that it links back to the tweet itself, this usually only appears with some rather long tweets (not sure how much that increases in frequency with the character limit increase too); as well as forwarding to the URL of a status independent of a the tweet author's status URL, which is often the case with embedded media (images and video); plus forwards to the URL of a tweet which is being quote tweeted or retweeted.
Given the OP's original scenario, none of those internal Twitter usages should ever be seen and only the "normal" forwarding is of concern here. Now searching for the t.co address at twitter.com avails us nothing, regardless what combinations are used.
Searching the target address, however, that which is revealed by scripts like the one at the start of this answer, however, is quite another matter. That will produce the the results of every tweet which is publicly accessible and which posted that link. There are, however, some drawbacks including:
The search results will include tweets where other forwarding services were used as well.
There is no way to tell whether all the tweets which linked to that URL generated the same t.co address or not.
If not, there is no way to see which t.co forward was utilised by which tweet.
Nevertheless, in conjunction with complete referrer logs on a web server, it may be possible to narrow that further. Assuming the referrer URL reports the URL of the tweet and not simply twitter.com. That, however, is more likely to be determined by the manner in which the person clicking on the link did so (i.e. were they just seeing the tweet in a stream or had they expanded it enough to display its full URL).
I suspect the effectiveness of referrer logs will be sporadic and likely reduced on smartphones and tablets where the apps in use are less likely to have expanded tweets in that way in order to then provide that data to third party websites.
#!/usr/bin/env python3
import requests
import urllib.parse
shorturl = input("Enter the shortened URL in its entirety: ")
r0 = requests.get(shorturl, verify=True)
t0 = "https://twitter.com/search?f=tweets&q="
t1 = urllib.parse.quote_plus(r0.url)
r1 = requests.get("{0}{1}".format(t0, t1), verify=True)
# the results will be in r1.content
# there may be some benefit from cutting the http:// or
# https:// from r0.url before creating the quoted string in t1.
That, however, is as good as it gets ... without paying Twitter for enhanced data access.

Find out which is the original URL the shortened URL is pointing to e.g. by using a service like http://www.getlinkinfo.com
Paste that original URL into Google's search box
If you are specifically looking for references from Twitter do like this: site:twitter.com "https://example.com"

If you use the Twitter search APIs, you can find tweets that mention the t.co URL (if they're visible to you) and find the link that way.
Here’s some Python for doing that, taken from a longer blog post I wrote:
from requests_oauthlib import OAuth1Session
sess = OAuth1Session(
client_key=TWITTER_CONSUMER_KEY,
client_secret=TWITTER_CONSUMER_SECRET,
resource_owner_key=TWITTER_ACCESS_TOKEN,
resource_owner_secret=TWITTER_ACCESS_TOKEN_SECRET
)
def find_tweets_using_tco(tco_url):
"""
Given a shortened t.co URL, return a set of URLs for tweets that use this URL.
"""
# See https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets.html
resp = sess.get(
"https://api.twitter.com/1.1/search/tweets.json",
params={
"q": tco_url,
"count": 100,
"include_entities": True
}
)
statuses = resp.json()["statuses"]
tweet_urls = set()
for status in statuses:
# A retweet shows up as a new status in the Twitter API, but we're only
# interested in the original tweet. If this is a retweet, look through
# to the original.
try:
tweet = status["retweeted_status"]
except KeyError:
tweet = status
# If this tweet shows up in the search results for a reason other than
# "it has this t.co URL as a short link", it's not interesting.
if not any(u["url"] == tco_url for u in tweet["entities"]["urls"]):
continue
url = "https://twitter.com/%s/status/%s" % (
tweet["user"]["screen_name"], tweet["id_str"]
)
tweet_urls.add(url)
return tweet_urls

Twitter's t.co URL shortener simply redirects to another URL in the HTTP response. To find that other URL, you only need to fetch the t.co URL and look at the location header in the response. curl can do this:
curl -v <t.co URL>
To extract only the URL from all that information, you can use:
curl -w "%{redirect_url}" <t.co URL>
The -w option tells curl to output only the redirect_url variable.

List of tweets that referred to your pages is available under Social networks and then Trackbacks menu directly in Google Analytics.

This is how you find the original tweet:
Click the t.co link to find the original URL
Go to https://twitter.com/explore (#)
Copy and paste the link into the on "search twitter" search box
You will see the tweet(s) with the link

Google reader public RSS get more than 9 items

We need to parse the data from a google reader public rss feed, the problem is that the url parameter n=numerofitemstoretrieve only works up to n=9
For example in our test url:
http://www.google.com/reader/shared/user%2F15926769355350523044%2Flabel%2FPublicas%20RSS?n=2
Retrieves 2 news items
http://www.google.com/reader/shared/user%2F15926769355350523044%2Flabel%2FPublicas%20RSS?n=20
Retrieves only 9 news items
How can we overcome this limitation? Is there another parameter for this case? Or another method?

We found that using this alternative url the n parameter works fine:
https://www.google.com/reader/api/0/stream/contents/feed/http://www.google.com/reader/public/atom/user%2F15926769355350523044%2Flabel%2FPublicas%20RSS?n=20
The only problem is the output format its different this way, so if someone finds a better solution we will grant the response to him/her
It seems the results are cropped only when the url is viewed in the browser...if you get the web contents from code it returns the correct item count...(in contrast using the alternative url the returned contents are right both ways: getting them from code as well as viewing it in the browser)

In Atom format (link in the top right in the two urls in the OP) :
http://www.google.com/reader/public/atom/user%2F15926769355350523044%2Flabel%2FPublicas%20RSS?n=20
The content with /api/ in the URL in the second post is in JSON format, slightly harder to parse than the Atom XML.
https://webapps.stackexchange.com/questions/26567/how-to-raise-google-reader-rss-feed-entry-limit

Using Yahoo Pipes to makes tweets link to url they contain rather than Twitter

I'm wondering how you can use Yahoo Pipes to get any tweets than contain a url to link to that url when clicked, instead of linking to Twitter.

I made some assumptions.
You want the first URL if there are multiple links in the message
If there is no URL, you will skip the tweet
You only care about http and https links
The flow ends up:
1 Fetch Feed - Use twitter RSS
2 Filter - item.description Matches regex https?://
3 Rename - item.description Copy As link
4 Regex - In item.link replace ^.*?(https?://[\w:##%/;$()~?+-=\.&]+).*$ with $1 (s)
If you want to see all tweets, then the simplest thing is to split the feed at the top and filter the ones with and without URLs and only process the URL ones. Finally you can remerge the feeds before outputting.
If you want more url types, change the https?:// to (https?|ftp|etc):// in step 2 and 4
I made a sample here.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart