Getting the original link on Wikipedia? - hyperlink

I have some links to wikipedia articles, for example: https://en.wikipedia.org/wiki/Steve_jobs when you visit that link, you will see right under the article's title: (Redirected from Steve jobs) If you follow that link you will eventually reach a page with the same URL except that Steve_jobs has a capital J for jobs. So it would look like this: https://en.wikipedia.org/wiki/Steve_Jobs
Is there a way I can retrieve the latter link using the first one?

You can find out where does a certain title redirect to by API query like:
http://en.wikipedia.org/w/api.php?action=query&titles=Steve%20jobs&redirects
If you want the result in XML, add &format=xml to the URL, or &format=json for JSON.

Just use the query api. With http://en.wikipedia.org/w/api.php?action=query&titles=Steve+jobs|steve+jobs|steve+Jobs|Steve+Jobs&redirects you will get your titles normalized and redirected. See the api documentation for further options, especially the formats.

Related

Wikipedia API request sometimes not returning results

I want to make a request to the Wikipedia API to see if a given name has a Wikipedia page.
For example, let's say I make an API request to get the page for Justin Bieber:
source = "https://en.wikipedia.org/w/api.php?action=query&titles=justin%20bieber&prop=revisions&rvprop=content&format=json"
data = open(source).read
json = JSON.parse(data)
Then I get back a JSON response with this info. But why is it not returning any result for some less well known name (even though they have wiki pages?) For example, this brent bolthouse page: https://en.wikipedia.org/wiki/Brent_Bolthouse. If I check the json, there's no indication that it's an actual page..
I basically just want to implement a simple check to see if there's a wiki page that matches the exact name.
Try capitalizing all parts of the name, e.g.:
"brent bolthouse".titleize
=> "Brent Bolthouse"
I suggest this because the titles of Wikipedia's pages on persons always have that format. While your URL with the lowercase name as the query doesn't work, the URL with the capitalized name does.
Ah, I found out MediaWiki is case sensitive for page titles.

How do I reverse a t.co URL to the originating Tweet?

I'm going through our site analytics, and have a load of t.co URLs which were referrers to a promotion we were doing. I'm trying to figure out if there is a way to reverse those back to the original tweet where they originated, through the Twitter API or other means. I can't seem to find a good means to do this though, is there one?
This is not possible with the public APIs that twitter provides.
If I understand correctly you want to find a tweet that originally had a particular t.co link embedded. i.e. The t.co when followed resolves to your site, not the twitter tweet.
When a t.co forward points to a tweet, it goes to the web page for that tweet and the HTML for the page will include the canonical URL.
The ugly way to get this information is to use wget or curl to grab the HTML destination which will include the URL for your initial tweet.
A better way to do it is with the Python module, Requests (you will need to install this module first). Here's a quick command line script that will do it:
#!/usr/bin/env python
import requests
shorturl = raw_input("Enter the shortened URL in its entirety: ")
r = requests.get(shorturl)
print("""
The shortened URL forwards to:
%s
""" % r.url)
That code will work on any of those URL shortening services, not just Twitter's t.co site.
I did my testing with Python 2.7, but chances are that the above code will work with Python 3.x. Either way, Requests is your friend, see the documentation for details:
http://docs.python-requests.org/en/latest/index.html
The redirection and history section covers this example.
I don't know of a way to do it through the Twitter API and it may not be possible if all URL shortening is automatic. Still an API based solution would only work with the t.co addresses, whereas the code above will work on any other shortened URL or any URL which redirects (e.g. HTTP 301 or 302 response codes) to another location.
Edit (better a bit later than never): After using the above to find where the t.co forward actually points to, there will be three or four types of possible results. The most common being that it is what the OP believes they all are, a shortening to a URL pasted into a tweet and, to be fair, that is what most of them are.
The other possibilities are that it links back to the tweet itself, this usually only appears with some rather long tweets (not sure how much that increases in frequency with the character limit increase too); as well as forwarding to the URL of a status independent of a the tweet author's status URL, which is often the case with embedded media (images and video); plus forwards to the URL of a tweet which is being quote tweeted or retweeted.
Given the OP's original scenario, none of those internal Twitter usages should ever be seen and only the "normal" forwarding is of concern here. Now searching for the t.co address at twitter.com avails us nothing, regardless what combinations are used.
Searching the target address, however, that which is revealed by scripts like the one at the start of this answer, however, is quite another matter. That will produce the the results of every tweet which is publicly accessible and which posted that link. There are, however, some drawbacks including:
The search results will include tweets where other forwarding services were used as well.
There is no way to tell whether all the tweets which linked to that URL generated the same t.co address or not.
If not, there is no way to see which t.co forward was utilised by which tweet.
Nevertheless, in conjunction with complete referrer logs on a web server, it may be possible to narrow that further. Assuming the referrer URL reports the URL of the tweet and not simply twitter.com. That, however, is more likely to be determined by the manner in which the person clicking on the link did so (i.e. were they just seeing the tweet in a stream or had they expanded it enough to display its full URL).
I suspect the effectiveness of referrer logs will be sporadic and likely reduced on smartphones and tablets where the apps in use are less likely to have expanded tweets in that way in order to then provide that data to third party websites.
#!/usr/bin/env python3
import requests
import urllib.parse
shorturl = input("Enter the shortened URL in its entirety: ")
r0 = requests.get(shorturl, verify=True)
t0 = "https://twitter.com/search?f=tweets&q="
t1 = urllib.parse.quote_plus(r0.url)
r1 = requests.get("{0}{1}".format(t0, t1), verify=True)
# the results will be in r1.content
# there may be some benefit from cutting the http:// or
# https:// from r0.url before creating the quoted string in t1.
That, however, is as good as it gets ... without paying Twitter for enhanced data access.
Find out which is the original URL the shortened URL is pointing to e.g. by using a service like http://www.getlinkinfo.com
Paste that original URL into Google's search box
If you are specifically looking for references from Twitter do like this: site:twitter.com "https://example.com"
If you use the Twitter search APIs, you can find tweets that mention the t.co URL (if they're visible to you) and find the link that way.
Here’s some Python for doing that, taken from a longer blog post I wrote:
from requests_oauthlib import OAuth1Session
sess = OAuth1Session(
client_key=TWITTER_CONSUMER_KEY,
client_secret=TWITTER_CONSUMER_SECRET,
resource_owner_key=TWITTER_ACCESS_TOKEN,
resource_owner_secret=TWITTER_ACCESS_TOKEN_SECRET
)
def find_tweets_using_tco(tco_url):
"""
Given a shortened t.co URL, return a set of URLs for tweets that use this URL.
"""
# See https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets.html
resp = sess.get(
"https://api.twitter.com/1.1/search/tweets.json",
params={
"q": tco_url,
"count": 100,
"include_entities": True
}
)
statuses = resp.json()["statuses"]
tweet_urls = set()
for status in statuses:
# A retweet shows up as a new status in the Twitter API, but we're only
# interested in the original tweet. If this is a retweet, look through
# to the original.
try:
tweet = status["retweeted_status"]
except KeyError:
tweet = status
# If this tweet shows up in the search results for a reason other than
# "it has this t.co URL as a short link", it's not interesting.
if not any(u["url"] == tco_url for u in tweet["entities"]["urls"]):
continue
url = "https://twitter.com/%s/status/%s" % (
tweet["user"]["screen_name"], tweet["id_str"]
)
tweet_urls.add(url)
return tweet_urls
Twitter's t.co URL shortener simply redirects to another URL in the HTTP response. To find that other URL, you only need to fetch the t.co URL and look at the location header in the response. curl can do this:
curl -v <t.co URL>
To extract only the URL from all that information, you can use:
curl -w "%{redirect_url}" <t.co URL>
The -w option tells curl to output only the redirect_url variable.
List of tweets that referred to your pages is available under Social networks and then Trackbacks menu directly in Google Analytics.
This is how you find the original tweet:
Click the t.co link to find the original URL
Go to https://twitter.com/explore (#)
Copy and paste the link into the on "search twitter" search box
You will see the tweet(s) with the link

URL for a link to Twitter for a specific tweet

I have some Javascript that uses Twitter API to get tweets. I parse the data and use jQuery to generate HTML for the DOM.
An aspect of what I want to display is a "View this tweet" link -- yeah, sorta sounds silly, but it allows a user to get a URL for a specific tweet.
I am generating an a tag with an href. The URL is of the form:
http://twitter.com/{twitter-user-id}/status/{tweet-status-id}
where the content in curly braces is actual data extracted from the tweet (no, I am not including the curly braces). For example:
http://twitter.com/Atechtrader/status/57432099984130050
What happens in operation is that this works for some tweets, but not others. For the ones that fails, the Twitter server responds with content that says the requested page does not exist.
Am I doing something wrong?
https://twitter.com/statuses/ID should work.
it will redirect to the needed status.
Unfortunately, all of the answers provided so far rely on an HTTP redirect.
The direct link is of the form: https://twitter.com/i/web/status/{tweet-status-id}
FYI: id_str is the variable you need to call instead of id
id_str should be taken from the tweet object and replaced in
https://twitter.com/statuses/[id_str]
You can use like:
http://twitter.com/itdoesnotmatter/status/[YOURID]
Twitter redirect based on status ID not username.
It works for desktop and mobile.
You can use
'https://www.twitter.com/'+ user.screen_name+'/status/' + id_str
I've been tried it. It's work good:
- Web : https://twitter.com/statuses/ID
- Mobile && Web: https://twitter.com/User_ID/statuses/Tweet_ID
I hope it's helpful for you.

dynamic seo title for news articles

I have a news section where the pages resolve to urls like
newsArticle.php?id=210
What I would like to do is use the title from the database to create seo friendly titles like
newsArticle/joe-goes-to-town
Any ideas how I can achieve this?
Thanks,
R.
I suggest you actually include the ID in the URL, before the title part, and ignore the title itself when routing. So your URL might become
/news/210/joe-goes-to-town
That's exactly what Stack Overflow does, and it works well. It means that the title can change without links breaking.
Obviously the exact details will depend on what platform you're using - you haven't specified - but the basic steps will be:
When generating a link, take the article title and convert it into something URL-friendly; you probably want to remove all punctuation, and you should consider accented characters etc. Bear in mind that the title won't need to be unique, because you've got the ID as well
When handling a request to anything starting with /news, take the next part of the path, parse it as an integer and load the appropriate article.
Assuming you are using PHP and can alter your source code (this is quite mandatory to get the article's title), I'd do the following:
First, you'll need to have a function (or maybe a method in an object-oriented architecture) to generate the URLs for you in your code. You'd supply the function with the article object or the article ID and it returns the friendly URL with the ID and the friendly title.
Basically function url(Article $article) => URL.
You will also need some URL rewriting rules to remove the PHP script from the URL. For Apache, refer to the mod_rewrite documentation for details (RewriteEngine, RewriteRule, RewriteCond).

How does a website know the Google query I used to find it?

When I search for something such as "rearrange table columns in asp.net" on Google, and click the link to Wrox's forum site, the site greets me with a message such as "Your Google search for 'rearrange table columns in asp.net' brought you to Wrox Forum...".
How does a site know what query I typed into Google? And how could I add such an ablity to my site?
It is parsing your query from the query parameters in the HTTP_REFERER server variable, which contains the URL you're coming from and is provided in your HTTP request.
It uses a header known as the "HTTP referrer". See http://en.wikipedia.org/wiki/HTTP_referrer
To use it in your site, you would need some kind of dynamic page generation, such as ASP / ASP.NET, PHP, or Perl. For example in Perl, you could do something like:
if ($ENV{HTTP_REFERER} =~ /google.com\?.+&q=(.+?)&/)
print "Your google search of $1 brought you to this site";
WARNING: The code above is only an example and may not be correct or secure!
Like these guys are suggesting, it's the HTTP_REFERER header variable. The query is in the "q" key in the URL. So if you want to parse that, you can just sort out the querystring and URL decode the "q" variable.
It looks at the referrer header. Here is some fairly basic PHP code to do it.

Resources