How does short URLS work? [duplicate] - url

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How do short URLs services work?
Hi,
Can anybody explain how short URL's (technically) work, and for how long are they valid? Any articles about how does it work are welcome too (but please no example provider sites).
Thank you in advance.

The short URL server has a database matching the short URL (or, rather the coded part of the URL) to the actual URL it represents.
When it gets a request, it looks up the coded part and sends a redirect to the actual URL.
So, for example, the URL http://tinyurl.com/so-hints
Will go to the tinyurl server
The server will lookup what full URL matches so-hints
The server will issue a redirect to the browser to go to the full URL

Create a unique identifier for a given URL, store it in a database,
when user visits short url, lookup the original URL in the database,
return a HTTP 3xx (redirect) status code to the client with the actual address.
Short URLs usually use a combination of numbers and lowercase and uppercase letters. A combination of exactly six elements of this set (26 + 10 items) for the path component can already provide 2,176,782,336 unique ids.
If you want to study some source code, this article highlights 7 open source scripts:
7 Open Source And Free URL Shortener Scripts To Create Your Own

There's just a relational database with a table that maps from a short, high-entropy string to a given URL. The short strings are created each time someone asks for one. They're not any form of encryption, it's just lookup.

In its simplest form it is just a key that is matched to a URL. From there you can add functionality.
Have a look at the spec for the Google shortener as they have a pretty balanced feature set: http://code.google.com/apis/urlshortener/v1/getting_started.html

They manage a list of short to long URLs and redirect each request to short URL to its original one

Related

How to know what are all the possible parameters for a query string for a site?

I want to check what are ALL the possible parameters for any existing website url. Assuming the site is working with parameters type query string "architecture" (and not MVC for example) something like:
http://www.foobar.com/p1&itemsPerPage=50&size=500
Let's say there are other parameters which I don't know exist, and I don't see them in the url at the moment. For example, parameters like max, day and OtherExoticVariable. Again, I don't know their names but want to know ALL of their names. Is there some way of requesting the server to respond will all possible url parameters?
I would prefer a method with Javascript that I could run quickly through a browser but could also do asp.net c# if necessary.
Thanks a lot!
Ray.
It is the script/app running on the server that decides what parameters are valid. Unless the app provides such a query mechanism you can't do it. The server has no idea what is valid and what isn't.
Not guaranteed to get you ALL query strings, but it is often helpful to Google
"foobar.com/p1& * ".
You will be able to see all the public occurrences of query strings for the foobar.com website.
(As the accepted answer says, there is no general method to access query strings unless the website provides an API.)
I do not think this is possible. Each Web application designer can decide on the parameters individually, and you only know them if you see them being used.

Twitter streaming API not tracking URLs

Have gone through https://dev.twitter.com/docs/streaming-apis/parameters
Per documentation it should be able to track URLs such as example.com/foobarbaz but I can't seem it to be tracking such URLs. It just doesn't return me any result when I tweet this URL and track it using Streaming API. Am I missing something?
Pretty late, but I found this by Google so this might help someone...
There are a few answers to this. The main answer being that Twitter treats URLs differently than anything else.
First, make sure you do NOT include the "www".
Twitter currently canonicalizes the domain “www.example.com” to “example.com” before the match is performed, so omit the “www” from URL track terms.
For me, sending the track parameter as "example.com/foobarz" and then tweeting "a test, please ignore: http://example.com/foobarz" worked perfectly.
You can NOT, in general, ask for substrings of URLs:
URLs are considered words for the purposes of matches which means that the entire domain and path must be included in the track query for a Tweet containing an URL to match.
But if you are willing to take every tweet from the whole domain (and a bit more edge cases), Twitter will accommodate:
Finally, to address a common use case where you may want to track all mentions of a particular domain name (i.e., regardless of subdomain or path), you should use “example com” as the track parameter for “example.com” (notice the lack of period between “example” and “com” in the track parameter).
All quotes are from the Twitter docs: https://dev.twitter.com/streaming/overview/request-parameters#track
They have more information, including examples.
Good luck!

How to improve the structure of URLs

From the article at google's webmaster center and SEO's pdf, I think I should improve my website's URLs structure.
Now the news url looks like "news.php?id=127591". I need to rewrite it to something like "/news/127591/this-is-article-subject"
The problem is if I change the structure of url to the new one. Can I still keep the old one working? If both url working, how to avoid search engine like google and bing to search twice times for one article?
Thanks!
HTTP 301 permanent redirect from the old URL to the new URL
an HTTP 301 redirect has the property of communicate a new (permanent) URL for an old (outdated) ressource to google (and other clients). google will transfer most/all of the allocated value from the old URL to the new URL.
Also, in order to improve the arquitecture of your website, you must keep a clean structure by inserting links within all its pages/posts. But be careful, you must not do this lightly, or Google´s robot will get confused and leave.
Structure is key to your SEO
1. Find one page which is the "really important page" for any given keyword
2. direct relevant content from other pages which is relevant to that particular kw
3. repeat with every relevan kw
I´m gonna leave this post for you, where I explain this more in depth, hoping that you understand spanish. http://coach2coach.es/la-estructura-web-es-la-base-del-posicionamiento/
Yep.. you can use robots.txt to exclude news.php, and create an xml sitemap with the new URLs. mod_rewrite can be set to only change directories, with trailing slashes.. so all files in your root directory should work fine.

How would I find all the short urls that link to a particular long url?

Basically I want to know how many people have tweeted a link to a url, but since there are dozens of link shortener out there I don't see any way to do this without having access to all of their url maps. I found a previous question here but it was over a year old and didn't have any new answers.
So #1, does anyone know of a service/API for doing this?
And #2, can anyone think of a way to accomplish this task other than submitting the long url in question to all the popular link shortening sites?
ps- I'm also open to comments about why this is impossible or impractical.
You could perform a Google search (or the equivalent via API) for any pages that link to your page. This is done with the link: keyword. So if you're trying to figure out how many people link to www.example.com (regardless of whether it's through a link shortner URL), then you would just do a Google search for link:www.example.com.
e.g.: http://www.google.com/search?q=link:www.example.com
Note that this will only find pages that have been indexed, so pages that haven't been crawled, or pages that get crawled infrequently, will not show up in the results until a later date (if at all).
Since all sites have different algorithms for shortening the URLs, and these are different sites that most likely do not share their data with each other, how can you hope to find all of them in a single or small number of queries?
All you can do is brute-force it, and even then this might not be any good if a site is content to create a new value for the same long-form URL (especially if you send a different long-form URL that maps to the same place, like http://www.stackoverflow.com/ rather than http://stackoverflow.com/).
In order to really get this to work, there would have to be a site that ALREADY automatically collects all of this information from every site, which the URL shortening sites voluntarily call. And even if you wrote such a site, that doesn't account for the URL-shortening sites already out there who already have data!
In short, I do not see how this is remotely possible, unless I'm wrong about there being such a database somewhere out there.
So months after asking this question I came across a solution to a similar question, that is how to tell how many times a link has been shared on facebook. The solution, via a simple new API call:
http://graph.facebook.com/http://stackoverflow.com
returns the following json data:
{
"id": "http://stackoverflow.com",
"shares": 1627
}

Track a short URL generated for a long URL

I'm writing a URL shortener similar to tinyurl and I'm wondering how to keep track of URL's that are already shortened using my service? For example, tinyurl generates the same tiny URL for the same long URL regardless of who creates it. How can this be achieved that is scalable? Bitly also does this though they generate a new URL per person. However, they are able to track the aggregate (total # of) clicks for the long URL - How?
Thanks,
They store the URLs in their database, associated with the short URL(s). How else would it be done?

Resources