Understanding website and api urls? - url

I have noticed that many sites have URLs that are generated and interperable. For example there is the google search one bellow:
https://www.google.com/search?q=example+search&oq=example+search&aqs=chrome..69i57j0j69i60j0l3.2087j0j1&sourceid=chrome&ie=UTF-8
This type of pattern seems consistent accross many different web services. What is the best practice for choosing urls and how could I learn more?

If I understand your question you are referring to the key value pairs separated by ampersands, right?
This is called a GET request, which is giving information to the server through the URL. The format is website.com/path/to/page?variable1=value1&variable2=value2 etc.
The question mark indicates the beginning of the key value pair section.
POST requests serve a similar purpose, but are not sent in the URL, so they can contain more data.
Try looking here: https://www.w3schools.com/tags/ref_httpmethods.asp
This format is standard, but not essential. One could make their server interpret URLs in any way they want; however, HTML forms will automatically generate this format. This page has more info, some of which is relevant to your question: https://www.w3schools.com/html/html_forms.asp.

Related

Is it possible to detect uniquely-named files if you have the URL path?

Let's say someone has posted a resource at https://this-site.com/files/pdfs/some_file_name.pdf
Another resource is then posted at that URL, which we don't know the name of. However, the pathname is the same: https://this-site.com/files/pdfs/another_unique_resource98237219.pdf
Is it possible to detect when a new PDF is posted to this location? Or would we have to know more about the backend infrastructure? Keeping in mind that:
None of the other pieces of the URL are valid paths, in other words https://this-site.com/files/pdfs and https://this-site.com/files both return 404 errors.
The names of the files are unique and do not follow a specific pattern.
If this is not possible, what are other ways you might inspect the request/response infrastructure to look for resources posted to that URL?
my first suggestion is to look if there is another page that displays a list of the resource available on the website, of course assuming the website owner actually provides such page
the other method would be effectively brute forcing all the URLs under that path. you will need to collect some SOCKS to use with your crawlers to distribute your requests among multiple IP addresses otherwise the server will probably block your IP address. should you be able to distinguish the minimum and maximum number of characters in the file names (not pattern, just length) this operation can be drastically optimized.

How to know what are all the possible parameters for a query string for a site?

I want to check what are ALL the possible parameters for any existing website url. Assuming the site is working with parameters type query string "architecture" (and not MVC for example) something like:
http://www.foobar.com/p1&itemsPerPage=50&size=500
Let's say there are other parameters which I don't know exist, and I don't see them in the url at the moment. For example, parameters like max, day and OtherExoticVariable. Again, I don't know their names but want to know ALL of their names. Is there some way of requesting the server to respond will all possible url parameters?
I would prefer a method with Javascript that I could run quickly through a browser but could also do asp.net c# if necessary.
Thanks a lot!
Ray.
It is the script/app running on the server that decides what parameters are valid. Unless the app provides such a query mechanism you can't do it. The server has no idea what is valid and what isn't.
Not guaranteed to get you ALL query strings, but it is often helpful to Google
"foobar.com/p1& * ".
You will be able to see all the public occurrences of query strings for the foobar.com website.
(As the accepted answer says, there is no general method to access query strings unless the website provides an API.)
I do not think this is possible. Each Web application designer can decide on the parameters individually, and you only know them if you see them being used.

Twitter streaming API not tracking URLs

Have gone through https://dev.twitter.com/docs/streaming-apis/parameters
Per documentation it should be able to track URLs such as example.com/foobarbaz but I can't seem it to be tracking such URLs. It just doesn't return me any result when I tweet this URL and track it using Streaming API. Am I missing something?
Pretty late, but I found this by Google so this might help someone...
There are a few answers to this. The main answer being that Twitter treats URLs differently than anything else.
First, make sure you do NOT include the "www".
Twitter currently canonicalizes the domain “www.example.com” to “example.com” before the match is performed, so omit the “www” from URL track terms.
For me, sending the track parameter as "example.com/foobarz" and then tweeting "a test, please ignore: http://example.com/foobarz" worked perfectly.
You can NOT, in general, ask for substrings of URLs:
URLs are considered words for the purposes of matches which means that the entire domain and path must be included in the track query for a Tweet containing an URL to match.
But if you are willing to take every tweet from the whole domain (and a bit more edge cases), Twitter will accommodate:
Finally, to address a common use case where you may want to track all mentions of a particular domain name (i.e., regardless of subdomain or path), you should use “example com” as the track parameter for “example.com” (notice the lack of period between “example” and “com” in the track parameter).
All quotes are from the Twitter docs: https://dev.twitter.com/streaming/overview/request-parameters#track
They have more information, including examples.
Good luck!

Ruby on Rails 3 search external website source based on top google result

I'm having a hard time finding out where to start with this one. I pull information from an external website and put some of the content on my page. I think I need two things done. 1. A google search that takes the url of the top search given a name of my current object. 2. A way to examine the source of the result and output the information of a tag with a specific class.
To better explain this, I'll create a hypothetical situation: Say I have a website that lists mattresses and gives reviews. Say I want to add other websites reviews and in this website there's a tag like 3.5/5. Then I want to display this review along with a link to the external page. Is there a way to search the site like "site:http://mattressreviewsite/ #matress.name", pull that top url, and then search the source for the string "class='rating'" and display this in my view?
Thanks for any help or guidance. I'm using Rails 3.
You need an HTTP client (httparty, net/http-default) for that and do some parsing to get the required results.
Go study the url patterns of google (as far as I remember it was google.com?q=search_string) and use the http client for requests (get/post). Parse the result (there are many HTML parser gems available too) to get what you need and for any subsequent HTTP requests. And don't forget the 'I am feeling lucky' feature of google which returns only one result.
All the best!

What is the best URL strategy to handle multiple search parameters and operators?

Searching with mutltiple Parameters
In my app I would like to allow the user to do complex searches based on several parameters, using a simple syntax similar to the GMail functionality when a user can search for "in:inbox is:unread" etc.
However, GMail does a POST with this information and I would like the form to be a GET so that the information is in the URL of the search results page.
Therefore I need the parameters to be formatted in the URL.
Requirements:
Keep the URL as clean as possible
Avoid the use of invalid URL chars such as square brackets
Allow lots of search functionality
Have the ability to add more functions later.
I know StackOverflow allows the user to search by multiple tags in this way:
https://stackoverflow.com/questions/tagged/c+sql
However, I'd like to also allow users to search with multiple additional parameters.
Initial Design
My design is currently to do use URLs such as these:
http://example.com/search/tagged/c+sql/searchterm/transactions
http://example.com/search/searchterm/transactions
http://example.com/search/tagged/c+sql
http://example.com/search/tagged/c+sql/not-tagged/java
http://example.com/search/tagged/c+sql/created/yesterday
http://example.com/search/created_by/user1234
I intend to parse the URL after the search parameter, then decide how to construct my search query.
Has anyone seen URL parameters like this implemented well on a website?
If so, which do it best?
What you have here isn't a bad start.
Some things to keep in mind is that there is a length restriction on urls ~2000 characters in IE. Keep this in mind in the battle between SEO and readability vs brevity.
I'm not aware of any standards in this arena outside of common sense which it appears you've captured.
Another thing to keep in mind is that most search engines use standard url params e.g. ?http://www.google.com/search?hl=en&source=hp&q=donkeys+for+sale&aq=f&aqi=g10&aql=&oq=&gs_rfai=
There is good reason for this namely to do with url encoding and allowing for not traditional characters in the search bar.
So while pretty urls are nice they fail here for a variety of reasons

Resources