Can someone point me to good resource for Net::HTTP? I'm trying to understand why certain code functions the way it does. For example:
def url_check(domain)
parsed = URI.parse(domain).host
check = Net::HTTP.new(parsed).head('/').kind_of? Net::HTTPOK
( check == true ? "up" : "down" )
end
I understand 95% of the above code, but I can't find any resources that explain what .head('/') is doing. I'm hoping someone can point me to a good resource that is beginner friendly.
HEAD is an HTTP command that returns just the http headers.
head("/") probably just returns the http headers sent by the server in response to request uri "/", ie the root of the website. It is commonly used to do a quick check to see if the page and/or site exists without fetching the entire html page.
You probably also need to learn something about HTTP protocol as well.
GET, POST, HEAD, SET, PUT, DELETE, TRACE are some common ones that come to my head right now there are couple more. You will have better understanding of the code once you understand the basics of HTTP.
Related
First you must know I'm a total beginner, I'm trying to learn so I almost don't know anything.
On the basic page of the API, there is a curl command used as an example to show us how to make requests.
I'm using Ruby on Rails so I used "curl-to-ruby" website to translate it, but it did not work as expected.
I wanted it to show me this :
uri = URI.parse("REQUEST_URL")
response = JSON.parse(Net::HTTP.get(uri))
Instead I got this :
uri = URI.parse("REQUEST_URL")
response = Net:HTTP.get_response(uri)
I don't understand any of this, I thought I wouldn't need to and just use "curl-to-ruby", but apparently I really need to get this.
Would you please try to explain me ?
Or give me links ?
Or matters to read (curl, API, http) ?
Thank you very much, have a nice day.
It's because that command doesn't return just the content, it returns the whole HTTP response object including headers and body. You need to extract the response body and parse that using JSON.parse(), e.g.
JSON.parse(response.body)
See documentation here: https://docs.ruby-lang.org/en/2.0.0/Net/HTTP.html#method-c-get_response
(Also, there is nothing in the cURL command which would hint to the converter that the content-type of the response was expected to be JSON (e.g. perhaps an "accepts" header or something), so even if it were able to produce extra code adding the JSON.parse part, it has no way of knowing that it would be appropriate to do so in this case.)
I am using two different HTTP POST utilities (poster out of Firefox as well as Python requests API) to post a simple SPARQL insert to Virtuoso.
My URL is: http://localhost:8890/sparql
My request parameters are:
default-graph-uri: <MY_GRAPH>
should-sponge: soft
debug: on
timeout:
format: application/xml
save: display
fname:
I put the actual SPARQL (INSERT DATA { GRAPH...) in the content of the message.
I tried different content types, none of which worked. I do get 200 but the response is in HTML even though the above parameter set specifies application/xml, however, no data is inserted. When I try content type of text/turtle, I get 409 Invalid Path, which is also referenced in this post.
I can successfully do HTTP GET, however, that has a payload length limitation which I would like to exceed for performance reasons. The only difference with the GET is that the SPARQL goes in the URL under query parameter and the POST should enable a much larger payload in the message content, by including multiple triples in the same request, not just one (I have 100s of 1000s of inserts). I was trying to follow this documentation page.
I stopped by this question days ago trying to achieve the same with curl. Since it is a powerful (and far more convenient) alternative to browser extensions, here is the formulation that eventually proved successful:
curl -X POST \
-H "Content-Type:application/sparql-update" \
-H "Accept:text/html" \
--data "select distinct ?Concept where {[] a ?Concept} LIMIT 100" http://localhost:8890/sparql
More details on the headers in this thread.
If you are using python, I would avoid using the requests library. There are some dedicated libraries for RDF which abstract the process and make your life easier.
Try:
SPARQLWrapper
RDFLib
They are both form the same family of packages from rdflib
Based on experience, I find the SPARQLWrapper significantly simpler and easier to use for your use case. It's an abstracted version of RDFLib. The docs suggest something like this could work:
from SPARQLWrapper import SPARQLWrapper, POST
sparql = SPARQLWrapper("https://example.org/sparql")
sparql.setCredentials("some-login", "some-password") # if required
sparql.setMethod(POST) # this is the crucial option
sparql.setQuery("""
<QUERY GOES HERE>
""".format(PARSE SOME VARS INTO THE QUERY HERE IF YOU WANT)
)
results = sparql.query()
print results.response.read()
Make sure you add the option for POST. You should be doing bulk I/O in no time :).
There are many aspects to this "question" making it difficult to provide a simple answer, suitable to this site. This is one of the reasons I suggested the mailing list, which is better suited to conversational and/or multi-facet assistance.
Have you tried using curl as most of our examples do?
Looking at the Poster page on Mozilla Add-Ons, I see that you may need to manually add a ? to the end of your target URI -- so http://localhost:8890/sparql? rather than http://localhost:8890/sparql -- and it's not clear whether you've done that in your testing. On the project page, I also note its last commit was in 2012, and there are a great many open issues.
I'm not at all familiar with Python, so I've not dug in there.
Have you tried setting an Accept: header? This can have significant impact on the content returned by the server.
If I understand your described efforts correctly, your format: query parameter should be output-format:, and its value should not be application/xml but one of the supported formats listed in the documentation.
Neither the virtuoso-users post you referenced nor this question have enough detail to analyze the cause of the 409 Invalid Path error. Explicit details that allow us to reproduce this result would be helpful, optimally in a distinct thread.
This seems to be a Virtuoso specific issue. You can only post a query by using content type "application/sparql-update" instead of "application/sparql-query" which is common.
The request is done as follows with Python:
headers = {
'Content-Type': 'application/sparql-update',
'Accept': 'application/json'
}
s = Session();
s.mount(server_url, HTTPAdapter(max_retries=1))
response: Response = s.post(server_url, data=<sparql_string>, headers=headers, timeout=100)
return response.json();
In Kohana 3.1.x framework.
What are the benefits to send data with internal requests like this
$post = Request::factory('module/data')
->method(Request::POST)
->post(array('some' => 'random data'))
->execute()
->response;
if you could simply send data like this
Module::instance()->data(array('some' => 'random data'));
In this example Module is a random module and data is some random method.
I'll call this Module via ajax and internal requests. I'm planning to design RESTful API.
QUESTION IS: Why use HMVC instead of just directly using an internal class API
Because they're internal requests, there is no additional HTTP request being made.
You might want to take a look at Request_Client_Internal and compare it to Request_Client_External. After that you should feel enlightened :)
Edit:
You should know that AJAX requests aren't the only "external HTTP requests". cURL, PECL HTTP, file_get_contents() and other PHP functions will also send an external HTTP request (imho you should read the RFC 2616 to understand how HTTP actually works).
With HMVC calls you can use the same controller for both Ajax and internal requests. Also, it can handle a standard (non-ajax) http-requests, form submits for example. All-in-one solution, single entry point.
If you dont want HMVC calls, you will require one call for internal request (somewhere in base controller) and another one - in a special Ajax controller. Also you may have a problems with a data rendering (usually HMVC and ajax calls are using different templates). Its not DRY.
I would comment on the above, what biakaveron said, but I can't yet, so I put it as an answer.
#stacknoob: Could you use Module::instance()->data(array(...)) as controller's action? You could - with some extra code.
Instead, what biakaveron already said, you can keep your code logic and have the action return the same result for AJAX and HMVC requests. In one place. DRY + KISS.
Suppose you are working on an API, and you want nice URLs. For example, you want to provide the ability to query articles based on author, perhaps with sorting.
Standard:
GET http://example.com/articles.php?author=5&sort=desc
I imagine a RESTful way of doing this might be:
GET http://example.com/articles/all/author/5/sort/desc
Am I correct? Or have I got this REST thing all wrong?
I'm afraid your question really misses the point of REST. From a purely theoretical perspective there is absolutely no advantage or disadvantage to either of those urls from a REST perspective. In practice, those urls may behave differently with different caches, and certainly server frameworks are going to parse them differently. Despite what you hear from the framework developers, there is no such thing as a RESTful URL.
From the perspective of REST those two URLs are simply identifiers that can be dereferenced. If you want to start building REST apis that will benefit from the characteristics described in the dissertation, you need to start thinking in terms of content that is returned when you dereference the URL and how that content is linked together using URLs embedded in the content.
I realize this does not help you much in trying to resolve what you consider to be your problem. What I can tell you is that one of the major intents of REST is to allow your URLs to be completely under the control of the server and can change without impacting your client applications. Therefore, my recommendation is to pick whatever url structure works most easily with the framework you are using to serve the resource representations. Certainly do not look to the REST dissertation to tell you what is the right and wrong way of formatting your URLs and anyone who tells you that your URLs are not RESTful is confused. Probably what they are telling you is the server framework, they are used to using for creating RESTful interfaces, requires URLs to be structured this way.
It's not what your URI looks like that matters, it is what you do with it that matters.
Using a query string is not more or less RESTful than using path components. The URI Generic Syntax (RFC 3986, January 2005) defines that they're just as important in identifying the resource. So yes, as others point out, it's not important to REST. (Note that in the obsoleted-by-RFC-3986 RFC 2396, the query string was not defined to be identifying the resource, but rather a string of information to be interpreted by the resource.)
However, URI design is important, because as an owner of a URI namespace (i.e. the holder of the domain name where the URIs will live) you want the URIs to be long lived. As wise men have stated earlier: Cool URIs don't change!
The choice of using query strings vs path components depends on how your resources are identified, and how they will be identified in years to come. If there's a hierarchy that stands out, then it might be that this should be reflected in the URI, at least if that hierarchy is relatively permanent, and that things don't move around all the time.
It's also important to note that the actual URIs are only meaningful to two parties:
Servers, who need to forge and parse URIs
Human beings who might see a URI in passing might learn things from the URI.
By contrast, client applications are usually not allowed to do URI introspection. So your choice of query strings vs path components boils down to what you think you can live with ten (or 100) years from now.
You are mostly right. The thing with REST api's is to focus on the nouns.
What does the noun all do in this case? Wouldn't you expect your API to always return all articles, unless you filter it?
I would make sort a query string parameters, further, I would make any and all filtering query string parameters. If you look at how Stack is implemented when you click on the "Newest" questions link, you get a query string to filter the questions.
So perhaps something like:
GET http://example.com/aritcles/authors/5?sort=desc
But also think about what happens with each URL:
GET http://example.com/aritcles/ might return all current articles
GET http://example.com/aritcles/authors/ What does this url do? does it return all authors of all articles, or does it return all the articles for all authors (which is essentially the same functionality of the URL above.)
GET http://example.com/aritcles/authors/5/ might return all articles by author 5, or does it return author 5's information?
I would maybe change it to:
http://example.com/aritcles returns all articles
http://example.com/aritcles/5 returns all articles from author 5
http://example.com/authors returns all authors
http://example.com/authors/5 returns information for author 5
Alan is mostly right but his URLs are misleading. I believe the correct routes / urls should reflect the following behavior:
[GET] http://domain.com/articles #=> returns all articles (index action)
[GET] http://domain.com/articles/5 #=> returns article ID 5 (show action)
[GET] http://domain.com/authors/#=> returns all authors (index action)
[GET] http://domain.com/authors/5 #=> returns author ID 5 (show action)
[GET] http://domain.com/authors/5/articles OR http://domain.com/articles/authors/5 #=> depending on the hierarchy of your routes (both belong to the index action)
Best regards,
DBA
One of our website has URL like this : example.oursite.com. We decided to move our site with an URL like this www.oursite.com/example. To do this, we wrote a rewrite rule in our Apache server that redirect to our new URL with a code 301.
Many websites link to us with URLs of the form example.oursite.com/#id=23. The problem is that the redirection erase the hash part of the URL with IE. As far as I know, the hash part is never sent to the server.
I wanted to implement the redirection with javascript to keep the hash part, but the Search Engine will not be aware that our URL changed. (no code 301 returned)
I want the Search Engine to be notified of our new URL(301) because we need to transfer the page rank to our new URL.
Is there a way to redirect with a 301 code and keep the hash part(#id=23) of in the URL ?
Search engines do in fact care about hash tags, they frequently use them to highlight specific content on a page.
To the question, however, anchor locations are unfortunately not sent to the server as part of the HTTP request. If you want to redirect a user, you will need to do this in Javascript on the client side.
Good article: http://web.archive.org/web/20090508005814/http://www.mikeduncan.com/named-anchors-are-not-sent/
Seeing as the server will never see the # (ruling out 301 Redirects) and Google has deprecated their AJAX Crawling scheme, it seems that a front-end solution is the only way!
How I did it:
(function() {
var redirects = [
['#!/about', '/about'],
['#!/contact', '/contact'],
['#!/page-x', '/pageX']
]
for (var i=0; i<redirects.length; i++) {
if (window.location.hash == redirects[i][0]) {
window.location.replace(redirects[i][1]);
}
}
})();
I'm assuming that because Google crawlers do indeed execute Javascript, the new pages will be indexed properly.
I've put it in a <script> tag directly underneath the <title> tag, so that it get executed before any other JS/CSS. Note that this script should only be required for your index file.
I am fairly certain that the hash/page anchor/bookmark part of a URL is not indexed by search engines, and therefore has no effect on your page ranking. Doing a google search for "inurl:#" returns zero documents, so that backs up my assumption. Links from external sites will be indexed without the hash.
You are right in that the hash part isn't sent to the server, so as far as I am aware, there isn't a good way to be able to create a redirection url with the hash in it.
Because of this, it's up to the browser to correctly manage the hash during a redirect. Firefox 3.5 appears to do this successfully. If you append a hash to a URL that has a known redirect, you will see the URL change in the address bar to the new location, but the hash stays on there successfully.
Edit: In response to the comment below, if there isn't a hash sign in the external URL for the part you need, then it is entirely possible to rewrite the URL. An Apache rewrite rule would take care of it:
RewriteCond %{HTTP_HOST} !^exemple\.oursite\.com [NC]
RewriteCond %{HTTP_HOST} !^$
RewriteRule ^/(.*) http://www.oursite.com/exemple/$1 [L,R]
If you're not using Apache, then you'll have to look into the server docs for something similar.
Google has a special syntax for AJAX applications that is based on hash URLs: http://code.google.com/web/ajaxcrawling/docs/getting-started.html
You could create a page on the old address that catches all requests and redirects to the new site with the correct address and code.
I did something like that, but it was in asp.net, which I guess it's not the language you use. Anyway there should be a way to do this in any language.
When returning status 301, your server is supposed to return a 'Location:' header which points to the new location. In practice, the way this is implemented varies; some servers provide the full URL (netloc and path), some just provide the new path and expect the browser to look for that path on the original netloc. It sounds like your rewrite rule is stripping the path.
An easy way to see what the returned Location header is, in the python shell:
>>> import httplib
>>> conn = httplib.HTTPConnection('exemple.oursite.com')
>>> conn.request('HEAD', '/')
>>> res = conn.getresponse()
>>> print res.getheader('location')
I'm afraid I don't know enough about mod_rewrite to tell you how to do the rewrite rule correctly, but this should give you an idea of what your server is actually telling clients to do.
The search bots don't care about hash tags. And if you are using them for some kind of flash or AJAX calls, you have more serious problems than your 301 redirects don't work. Because unless you have the content in an alternate form, the search engines are not indexing your site and you are definitely suffering as far as SEO goes.
I registered my account so I can't edit.
zombat : I'm sorry I made a mistake in my comment. The link to our video is exemple.oursite.com/#video_id=233. In this case, my rewrite rule in Apache doesn't work.
Nick Berardi: We changed the way our links work. We don't use # anymore, only for backward compatibility