trying to webscrape a website but getting 403 forbidden - ruby-on-rails

I'm trying to access ticketmasters website to get prices of events.
However i'm running into issues where httpparty returns a 403 error on trying to scrape the website
This is all i have
page = HTTParty.get(ticketmaster_url)
doc = Nokogiri::HTML(page)
And the 403 appears on the get call
Thanks
Sam

Not sure why you are using HTTParty here. You can simply opne and parse the page usign Nokogiri:
require 'open-uri'
doc = Nokogiri::HTML(open(ticketmaster_url))
where ticketmaster_url is the url of the page you want to extract information from.

Related

Getting 500 Internal server error with open uri

I have a bookmarking site, done in ruby on rails, in which lots of URLs needed to be open and crawl its title and base_uri. Th method used for opening URL is open(url). When I tried to open http://www.mysite.com/ with open URI method, I got 500 Internal server error.
OpenURI::HTTPError in TestsController#test
500 Internal Server Error
I can access this URL through browser.
My code posted below
require 'hpricot'
require 'open-uri'
require 'timeout'
require 'net/http'
url = 'http://www.mysite.com/'
#filep = open(url)
base_uri = #filep.base_uri
I tried the same with hpricot too using the code.
#doc = Nokogiri::HTML(open(url).read) but getting the same error.
Please help me on this.
I had the exact same problem; could open the website in my browser but not through open-uri . Adding a user agent did not fix it, but using the 'restclient' class did
require 'restclient'
url = 'http://www....'
user_info = RestClient.get(url, "User-Agent" => "Ruby")

Google docs API: can't download a file, downloading documents works

I'm trying out http requests to download a pdf file from google docs using google document list API and OAuth 1.0. I'm not using any external api for oauth or google docs.
Following the documentation, I obtained download URL for the pdf which works fine when placed in a browser.
According to documentation I should send a request that looks like this:
GET https://doc-04-20-docs.googleusercontent.com/docs/secure/m7an0emtau/WJm12345/YzI2Y2ExYWVm?h=16655626&e=download&gd=true
However, the download URL has something funny going on with the paremeters, it looks like this:
https://doc-00-00-docs.googleusercontent.com/docs/securesc/5ud8e...tMzQ?h=15287211447292764666&amp\;e=download&amp\;gd=true
(in the url '&amp\;' is actually without '\' but I put it here in the post to avoid escaping it as '&').
So what is the case here; do I have 3 parameters h,e,gd or do I have one parameter h with value 15287211447292764666&ae=download&gd=true, or maybe I have the following 3 param-value pairs: h = 15287211447292764666, amp;e = download, amp;gd = true (which I think is the case and it seems like a bug)?
In order to form a proper http request I need to know exectly what are the parameters names and values, however the download URL I have is confusing. Moreover, if the params names are h,amp;e and amp;gd, is the request containing those params valid for obtaining file content (if not it seems like a bug).
I didn't have problems downloading and uploading documents (msword docs) and my scope for downloading a file is correct.
I experimented with different requests a lot. When I treat the 3 parameters (h,e,gd) separetaly I get Unauthorized 401. If I assume that I have only one parameter - h with value 15287211447292764666&ae=download&gd=true I get 500 Internal Server Error (google api states: 'An unexpected error has occurred in the API.','If the problem persists, please post in the forum.').
If I don't put any paremeters at all or I put 3 parameters -h,amp;e,amp;gd, I get 302 Found. I tried following the redirections sending more requests but I still couldn't get the actual pdf content. I also experimented in OAuth Playground and it seems it's not working as it's supposed to neither. Sending get request in OAuth with the download URL responds with 302 Found instead of responding with the PDF content.
What is going on here? How can I obtain the pdf content in a response? Please help.
I have experimented same issue with oAuth2 (error 401).
Solved by inserting the oAuth2 token in request header and not in URL.
I have replaced &access_token=<token> in the URL by setRequestHeader("Authorization", "Bearer <token>" )

Google Calendar Data API Integration

We're using Oauth to grab Calendar event data. I have successfully authorized the token and exchange it for an access token. When I perform a get request to the API endpoint I get a page that says "Moved Temporarily" with a link to something like https://www.google.com/calendar/feeds/default?gsessionid=xxxxxxxxxxxx
I'd like to interpret the response, whether it's json or xml but I can't get beyond the redirect it's throwing out. Any idea how to follow this?
Here's my call to the feed:
access_token = current_user.google.client
response = access_token.get(ConsumerToken::GOOGLE_URL).body
Yep, just dealt with this myself. It says "Moved Temporarily" because it's a redirect, which the oauth gem unfortunately doesn't follow automatically. You can do something like this:
calendar_response = client.get "http://www.google.com/calendar/feeds/default"
if calendar_response.kind_of? Net::HTTPFound # a.k.a. 302 redirect
calendar_response = client.get(calendar_response['location'])
end
This might be worthy of a patch to oauth...

Google Geocoding API request_denied

I am trying to geocode a batch of around 400 addresses using the Google Geocoding API through my rails app.
In one of my controllers I have these lines
require "net/http"
require "uri"
uri = URI.parse("http://maps.googleapis.com/maps/api/geocode/json?")
response = Net::HTTP.post_form(uri, {"address" => '5032-forbes-ave', 'sensor' => 'false'})
But I always get back ""status": "REQUEST_DENIED" from that.
Does anyone know why I am getting this, or if there is a way I can see exactly the HTTP request that is being sent so I can try to debug it?
Update: This is the request that I am trying to make, if I do this from my browser I get a normal response from the api: http://maps.googleapis.com/maps/api/geocode/json?address=5032-forbes-ave&sensor=false
You are POSTing the request but in your browser you are using GET.
So this work perfectly:
uri = URI.parse("http://maps.googleapis.com/maps/api/geocode/json?address=5032-forbes-ave&sensor=false")
response = Net::HTTP.get(uri)
The response is String itself and it contains some JSON (I'm not Google API's expert)

Adobe Flex 3 : Fault Event doesnt return XML Feed sent from Server

I am working on a flex application which communicates with a Rails backened.
When i request for some data, It sends back xml feed.
In some cases, if given parameters are not valid, then rails return an error feed with status code = 422 as following
email is wrong
But I dont get this feed in FaultEvent of Flex, How could i read error feed?
Thanks
Are you getting the result in ResultEvent in such cases? I am not sure for what all HTTP error codes FaultEvent will get invoke(I know only it goes for 404 and 500). May be its still going to ResultEvent as a valid result!
You can use HTTPService instead of URLLoader.
Flex HTTP results will not include the actual underlying HTTP response codes. It just doesn't work. (TM)

Resources