Scraping using a nokogiri

Scraping using a nokogiri - ruby-on-rails

I'm trying to scrape information for an on project from the Oil and Gas Authority Open data site but my code returns no data
(The website I'm trying to scrape from)
http://data-ogauthority.opendata.arcgis.com/datasets/ab4f6b9519794522aa6ffa6c31617bf8_0?uiTab=table
I have also realized that the site has an API but I do not know how to call an API in rails. If anybody can assist it would be greatly appreciated.

You can get those data using requests module:
import requests
import json
url = 'http://data-ogauthority.opendata.arcgis.com/datasets/ab4f6b9519794522aa6ffa6c31617bf8_0.geojson'
r = requests.get(url)
data = json.loads(r.text)
# here you have the data loaded into a dict

Related

Importing Data from XML L Data of 9000 urls

Hi I am trying to import the author of blog from url using the query :
=Index(IMPORTXML(A264,"//span[#class='auth-name']"),1)
it works for some urls , but for some needs the data doesnt load , shown Error
Loading data Please suggest on what to do
Got what i tied but not completely

The content returned by requests.get(URL) does not match with the content on the webpage, why is that?

I'm trying to learn web scraping but when I try to request this page using
url = https://fbref.com/en/players/fa/
page = requests.get(url)
the page resembles what I'd get if I went one step down in the URL (https://fbref.com/en/players/), i.e. not including the "fa/"
My guess would be that this redirection has got to do with cookies or something similar, is there a way to bypass this?
Thanks in advance!

How to specify a language using Insight API for Twitter from IBM Bluemix Platform

I'm starting to use Insight API for Twitter from IBM Bluemix.
It's hard to find supporting resource for using this. So far I am using CURL and specifically formed URL to query the API service, and the service returns results in JSON format.
Here's an example of the URL I used with CURL to search for some tweets with the API:
https://(my seucrity key)#cdeservice.mybluemix.net:443/api/v1/messages/search?q=$MSFT%28posted:2016-01-01T00:00:00Z,2016-09-01T00:00:00Z%29&size=20
This URL returns a JSON object with tweets with keyword #MSFT, and between the time frame of 2016-1-1 to 2016-9-1, only return 20 tweets.
I would like to add to that link by specifying a language for the tweet to search for, so far I come up empty. Can you please help me ?
I have tried adding the following to the URL and did not do anything:
lang=EN, lang="en"
lang:en, lang:EN
Thanks.

The syntax is lang:en and you need to make sure to include it as part of your query.
I created the following query based on the one you provided in your question:
https://username:password#cdeservice.mybluemix.net:443/api/v1/messages/search?q=(%24MSFT%20AND%20posted%3A2016-01-01T00%3A00%3A00Z%2C2016-09-01T00%3A00%3A00Z%20AND%20lang%3Aen)&size=20
The unencoded query is
($MSFT AND posted:2016-01-01T00:00:00Z,2016-09-01T00:00:00Z AND lang:en)
You can find documentation here.
But in this link you can find more details on the syntax, which is:
/api/v1/messages/search?q=QUERY&size=NUMBER&from=NUMBER

Google reader public RSS get more than 9 items

We need to parse the data from a google reader public rss feed, the problem is that the url parameter n=numerofitemstoretrieve only works up to n=9
For example in our test url:
http://www.google.com/reader/shared/user%2F15926769355350523044%2Flabel%2FPublicas%20RSS?n=2
Retrieves 2 news items
http://www.google.com/reader/shared/user%2F15926769355350523044%2Flabel%2FPublicas%20RSS?n=20
Retrieves only 9 news items
How can we overcome this limitation? Is there another parameter for this case? Or another method?

We found that using this alternative url the n parameter works fine:
https://www.google.com/reader/api/0/stream/contents/feed/http://www.google.com/reader/public/atom/user%2F15926769355350523044%2Flabel%2FPublicas%20RSS?n=20
The only problem is the output format its different this way, so if someone finds a better solution we will grant the response to him/her
It seems the results are cropped only when the url is viewed in the browser...if you get the web contents from code it returns the correct item count...(in contrast using the alternative url the returned contents are right both ways: getting them from code as well as viewing it in the browser)

In Atom format (link in the top right in the two urls in the OP) :
http://www.google.com/reader/public/atom/user%2F15926769355350523044%2Flabel%2FPublicas%20RSS?n=20
The content with /api/ in the URL in the second post is in JSON format, slightly harder to parse than the Atom XML.
https://webapps.stackexchange.com/questions/26567/how-to-raise-google-reader-rss-feed-entry-limit

Parsing Bing News Search API Results

Hey i am trying to parse Bing News API Search results, using Regex but finding it real hard. Can any one tell how to extract - 1. Snippet, 2. URL and 3. Name from all the results(10 is the default number) that are returned in one response ?
This is the response that i am receiving from Bing for a query.(there are 5 results returned in this)
http://ideone.com/yd8yl

You don't need to use a regular expression for parsing the Bing response. The response is in JSON (JavaScript Object Notation) format, and depending on your programming environment, you may use an appropriate library to parse it. Please check http://www.json.org/ if you are not familiar with what JSON is.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Scraping using a nokogiri - ruby-on-rails

You can get those data using requests module: import requests import json url = 'http://data-ogauthority.opendata.arcgis.com/datasets/ab4f6b9519794522aa6ffa6c31617bf8_0.geojson' r = requests.get(url) data = json.loads(r.text) # here you have the data loaded into a dict

Related

Importing Data from XML L Data of 9000 urls

The content returned by requests.get(URL) does not match with the content on the webpage, why is that?

How to specify a language using Insight API for Twitter from IBM Bluemix Platform

Google reader public RSS get more than 9 items

Parsing Bing News Search API Results

Categories

Resources