XPath queries to scrape data

XPath queries to scrape data - google-sheets

I'm using Copy XPath from Chrome to create my queries. It works very well but not for this question.
Here is the site I scrape data from.
One query that works (number next to "Senaste NAV-kurs" in table 1)
=IMPORTXML("http://www.di.se/di-fonder/fonddetaljer/?InstrumentId="&1085603;"//*[#id='fund-summary-wrap']/div[1]/dl[2]/dd/text()" )
But when I copy XPath from table with title "AVKASTNING" i'm getting no data, pls help
=IMPORTXML("http://www.di.se/di-fonder/fonddetaljer/?InstrumentId="&1085603;"//*[#id='ctl00_FourColumnWidthContent_ThreeColumnsContent_MainAndSecondColumnContent_fundInfo_fundPerformance_tableFund']/tbody/tr[4]/td[2]/span" )

If you are willing to try it another way, the following will get the "AVKASTNING" table.
=IMPORTHTML("http://www.di.se/di-fonder/fonddetaljer/?InstrumentId=1085603","table",7)
If you want a specific value from the table use index. The following example gets the second row second column value:
=index(IMPORTHTML("http://www.di.se/di-fonder/fonddetaljer/?InstrumentId=1085603","table",7),2,2)

Related

SELECT statement with multiple conditions on TAGs

For considerably long period of time I’ve been struggling the following problem. This is an example of data stored in the DB:
> show series
flights,cycleId=1535,cycleIdx=0,engineId=2,flightId=1696,flightIdx=0,type=fil
flights,cycleId=1535,cycleIdx=0,engineId=2,flightId=1696,flightIdx=0,type=std
flights,cycleId=1535,cycleIdx=0,engineId=2,flightId=1696,flightIdx=0,type=raw
...
and my intention is to select a specific one by using a query like this:
SELECT * FROM flights WHERE type='fil' AND engineId= '2' AND flightId = '1696' AND flightIdx = '0' AND cycleId = '1535' AND cycleIdx = '0'
Such query, however, yields always zero results. Zilch.
Selecting the first (and only) tag works fine:
SELECT * FROM flights WHERE cycleId = '1535'
but using this condition on any other tag, like for example
SELECT * FROM flights WHERE type='fil'
does never return a single row. Querying only the first tag and nothing else works.
Could you please give me a hint what am I doing wrong? From all I have found people are always selecting just by a single tag but never more. What is the part that I cannot see?
Many thanks for any ideas!

I believe I have discovered the reason: two keys from the tags made by mistake their way into the fields. I spotted the trouble when listing the tag and fields keys as
show tag keys
show field keys
Deleting all records does not remove the keys from these lists and the problem persists. One need to drop the entire database to restore the order of things.

App Inventor Fusion Table Column calling

I developed an app based on App Inventor and Fusion Tables. When I want to update total money by adding some money to already existing money it is giving some error.
When I use SELECT command to get information from fusion table it is taking then number with column Name. When i am trying to add both of them it is giving following error.
Error message

The result from a fusiontable includes always the header row...
from your example SELECT statement the result is
TotalPaid
5000
Obviously to add any value to that result will result in an error, because you only can add numeric values...
You first have to extract the value (in your example 5000) from the result. Convert the result into a list using the split block, just split at \n (new value) to get a list, then select the 2nd item using the select list item block.
Note: to be able to update something in a table, you need the ROWID, see also the SQL Reference Documentation of the Fusion Tables API.
For UPDATE statements the first step to be done is to get the ROWID of the row to be updated with a SELECT statement. The second step is to do the UPDATE.

Fusion tables query does not fetch some

I'm having an issue on this table: 1g_ydg74ooUSBzNfQBHOIdgrOKhxZD_92In8xTDg
I'm trying to fetch some results by CODE_DEPT. I used the filter
CODE_DEPT IN ('001', '002', '003', '02A', '02B')
and only the 3 first are fetched.
Any idea what's going on ?
Cheers

Looks like CODE_DEPT is identified as a numeric column. The last two codes are not numeric and so would not match anything. If you change the column to type Text and you should be OK.

How do I parse a table into its meaningful chunks?

I need to extract a table of data on a collection of pages. I can already traverse the pages just fine.
How do I extract the table's data? I'm using Ruby and Nokogiri, but I would assume that this is a pretty general issue.
I underlined the desired data points in each row in the following image.
A sample of the html is: http://pastebin.com/YYFPbFLC
How would I parse this table into a hash via Nokogiri into the meaningful chunks?
The table's xpath is:
/html/body/table/tbody/tr/td[2]/table/tbody/tr[2]/td/table/tbody/tr/td[2]/table/tbody/tr/td/table/tbody/tr/td[2]/table
The table has a variable number of rows of data and formatting rows. I only want to collect the rows with meaningful data, but I don't readily see a way to distinguish this via an XPath except the second column will reliably have "keyword" in it. Each of these rows have an XPath of:
1st meaningful row is: /html/body/table/tbody/tr/td[2]/table/tbody/tr[2]/td/table/tbody/tr/td[2]/table/tbody/tr/td/table/tbody/tr/td[2]/table/tbody/tr[2]
...
Last meaningful row: /html/body/table/tbody/tr/td[2]/table/tbody/tr[2]/td/table/tbody/tr/td[2]/table/tbody/tr/td/table/tbody/tr/td[2]/table/tbody/tr[N]
The first meaningful column that needs to match text content on the "keyword" is:
/html/body/table/tbody/tr/td[2]/table/tbody/tr[2]/td/table/tbody/tr/td[2]/table/tbody/tr/td/table/tbody/tr/td[2]/table/tbody/tr[2]/td[2]
The last column of this first row of data would be:
/html/body/table/tbody/tr/td[2]/table/tbody/tr[2]/td/table/tbody/tr/td[2]/table/tbody/tr/td/table/tbody/tr/td[2]/table/tbody/tr[2]/td[6]
Each row is a record and has a timestamp with this column/td being the time in the timestamp; The year, month and day are all in their own variables and can be appended for a full timestamp:
/html/body/table/tbody/tr/td[2]/table/tbody/tr[2]/td/table/tbody/tr/td[2]/table/tbody/tr/td/table/tbody/tr/td[2]/table/tbody/tr[2]/td[5]

The first rule of XPath is: never use the autogenerated XPath from Firebug or other browser tool. This creates brittle XPath that treats all page elements as equally important and required, even parts you don't care about. For example, if a notice went up at the top of the page and it happened to be in a table, it could throw off your parsing.
Instead, think about how a human would identify it. In this case, you want "the first table under the heading with the word 'today' in it". Here's the XPath for that:
//table[preceding-sibling::h2[contains(text(), "today")]][1]
This says take the tables that have a preceding h2 (in other words, that follow the h2), where the h2 contains the word "today". Then take the first such table.
Then you need to identify the rows you are interested in. Note that some rows are just dividers containing a single td, so you want to make sure you only parse the rows that have multiple td tags. In XPath, that is:
//tr[td[2]]
Then you just grab the content of all the columns. In the first one you can remove everything before the words "of magnitude" to get just the value. Putting it all together:
doc = Nokogiri::HTML.parse(html)
events = []
doc.xpath('//table[preceding-sibling::h2[contains(text(), "today")]][1]//tr[td[2]]').each do |row|
cols = row.search('td/text()').map(&:to_s)
events << {
:magnitude => cols[0].gsub(/^.*of magnitude /,''),
:temp_area => cols[1],
:time_start => cols[2],
:time_middle => cols[3],
:time_end => cols[4]
}
end
The output is:
[
{:magnitude=>"F1.7",
:temp_area=>"0",
:time_start=>"01:11:00",
:time_middle=>"01:24:00",
:time_end=>"01:32:00"},
{:magnitude=>"F3.1",
:temp_area=>"0",
:time_start=>"04:01:00",
:time_middle=>"04:10:00",
:time_end=>"04:26:00"},
{:magnitude=>"F3.5",
:temp_area=>"134F55",
:time_start=>"06:24:00",
:time_middle=>"06:42:00",
:time_end=>"06:53:00"},
{:magnitude=>"F1.4",
:temp_area=>"0",
:time_start=>"11:58:00",
:time_middle=>"12:06:00",
:time_end=>"12:16:00"},
{:magnitude=>"F1.0",
:temp_area=>"0",
:time_start=>"13:02:00",
:time_middle=>"13:05:00",
:time_end=>"13:09:00"},
{:magnitude=>"D53.7",
:temp_area=>"134F55",
:time_start=>"17:37:00",
:time_middle=>"18:37:00",
:time_end=>"18:56:00"}
]

Customizing jQuery UI Autcomplete

I want to make a few customizations to jQuery UI Autcomplete:
1) If there are no results found it should output "no results found" in the list.
2) Is it possible to highlight/bold the letters in the results as they are being typed? For example if I type "ball" and I have "football" in my results it needs to output as foot ball
3) Is it possible for the results that appear at the top to match the beginning of the string. For example suppose I have 3 entries in my database:
Astrologer
Space Station
Star
I start typing "st" - this will bring up those 3 entries in that order. But I want "Star" to be the first result.
The MySQL query being used at the moment to generate the results is:
$query = mysql_query("SELECT id, name FROM customer WHERE name LIKE '%".$_GET['term']."%' ORDER BY name");

You can simply echo 'No results found' inside the script that returns the list if num rows from your mysql_query is 0.
This was possible in the original Autocomplete plugin but I can't see it anywhere in the JQuery UI documentation.
You may have to run two separate mysql queries - the first one looking for LIKE '".$_GET['term']."%' and the second one as you have it, but excluding the results you've already got from the first query.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

XPath queries to scrape data - google-sheets

Related

SELECT statement with multiple conditions on TAGs

App Inventor Fusion Table Column calling

Fusion tables query does not fetch some

How do I parse a table into its meaningful chunks?

Customizing jQuery UI Autcomplete

Categories

Resources