Google ImportXML from QGIS metadata file - google-sheets

I am trying to capture elements of an qmd file (that is xml markup) using Google Sheets importxml. Based on How to use importXML function with a file from Google Drive? I think I've got the file importing correctly but can't seem to capture any of the tags.
Here's what I am trying -
=importXML("https://drive.google.com/uc?id=1AI2C8hQnSOuuoyJXizYBszGmpMXW8xxT&export=download","\\identifier")
Here's what the qmd/xml file looks like
<!DOCTYPE qgis PUBLIC 'http://mrcc.com/qgis.dtd' 'SYSTEM'>
<qgis version="3.9.0-Master">
<identifier>Z:/My Drive/Mangoesmapping/Spatial Projects/2019/DSC/132_Ongoing_Asset_Updates/Working/Sewerage_Updates/Sewerage_Manholes_InspectionShafts.TAB</identifier>
<parentidentifier>Sewerage Manhole Infrastructure</parentidentifier>
<language>AUS</language>
<type>dataset</type>
<title>Sewerage Manholes within Douglas Shire Council</title>
<abstract>Sewerage Manholes within Douglas Shire Council. Most data has been updated based on field work, review of existing AsCon files and discussion with council staff responsible for the assets in 2018/2019. In Port Douglas most of the infrastructure has been surveyed in. </abstract>
<keywords vocabulary="gmd:topicCategory">
<keyword>Infrastructure</keyword>
<keyword>Sewerage</keyword>
If I use
=importXML("https://drive.google.com/uc?id=1AI2C8hQnSOuuoyJXizYBszGmpMXW8xxT&export=download","*")
I get
But I really would like to just get the elements I want by placing the importxml for each tag in the cell I need it in.

You want to retrieve ### of <identifier>###</identifier> from https://drive.google.com/uc?id=1AI2C8hQnSOuuoyJXizYBszGmpMXW8xxT&export=download
I could understand like above. If my understanding is correct, how about this answer?
Issue:
In your question, the formula of =importXML("https://drive.google.com/uc?id=1AI2C8hQnSOuuoyJXizYBszGmpMXW8xxT&export=download","\\identifier") uses \\identifier as the xpath. From your data you want to retrieve the values, it seems that you are trying to retrieve ### of <identifier>###</identifier>.
In this case, in order to Selects nodes in the document from the current node that match the selection no matter where they are, // is required to be used instead of \\. This can be seen at the document of here.
Modified formula:
So =importXML("https://drive.google.com/uc?id=1AI2C8hQnSOuuoyJXizYBszGmpMXW8xxT&export=download","\\identifier") can be modified as follows.
=importXML("https://drive.google.com/uc?id=1AI2C8hQnSOuuoyJXizYBszGmpMXW8xxT&export=download","//identifier")
As other xpath, from your data in your question, you can also use the xpath of /qgis/identifier instead of //identifier. So you can also use the following formula.
=importXML("https://drive.google.com/uc?id=1AI2C8hQnSOuuoyJXizYBszGmpMXW8xxT&export=download","/qgis/identifier")
References:
IMPORTXML
XPath Tutorial

Related

Attempting to import from a XPath, seems to always yield blank information

Currently in my google doc, i'm working on a database for my card worth, and it seems like it doesn't want to grab the information no matter what xpath i want to attempt.
Website i'm trying to take information available here. *This is the hyperlink i'm feeding
In the top right corner i'm attempting to grab the worth box information, here is current xpaths i've attempted
"//a[#id='worthBox']/h4"
"/html/body/div[4]/div[1]/div[2]/form/div[1]/div[2]/div/a/h4"
"/h4"
"/h4[0-20]"
"//a[#id='worthBox'][1]/h4"
"//div[#id='estimate-box']/a/h4"
"//div[#id='estimate-box']/a[1]/h4"
Can someone explain to me why it doesn't seem to wanna fetch, is it even possible?
Thank you so much for your time and help!
In the URL, the value is put using the Javascript. But IMPORTXML cannot retrieve the result after Javascript was run. IMPORTXML retrieves the HTML without running Javascript. I think that your xpath is the result after Javascript was run. By this, they cannot be used. But it seems that the value you expect can be retrieved other xpath.
Modified xpath:
//input[#id='medianHiddenField']/#value
Sample formula:
=IMPORTXML(A1,"//input[#id='medianHiddenField']/#value")
In this case, the URL of https://mavin.io/search?q=Lugia%20NM%209%2F111%20-PSA&bt=sold# put in the cell "A1".
Result:
Reference:
IMPORTXML

Google Sheet find cell by URL parameter

I have a database of elements, each element has its own QR Code. After reading the code I would like to be able to open the worksheet on a specific tab and jump to the appropriate cell (according to the element name). Calling a worksheet through a URL with the #gid parameter allows you to open a tab.... the "range" parameter allows you to jump to a specific cell.... and what if I want to search for an item by name? Something like: https://docs.google.com/spreadsheets/d/1fER4x1p.../edit#gid=82420100&search=element_name.... is it possible?
Google has not introduced this yet
But you can look into Google Script (Googles SpreadSheets macros like) to achieve this.
Also a simpler approach will be to just filter the data, but this will change your requirement obviously. For example you can create a Filter with the name you are looking for and then you will get the URL.
This is the URL to a Sample of this, it should open the
Spreadsheet and filter the data when loaded. This is the Icon to
look for to create the filters
here is some documentation for you to get started on Google App Script, but I don't have a direct link to let you know how to catch the parameters for it to process them. What I can tell you is that this is a much more complicated approach than just a URL because it involves programmatic processing on the Spreadsheet side.

Google Spreadsheet getting text with importxml

I've tried this and other versions to no avail? Can anyone help please?
=IMPORTXML("http://performance.morningstar.com/fund/ratings-risk.action?t=MWTRX", "//*[#id='div_ratings_risk']/table/tbody/tr[4]/td[3]/text()")
As explained in the comments to your original question, initially the div Element with the id #div_ratings_risk is initially empty and does not consist of a table.
So Google spreadsheets is not able to parse content that is not there and yet needs to be loaded first.
The content (table) you try to fetch data from into your google spreadsheet is dynamically loaded using jQuery from another URL. You can get that URL using e.g. the chrome developer tools and filter for XHR request.
If you parse the content directly from that HTML it will work. So you would need to change your formula to that URL and adapt your XPath like so:
=IMPORTXML("http://performance.morningstar.com/ratrisk/RatingRisk/fund/rating-risk.action?&t=XNAS:MWTRX&region=usa&culture=en-US&cur=&ops=clear&s=0P00001G5L&ep=true&comparisonRemove=null&benchmarkSecId=&benchmarktype=", "//table/tbody/tr[4]/td[3]/text()")

How to get a note or comment

I have an Excel containing cell comments/notes, I've uploaded it to Google Drive, and converted it into a Google Spreadsheet. How may I retrieve the comments for cell A1?
The API doesn't describe how to get a note or comment from a cell.
The Google Apps Spreadsheet API only provides programmatic access to spreadsheet data. You cannot access comments using the API.
You can get the Notes for a cell or range of cells using Google Apps Script. See the Class Range documentation for these methods:
getNote() - Returns the note associated with the given cell.
getNotes() - Returns the notes associated with the cells in the range.
There are open issues related to this functionality, mainly concerning comments (the "other" type of note). See Issue 1818 and Issue 2566
In a comment on your question, SGC asks if you've looked at developers.google.com/drive/v2/reference/comments/get and developers.google.com/drive/v2/reference/comments/list; these are FILE level comments, not the comments attached to a spreadsheet cell.
1 - Add the following custom function to your spreadsheet (through the Tools -> Script Editor menu)
function getNote(cell) {
return SpreadsheetApp.getActiveSheet().getRange(cell).getComment()
}
2 - on your spreadsheet use the following formula to get contents
=getNote(cell("address",a1))
or
=getNote(cell("address",E9),GoogleClock())
if your comments are not static.
Bit of a muck around but it's possible to download the googlesheet as a .xlsx file, which, with varying degrees of success, converts the comments into Excel comments.
Uploading this xslx file converts the comments to notes!
There is a difference between Note and Comment.
getNote is working fine. You create a Note by using the cell menu "insert Note" , and not "insert Comment"
Very sad, but there is no getComment. The example of Oren did not work for me either.

Google XPATH importxml can find "show" but not "showcount" or "count" [duplicate]

This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
Using this webpage as an example http://forums.macrumors.com/showthread.php?t=1688317
On a google spreadsheet, the following DO NOT work with importxml():
//a[contains(#href,"showpost")]/#href
//a[contains(#href,"showcount")]/#href
//*[#id="postcount18545482"]
The last one (//*[#id="postcount18545482"]) was copied directly from Chrome's element viewer.
The following DO work but exclude any results with the word "showcount", "postcount", or "showpost":
//div[contains(#id,"post_message")]/#id
//a[contains(#href,"show")]/#href
//a[contains(#href,"post")]/#href
Is there something special about the word "count" when working with importxml() or XPATH? How can I get the missing entries?
ImportXML function in Google Docs spreadsheet can not process data that is created in a two-step process. For example, when an authentication token must be retrieved first before making the url request, or when the URL tells the server to dynamically create an xml output after which the user is redirected to the output, even when the URL stays the same. You might want to look into Google Apps Scripts (http://code.google.com/googleapps/appsscript/index.html) to handle this case.
Taken from here
In your particular case the anchor parameters get set in the vbulletin_post_loader.js script called after the page container is loaded.
...
pc_obj=fetch_object("postcount"+this.postid);
openWindow("showpost.php?"+(SESSIONURL?"s="+SESSIONURL:"")
+(pc_obj!=null?"&postcount="+PHP.urlencode(pc_obj.name):"")+"&p="+A)
...
In other words, when importXML() scans the page, the nodes containing 'showpost' or 'postcount' in href are not yet on the page:
Looks like importXML() works with static pages only and not able to handle dynamically loaded content.
Try to find another way of obtaining the number of post in a thread.

Resources