Google Sheets importxml xpath to parse marcxml - google-sheets

I'm trying to parse XML (MarcXML) in Google Sheets.
For example I try to get the value in the subfield with code="a" in the datafield with tag="245"
MarcXML example I'm trying to parse:
https://www.loc.gov/standards/marcxml/Sandburg/sandburg.xml
Google Sheets formula I tried:
=importxml(A1;"//datafield[#tag='245']/subfield[#code='a']")
However with the above formula I get the dreaded error Imported content is empty.
When I use this:
=importxml(A1;"/*")
it does output something (all the values together...)
Since Google Sheet's importxml is outputting something with xpath "/*" I think what I try to do should in theory work? Could someone make a suggestion?
Thank you!

I'm not sure what is the exact problem with your xpath query, but I guess it might be due to the differences of XML and MarcXML.
Anyway this xpath works fine based on the structure of your sample data:
=importxml(A1,"//*[#tag=245]/*[#code='a']")
It searches for any node that has a tag attribute with the value of 245; and then looks for any child node that has a code attribute with the value of 'a'.

Related

X-Path to a library search engine

I am writing a short scrapper using Google Spreadsheets using Xpatch and IMPORTXML
on that page, I am trying to get in B3 and following all the titles of articles (class 'library-document-summary') and in C3 and follow all the URLS of said articles
however, I am getting nowhere as the returns of my XPATH are always empty. Could someone with knowledge in this area help?
B2= https://resources.norrag.org/categories/591,595
=IMPORTXML(B2,"//div//a[#class='library-document-summary']/text()")
I don't think the IMPORTXML function supports XPaths that select text nodes. But I think if your XPath selects the a elements themselves, then their text content will be imported. e.g.
//div[#id='article_search_results']//a
... and for the links:
//div[#id='article_search_results']//a/#href

ImportXML Function on Google Sheets

I'm having a tough time pulling info in on Google Sheets using the ImportXML function. I want to pull in the price of a crypto coin so that I have a real-time feed. The link that I'm hoping to pull from is:
https://www.dextools.io/app/uniswap/pair-explorer/0x40f0e70a7d565985b967bcdb0ba5801994fc2e80
I've tried out a lot of different formulas and keep getting an #N/A or an error. Some of the ones I've tried:
Copy XPATH fully:
=IMPORTXML("https://www.dextools.io/app/uniswap/pair-explorer/0x40f0e70a7d565985b967bcdb0ba5801994fc2e80","/html/body/app-root/div[3]/div/main/app-uniswap/div/app-pairexplorer/app-layout/div/div/div[2]/div[2]/ul/li[2]/span")
Shortened XPATH (also tried deleting the second backslash before 'li' but that didn't work):
=IMPORTXML("https://www.dextools.io/app/uniswap/pair-explorer/0x40f0e70a7d565985b967bcdb0ba5801994fc2e80","//li[2]/span")
Include class:
=IMPORTXML("https://www.dextools.io/app/uniswap/pair-explorer/0x40f0e70a7d565985b967bcdb0ba5801994fc2e80","//li[2]/span[#class='ng-tns-c93-2 ng-star-inserted']")
Does anyone have thoughts? Thanks!
upon disabling JavaScript the site is empty = can't be scraped by Google Sheets by any import formula.
To avoid the problem above, consider using a proper API service that gives you easy access to the data.
For instance you could get Zero price in USD using
=IMPORTDATA("https://cryptoprices.cc/ZERO/")
If you need it in comparison to ETH you could try doing it by hand
=IMPORTDATA("https://cryptoprices.cc/ZERO/")/=IMPORTDATA("https://cryptoprices.cc/ETH/")
Or use a more advanced API such as CoinGecko's
https://www.coingecko.com/en/api

Conditional formatting using countif() with values from an external spreadsheet

I have a Google sheet the references values from another and using conditional formatting, it marks down the cells with the same value. Within the same document, I use the following code:
=countif(indirect("Responses!D2:D103"),A1)=1 That works great.
However, I try to get the same result referencing the same sheet from an external spreadsheet to no avail. I feel like I tried all the combinations of IMPORTRANGE and INDIRECT out there, similar to this: =countif(importrange("sheet_url",indirect("Responses!$D$2:$D$103")),A1)=1
I'm sure I'm missing some small detail, I just can't tell what it is.
try:
=INDEX(COUNTIF(IMPORTRANGE("1ddqnVB9eDkk2tCadotN0NQlZdJDzIX4UyEEuXVs99nk",
"Responses!D1:D103"), A1)=1)
note that access needs to be granted first in order for this to work

Correct path for =IMPORTXML on Google Sheets

This URL: https://www.screwfix.com/p/makita-jr3050t-2-1010w-reciprocating-saw-240v/27338
Trying to use IMPORTXML on google Sheets to pull in the price (119.99 as of today)
Using the following formula:
(via Google Developer Tab, right-click Copy XPath)
=IMPORTXML(https://www.screwfix.com/p/makita-jr3050t-2-1010w-reciprocating-saw-240v/27338, "//*[#id='product_price']/text()")
Or
=IMPORTXML("https://www.screwfix.com/p/makita-jr3050t-2-1010w-reciprocating-saw-240v/27338","//meta[#itemprop='price']/#content")
Or
=importxml(https://www.screwfix.com/p/makita-jr3050t-2-1010w-reciprocating-saw-240v/27338, "//div[#class='pr__price']")
Plus a few other variations - Unfortunatley, they all come out as #N/A
Can anyone help me find the correct path?
It seems that in this case, when the URL is retrieved by IMPORTXML(), most values are included in head. When I tried this URL, body retrieved by IMPORTXML() was empty. So how about this workaround?
=REGEXEXTRACT(IMPORTXML(A1,"//head/*"),"(\d.+)INC")
Please put the URL of https://www.screwfix.com/p/makita-jr3050t-2-1010w-reciprocating-saw-240v/27338 to the cell "A1" and put the formula to other cell.
In this workaround, the value you want is retrieved from the values retrieved from head.
Result:
Note:
I'm not sure whether this formula can be used for other URL. If you want to use this for other URL, please confirm the values and set the xpath and regex.
If you use Google Apps Script, I think that the value can be retrieved from the body of URL.
If this was not what you want, I'm sorry.

Need help getting data from website using importxml and xpath

I would like to have some help to get the data beside ROE from this link using importxml / xpath. http://fundamentus.com.br/detalhes.php?papel=TAEE11 ... so in this case the ROE data is 20,8% . I would like to get this value using importxml / xpath.
How to do that? I've tried some formulas but.. not able to get the details from the website.
You can do this with a combination of importxml, match and index:
=index(IMPORTXML("http://fundamentus.com.br/detalhes.php?papel=TAEE11","//*[#class='txt']"),match("roe",IMPORTXML("http://fundamentus.com.br/detalhes.php?papel=TAEE11","//*[#class='txt']"),0)+1,0)
Basically what is happening is that by point to the class # txt it stacks all the labels above the data fields, so you can consistently search for a label such as ROE, and just increase the index by 1 to retrieve the corresponding value.

Resources