Need help getting data from website using importxml and xpath - google-sheets

I would like to have some help to get the data beside ROE from this link using importxml / xpath. http://fundamentus.com.br/detalhes.php?papel=TAEE11 ... so in this case the ROE data is 20,8% . I would like to get this value using importxml / xpath.
How to do that? I've tried some formulas but.. not able to get the details from the website.

You can do this with a combination of importxml, match and index:
=index(IMPORTXML("http://fundamentus.com.br/detalhes.php?papel=TAEE11","//*[#class='txt']"),match("roe",IMPORTXML("http://fundamentus.com.br/detalhes.php?papel=TAEE11","//*[#class='txt']"),0)+1,0)
Basically what is happening is that by point to the class # txt it stacks all the labels above the data fields, so you can consistently search for a label such as ROE, and just increase the index by 1 to retrieve the corresponding value.

Related

Google Sheets importxml xpath to parse marcxml

I'm trying to parse XML (MarcXML) in Google Sheets.
For example I try to get the value in the subfield with code="a" in the datafield with tag="245"
MarcXML example I'm trying to parse:
https://www.loc.gov/standards/marcxml/Sandburg/sandburg.xml
Google Sheets formula I tried:
=importxml(A1;"//datafield[#tag='245']/subfield[#code='a']")
However with the above formula I get the dreaded error Imported content is empty.
When I use this:
=importxml(A1;"/*")
it does output something (all the values together...)
Since Google Sheet's importxml is outputting something with xpath "/*" I think what I try to do should in theory work? Could someone make a suggestion?
Thank you!
I'm not sure what is the exact problem with your xpath query, but I guess it might be due to the differences of XML and MarcXML.
Anyway this xpath works fine based on the structure of your sample data:
=importxml(A1,"//*[#tag=245]/*[#code='a']")
It searches for any node that has a tag attribute with the value of 245; and then looks for any child node that has a code attribute with the value of 'a'.

X-Path to a library search engine

I am writing a short scrapper using Google Spreadsheets using Xpatch and IMPORTXML
on that page, I am trying to get in B3 and following all the titles of articles (class 'library-document-summary') and in C3 and follow all the URLS of said articles
however, I am getting nowhere as the returns of my XPATH are always empty. Could someone with knowledge in this area help?
B2= https://resources.norrag.org/categories/591,595
=IMPORTXML(B2,"//div//a[#class='library-document-summary']/text()")
I don't think the IMPORTXML function supports XPaths that select text nodes. But I think if your XPath selects the a elements themselves, then their text content will be imported. e.g.
//div[#id='article_search_results']//a
... and for the links:
//div[#id='article_search_results']//a/#href

Extract html table row to google sheet

I’m trying to extract a single row from a table
When using the google sheet importhtml function, I get the whole table.
=IMPORTHTML("https://www.marketwatch.com/investing/stock/jwn/options?mod=mw_quote_tab", "table",1)
How can I extract just the row right above the word “ Current price as of “
So e.g. in this case the row will have the data below. (this data will change as the date changes)
quote 1.5 0.53 76 1.36 1.47 142 39 quote 0.88 -1.73 23
I have several urls to go thorough
So e.g if I put the following url then the row position will change.
https://www.marketwatch.com/investing/stock/ge/options
Any idea how to extract that just last row right above the word “ Current price as of “
When I saw the HTML data from the URL of https://www.marketwatch.com/investing/stock/ge/options, I thought that the value you expect might be able to be retrieved using IMPORTXML and a xpath. So in this answer, I would like to propose to use IMPORTXML.
Sample formula:
=IMPORTXML(A1,"//tr[td[1]/#class='acenter inthemoney'][last()]")
In this case, the URL of https://www.marketwatch.com/investing/stock/ge/options is put in the cell "A1".
Result:
Note:
This sample formula can be used for the current URL of https://www.marketwatch.com/investing/stock/ge/options. So when the URL is changed and the HTML structure is changed by updated of the site, the formula might not be able to be used. So please careful this.
Reference:
IMPORTXML
ImportHTML() simply allows you to read an (entire!) HTML table or list into your Google sheet.
If you want to filter or manipulate the imported data, then you'll need to use other Google Sheets functions. These are documented here:
Google Sheets function list
Alternatively, you might want to "import" input one sheet, then select certain data into another, separate sheet:
Get data from other sheets in your spreadsheet
Here are some examples for "filtering" your data:
FILTER function

How to make importxml only give a certain data

I am trying to get only the number of likes from a website. Currently, I am using
=IMPORTXML("https://www.abillionveg.com/articles/vegan-diet-nutrition-guide","//button")
However, it gives me data from all of the buttons. Can someone help me modify the formula to show only the likes?
Sorry if this is a basic question, I am just learning.
You want to retrieve the number of the number of likes using IMPORTXML.
If my understanding is correct, how about this answer?
Modified formula 1:
=INDEX(SPLIT(IMPORTXML(A1,"//div[#class='ArticleActions__Container-sc-15ye7g8-0 huWdyg'][1]//span[contains(text(),'likes')]")," "),1)
The URL of https://www.abillionveg.com/articles/vegan-diet-nutrition-guide is put in the cell "A1".
The xpath is //div[#class='ArticleActions__Container-sc-15ye7g8-0 huWdyg'][1]//span[contains(text(),'likes')].
Retrieve the value using IMPORTXML.
Retrieve the number of ### from the value like ### likes using SPLIT and INDEX.
Result:
Modified formula 2:
=REGEXEXTRACT(IMPORTXML(A1,"//script[#id='__NEXT_DATA__']"),"likesCount""\:(\d+)") - 1
This result is the same with Modified formula 1.
Note:
For example, if =IMPORTXML(A1,"//div[#class='ArticleActions__Container-sc-15ye7g8-0 huWdyg'][1]//span[contains(text(),'likes')]") is used, 100 likes is retrieved.
References
IMPORTXML
SPLIT
INDEX

How to use ImportXML to extract text within <span> with multiple classes?

I'm using Google spreadsheet's ImportXML function, trying to fetch member counts from discordapp.com's invite link so that I can keep track on multiple servers' size and growth. The desired text is inside a span inside other divs. From what I've read, I'd think my code would work, but the error says content is empty. See details below:
My attempted code:
=ImportXML("https://discordapp.com/invite/steam","//span[#class='pillMessage-1btqlx medium-zmzTW- size16-14cGz5 height20-mO2eIN']")
Expected: Cell filled with current count, "24,013 Members".
Preferably: Cell filled with value 24013.
Actually: Cell: #N/A & Hovering: Error Imported content is empty.
How can I fix it to fetch the server's member count?
How about this answer?
It seems that at the site, the value like 24,013 is shown by the script. So the value cannot be directly retrieved by IMPORTXML(). But when I saw the HTML, it was found that the value is included in the metadata of HTML. In this answer, as a workaround, the value is retrieved from the metadata. Please think of this as just one of several answers.
Modified formula:
=VALUE(REGEXEXTRACT(IMPORTXML(A1,"//meta[3]/#content"),"hang out with ([0-9,]+) "))
The url of https://discordapp.com/invite/steam is put to the cell "A1".
Content of metadata is retrieved using IMPORTXML().
In this case, I used //meta[3]/#content as the xpath.
The value is retrieved using REGEXEXTRACT().
The value is converted to the number using VALUE().
Result:
When I tried above formula, 24018 was retrieved.
References:
IMPORTXML
REGEXEXTRACT
VALUE
If I misunderstood your question and this was not the result you want, I apologize.

Resources