I am trying to get the url of the images from web link.
For this I use the IMPORTXML and XPATH function in google Spreadsheet.
This is the code:
=IMPORTXML(B1;"//*[#id='gallery-1']/figure/div/a//img/#src")
So far everything is correct, get the urls of each image, however, I am also getting the following text in turn for each URL obtained.
data:image/svg+xml,%3Csvg%20xmlns='http://www.w3.org/2000/svg'%20viewBox='0%200%20500%20400'%3E%3C/svg%3E
I don't know how to prevent this text from appearing. I have been looking at the source code of the page link and this text does not appear anywhere. Therefore I consider that it is something related to spreadsheet.
I have checked the xpath again and again but I also find that there is an error ...
Here is a screenshot of what happened for your reference.
https://drive.google.com/file/d/1Tv_zk0JUI9EjCTU0kKfzRuQr936-88I7/view?usp=sharing
Any help is greatly appreciated. Thanks
try:
=QUERY(IMPORTXML(B1, "//*[#id='gallery-1']/figure/div/a//img/#src"),
"where not Col1 starts with 'data'")
Related
Can you help me with the correct XPath to select the text of the title "Quotes to Scrape" contained in the <a> tag inside an <h1> tag on the following webpage?
I need to use this XPath on the IMPORTXML function in Google Sheets, but I'm not sure if the XPath is correct.
=IMPORTXML("https://quotes.toscrape.com/","//div[#class='col-md-8']/h1/a")
I have an error in Google Sheets.
I am expecting to get the text inside /h1/a.
The answer was to use a semicolon instead of comma:
=IMPORTXML("http://quotes.toscrape.com/";"//div[#class='col-md-8']/h1/a")
I'm writing this answer as a community wiki since the solution was provided by #margusl in the comments section, for visibility to the community.
The issue was related to a typo with the formula. The formula with the issue was
=IMPORTHTML("https://quotes.toscrape.com/","//div[#class='col-md-8']/h1/a")
However, IMPORTHTML doesn't use XPath, it uses a query like "List" or "Table" as mentioned here.
So to fix the issue, you need to fix the typo and use:
=IMPORTXML("http://quotes.toscrape.com/","//div[#class='col-md-8']/h1/a")
Or you can also use:
=IMPORTXML("http://quotes.toscrape.com/","/html/body/div/div[1]/div[1]/h1/a")
I'm trying to get the price of an item in my Sheet but I am unable to get it using IMPORTXML command as it keeps returning the error "Imported content empty"
I'm trying to retrieve the highlighted number on the image.
Site link: https://csgostash.com/sticker/3666/Battle-Scarred-Holo
My code: =IMPORTXML("https://csgostash.com/sticker/3666/Battle-Scarred-Holo","/html/body/div[3]/div[4]/div1/div/div[2]/div[2]/div1/a/span[2]")
The image is here!
It would be great if anyone could help me out, thanks!
The HTML for that website has no visual hierarchy, so it's tough to plow down through it. But this is what I arrived at:
=IMPORTXML("https://csgostash.com/sticker/3666/Battle-Scarred-Holo","/html/body/div[2]/div[4]/div/div/div/div/div[1]/a/span[2]")
From website https://goldenmark.com/pl/mysaver/ceny-zlota/ I want to get bolded value in Aktualny kurs: 6940,28 PLN/uncję into my spreadsheet. How to obtain this?
Thank you
You could try xpath and the importxml function:
https://support.google.com/docs/answer/3093342?hl=en
the element you want to import is controlled by JavaScript.
google sheets is not able to import JS content.
you can always test this by disabling JS for a given site and what's left can be scraped
Currently in my google doc, i'm working on a database for my card worth, and it seems like it doesn't want to grab the information no matter what xpath i want to attempt.
Website i'm trying to take information available here. *This is the hyperlink i'm feeding
In the top right corner i'm attempting to grab the worth box information, here is current xpaths i've attempted
"//a[#id='worthBox']/h4"
"/html/body/div[4]/div[1]/div[2]/form/div[1]/div[2]/div/a/h4"
"/h4"
"/h4[0-20]"
"//a[#id='worthBox'][1]/h4"
"//div[#id='estimate-box']/a/h4"
"//div[#id='estimate-box']/a[1]/h4"
Can someone explain to me why it doesn't seem to wanna fetch, is it even possible?
Thank you so much for your time and help!
In the URL, the value is put using the Javascript. But IMPORTXML cannot retrieve the result after Javascript was run. IMPORTXML retrieves the HTML without running Javascript. I think that your xpath is the result after Javascript was run. By this, they cannot be used. But it seems that the value you expect can be retrieved other xpath.
Modified xpath:
//input[#id='medianHiddenField']/#value
Sample formula:
=IMPORTXML(A1,"//input[#id='medianHiddenField']/#value")
In this case, the URL of https://mavin.io/search?q=Lugia%20NM%209%2F111%20-PSA&bt=sold# put in the cell "A1".
Result:
Reference:
IMPORTXML
I was trying to import a table from hket website to run some analysis of my own.
When I used: =importxml("http://www1.hket.com/finance/chart/industry-index.do","//*[#id='eti-finance-chart-table']") which represents the link to the site, I am getting the "N/A" error.
The importxml works fine with gurufocus site.
Can you help me out? I haven't been able to figure out what the issue could be.
from what I understand, hket doesn't use HTML or XML format for their table. If that is the case, is there a script I can use in Google Sheets that will let me extract data from hket?
you can see the culprit if you run this formula:
=IMPORTXML("http://www1.hket.com/finance/chart/industry-index.do", "//*")