I'm unable to import with ImportHTML table (it's prohibited by website) from the following page:
https://www.barchart.com/futures/quotes/SIH22/volatility-greeks/mar-22
I've tried and failed to use ImportXML function to retrieve a specific value from this page.
For example the last price of a specific strike from the table. As shown in the attached image.
Any hope to retrieve the data with ImportXML?
It seems that the webpage you provided is being controlled by JavaScript which means IMPORT functions of Google Sheets won't be able to fetch the data that you want. To check whether it is JavaScript enabled webpage, you can click on the lock button on the left side of the URL in the browser -> Click on site settings -> Set JavaScript to Block.
Upon doing this, the table in the website you provided disappeared.
I suggest finding another source website which gives the same information and can be fetched using IMPORT functions.
Related
I've tried to get table's data from https://www.set.or.th/th/market/index/set/agro/agri
to Google Sheets
=IMPORTHTML("https://www.set.or.th/th/market/index/set/agro/agri","table",1)
Changing list to table and still unable to get the data.
My expected output in Sheets is
EE bunch of numbers
GFPT bunch of numbers
LEE bunch of numbers
. bunch of numbers
. bunch of numbers
VPO bunch of numbers
I'm writing this answer as a community wiki, since the issue was resolved from the comments section, in order to provide a proper response to the question.
The content you're trying to extract, loads with JavaScript and IMPORT functions can’t extract content that loads with JavaScript. You can check this article.
If you click on the ‘Lock’ icon beside the browser’s address bar, select ‘Site settings’ and set JavaScript to ‘Block’. Reload the page and as you can see in the screenshot below, the site needs JavaScript enabled to load certain content.
I am trying to figure out whether it is possible to import latest parcel status from DPD.
For example, I would like to get 'Delivered' status from below link and import it to google sheets:
https://tracking.dpd.de/status/en_DE/parcel/05252044194808
Any attempts with importxml are, however, empty. Any chance there is a way to download the latest status to Google Sheets? Perhaps that site is secured from scrapping?
IMPORTXML and IMPORTHTML would return data whenever the web page information is not generated through JAVASCRIPT, disabling JAVASCRIPT on the web page you are trying to scrape data from shows no data, so most likely the whole content is generated through Javascript dynamically and that's the reason why those methods don't return any information.
I am interested in importing a table from a website. The does not load all rows at first; it expands as the user scrolls, eventually reaching the end.
I'm using a GoodReads account as an example.
I want to import all rows; however, as the url doesn't change, I believe I will need to use the IMPORTXML function rather than IMPORTHTML. However, I have not been able to identify an XPath that works.
IMPORTHTML displays the rows that initially populate when the page runs (url, "table", 2)
IMPORTXML currently displays text from rows that initially populate when the page runs in single cell
the following link has both options in individual sheets
https://docs.google.com/spreadsheets/d/14jHGRyHKf866jrZiIfX2-GX6hZ2SE6vi_DezckpXaRs/edit?usp=sharing
Upon checking the website you are trying to get data from, it the auto-expand of data while scrolling is controlled by JavaScript. Thus, it cannot be fetched using import() functions from google. This is why you can only fetch the initially available data.
To check whether a website is being controlled by JavaScript, click on the lock button beside the url->go to Site Settings->set JavaScript to 'block'->refresh the website and see the difference.
In your case, after blocking JavaScript, the website turned into this:
This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
So I am trying to do a spreadsheet using Google Sheets and the importxml/html function. However, I am not seeing a solution for the URL since it has tabs on a persistent URL: https://www.morningstar.co.uk/uk/etf/snapshot/snapshot.aspx?id=0P0001CY2T&tab=3&InvestmentType=FE
My goal is to extract the tables of value & growth but not seeing a way to work around that. Only making it work on the main page of the URL: https://www.morningstar.co.uk/uk/etf/snapshot/snapshot.aspx?id=0P0001CY2T which is data I don't intend to use.
I did try to importhtml with table selection, however not displaying any data when the first URL is used. Also did try importxml with both full Xpath and Xpath for the items I'm interested in and not working either...
Options used:
=importhtml("https://www.morningstar.co.uk/uk/etf/snapshot/snapshot.aspx?id=0P0001CY2T&tab=3&InvestmentType=FE";"table";"2")
=importxml("https://www.morningstar.co.uk/uk/etf/snapshot/snapshot.aspx?id=0P0001CY2T&tab=3&InvestmentType=FE";"//#html/body/div/sal-components-pillar-cards-process/div/div[2]/div/div[3]/div[2]/div/sal-components-mip-style-measures/div/div[3]/div/div[1]/sal-components-mip-measures/div/div[2]/div/div[2]/div/div/div/table/tbody/tr[1]/td[2]")
Any ideas?
It seems that the table you are trying to fetch is controlled by Javascript which is out of hand when using IMPORTs in Google Sheets. Thus, the table can't be scraped.
You can check whether a website/table in a website is javascript controlled by doing this. Go click on the lock button on the left side of the address bar and click site settings, look for Javascript then block it. If you try and reload the website, You should notice a difference before blocking Javascript.
In this case, if you try it on your end, you will notice that after blocking Javascript on the website, you won't be able to see the tables anymore.
IMPORT functions of google sheets are not able to handle JavaScript elements. if you disable JS you are left with (and only this can be imported):
First of all, I'm completely incompetent and my hours-long attempts at trying to make this work have been fruitless. So, please, there's someone that can help me.
I have
table id="..........." tablesorter class="........"
They are in the same line of code ad I'm able to scrape until the first element. For me it's important to scrape by the second one. I'm tryng different way but nothing
investing
In the image, in the part highlighted on the left where there is the drop-down menu, it's possible to select the different American markets (Nasdaq, DowJones,
S&P500 etc.). When I select a market other than DowJones, the URL of the page always remains the same, while the part that I highlighted on the right changes (tablesorter class = "............").
In my sheet, I've done this but it can't allow me to scrape different market (only the default table thay you see when open the webpage)
spreadsheet
Your main problem is that IMPORTXML can only retrieve information from static content in websites. Therefore, any content inserted dynamically can't be retrieved by this function.
In your case, you can check what content is not static by heading over to the website https://it.investing.com/equities/americas and then disabling JavaScript on it. To do so if you are using Chrome please follow this guide.
As Javascript will add dynamic content to the site, when you disable it you will observe that the information subject to change with the dropdown doesn't actually change which means that it was dynamically inserted and therefore can't be accessed by IMPORTXML. I have attached an image below showing this.
As a workaround to this you will need to use other web scraping techniques.