First of all, I'm completely incompetent and my hours-long attempts at trying to make this work have been fruitless. So, please, there's someone that can help me.
I have
table id="..........." tablesorter class="........"
They are in the same line of code ad I'm able to scrape until the first element. For me it's important to scrape by the second one. I'm tryng different way but nothing
investing
In the image, in the part highlighted on the left where there is the drop-down menu, it's possible to select the different American markets (Nasdaq, DowJones,
S&P500 etc.). When I select a market other than DowJones, the URL of the page always remains the same, while the part that I highlighted on the right changes (tablesorter class = "............").
In my sheet, I've done this but it can't allow me to scrape different market (only the default table thay you see when open the webpage)
spreadsheet
Your main problem is that IMPORTXML can only retrieve information from static content in websites. Therefore, any content inserted dynamically can't be retrieved by this function.
In your case, you can check what content is not static by heading over to the website https://it.investing.com/equities/americas and then disabling JavaScript on it. To do so if you are using Chrome please follow this guide.
As Javascript will add dynamic content to the site, when you disable it you will observe that the information subject to change with the dropdown doesn't actually change which means that it was dynamically inserted and therefore can't be accessed by IMPORTXML. I have attached an image below showing this.
As a workaround to this you will need to use other web scraping techniques.
Related
I've tried to get table's data from https://www.set.or.th/th/market/index/set/agro/agri
to Google Sheets
=IMPORTHTML("https://www.set.or.th/th/market/index/set/agro/agri","table",1)
Changing list to table and still unable to get the data.
My expected output in Sheets is
EE bunch of numbers
GFPT bunch of numbers
LEE bunch of numbers
. bunch of numbers
. bunch of numbers
VPO bunch of numbers
I'm writing this answer as a community wiki, since the issue was resolved from the comments section, in order to provide a proper response to the question.
The content you're trying to extract, loads with JavaScript and IMPORT functions can’t extract content that loads with JavaScript. You can check this article.
If you click on the ‘Lock’ icon beside the browser’s address bar, select ‘Site settings’ and set JavaScript to ‘Block’. Reload the page and as you can see in the screenshot below, the site needs JavaScript enabled to load certain content.
This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
So I am trying to do a spreadsheet using Google Sheets and the importxml/html function. However, I am not seeing a solution for the URL since it has tabs on a persistent URL: https://www.morningstar.co.uk/uk/etf/snapshot/snapshot.aspx?id=0P0001CY2T&tab=3&InvestmentType=FE
My goal is to extract the tables of value & growth but not seeing a way to work around that. Only making it work on the main page of the URL: https://www.morningstar.co.uk/uk/etf/snapshot/snapshot.aspx?id=0P0001CY2T which is data I don't intend to use.
I did try to importhtml with table selection, however not displaying any data when the first URL is used. Also did try importxml with both full Xpath and Xpath for the items I'm interested in and not working either...
Options used:
=importhtml("https://www.morningstar.co.uk/uk/etf/snapshot/snapshot.aspx?id=0P0001CY2T&tab=3&InvestmentType=FE";"table";"2")
=importxml("https://www.morningstar.co.uk/uk/etf/snapshot/snapshot.aspx?id=0P0001CY2T&tab=3&InvestmentType=FE";"//#html/body/div/sal-components-pillar-cards-process/div/div[2]/div/div[3]/div[2]/div/sal-components-mip-style-measures/div/div[3]/div/div[1]/sal-components-mip-measures/div/div[2]/div/div[2]/div/div/div/table/tbody/tr[1]/td[2]")
Any ideas?
It seems that the table you are trying to fetch is controlled by Javascript which is out of hand when using IMPORTs in Google Sheets. Thus, the table can't be scraped.
You can check whether a website/table in a website is javascript controlled by doing this. Go click on the lock button on the left side of the address bar and click site settings, look for Javascript then block it. If you try and reload the website, You should notice a difference before blocking Javascript.
In this case, if you try it on your end, you will notice that after blocking Javascript on the website, you won't be able to see the tables anymore.
IMPORT functions of google sheets are not able to handle JavaScript elements. if you disable JS you are left with (and only this can be imported):
Back in the day, I know it was possible to really customize your Google Docs/Forms. Is that still the case? There were all these tricks about how to make it look the way you want when you embed them onto a page (and remove Google's branding). The closest post I'm finding is from 2014 and it doesn't appear to work anymore.
What I'm trying to do is embed a small table into a webpage -- and only that table. I don't want anything else from Google. This table will be updated by me on the backend, so it can't be just an image.
You have to publish to the web and then get the iframe code to embed. At the top of your Spreadsheet, click File and then Publish to the web. Click on the Embed tab and then on the "Publish" button, this will generate the iframe element you can use in your web.
Within the iframe, there's the src attribute with a URL you can edit with some query parameters to customize Sheet's behaviors, for example, to specify the range to be displayed in the iframe, you can add the range query parameter:
<iframe src="https://docs.google.com/spreadsheets/d/e/[SPREADSHEET-ID]/pubhtml?gid=0&range=[RANGE]&single=true&widget=true&headers=false"></iframe>
In your case it would appear your Spreadsheet ID and you need to replace the [RANGE] with the A1 notation of the range you want to display.
I would like dashboard to link to worksheet. Unfortunately, these instructions show how to link to webpage, or file on server.
http://onlinehelp.tableau.com/current/pro/online/mac/en-us/actions_url.html
How to link to worksheet?
What is the purpose of the link?
If you have a whole lot of worksheets and just want to open the one that you look at in the dashboard, you can cllick the small "go to" icon in the top right corner of the graph.
To actually create an action that links to a worksheet is not possible and doesn't seem to make sense, since the URL action is meant to create a link based on a certain value in your data, however no matter what data is displayed, you will always end on the same graph.
In case you want to further investgate certain data you select, you could create a filter in the same dialogue that filters other graphs in your dashboard based on the values you select.
Some time in the last year, a tool that I use no longer displays Google search results in one of its frames. I suspect that Google started using JavaScript code to hide itself, if it is being displayed in a frame, which is understandable for most uses.
However, this is a tool that only I use, so I'm not misrepresenting to anyone. I use this tool to research data. One frame has a form where I enter data that I find online. The other frame has the Google results and the pages they link to. I can see both the data form that I'm working on and the changing search/results side-by-side in one window.
I tried going to an older browser version, but I think they are using JavaScript.
Now I have to right-mouse click "open in a new tab" and then click to the new tab, and then close it, a lot of extra overhead when I'm trying to process this repetitive research over and over.
Any ideas? Confirmations as to what has changed? I suppose I could retrieve the page in PHP, strip out the part that hides the page and then put the page source in the other frame. A bit of a challenge for me.