Using IMPORTXML to scrape content from web pages - google-sheets

I’m trying to use IMPORTXML to pull job information from Linkedin into Google sheets. I have a list of job URLs which I hope to pull various elements from the page (job title, description, company profile link etc.) into a spreadsheet. No matter what I try, I can’t get it to pull anything.
As far as I can see, the below formula looks correct but it returns a “Could not fetch URL” error
=IMPORTXML("www.linkedin.com/jobs/view/585970109","//*[#id='job-details'])")
Any ideas of what I’m doing wrong would be greatly appreciated.
Thanks

I will give tip may helps you. inside IMPORTHTML() built-in function it use urlFetching so it get results of the page without Compiling JavaScript because it just Http request who compile JavaScript is the browser. if you go to Linkedin site settings and disable JavaScript and reload, Data in page won't appear and most of sites do that to protect their data from that things.

Related

Is it possible to use Google sheet's IMPORTXML to extract data from a website document which file format is unknown to me?

I'm wondering if there's a way to extract data/information from the document shown in the link below, using Google sheets' IMPORTXML?
https://app.safetyculture.com/report/public/audit/3fa14de51938e5946b0c5f276db75e217e7dba82d0a9fe4f0abf695bd1bb284e?utm_source=iauditor_android&utm_medium=export_email
I tried understanding and using XPath, but can't make it work.
I would greatly appreciate any help from someone.
Thank you very much.
The site you're trying to extract data from, loads with JavaScript and the IMPORT function can’t extract content that loads with JavaScript. You can check this thread.
You can check that the website is JavaScript controlled by clicking on the ‘Lock’ icon beside the browser’s address bar, select ‘Site settings’ and set JavaScript to ‘Block’. Reload the page and as you can see the screenshot below, the site needs JavaScript enabled to load data.

Looking to import HTML information into a Google sheet

After multiple test and research I don't have success in importing the data of this table (div) into a Google slide.
None of the formula I tested actually work included this simple test to extract the first column/line "Name":
=importxml("https://ecosystem.lafrenchtech.com/lists/18872/list?showGrid=false", "//span[#class='table-column-text']")
:(
Anyone could help me ?
Thx by advance.
Answer:
I've tested your function on a test sheet and it returns an empty content.
According to an answer at Google Sheets importXML Returns Empty Value , IMPORTXML can not retrieve data which is being populated by a script and it is a limitation. Unfortunately, I have checked that when Javascript is disabled for the ecosystem.lafrenchtech.com site in Chrome browser, the table never loads. Thus, this confirms that the table is being populated by a script and this is the reason why it returns an empty content.
A possible alternative solution is to check if the ecosystem.lafrenchtech.com offers an API, where you can directly get the data that they show from their table using an API key (if it is available). However, this will require you to use Apps Script to parse the data from their API and then post it on your spreadsheet, which would be quite a tedious for a quite simple process.
Note:
On your post, google-slides was the set tag.

Google web app to log page accesses to Google docs, slides, sheets, etc

I have a range of Google docs that are publicly viewable, but I would like to get some information about how often they are being viewed. I understand that there used to be a way of doing this with Google Analytics, but now that has been removed.
It seems to me that I have two main options, one of which is to make all my doc links point to a page which redirects according to a query string parameter, e.g.:
http://myurl.net?page=1 # Sends you to one page and logs the visit
http://myurl.net?page=2 # Sends you to another page and logs the visit
Or alternatively, I could try to embed some code in each doc that makes a call back to the server with its page number. But I don't know if this is possible.
The first option looks like it should be fairly easy, but I don't see how to redirect the client.
Could anyone give me some ideas about how to do this? It seems it would be useful for quite a lot of people.
Many thanks.
Justin.

Google spreadsheet importHTML Could not fetch URL

Can someone confirm it for me?
I'm helping someone with the importHTML problem on Google spreadsheet. I'm not familiar with importHTML but I thought it should work.
=importhtml("http://www.stockq.org/","table",1)
I don't care which table I'm importing so long as it imports something. It's giving out error message Error: Could not fetch url: http://www.stockq.org/. But the web site is accessible in my browser. That's really bizarre.
My Google Spreadsheet can't cope with the Chinese characters but numbers recognisable by me on the web page are happily imported, as least for the middle table of the three, with:
=importhtml("http://www.stockq.org/","table",A12)
This is much what was I think mentioned by #DigitalSeraphim way back in September. To quote from an answer that was deleted (as not an answer?):
So, I have been building a page to help me keep up with mod updates for my minecraft server, using importxml heavily. I have found that I get the same error for some sites that load absolutely fine in the browser. Looking into it further, I found that the sites are reporting a 404 error, but actually returning the data requested. According to https://drupal.stackexchange.com/questions/110651/how-to-show-a-node-but-return-http-404-response, this is used to remove pages from search engines, as I had assumed. I don't think there is any way around this without some hackery... namely, setting up a "proxy" server that would "fix" the status.
However, it appears that the example you gave is now working, so maybe give it another try.
TL;DR
Use IMPORTXML with XPaths.
I encountered similar problem where I tried to switch between http and https. The work around worked occasionally but the result is not consistent (either way failed a lot).
Later I noticed there is another API named IMPORTXML (XML, not HTML here). With this one you can actually query the content from the same URL and apply XPath instead.
Therefore I would suggest to switch to use IMPORTXML. For example, the following formula
=IMPORTXML("http://www.stockq.org/index/IBOV.php", "//table[#class='indexpagetable']")
will give you all the tables that have class indexpagetable from the page of the given URL.
Note the XPath is slightly different in the spreadsheet, you can refer to the documents for more specifics.

Labview to google spreadsheet information transfer

I have been using LabVIEW to collect measurement data, and I would like to know if it is possible for LabVIEW to communicate the results to a Google Spreadsheet. If so, where could I find resources to learn how to make LabVIEW transmit information to the Google Spreadsheet ?
Thanks!
EDIT AND FOLLOW-UP- I used Jonathan's suggestion below and experimented with the LabVIEW http Post.vi. It's very simple, all you need to do is enter the URL of the Google form (replacing the final "viewform" with "formResponse") and a string with the data you want to enter (with rough syntax = ). A big thanks for that answer, it was really helpful !
However, when I try to use this method for a Google form with more than one page, the data isn't read properly... The form is still sent but every field not present on the first page of the form remains blank on the Spreadsheet. I feel that this is somehow linked to the fact that in the Google form, the URL of all the pages after page 1 are the URL of page 1 with the final "viewform" replaced with "formResponse". Is this what is causing the error or is it something else altogether, and how can I fix it ?
I can think of two ways to do this:
You can create a form in google spreadsheets. The form appears as an html document with standard tags. From here, I would use labview's http functionality to submit data to that form using a POST request. This would be the easiest way to get data in there.
Using the Google Apps API, you can manipulate google spreadsheets and dump data in there directly. This is going to be more complicated in terms of development time, but more configurable in the long run. https://developers.google.com/google-apps/spreadsheets/#what_can_this_api_do There are .net and java code examples throughout the documentation, so it would take some work to port this to LabVIEW, but it could be done.

Resources