I am using IMPORTXML to get information on USPS tracking numbers for my orders and I have been using it for about a month or so, it used to work on and off, sometimes it would not work and all I had to do was either refresh the page or add/remove the "S" on https, and it would work again but it has been about 5 days that it is not working at all no matter what I do, and it is a very tedious task to do manually, external 3rd party tracking apps won't work either because we need everyone to use just the sheet we have because not only contains the tracking info but also everything else. So is there any other way I can import some contents of the USPS tracking website that is reliable and won't stop working, I've seen some scripts here and there but haven't been able to apply them to my needs. Also if that script or workaround could work with other websites like UPS and Fedex it would be awesome as IMPORTXML doesn't work with them (it always says that the content is empty). Thanks in advance.
Related
This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
Looking for a way to get the stock price for a specific date (eg. 31.1.2020).
I know I can use IMPORTHTML or IMPORTXML together with INDEX to get the table. However, when I use the browser to search for a specific date on investing.com, there's no direct URL for the date, and it rather presents me with the latest stock prices instead. This is the stock I'm looking for
I'm afraid that investing.com do not provide an API
https://www.investing-support.com/hc/en-us/articles/115005473825-Do-you-provide-an-API-
So you won't be able to do this very easily (if at all) with Google Sheets or Apps Script. The reason is that it looks like most of the content on the site is generated with JavaScript, and so it is not part of the original HTML that is shown when you first enter the site. The HTML is what IMPORTHTML gets.
To get the information you are looking for without using and API, would involve browser automation. That is, simulate the clicks that a user might make and then get the data. This can be very finicky and is prone to break whenever the website changes its layout or HTML for whatever reason (something that tends to happen quite often for busy websites).
I would recommend using a different service that has a Sheets friendly HTML format. Better than that, I would look into a service that has an API and interact with it with Apps Script. Finally, if you need it to be investing.com you could look into something like Puppeteer which can automate a browser (though its a fair bit more complex than a formula or an API).
You can import using importhtml the historical data for the last 30 days, and then use a lookup for that data.
To get historical data I use:
query(IMPORTHTML("https://investing.com/equities/STOCK-historical-data"; "table"; 2);"SELECT Col1, Col2")
I don't know if you can import more than 30, I'm searching for that answer myself.
I hope that you guys are fine. I want to build a simple spreadsheet and I thought I could be able to make one but blank sheet looks horrible to me. I am sure that you guys are kind enough to help me out.
I want to perform multiple Google search queries in Google spreadsheet and want to parse results of each search (top 10 results of each search)
Something like this: https://www.youtube.com/watch?v=tBwEbuMRFlI
But when I tried his given formula in description to play test, Google returned #Error to me, I don't know why.
Can you guys please help me out in making a simple spreadsheet compatible for multiple queries at once? Like one column for keywords (where I could paste my list of keywords) and then 10 columns of search results. All results for one keyword should come in one row
Something like this:
My 1st Example Query = 1st search result, 2nd search result, 3rd result and so on.
My 2nd Example Query = 1st search result, 2nd search result, 3rd result and so on.
It must be easy to code but yeah, it might be time-consuming and I would be very grateful if anyone of you could help me about it.
Looking forward to your help guys.
The problem is that you want to scrape out of Spreadsheets, that's a bad approach and is almost certainly not going to work. Even if you manage to write a scraper inside that limited environment it will easily be spotted by Google.
As you said time is not a problem, I would suggest another route.
Use a backend tool/script that scrapes the data
Use a backend tool/script that creates/modifies the Google spreadsheet
You can run such a script(s) manually on your PC or from a server full automated using a scheduler/cron job.
To create/modify spreadsheets look here: How do I access the Google Spreadsheets API in PHP?
To scrape Google look here: Is it ok to scrape data from Google results?
So this is PHP as language of choice but you can do the exactly same in Java or Python or C#
There is a third party solution like SerpApi you could use for this. It's a paid API with a free trial.
Google Sheets Add-on: SerpApi - Search Engine Results and Ranks
Example code to extract title from the first result:
=SERPAPI_RESULT("engine=google&q=coffee&location=Austin, Texas, United States&google_domain=google.com&gl=us&hl=en", "organic_results.0.title")
My project is reliant on several API's, like Twitter and Youtube for example. Recently, Youtube deprecated their old API, and it caused issues with my team's iPad app.
We could have stayed ahead of the change if we were paying attention to Youtube's announcements of the upcoming deprecation. But alas, we were not and the idea of staying up to date with all of our dependencies manually(browsing the web) seems exhausting and inefficient.
I have found the following tool to help notify when changes occur with external library dependencies, https://libraries.io. However, this does not help with API dependencies.
Besides checking the API source webpages every so often, I was wondering if anyone had suggestions on how to stay notified and up-to-date with news regarding updates to a specified list of external API's?
After some time looking at different options, I have found a solution that is not perfect, but seems to work best at fitting this need.
Solution Description
This solution uses a combination of Twitter, Google Scripts, and website blogtrottr.com. I am creating a twitter list of reliable dev handles that often post updates on new API. For example I made a list that contained #twitterapi and #YouTubeDev. Used Google Scripts to create an online feed out of the twitter list. Then used blogtrottr to email me every time that feed gets a new posting.
Steps to Implement
Create a twitter list of reliable handles that often post about updates to their API
Create an RSS Feed from that Twitter list. The details for how to do this can be found here.
Plug that url that you get from Google Script into blogtrotter.
I did find some other ways to do this, but so far this is the only solution that was 100% free!
Can someone confirm it for me?
I'm helping someone with the importHTML problem on Google spreadsheet. I'm not familiar with importHTML but I thought it should work.
=importhtml("http://www.stockq.org/","table",1)
I don't care which table I'm importing so long as it imports something. It's giving out error message Error: Could not fetch url: http://www.stockq.org/. But the web site is accessible in my browser. That's really bizarre.
My Google Spreadsheet can't cope with the Chinese characters but numbers recognisable by me on the web page are happily imported, as least for the middle table of the three, with:
=importhtml("http://www.stockq.org/","table",A12)
This is much what was I think mentioned by #DigitalSeraphim way back in September. To quote from an answer that was deleted (as not an answer?):
So, I have been building a page to help me keep up with mod updates for my minecraft server, using importxml heavily. I have found that I get the same error for some sites that load absolutely fine in the browser. Looking into it further, I found that the sites are reporting a 404 error, but actually returning the data requested. According to https://drupal.stackexchange.com/questions/110651/how-to-show-a-node-but-return-http-404-response, this is used to remove pages from search engines, as I had assumed. I don't think there is any way around this without some hackery... namely, setting up a "proxy" server that would "fix" the status.
However, it appears that the example you gave is now working, so maybe give it another try.
TL;DR
Use IMPORTXML with XPaths.
I encountered similar problem where I tried to switch between http and https. The work around worked occasionally but the result is not consistent (either way failed a lot).
Later I noticed there is another API named IMPORTXML (XML, not HTML here). With this one you can actually query the content from the same URL and apply XPath instead.
Therefore I would suggest to switch to use IMPORTXML. For example, the following formula
=IMPORTXML("http://www.stockq.org/index/IBOV.php", "//table[#class='indexpagetable']")
will give you all the tables that have class indexpagetable from the page of the given URL.
Note the XPath is slightly different in the spreadsheet, you can refer to the documents for more specifics.
I have been using LabVIEW to collect measurement data, and I would like to know if it is possible for LabVIEW to communicate the results to a Google Spreadsheet. If so, where could I find resources to learn how to make LabVIEW transmit information to the Google Spreadsheet ?
Thanks!
EDIT AND FOLLOW-UP- I used Jonathan's suggestion below and experimented with the LabVIEW http Post.vi. It's very simple, all you need to do is enter the URL of the Google form (replacing the final "viewform" with "formResponse") and a string with the data you want to enter (with rough syntax = ). A big thanks for that answer, it was really helpful !
However, when I try to use this method for a Google form with more than one page, the data isn't read properly... The form is still sent but every field not present on the first page of the form remains blank on the Spreadsheet. I feel that this is somehow linked to the fact that in the Google form, the URL of all the pages after page 1 are the URL of page 1 with the final "viewform" replaced with "formResponse". Is this what is causing the error or is it something else altogether, and how can I fix it ?
I can think of two ways to do this:
You can create a form in google spreadsheets. The form appears as an html document with standard tags. From here, I would use labview's http functionality to submit data to that form using a POST request. This would be the easiest way to get data in there.
Using the Google Apps API, you can manipulate google spreadsheets and dump data in there directly. This is going to be more complicated in terms of development time, but more configurable in the long run. https://developers.google.com/google-apps/spreadsheets/#what_can_this_api_do There are .net and java code examples throughout the documentation, so it would take some work to port this to LabVIEW, but it could be done.