Google spreadsheet importHTML Could not fetch URL - google-sheets

Can someone confirm it for me?
I'm helping someone with the importHTML problem on Google spreadsheet. I'm not familiar with importHTML but I thought it should work.
=importhtml("http://www.stockq.org/","table",1)
I don't care which table I'm importing so long as it imports something. It's giving out error message Error: Could not fetch url: http://www.stockq.org/. But the web site is accessible in my browser. That's really bizarre.

My Google Spreadsheet can't cope with the Chinese characters but numbers recognisable by me on the web page are happily imported, as least for the middle table of the three, with:
=importhtml("http://www.stockq.org/","table",A12)
This is much what was I think mentioned by #DigitalSeraphim way back in September. To quote from an answer that was deleted (as not an answer?):
So, I have been building a page to help me keep up with mod updates for my minecraft server, using importxml heavily. I have found that I get the same error for some sites that load absolutely fine in the browser. Looking into it further, I found that the sites are reporting a 404 error, but actually returning the data requested. According to https://drupal.stackexchange.com/questions/110651/how-to-show-a-node-but-return-http-404-response, this is used to remove pages from search engines, as I had assumed. I don't think there is any way around this without some hackery... namely, setting up a "proxy" server that would "fix" the status.
However, it appears that the example you gave is now working, so maybe give it another try.

TL;DR
Use IMPORTXML with XPaths.
I encountered similar problem where I tried to switch between http and https. The work around worked occasionally but the result is not consistent (either way failed a lot).
Later I noticed there is another API named IMPORTXML (XML, not HTML here). With this one you can actually query the content from the same URL and apply XPath instead.
Therefore I would suggest to switch to use IMPORTXML. For example, the following formula
=IMPORTXML("http://www.stockq.org/index/IBOV.php", "//table[#class='indexpagetable']")
will give you all the tables that have class indexpagetable from the page of the given URL.
Note the XPath is slightly different in the spreadsheet, you can refer to the documents for more specifics.

Related

Looking to import HTML information into a Google sheet

After multiple test and research I don't have success in importing the data of this table (div) into a Google slide.
None of the formula I tested actually work included this simple test to extract the first column/line "Name":
=importxml("https://ecosystem.lafrenchtech.com/lists/18872/list?showGrid=false", "//span[#class='table-column-text']")
:(
Anyone could help me ?
Thx by advance.
Answer:
I've tested your function on a test sheet and it returns an empty content.
According to an answer at Google Sheets importXML Returns Empty Value , IMPORTXML can not retrieve data which is being populated by a script and it is a limitation. Unfortunately, I have checked that when Javascript is disabled for the ecosystem.lafrenchtech.com site in Chrome browser, the table never loads. Thus, this confirms that the table is being populated by a script and this is the reason why it returns an empty content.
A possible alternative solution is to check if the ecosystem.lafrenchtech.com offers an API, where you can directly get the data that they show from their table using an API key (if it is available). However, this will require you to use Apps Script to parse the data from their API and then post it on your spreadsheet, which would be quite a tedious for a quite simple process.
Note:
On your post, google-slides was the set tag.

Google sheets IMPORTXML Could not fetch url

I am using IMPORTXML to get information on USPS tracking numbers for my orders and I have been using it for about a month or so, it used to work on and off, sometimes it would not work and all I had to do was either refresh the page or add/remove the "S" on https, and it would work again but it has been about 5 days that it is not working at all no matter what I do, and it is a very tedious task to do manually, external 3rd party tracking apps won't work either because we need everyone to use just the sheet we have because not only contains the tracking info but also everything else. So is there any other way I can import some contents of the USPS tracking website that is reliable and won't stop working, I've seen some scripts here and there but haven't been able to apply them to my needs. Also if that script or workaround could work with other websites like UPS and Fedex it would be awesome as IMPORTXML doesn't work with them (it always says that the content is empty). Thanks in advance.

Bingbot converts unicode characters to not understandable symbols

I get a lot of errors from my site when bing trying to index some pages which have unicode characters.
For example:
http://www.example.com/kjøp
Bing is trying to index
http://www.example.com/kjøp
Then I get en error "System.NullReferenceException: Object reference not set to an instance of an object." because there is no such controller.
Google works good with such links. How to help bing to understand norwegian letters?
You can confirm that Bing does not index these URLs correctly by doing an "INURL:" search like this... https://www.bing.com/search?q=inurl%3A%C3%B8
Only 6 pages are indexed which cannot be correct.
Unfortunately you won't be able to fix Bing. You may be able to do compensate for its shortcoming by making some changes to your site however. It is a burden that you shouldn't have to deal with. However the other option is to do nothing and continue not getting pages properly linked.
Bing will likely have issues with URLs containing characters in this list...
https://www.i18nqa.com/debug/utf8-debug.html
Your webserver needs to look for URL requests containing these characters. You will then replace the wrong characters with the correct ones and do a 301 redirect to the correct page. The specifics depend on what kind of server and programming language you are using. In your case it is most likely IIS and MVC so you would most likely look into Microsoft's URL Rewrite extension. https://www.iis.net/downloads/microsoft/url-rewrite
Before doing this however I would see what errors Bing's webmaster tools might provide.
https://www.bing.com/toolbox/webmaster
The other option is to not use those characters in your URL. My recommendation is to take the time to use the wrong to right translation. Bing will eventually fix this but it could be quite a while.

Labview to google spreadsheet information transfer

I have been using LabVIEW to collect measurement data, and I would like to know if it is possible for LabVIEW to communicate the results to a Google Spreadsheet. If so, where could I find resources to learn how to make LabVIEW transmit information to the Google Spreadsheet ?
Thanks!
EDIT AND FOLLOW-UP- I used Jonathan's suggestion below and experimented with the LabVIEW http Post.vi. It's very simple, all you need to do is enter the URL of the Google form (replacing the final "viewform" with "formResponse") and a string with the data you want to enter (with rough syntax = ). A big thanks for that answer, it was really helpful !
However, when I try to use this method for a Google form with more than one page, the data isn't read properly... The form is still sent but every field not present on the first page of the form remains blank on the Spreadsheet. I feel that this is somehow linked to the fact that in the Google form, the URL of all the pages after page 1 are the URL of page 1 with the final "viewform" replaced with "formResponse". Is this what is causing the error or is it something else altogether, and how can I fix it ?
I can think of two ways to do this:
You can create a form in google spreadsheets. The form appears as an html document with standard tags. From here, I would use labview's http functionality to submit data to that form using a POST request. This would be the easiest way to get data in there.
Using the Google Apps API, you can manipulate google spreadsheets and dump data in there directly. This is going to be more complicated in terms of development time, but more configurable in the long run. https://developers.google.com/google-apps/spreadsheets/#what_can_this_api_do There are .net and java code examples throughout the documentation, so it would take some work to port this to LabVIEW, but it could be done.

From a development perspective, how does does the indeed.com URL structure and site work?

On the webmaster's Q and A site, I asked the following:
https://webmasters.stackexchange.com/questions/42730/how-does-indeed-com-make-it-to-the-top-of-every-single-search-for-every-single-c
But, I would like a little more information about this from a development perspective.
If you search Google for anything job related, for example, Gastonia Jobs (City + jobs), then, in addition to their search results dominating the first page of Google, you get a URL structure back that looks like this:
indeed.com/l-Gastonia,-NC-jobs.html
I am assumming that the L stands for location in the URL structure. If you do a search for an industry related job, or a job with a specific company name, you will get back something like the following (Microsoft jobs):
indeed.com/q-Microsoft-jobs.html
With just over 40,000 cities in the USA I thought, ok, maybe it's possible they looped through them and created a page for every single one. That would not be hard for a computer. But then obviously the site is dynamic as each of those pages has 10000s of results and paginated by 10. The q above obviously stands for query. The locations I can understand, but they cannot possibly have created a web page for every single query combination, could they?
Ok, it gets a tad weirder. I wanted to see if they had a sitemap, so I typed into Google "indeed.com sitemap.xml" I got the response:
indeed.com/q-Sitemap-xml-jobs.html
.. again, I searched for "indeed.com url structure" and, as I mentioned in the other post on webmasters, I got back:
indeed.com/q-change-url-structure-l-Arkansas.html
Is indeed.com somehow using programming to create a webpage on the fly based on my search input into google? If they are not, how are they able to have a static page for millions and millions and millions possible query combinations, have them dynamically paginate, and then have all of those dominate google's first page of results (albeit that very last question may be best for the webmasters QA)?
Does the javascript in the page somehow interact with the URL
It's most likely not a bunch of pages. The "actual" page might be http://indeed.com/?referrer=google&searchterm=jobs%20in%20washington. The site then cleverly produces a human readable URL using URL rewrite, fetches jobs in the database that matches the query, and voíla...
I could be dead wrong of course. Truth be told, the technical aspect of it can probably be solved in a multitude of ways. Every time a job is added to the site, all pages that need to be done to match that job, might be created, thus producing an enormous amount of pages for Google to crawl.
This is a great question however remains unanswered on the ground that a basic Google search using,
ste:indeed.com
returns over 120MM results and secondly a query such as, "product manager new york" ranks #1 in results. These pages are obviously pre-generated which is confirmed by the fact the page is cached by the search engine (sometimes several days before) has different results from a live query on the site.
Easy when Googles search bot crawls the pages on indeed or any other job search site those page are dynamically created. Here is another site: http://jobuzu.co.uk i run this which is similar to how indeed works.
PHP is your friend in this and Indeed don't just use standard databases look into Sphinx and Solr as they offer Full text search for better performance then MySql etc.
They also make clever use of rel="canonical" and thorough internal linking:
http://www.indeed.com/find-jobs.jsp
Notice that all the pages that actually rank can be found from that direct internal link structure.

Resources