GoogleSheet: IMPORTXML error, resource at url not found? - google-sheets

What you will see from images below is that A1 is filled with random number which generated from the script. The number will change randomly every-time cursor is moved, it's used in method for "forcing update the XML data" in Google Sheets.
as we can see from the 1st picture, the IMPORTXML worked like charm, using =IMPORTXML("Link" &A1(which is the random number, that is needed to update the data), "//target content") recipe
Well, it worked out for the 1st link, but not really for the second one, in the 1st image, B2 is using the last link, and it shows 1736.5 as the value, that is showing fine without using &A1 code
After adding &A1 to the formula, it gives error #N/A and Resource at url not found as the error detail.
I already tried to use another cell with calculated numbers(more than A1 or less than), still gives me that error.

Solution
If you look closely to the second URL you will notice it finishes with an = sign. In URLs this symbol is used to express key values pairs. Using your refresh trick, in this case, you are specifying to the server to look for a resource that actually doesn't exist. Hence the IMPORTXML error. Just put the generated URL in the browser to see the result.
Try to put another random parameter in the URL that will cause to refresh the page without causing a 404 HTTP error.
For example:
https://www.cnbc.com/quotes/?symbol=XAU=&x=0
Won't cause any error and will give the desired result.

Related

"Imported content is empty" error when trying to pull data using =IMPORTXML

I'm trying to pull the "Trophy points" value from https://trackmania.io/#/player/acf21a42-f517-42cc-a1b2-e8e7693da4ca using the following:
=IMPORTXML("https://trackmania.io/#/player/acf21a42-f517-42cc-a1b2-e8e7693da4ca","//*[#id='content']/div/section[2]/div/div[2]/div[2]/div[2]/table/tr[1]/td")
When I try this, the cell returns "#N/A" and when I hover "Imported content is empty". The only reason I can think of is that the data on the website hasn't loaded by the time Sheets attempts to pull the data, so it returns with no value.
your assumption of the not-loaded site is incorrect. the issue is the website itself that uses JavaScript. IMPORT formulae of google sheets does not support scrapping of JS elements. any such attempt will lead to #N/A error and imported content will be empty. you can always check this by doing:

Googlesheets importhtml Error: resource at url not found

I'm using the following function go grab some stock data from a website
=index(split(index(IMPORTHTML("https://finance.yahoo.com/quote/"&A27,"Table", "2"),6,2)," "),1)
It works perfectly for every stock ticker except one, where it gives "Error: resource at url not found"
I double checked and the ticker name is correct in the A column, the link works if I write it in like that manually. The yahoo page with TGH ticker does contain the info I need and exactly the same way as any other ticker...I'm just lost on why it doesn't work in this single case.
See pic below:
GoogleSheets

Attempting to import from a XPath, seems to always yield blank information

Currently in my google doc, i'm working on a database for my card worth, and it seems like it doesn't want to grab the information no matter what xpath i want to attempt.
Website i'm trying to take information available here. *This is the hyperlink i'm feeding
In the top right corner i'm attempting to grab the worth box information, here is current xpaths i've attempted
"//a[#id='worthBox']/h4"
"/html/body/div[4]/div[1]/div[2]/form/div[1]/div[2]/div/a/h4"
"/h4"
"/h4[0-20]"
"//a[#id='worthBox'][1]/h4"
"//div[#id='estimate-box']/a/h4"
"//div[#id='estimate-box']/a[1]/h4"
Can someone explain to me why it doesn't seem to wanna fetch, is it even possible?
Thank you so much for your time and help!
In the URL, the value is put using the Javascript. But IMPORTXML cannot retrieve the result after Javascript was run. IMPORTXML retrieves the HTML without running Javascript. I think that your xpath is the result after Javascript was run. By this, they cannot be used. But it seems that the value you expect can be retrieved other xpath.
Modified xpath:
//input[#id='medianHiddenField']/#value
Sample formula:
=IMPORTXML(A1,"//input[#id='medianHiddenField']/#value")
In this case, the URL of https://mavin.io/search?q=Lugia%20NM%209%2F111%20-PSA&bt=sold# put in the cell "A1".
Result:
Reference:
IMPORTXML

Empty response when startindex >= 100

After a lot of debugging, it finally occured to me that seemingly Youtube is only issueing the first 100 comments when using the v2 YouTube-API for getting comments. I finally tried using:
curl -Lk -X GET "http://gdata.youtube.com/feeds/api/videos/MShbP3OpASA/comments?alt=json&start-index=100&max-results=50"
And all I get is a response without an entry parameter. That is to say, I do not receive an error response or something like that - I get a perfectly good response, but without the entry parameter.
Digging a little deeper, in my response the value for openSearch$totalResults is 100, so in accordance to this resource this seems to be the expected result (although it tells about some kind of error message which I don't get?).
But here comes the kicker: When I use
curl -Lk -X GET "http://gdata.youtube.com/feeds/api/videos/MShbP3OpASA/comments?alt=json&start-index=1&max-results=50&orderby=published"
openSearch$totalResults equals 3141, the actual count of the comments.
Now here is my question: Since the v2 API is officially been deprecated about a week ago, is it possible that Google just set up a limit on the comments? So only the first 100 comments are accessible? Since the v3 API does not allow for comment retrieval, that would be a pretty bummer for me.
Does anyone have any ideas?
I've figured out how to retrieve all the comments using the navigation links embedded in the json response.
Suppose you retrieve the first using a link like (python here, but you get the point):
r'https://gdata.youtube.com/feeds/api/videos/' + aVideoID + r'/comments?alt=json&start-index=1&max-results=50&prettyprint=true&orderby=published'
Embedded in the json under "feed" (and before the comments) will be a four element array called "link". The fourth element will be called "rel": "next" and under "href" there will be a link you can use to get the next 50 comments. The link will look something like:
https://gdata.youtube.com/feeds/api/videos/fH0cEP0mvlU/comments?alt=json&orderby=published&alt=json&start-token=EgkI2NqyoZDRvgIosK%2FPosPRvgIw653cmsXRvgI4AUAC&max-results=50&orderby=published
for an original URL of:
https://gdata.youtube.com/feeds/api/videos/fH0cEP0mvlU/comments?alt=json&start-index=1&max-results=50&prettyprint=true&orderby=published
If you follow the next link it will return similar json to the original link, with another 50 comments. Continue this process over and over until you get all the comments (in my code I check for both the absence of this item in the json or zero comments in the json to determine when to stop).
You need the "&orderby=published" in the original URL because otherwise the "next" links eventually grow to be too large and cause an error (something in the token the API uses to track which comments you've seen in the default orderby takes a lot of space). Something about the published orderby keeps the "start-token" small, whereas after about 500 comments with the default orderby you will start getting 414 Request URI too long errors.
Hope this helps.

Google XPATH importxml can find "show" but not "showcount" or "count" [duplicate]

This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
Using this webpage as an example http://forums.macrumors.com/showthread.php?t=1688317
On a google spreadsheet, the following DO NOT work with importxml():
//a[contains(#href,"showpost")]/#href
//a[contains(#href,"showcount")]/#href
//*[#id="postcount18545482"]
The last one (//*[#id="postcount18545482"]) was copied directly from Chrome's element viewer.
The following DO work but exclude any results with the word "showcount", "postcount", or "showpost":
//div[contains(#id,"post_message")]/#id
//a[contains(#href,"show")]/#href
//a[contains(#href,"post")]/#href
Is there something special about the word "count" when working with importxml() or XPATH? How can I get the missing entries?
ImportXML function in Google Docs spreadsheet can not process data that is created in a two-step process. For example, when an authentication token must be retrieved first before making the url request, or when the URL tells the server to dynamically create an xml output after which the user is redirected to the output, even when the URL stays the same. You might want to look into Google Apps Scripts (http://code.google.com/googleapps/appsscript/index.html) to handle this case.
Taken from here
In your particular case the anchor parameters get set in the vbulletin_post_loader.js script called after the page container is loaded.
...
pc_obj=fetch_object("postcount"+this.postid);
openWindow("showpost.php?"+(SESSIONURL?"s="+SESSIONURL:"")
+(pc_obj!=null?"&postcount="+PHP.urlencode(pc_obj.name):"")+"&p="+A)
...
In other words, when importXML() scans the page, the nodes containing 'showpost' or 'postcount' in href are not yet on the page:
Looks like importXML() works with static pages only and not able to handle dynamically loaded content.
Try to find another way of obtaining the number of post in a thread.

Resources