On third page not getting next page token google place api?
I am using Google place text search API in my Ruby on rails application, everything is working fine but after third page I am not getting any next page token so for every text search I am getting only 60 result. Is I am missing something please suggest any help would be appreciable. This happen for every text.
My request for first page :-
https://maps.googleapis.com/maps/api/place/textsearch/json?key=#{my_key}&query=#{my_query}
My request for other page with token:-
https://maps.googleapis.com/maps/api/place/textsearch/json?key=#{my_key}&pagetoken=#{next_page_token}
And usually I am searching on google it show 100's of result for same place text. How can I get result more than 60.
This is intended behavior for Google's Places API, as you can only get up to 60 places, split across 3 pages (3 queries). This is why there is no next_page_token on the third page.
By default, each Nearby Search or Text Search returns up to 20
establishment results per query; however, each search can return as
many as 60 results, split across three pages. If your search will
return more than 20, then the search response will include an
additional value — next_page_token.
Reference here.
Related
This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
So... 90% of the time ImportXML seems to work just fine for me, but now I'm struggling with the below 2 cases... I don't know if they are all the same problem or not, or if they are 2 different problems.
CASE ONE - YAHOO
Go to this page: https://finance.yahoo.com/quote/AAPL/cash-flow?p=AAPL
The number I want to pull to my spreadsheet is "Free Cash Flow"
My first attempt:
=IMPORTXML("https://finance.yahoo.com/quote/AAPL/cash-flow?p=AAPL","//*[#id='Col1-1-Financials-Proxy']/section/div[4]/div[1]/div[1]/div[2]/div[12]/div[1]/div[2]/span")
My second attempt:
=IMPORTXML("https://finance.yahoo.com/quote/AAPL/cash-flow?p=AAPL","/html/body/div[1]/div/div/div[1]/div/div[3]/div[1]/div/div[2]/div/div/section/div[4]/div[1]/div[1]/div[2]/div[12]/div[1]/div[2]/span")
My third attempt:
=INDEX(IMPORTXML("https://finance.yahoo.com/quote/AAPL/cash-flow?p=AAPL","//div[#class='Ta(c) Py(6px) Bxz(bb) BdB Bdc($seperatorColor) Miw(120px) Miw(140px)--pnclg Bgc($lv1BgColor) fi-row:h_Bgc($hoverBgColor) D(tbc)']"),1,1)
My fourth attempt:
=INDEX(IMPORTXML("https://finance.yahoo.com/quote/AAPL/cash-flow?p=AAPL","//span[#data-reactid='277']"),1,1)
Nothing I do seems to work.
CASE TWO - MSN
Go to this page: https://www.msn.com/en-gb/money/stockdetails/analysis/fi-a1mou2
Click the "Price Ratios" link
The number I want to pull to my spreadsheet is "P/E Ratio 5-Year Low"
My first attempt:
=IMPORTXML("https://www.msn.com/en-us/money/stockdetails/analysis/nas-aapl/fi-a1mou2","//*[#id='main']/div[2]/div[2]/div[2]/div/div[3]/div/div/div[5]/div[1]/div[2]/div[4]/div[1]/div/div/div/ul[3]/li[2]/span[1]/p")
I only tried once with this case because I suspect that the number sitting on an internal page tab might be causing the issue?
ANY solutions that automatically will pull the above two numbers into my spreadsheet are welcome, I'm open to workarounds with scripts/macros if ImportXML just isn't able to do it.
The reason why you can't get the data in MSN is because the specific element you have mentioned has been inserted dynamically on the website. IMPORTXML can only retrieve static content of a website and therefore, it will not be able to retrieve this dynamic content.
To check which content is static and which is dynamic, you can disable Javscript on your browser (as JS is the responsible of inserting dynamic content) and reload the page : the remaining content is the one you can access with IMPORTXML. In the website you provided if you follow these indications you will see how if you click on Price Ratios nothing will change as this content is not static. This is a simple guide on how to disable Javascript in Chrome.
Therefore, you will need to find an alternative method to scrape dynamic data.
I'm making a new iOS (swift) app to test some concepts, and I'm using the GitHub Serach API to retrieve a list of filtered repositories.
The request is working fine so far, but I'm having trouble understanding the pagination process and how to know I reached the end of the results.
For what I saw, the Search API returns a maximum of 1k results, broke in pages of 100 maximum results. But the field in the returned json with the total count shows way more available results (I imagine that it shows the total repositories that satisfy the query and not the maximum available for return in the API).
The only way I found so far to obtain information about the pages (and the pagination process) in GitHub Documentation comes in the header of the response, like:
Status: 200 OK
Link: <https://api.github.com/resource?page=2>; rel="next",
<https://api.github.com/resource?page=5>; rel="last"
X-RateLimit-Limit: 20
X-RateLimit-Remaining: 19
Anyone can suggest the best approach to detect the end of the pages in this case?
Should I try to parse the information from the header or infer it somehow based on the returned json? I even got the "Link" header value but don't know how to parse it.
This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
Using this webpage as an example http://forums.macrumors.com/showthread.php?t=1688317
On a google spreadsheet, the following DO NOT work with importxml():
//a[contains(#href,"showpost")]/#href
//a[contains(#href,"showcount")]/#href
//*[#id="postcount18545482"]
The last one (//*[#id="postcount18545482"]) was copied directly from Chrome's element viewer.
The following DO work but exclude any results with the word "showcount", "postcount", or "showpost":
//div[contains(#id,"post_message")]/#id
//a[contains(#href,"show")]/#href
//a[contains(#href,"post")]/#href
Is there something special about the word "count" when working with importxml() or XPATH? How can I get the missing entries?
ImportXML function in Google Docs spreadsheet can not process data that is created in a two-step process. For example, when an authentication token must be retrieved first before making the url request, or when the URL tells the server to dynamically create an xml output after which the user is redirected to the output, even when the URL stays the same. You might want to look into Google Apps Scripts (http://code.google.com/googleapps/appsscript/index.html) to handle this case.
Taken from here
In your particular case the anchor parameters get set in the vbulletin_post_loader.js script called after the page container is loaded.
...
pc_obj=fetch_object("postcount"+this.postid);
openWindow("showpost.php?"+(SESSIONURL?"s="+SESSIONURL:"")
+(pc_obj!=null?"&postcount="+PHP.urlencode(pc_obj.name):"")+"&p="+A)
...
In other words, when importXML() scans the page, the nodes containing 'showpost' or 'postcount' in href are not yet on the page:
Looks like importXML() works with static pages only and not able to handle dynamically loaded content.
Try to find another way of obtaining the number of post in a thread.
I'm planning out how to track internal search data in Omniture/SiteCatalyst.
It's a fairly straight-forward plan for a standard "enter a term and get a page of results" model: set sProps and eVars with the terms, the count of results, and the page searched from, then fire a success event for searching and another for clicking a search result.
For a type-ahead search--where the user is given search results as they type in a search bar--what's a good strategy for handling the timing of event submissions so that you don't end up with different events/entries for letters 4, 5, 6, and 7 of a search term's entry?
Our solution was to leverage a delay on the autocomplete to reduce the number of calls. From a tracking standpoint, if someone pauses for 1 second (or 500 ms, whatever), then they're probably actually waiting for the autocomplete results, and that constitutes a valid search.
From a technical standpoint, we leveraged the delay option on the jQuery UI widget.
Strategy I've always used is to not track the "auto-complete" search features..put the tracking on the search results page same as normal. Or are you saying the whole search results page is being output as the user types? If that is the case...one thing you could do is write some code to pop the Omniture code when the search field loses focus.
Another thing you can do is as the visitor is typing in the search bar, on each keypress, write the current value to a cookie. Then have some code that runs on page load to look for that cookie and if it exists, pop the Omniture search variables and erase the cookie. Alternatively you can keep track of current value w/ a server-side session variable since I assume this thing is ajax driven, and output the omn code w/ server-side code if session var exists. These methods would mean that the search events and vars would not pop on the search results page...this probably isn't that big a deal, unless you have supporting variables you pop, like an "internal search referrer" prop/eVar that keeps track of the previous page the visitor was on (or the page the visitor was on when they performed the search). So you'll have to keep that in mind and carry that over as well.
Whenever you do a search you might be aware of the concept that query string parameter get added at the end of URL.
Suppose www.stackoverfow.com is website and when are you performing a search on it then it will be like www.stackoverflow.com?q=yourname , yourname is the searchkeyword.This keyword we can capture in sitecatalyst.
you can see google.com while searching on internet for sitecatalyst is ---
www.google.co.in/search?q=sitecatalyst
In the same way we can use query string parameter as q = something.
after doing all this thing we can use the plugin getQueryParam in plugin section of the s_code library file to fetch that variable and store that in sitecatalyst variable...
example:-
function s_doPlugins(s) {
var one = s.getQueryParam("q");
if(one)
s.eVar1=one;
}
s.doPlugins=s_doPlugins
insert this below code outside plugin section
/*
* Returns the value of a specified query string parameter, if found in the current page URL.
*/
s.getQueryParam=new Function("p","d","u",""
+"var s=this,v='',i,t;d=d?d:'';u=u?u:(s.pageURL?s.pageURL:s.wd.locati"
+"on);if(u=='f')u=s.gtfs().location;while(p){i=p.indexOf(',');i=i<0?p"
+".length:i;t=s.p_gpv(p.substring(0,i),u+'');if(t){t=t.indexOf('#')>-"
+"1?t.substring(0,t.indexOf('#')):t;}if(t)v+=v?d+t:t;p=p.substring(i="
+"=p.length?i:i+1)}return v");
s.p_gpv=new Function("k","u",""
+"var s=this,v='',i=u.indexOf('?'),q;if(k&&i>-1){q=u.substring(i+1);v"
+"=s.pt(q,'&','p_gvf',k)}return v");
s.p_gvf=new Function("t","k",""
+"if(t){var s=this,i=t.indexOf('='),p=i<0?t:t.substring(0,i),v=i<0?'T"
+"rue':t.substring(i+1);if(p.toLowerCase()==k.toLowerCase())return s."
+"epa(v)}return ''");
you will find that it will capture your search results
please let me know in case of more clarifications
As there are 2 response format(Atom and Json) from Twitter search api, i am trying to fetch the twits(Json format) by using the URL:[http://search.twitter.com/search.json?q=pareshmayani], but you know it is returning only 1 page with 15 twits at a time(and yes its as per the twitter search API documentation) same as the below image:
1
But what if i want to fetch 2nd page of twits, i know i can use page querystring value such as http://search.twitter.com/search.json?q=pareshmayani&page=2, but when i use it, it returns me the below output:
Problem: So As you have noticed the above response, it should return 2nd page of 15 twits in response. Please give me suggestions if i am doing something wrong to fetch 2nd page of twits.
Thanx,
Paresh
There is only one page of results.
If you search for a popular term, like cheryl cole, you will see a 'next_page' field returned in the JSON. This field contains a query string to retrieve the next page.