I've been using HTTPBuilder as a way of obtaining a site's HTML content. As an example, this is how I've used it:
def http = new HTTPBuilder(url)
def root = http.get([:])
// Really just the standard approach.
Now this has worked very well for static HTML sites, however I'm now attempting to take data from sites where Javascript is executed on load, which populates the page. For example this page.
My question is, does Grails / Groovy have a native way of waiting until all Javascript has executed, before returning the HTML content. If not native, then third party?
Research I have already attempted
I've had a look at libraries that attempt to mock a browser. I thought that if I could get the library to execute the Javascript and only return the result, I could mimic the behaviour I wanted. My research into this has been somewhat limited, as the libraries I have found only give you control over things like your User-Agent.
The method you are using only gets the raw HTML content from the server. So there's nothing to download or execute any code. Selenium might work (or Geb, a Groovy wrapper around it), but the getPageSource method says that getting the HTML content post-JavaScript depends on the driver. You might find one of the drivers (chrome, firefox, etc.) do return the results post-JavaScript. If that doesn't work, try using PhantomJS (blog post on what you want).
Related
I know it's not a good question to ask, but sometimes I really need to know if a webpage or website is static or not.
Sometimes I see .html extension in many URL, Does that mean that those pages are static?
.html extension means that page contains only front-end code and does not have any server side language included in it (I'm not talking about URL rewriters that adds .html to the end of virtual path).
This does not prevent these things:
Page can load it's content via Ajax depending on inputs, URL params, time of day, etc.
Page can be generated as static HTML page, but still be re-generated from time-to-time.
You can have iFrame in static HTML page that leads to .php file.
Not really, .html does not mean webpage is static. Ajax can be used to load dynamic data in html page.
Also there is no proper method defined to find whether page is static or dynamic.
One way is, you can check requests in Developer Options of browser.
You can read more here.
No.
There is no guarantee of a direct relationship between a thing that looks like a file extension in a URL and how the server handles things behind the scenes.
It might be resolved using basic static file handling rules to a static file with that name.
It might use a tool like mod_write to map the URL onto a server side script with the same name but a different file extension (e.g. if the site used to be made of static files, but was changed to be dynamic with steps taken to keep the URLs unchanged).
It might use a tool like mod_write to map the URL onto a server side script that has no relation to the name of the file but implements the front controller pattern for the whole site.
It might map onto a server side script which looks at the end of the URL to determine what type of data to return the content in (e.g. cars.html and cards.json might both be be handled by the same script, which outputs a list of cars, but it might output it in JSON or HTML depending on the URL).
It might hit a 404 error or a 302 redirect.
It might do any number of other things.
Not always, sometimes it can be a generated page from a Servlet or a PHP script that generates them. you can have a .htaccess rule to add .html to all documents.
i am evaluating Apache Sling as a potential backend CMS. I like how easy it is to push / get new content via rest. However, I also need to be able to search the content via REST. I compiled all the source code and am running their standalone jar. There are like 100 bundles installed, but I can't find a single rest query.
Some old documentation says you can do /content/mynode.query.json?
But this is not working, and there is no help on whether its supported or not. Honestly the only search option I found was in the console via /.explore.search.html/ which returns web pages.
How can you do restful search using sling?
The JsonQueryServlet, which provides an HTTP search interface was moved to a separate bundle as part of SLING-2226. See that issue's page for how to use it, and there's a related blog post at http://in-the-sling.blogspot.ch/2008/09/how-to-use-json-query-servlet.html
According to Wikipedia in "Browser Add-Ons and Extensions Exemption" section:
CSP should not interfere with the operation of browser add-ons or
extensions installed by the user...
But unfortunately it is blocking external scripts, injected by my add-on.
I can always put this injected code in to the content script. But I'm wondering if there is another way to overcome this.
You should indeed put your code into a content script. If you insert a <script> tag into a page then it works exactly the same as if the web page itself inserted it. The browser has no way of knowing that this code belongs to your extension. What's worse, this code isn't safe from manipulations by the webpage - e.g. the webpage can redefine window.alert() method and your code won't be able to show messages. Extension code and content scripts on the other hand aren't affected by this, these see only the raw DOM objects without any JavaScript-induced changes.
Long story short:
We've had errors being logged concerning a JQuery/JQueryUI based system for some time. At it's core we're doing a pretty basic click link -> JQuery AJAX GET -> Open JQueryUI modal pattern.
The error we were getting appeared simple - "Object doesn't support property or method 'dialog'" - leading us to believe there was an error with JQueryUI. After expending a lot of time ruling out browser incompatibilities, bad code on JQuery's end, bad code on our end, angry code gods... we caught a lucky break. A 100% repro on one of the machines in the office.
Turns out the thing was riddled with adware - specifically [an older version of] easyinline - http://www.easyinline.com. When the user clicked any link a cascade of javascript files would be loaded, including reloading JQuery from Google's CDN.
For most links this isn't really a problem - they take you off the page anyway and everything reloads. But for our modals it meant that every modal link would stamp over our JQuery at the point the request was sent, resulting in the response trying to make use of the 'new' $ which would now be missing JQueryUI and any other plugins.
Initially we thought about making another global var ($$ or something) for 'our' JQuery and explicitly using that in our code instead of just $. The issue with that is that we were using a few other 3rd party tools which rely on $ and the adware-loaded $ is a different (older) version. So it's important that we preserve $ correctly.
Any ideas? I'm aware of JQuery's noConflict() method but after a cursory glance don't think it fits the bill.
Ultimately we've decided to re-establish our JQuery integrity when we receive any ajax responses (i.e. just before the open modal code is executed). All our ajax stuff is wrapped in our own handler so this was fairly easy to inject across the board.
Basically;
We have the original JQuery 'saved' - we've got it in-scope thanks to our handler but it could be easily put into a separate global (like $$) just after it is loaded. In our ajax response handler we've got a fairly straightforward check;
if (window.$ !== $$) {
window.$ = window.jquery = window.jQuery = $$;
}
This will reset the global jquery back to what it should be.
well this is just a work around and not a full fledged solution.
you can try multiple things here
1. if you have control over what the adware loads then just put in something like this if(!$) where they try to load the jquery
2. try loading your plugin at the end of the page
3.even if end of the page is not working. Try injecting the link(a script tag using document.write) to the plugins CDN in the Jquery document ready event. this would ensure that the plugins code would be loaded at the end when all the jquery is already loaded (not a preferred thing).
If one uses remoteFunction or one of the the Grails Ajax capabilities, rendering a template to update a portion of a page, how does one see any additions made to the Javascript functions associated with the resulting page in Chrome or Firefox?
In Chrome, one is able to see the updated page/DOM via going to their Tools -> Developer Tools menu item, then selecting "Elements". There, I'm able to use the magnifying glass to select a portion of the updated page that I want to see. But, how do I also see the additional Javascript functions added to the page.
NOTE: Originally this question requested to see both html element content and Javascript content. Karthick AK's answer handles both.
In Chrome->Developer tool-> Network tab,
For each request being sent the response obtained can be seen in the Response tab. The rendered content can be seen in here.
Similiar option exists for firefox/firebug.
Another ajax gotcha i have experienced is, sometimes the ajax requests are cached and hence onclick the content is served from the cache and not an actual requests hits the server. This is more prominant in Old IE browsers