Quickly and accurately grabbing webpage titles - html-parsing

I'm looking to get the title of a webpage, a common feature of many IRC bots that I'm wanting to incorporate into a IRC client I'm writing for fun.
The method that I currently have working basically connects and sends a GET request for the entire webpage then seeks out the tags and reads inbetween them. For larger webpages this can be slower than I'd like. An additional problem I've noticed is webpages with dynamic titles (such as some phpbb forums) will not return the accurate title as it would show in a browser because I don't do any execution of javascript ect..
It seems one way to get an accurate title is to dump the html into a browser control (such as the IE COM control) and pull the title, but this is just going to make it even more time consuming.
Is there a simple method I am un aware of?

In a word, no, not really.
I guess rather than downloading the whole document you could stream the HTTP file into your application and just stop downloading when you reach </title> - that would save you waiting for the whole HTML document to download.
However that doesn't help the situation if you need to read the title after it's been changed by some client-side javascript. As you say, the only way I can think of doing that is by using a browser control.

Related

How to show response from GET ajax call after user is offline through service worker?

Basically, I want my PWA to work offline. But on page load of website, there is an GET ajax call which helps in showing some content of the page.
Question is how do I let my PWA work offline as there will be an ajax call on page load which would require me to either store the response in cache?
As the content can be heavy, is it even correct to cache so much data?
Also, I read somewhere that we cannot cache GET requests, so how can I proceed with making PWA work offline?
I have tried looking at the following links, but these do not say me how to cache a dynamic content
https://developers.google.com/web/ilt/pwa/caching-files-with-service-worker
https://vaadin.com/pwa/learn/caching-strategies
https://jslovers.com/dynamic-cache-serviceworkers.html
Of course you can cache "dynamic" content – that's because from the browser's point of view it's just anothe HTTP request :-) It is of course a matter of your application & server logic whether that is useful in any way. For some application caching dynamic content and then showing it to the user at a later time might work completely ok but for some other application it might come with problems. You know, it would be fine to show a rarely updated avatar image but not ok to show old currency info, right?
You could also design the app around these limitations, maybe show the user a notification saying "hey, you're using an offline version and the data is XX hours old!" or something like that.
You can easily store multiple megabytes of network responses into the cache. If you've got more than 50 megs browsers start to limit you. Also, always have error handling ready if the browser tells you the cache is full or whatever.
Does this explanation help you?

Website fetch, using NSURLSession and changing INPUT Field value on this site

I wanna fetch the content of a website. But to get the correct content, it is necessary to change a Input Html sroll field on the side?
Many idea how to manage with xcode?
Thanks a lot!!
Lars
If you want to retrieve the HTML that you get after filling in a HTML form, you have to identify precisely what the series of requests looks like to fetch the data. And be careful because it's often not as simple as just looking at the request that the HTML in question generates: unfortunately, it is sometimes a complex series of requests (e.g. retrieving the original HTML is often seamlessly retrieving some critical hidden form fields and/or cookies).
Bottom line, to reverse engineer the required HTTP requests, you often have to pour through HTML code and/or watch the requests with something like Charles. It often takes quite a bit of time to do this with complicated sites.
Before you invest a lot of time here, though, you should first see if the web site provider's Terms of Service permit such usage. They often strictly prohibit this sort of practice. It's much better to contact the web site provider and see if they provide a web service to retrieve the data. That's far easier and will result in a far more robust interface for your app.
But if you're forced programmatically parsing HTML, I'd refer you to How to Parse HTML on iOS on Ray Wenderlich's site.

How is this URL modification possible?

Could anyone please tell how the site http://www.outsharked.com/imagemapster/default.aspx?what.html is working in such way? Modifying the url without loading/reloading the page. I think this is not done by html5. Because it works in IE6 which doesn't support html5.
I created that site. The commenter is correct, it uses Javascript to change the URL. There's nothing about how that navigation works that is different for IE6 - that browser supports the necessary client-side functionality to do this kind of thing. The basic functionality involves:
capturing click events on the nav, and loading the inner content via AJAX
update the URL to reflect a working direct URL to target.
The links also are valid anchor links that, in the absence of Javascript, would go to the same page (but load the whole thing). This is your basic AJAX web site setup with one minor difference. It's common practice to use a URLs like this in AJAX/single page web sites:
http://mysite.com/home#somepage
or even just
http://mysite.com/#somepage
Where the hashtag part represents the actual page a user has navigated to. If someone accessed that url directly, e.g. from outside the site, the site would use Javascript to load the correct content based on the hashtag, after the page had loaded. This means that there might be a little delay for the inner content to reflect the correct page, since it has to run another request after the initial page has loaded from the browser to get the inner content via AJAX.
I was trying to avoid that by creating a setup that worked completely with and without Javascript. If you go directly to a URL within the site such as http://www.outsharked.com/imagemapster/default.aspx?faq.html you will notice it loads the content directly. This URL will work even if Javascript is disabled. You can't actually do this using hashtags, since hashtag content is not sent to the server. Only the client knows what's after the hashtag in a URL. That's why I was using query strings to represent inner pages.
This site architecture was sort of an experiment at the time. It works pretty well but the code isn't fantastic, I didn't really do anything else with it, and I'm sure there are other better-fleshed-out/tested/full-featured frameworks out there to do much the same thing.
But it might not be a bad example of the nuts and bolts of creating a basic AJAX navigation setup, as a learning tool, since it's pretty concise, and also does HTML5 history navigation (e.g. so the back button works on modern browsers).

iOS HTML Rewrite?

I've not written an iOS app and want to know if what I want to do is reasonably easy before I invest all my time in it. The idea is simply to leverage the built-in webkit methods to write my own browser. I've seen tutorials where this is done fairly easily. However, the twist is I want to apply some rewrite/regex rules prior to the page rendering. ie, you load http://example.com which is a page containing the word 'foo'. Prior to displaying the page, the app rewrites 'foo' to 'bar' and renders.
Is this possibly to do easily without actually writing a ground-up browser?
Thanks!
It's doable (assuming you're using the standard UIWebView component to render the page), and there are a few ways you could go about it. Among them:
You could download the HTML and parse it via Objective-C string handlers before loading it into the UIWebView
You coud use load the HTML as-is and use the UIWebview's stringByEvaluatingJavaScriptFromString: message to "inject" javascript onto the page, manipulating the DOM itself
You could go the Opera route, and pre-render the page via a server-side proxy before downloading it to the client.
How far down the rabbit hole you want to go would be up to you, of course. Easy is in the eye of the beholder.

sifr3 - prefetch not working?

I am having a problem with the loading times/size of an sifr 3 enabled site, and found out the the font swf is requested several times in my application. This can be seen in the network tab of firebug, as well as in the apache logs.
On http://novemberborn.net/flash/prefetching-movies there are some instructions for prefetching. However that does not work, the prefetch Method is not available (still in the docu!). I understand that prefetching is done automatically, however that does not seem to work.
Even in the demo page of the sifr download package, with an empty browser cache I get several hits for rockwell.swf and cochin.swf! Both with Firefox 3 and IE7...
Any chance for an easy and quick fix?
Greetings,
Simon
Fundamentally, this is an issue between the browser and the Flash player. As sIFR inserts the Flash movies into the page, the browser initializes the Flash plugin with the path to the Flash movie. If the movie is not yet in a local cache, it's requested from the server. Since the movies are inserted within a few milliseconds, this would mean that a request is made for each inserted movie.
sIFR tries to prevent this by prefetching the Flash movies. It does this ones per browser session, based on a session cookie. This merely fires off a request for the movie file, and hopefully that file is in the cache by the time replacement starts. Therefore its important to load the sIFR JavaScript code as early as possible, and to activate sIFR properly by passing the Flash movies to the sIFR.activate() method.
In my experience the only way to reliably test this process is clearing browser cache, closing all browser instances (to get rid of the session cookie), then opening the browser and going straight to the page you want to test. I don't find the activity monitors within browsers reliable, so either check through an HTTP proxy or the server logs.
The one remaining improvement I could make is to try and detect the progress of the prefetch, and hold off on replacing elements until the prefetch is complete.
Do you have the option to move to Cufon? You'll find it much easier to use and isn't quirky.

Resources