Retrieve/process/show wikipedia page in IOS application - ios

I am going to show a mobile version of wikipedia page in my app.
The easiest way is to use UIWebView to show mobile view page, e.g. https://en.m.wikipedia.org/w/index.php?title=White_House
However I want to make certain changes to the page:
Remove the search bar.
remove all external links in the page.
while keep all format/image/layout unchanged.
I did some search. Seems I have to retrieve all contents in json with wikipedia API and reformat everything by myself.
Any easier way?

You can load the HTML, make "Find and replace" and remove whatever you want. (Modify the HTML itself.
After that you can load the HTML into the UIWebView.
Note: This might break when wikipedia will change it's webpage structure...

Related

jquery mobile multi-page internal hyperlinking

This appears to be pretty basic but I can't figure it out.
Using a jqm multipage template, I'm trying to allow users to jump from a link one page (id='page1') directly to an image in another page (id='page2').
FIDDLE
It appears I am constrained, by html hyperlinking rules and jqm, to this:
<a href='#page2'>go to image on p2</a>
... which of course jumps the user to the top of page2.
But that's not what I want. I want the user to jump directly to the IMAGE, which is close to the bottom of page2, tagged like so:
<img id='image-id'>
But tagging the link with the image's id (not the page's id), i.e. tagging it like this
<a href='#image-id'>go to image on p2</a>
doesn't work.
I get the feeling I'm missing something very obvious, but can't figure it out.
Any suggestions? Or is this not possible?
I've got a different problem but found this question in my travels... thought I would add an extract from the jquery mobile page:
http://demos.jquerymobile.com/1.4.5/navigation-linking-pages/
Note: You cannot link to a multipage document with Ajax navigation active because the framework will only load the first page it finds, not the full set of internal pages. In these cases, you must link without Ajax (see next section) for a full page refresh to prevent potential hash collisions. There is currently a subpage plugin that makes it possible to load in multi-page documents.

How is this URL modification possible?

Could anyone please tell how the site http://www.outsharked.com/imagemapster/default.aspx?what.html is working in such way? Modifying the url without loading/reloading the page. I think this is not done by html5. Because it works in IE6 which doesn't support html5.
I created that site. The commenter is correct, it uses Javascript to change the URL. There's nothing about how that navigation works that is different for IE6 - that browser supports the necessary client-side functionality to do this kind of thing. The basic functionality involves:
capturing click events on the nav, and loading the inner content via AJAX
update the URL to reflect a working direct URL to target.
The links also are valid anchor links that, in the absence of Javascript, would go to the same page (but load the whole thing). This is your basic AJAX web site setup with one minor difference. It's common practice to use a URLs like this in AJAX/single page web sites:
http://mysite.com/home#somepage
or even just
http://mysite.com/#somepage
Where the hashtag part represents the actual page a user has navigated to. If someone accessed that url directly, e.g. from outside the site, the site would use Javascript to load the correct content based on the hashtag, after the page had loaded. This means that there might be a little delay for the inner content to reflect the correct page, since it has to run another request after the initial page has loaded from the browser to get the inner content via AJAX.
I was trying to avoid that by creating a setup that worked completely with and without Javascript. If you go directly to a URL within the site such as http://www.outsharked.com/imagemapster/default.aspx?faq.html you will notice it loads the content directly. This URL will work even if Javascript is disabled. You can't actually do this using hashtags, since hashtag content is not sent to the server. Only the client knows what's after the hashtag in a URL. That's why I was using query strings to represent inner pages.
This site architecture was sort of an experiment at the time. It works pretty well but the code isn't fantastic, I didn't really do anything else with it, and I'm sure there are other better-fleshed-out/tested/full-featured frameworks out there to do much the same thing.
But it might not be a bad example of the nuts and bolts of creating a basic AJAX navigation setup, as a learning tool, since it's pretty concise, and also does HTML5 history navigation (e.g. so the back button works on modern browsers).

Stop part of page being index by search engines?

How can I stop search engines indexing part of my page? Is there an HTML5 element for this?
Its just a line of text that I want to hide (a co-worker doesnt want their name on google for some reason). Im thinking that I could inject the text with javascript, but I have heard google does sometimes look inside javascript files.
I also thought of using images instead of text, but im concerned how this will look cross device and platform. Ive noted text rendering can differ on mac and pc and thats before ive had to think about mobile devices, retina displays, etc.
Thanks
You can't hide content unless you use the methods you've already outlined above. Your best bet is to use JavaScript in an external file and then block that file using robots.txt.

iOS HTML Rewrite?

I've not written an iOS app and want to know if what I want to do is reasonably easy before I invest all my time in it. The idea is simply to leverage the built-in webkit methods to write my own browser. I've seen tutorials where this is done fairly easily. However, the twist is I want to apply some rewrite/regex rules prior to the page rendering. ie, you load http://example.com which is a page containing the word 'foo'. Prior to displaying the page, the app rewrites 'foo' to 'bar' and renders.
Is this possibly to do easily without actually writing a ground-up browser?
Thanks!
It's doable (assuming you're using the standard UIWebView component to render the page), and there are a few ways you could go about it. Among them:
You could download the HTML and parse it via Objective-C string handlers before loading it into the UIWebView
You coud use load the HTML as-is and use the UIWebview's stringByEvaluatingJavaScriptFromString: message to "inject" javascript onto the page, manipulating the DOM itself
You could go the Opera route, and pre-render the page via a server-side proxy before downloading it to the client.
How far down the rabbit hole you want to go would be up to you, of course. Easy is in the eye of the beholder.

Hard refreshes and SEO

I couldn't find an answer to this on the web. I have a site, where I try to avoid hard refreshes as much as possible. It's a sequences of photos, and upon a user click of a central div, a little page (a RoR partial) loads within that div with a new photo in it.
The user keeps clicking, the photo keeps changing, and the URL of the page never changes. The title of the photo does change though. And so I want the web crawlers to see this...
Is there any advantage to having a hard refresh or not in this scenario? Will the Web Crawlers see the title of the photo in the div, and index my home page? Or at least the url of the inner div?
I hope this makes sense! Thanks!
It all depends on what you mean by a hard refresh. If all of the pictures, and their related data (title etc.) are loaded when the page first loads, and the click is just a javascript event that changes the css a bit to display the next picture then that has no negative effect on SEO. If clicking that link makes an ajax request back to your server to retrieve the image, then it will never get picked up by the search engine web crawler, and will not contribute to SEO.
If you aren't sure if this click is an ajax request, or just a css change, you can look at your html source to figure it out. If all your image tags are in your html source then it's not making an ajax request. If you only see one (or zero) then it is making an ajax request.
If the page title would never change, then there's no benefit. But if you're loading a new image, the page title should change for optimal SEO.
There's a workaround, though. Just make it to where you can access the images specifically with a static page and make sure Google spiders it. You can keep the normal page flow as-is using this method.
Edit: I should add that I had a site that got 60% of it's traffic from Google Image searches, so I'd say you'd definitely want them indexed separately.

Resources