Expanding a website - providing different contents across different places

Expanding a website - providing different contents across different places - localization

I am working on a website. Currently the website was targeted to serve users from a specific Geographic region. Now I would like to expand its userbase to another region. The need is to serve different contents to different regions with the same base functionality.
My initial thought (I might sound a noob here) is to host the content specific to different regions on different databases -> Redirect users to specific domains and thus map the users geographically. Do suggest if its the right way to proceed.
Also, I would like to know whether there is a need to localize my website for these regions (Current language used is English)
Please post your experiences in such scenarios and also your ideas to bring about the transition.
Thanks in advance.

How do you see users being matched to their specific regional content?
Will they be presented with an option to choose?
Will you use geo functions to determine location?
Will you use server based reverse DNS lookup to determine location?
Will each region get its own "entry" URL (aka different domains)?
The first three are fraught with their own specific problems...
Presenting a choice/menu is considered bad form because it adds to the number of "clicks" necessary for a user to get to the content they actually came for.
While geo functions are very widely supported in all modern browsers, it is still seen as an issue of privacy in that a large number of users will not "allow" the functionality, meaning you'll have to fallback to a choice/menu approach anyway.
Server based reverse DNS, while a common practice, is very unreliable because many users are using VPN, proxies, TOR, etc. to specifically mask their actual location via this method of lookup.
Personally, my experience is to use completely separate entry URLs that are all hosted as virtual domains on a single Web Server. This gives you a large array of methods of determining which entry URL was used to access your code, and then format/customize the content appropriately.
There is really no need to setup separate servers and/or databases to handle these different domains/regions.
With that said, even if the language is common across regions, it is a very good habit to configure your servers and databases to support UTF-8 end-to-end, such that if any language specific options need to be supported in the future, then you won't need to change your code to do so. This is especially true if your site will capture any user generated input.

Related

How can I count unique views

I have a tricky problem to solve. I need to be able to track how many unique views to a URL (that includes a promo code): www.mysite.com/hello/PROMO
The issues are:
Since I pay for views based on promo code, I want to make sure people can't fake views.
It's difficult for me to count views from the same IP as fraudulent, because a lot of my users are on college campuses (i.e. behind the same NAT, often using one of a handful of IP addresses).
Any ideas on how I can have a robust view tracking system, given my concerns re IP repeats amongst my users?
FYI - I am using Ruby on Rails.
Edit - My site uses Google Analytics, so I'm wondering how accurate GA is at determining unique visitors to a specific URL (and whether the API is capable of giving me that result).

How do you decide how much data to push to the user in Single Page Applications?

Say you have a Recipe Manager application that you're building with a Web Api project. Do you send the list of recipes along with their ingredient names in JSON? Or do you send the recipes, ingredient names, and ingredient details? What's the process in determining how big the initial payload should be for a SPA?

These are the determining factors in how much to send to the client in an initial page:
Data that will be displayed for that first page
Lookup list data for any drop downs on that page
Data that is required for and presentation rules (might not be displayed but is used)
On a recipe page that would show a list of recipes, I would get the recipes and some key factors to display (like recipe name, the dish, and other key info) that can be displayed in a list. Enough for the user to make a determination on what to pick. Then when the user dives into a recipe, then go get that 1 recipe's details.
The general rule is get what you user will almost certainly need up front. Then get other data as they request it.

The process by which you determine how much data to send solely depends on the experience you want to provide your users - however it's as simple as this. If my experience demands that I readily display all of the recipes with a brief description and then allow them to drill into the recipe to get more information, then I'm only going to send enough information to produce the display and navigate further into the entity.
If then after navigating into the recipe it requires that you display the ingredient names and measures then send down that and enough information to navigate further into any single ingredient.
And as you can see it just goes on and on.

It depends if your application is just a simple HTTP API backing your web page, or your goal is something more akin to Platform As A Service. One driver for the adoption of SPA is that it makes the browser another client, just like an iOS or Android app,or a 3rd party.
If you want to support multiple clients, then it's likely that you want to design your APIs around the resources that you are trying to expose, such that you can use the uniform interface of GET/POST/PUT etc. against that resource. This will means it is much more likely that you are not coding in an client specific style and your API will be usable by a wide range of clients.
A resource is anything you would want to have its own URN.
I would suggest that is likely that in this case you would want a Recipe Book resource which has links to individual Recipe resources, which probably contain all the information necessary for that Recipe. Ingredients would only be a separate resource if you had more depth on what an Ingredient contained and they had their own resource.
At Huddle we use a Documentation Driven Design approach. That is we write the documentation for our API up front so that we can understand how usable our API would be. You can measure API quality in WTFs. http://code.google.com/p/huddle-apis/
Now this logical division might not be optimal in terms of performance. Your dealing with a classic tradeoff (ultimately architecture is all about balancing design tradeoffs) here between usability of your API and the performance of your API. Usually, don't favour performance until you know that it is an issue, because you will pay a penalty in usability or maintainability for early optimization.

Another possibility is to implement the OData query support for WebAPI. http://www.asp.net/web-api/overview/odata-support-in-aspnet-web-api
That way, your clients can perform their own queries to return only the data they need.

How would I go about routing visitors based on location in rails?

I'm working on a rails app that needs to route users to a specific URL based on their location. Preferably something that will present them the appropriate content based on location with the ability for them to be able to view content for other locations.
Specifically, think of the location interface for Craigslist... Users are presented content from the city they are in and still allowed to select and view another city.
I've seen a few posts that answer parts of this question, but I'm trying to plan out the best solution.
It looks like there is going to need to be something, probably cookie based, that sets a 'default' location for a given user and still allows them to select other locations.
Again, just looking for concept/planning assistance and any direction on any gems that might be applicable.
Thanks in advance!

http://dev.maxmind.com/geoip/geolite is a free geo-ip database that works pretty well. It makes some mistakes (it put a client's office of mine in Kirkland, WA when they are in fact in downtown Seattle, WA). Certainly is good enough for Craigslist level specificity since you'd be re-routing both those people to "seattle" anyway. There's a ruby gem for it as well - "geoip-c". It's very easy to use.
The other option would be to use HTML5's "gimme your location" functionality. More intrusive for the user, but might be more specific.

Prevent bot from crawling certain areas of site

I don't know much about SEO and how web spiders work, so forgive my ignorance here. I'm creating a site (using ASP.NET-MVC) which has areas that displays information retrieved from the database. The data is unique to the user, so there's no real server-side output caching going on. However, since the data can contain things the user may not wish to have displayed from search engine results, I'd like to prevent any spiders from accessing the search results page. Are there any special actions I should take to ensure that the search result directory isn't crawled? Also, would a spider even crawl a page that's dynamically generated and would any actions preventing certain directories being search mess up my search engine rankings?
edit: I should add, I'm reading up on robots.txt protocol, but it relies on co-operation from the web crawler. However, I'd also like to prevent any data-mining users who will ignore the robots.txt file.
I appreciate any help!

You can prevent some malicious clients from hitting your server too heavily by implementing throttling on the server. "Sorry, your IP has made too many requests to this server in the past few minutes. Please try again later." In practice, though, assume that you can't stop a truly malicious user from bypassing any throttling mechanisms that you put in place.
Given that, here's the more important question:
Are you comfortable with the information that you're making available for all the world to see? Are your users comfortable with this?
If the answer to those questions is no, then you should be ensuring that only authorized users are able to see the sensitive information. If the information isn't particularly sensitive but you don't want clients crawling it, throttling is probably a good alternative. Is it even likely that you're going to be crawled anyway? If not, robots.txt should be just fine.

It seems like you have 2 issues.
Firstly a concern about certain data appearing in search results. The second about malicious or unscrupulous user harvesting user related data.
The first issue will be covered by appropriate use of a robots.txt file as all the big search engines honour this.
The second issue seems more to do with data privacy. The first question which immediately springs to mind is: If there is user information which people may not want displayed, why are you making it available at all?
What is the privacy policy for such data?
Do users have the ability to control what information is made available?
If the information is potentially sensitive but important to the system could it be restricted so it is only available to logged in users?

Check out the Robots exclusion standard. It's a text file that you put on your site that tells a bot what it can and can't index. You will also want to address what happens if a bot doesn't honour the robots.txt file.

robots.txt file as mentioned. If that is not enough then you can:
Block unknown useragents - hard to maintain, easy for a bot to forge a browser's (although most legitimate bots wont)
Block unknown IP addresses - not useful for a public site
Require logins
Throttle user connections - tricky to tune, you will still be disclosing information.
Perhaps by using a combination. Either way it is a trade off, if the public can browse to it, so can a bot. Be sure you don't block & alienate people in your attempts to block bots.

a few options:
force the user to login to view the content
add a CAPTCHA page before the content
embed content in Flash
load dynamically with JavaScript

Why would Google Search use client-side URL parameters?

Yesterday morning I noticed Google Search was using hash parameters:
http://www.google.com/#q=Client-side+URL+parameters
which seems to be the same as the more usual search (with search?q=Client-side+URL+parameters). (It seems they are no longer using it by default when doing a search using their form.)
Why would they do that?
More generally, I see hash parameters cropping up on a lot of web sites. Is it a good thing? Is it a hack? Is it a departure from REST principles? I'm wondering if I should use this technique in web applications, and when.
There's a discussion by the W3C of different use cases, but I don't see which one would apply to the example above. They also seem undecided about recommendations.

Google has many live experimental features that are turned on/off based on your preferences, location and other factors (probably random selection as well.) I'm pretty sure the one you mention is one of those as well.
What happens in the background when a hash is used instead of a query string parameter is that it queries the "real" URL (http://www.google.com/search?q=hello) using JavaScript, then it modifies the existing page with the content. This will appear much more responsive to the user since the page does not have to reload entirely. The reason for the hash is so that browser history and state is maintained. If you go to http://www.google.com/#q=hello you'll find that you actually get the search results for "hello" (even if your browser is really only requesting http://www.google.com/) With JavaScript turned off, it wouldn't work however, and you'd just get the Google front page.
Hashes are appearing more and more as dynamic web sites are becoming the norm. Hashes are maintained entirely on the client and therefore do not incur a server request when changed. This makes them excellent candidates for maintaining unique addresses to different states of the web application, while still being on the exact same page.
I have been using them myself more and more lately, and you can find one example here: http://blixt.org/js -- If you have a look at the "Hash" library on that page, you'll see my implementation of supporting hashes across browsers.
Here's a little guide for using hashes for storing state:
How?
Maintaining state in hashes implies that your application (I'll call it application since you generally only use hashes for state in more advanced web solutions) relies on JavaScript. Without JavaScript, the only function of hashes would be to tell the browser to find content somewhere on the page.
Once you have implemented some JavaScript to detect changes to the hash, the next step would be to parse the hash into meaningful data (just as you would with query string parameters.)
Why?
Once you've got the state in the hash, it can be modified by your code (or your user) to represent the current state in your application. There are many reasons for why you would want to do this.
One common case is when only a small part of a page changes based on a variable, and it would be inefficient to reload the entire page to reflect that change (Example: You've got a box with tabs. The active tab can be identified in the hash.)
Other cases are when you load content dynamically in JavaScript, and you want to tell the client what content to load (Example: http://beta.multifarce.com/#?state=7001, will take you to a specific point in the text adventure.)
When?
If you had a look at my "JavaScript realm" you'll see a border-line overkill case. I did it simply because I wanted to cram as much JavaScript dynamics into that page as possible. In a normal project I would be conservative about when to do this, and only do it when you will see positive changes in one or more of the following areas:
User interactivity
Usually the user won't see much difference, but the URLs can be confusing
Remember loading indicators! Loading content dynamically can be frustrating to the user if it takes time.
Responsiveness (time from one state to another)
Performance (bandwidth, server CPU)
No JavaScript?
Here comes a big deterrent. While you can safely rely on 99% of your users to have a browser capable of using your page with hashes for state, there are still many cases where you simply can't rely on this. Search engine crawlers, for example. While Google is constantly working to make their crawler work with the latest web technologies (did you know that they index Flash applications?), it still isn't a person and can't make sense of some things.
Basically, you're on a crossroads between compatability and user experience.
But you can always build a road inbetween, which of course requires more work. In less metaphorical terms: Implement both solutions so that there is a server-side URL for every client-side URL that outputs relevant content. For compatible clients it would redirect them to the hash URL. This way, Google can index "hard" URLs and when users click them, they get the dynamic state stuff!

Recently google also stopped serving direct links in search results offering instead redirects.
I believe both have to do with gathering usage statistics, what searches were performed by the same user, in what sequence, what of the search results the user has followed etc.
P.S. Now, that's interesting, direct links are back. I absolutely remember seeing there only redirects in the last couple of weeks. They are definitely experimenting with something.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart