Why would Google Search use client-side URL parameters?

Why would Google Search use client-side URL parameters? - url

Yesterday morning I noticed Google Search was using hash parameters:
http://www.google.com/#q=Client-side+URL+parameters
which seems to be the same as the more usual search (with search?q=Client-side+URL+parameters). (It seems they are no longer using it by default when doing a search using their form.)
Why would they do that?
More generally, I see hash parameters cropping up on a lot of web sites. Is it a good thing? Is it a hack? Is it a departure from REST principles? I'm wondering if I should use this technique in web applications, and when.
There's a discussion by the W3C of different use cases, but I don't see which one would apply to the example above. They also seem undecided about recommendations.

Google has many live experimental features that are turned on/off based on your preferences, location and other factors (probably random selection as well.) I'm pretty sure the one you mention is one of those as well.
What happens in the background when a hash is used instead of a query string parameter is that it queries the "real" URL (http://www.google.com/search?q=hello) using JavaScript, then it modifies the existing page with the content. This will appear much more responsive to the user since the page does not have to reload entirely. The reason for the hash is so that browser history and state is maintained. If you go to http://www.google.com/#q=hello you'll find that you actually get the search results for "hello" (even if your browser is really only requesting http://www.google.com/) With JavaScript turned off, it wouldn't work however, and you'd just get the Google front page.
Hashes are appearing more and more as dynamic web sites are becoming the norm. Hashes are maintained entirely on the client and therefore do not incur a server request when changed. This makes them excellent candidates for maintaining unique addresses to different states of the web application, while still being on the exact same page.
I have been using them myself more and more lately, and you can find one example here: http://blixt.org/js -- If you have a look at the "Hash" library on that page, you'll see my implementation of supporting hashes across browsers.
Here's a little guide for using hashes for storing state:
How?
Maintaining state in hashes implies that your application (I'll call it application since you generally only use hashes for state in more advanced web solutions) relies on JavaScript. Without JavaScript, the only function of hashes would be to tell the browser to find content somewhere on the page.
Once you have implemented some JavaScript to detect changes to the hash, the next step would be to parse the hash into meaningful data (just as you would with query string parameters.)
Why?
Once you've got the state in the hash, it can be modified by your code (or your user) to represent the current state in your application. There are many reasons for why you would want to do this.
One common case is when only a small part of a page changes based on a variable, and it would be inefficient to reload the entire page to reflect that change (Example: You've got a box with tabs. The active tab can be identified in the hash.)
Other cases are when you load content dynamically in JavaScript, and you want to tell the client what content to load (Example: http://beta.multifarce.com/#?state=7001, will take you to a specific point in the text adventure.)
When?
If you had a look at my "JavaScript realm" you'll see a border-line overkill case. I did it simply because I wanted to cram as much JavaScript dynamics into that page as possible. In a normal project I would be conservative about when to do this, and only do it when you will see positive changes in one or more of the following areas:
User interactivity
Usually the user won't see much difference, but the URLs can be confusing
Remember loading indicators! Loading content dynamically can be frustrating to the user if it takes time.
Responsiveness (time from one state to another)
Performance (bandwidth, server CPU)
No JavaScript?
Here comes a big deterrent. While you can safely rely on 99% of your users to have a browser capable of using your page with hashes for state, there are still many cases where you simply can't rely on this. Search engine crawlers, for example. While Google is constantly working to make their crawler work with the latest web technologies (did you know that they index Flash applications?), it still isn't a person and can't make sense of some things.
Basically, you're on a crossroads between compatability and user experience.
But you can always build a road inbetween, which of course requires more work. In less metaphorical terms: Implement both solutions so that there is a server-side URL for every client-side URL that outputs relevant content. For compatible clients it would redirect them to the hash URL. This way, Google can index "hard" URLs and when users click them, they get the dynamic state stuff!

Recently google also stopped serving direct links in search results offering instead redirects.
I believe both have to do with gathering usage statistics, what searches were performed by the same user, in what sequence, what of the search results the user has followed etc.
P.S. Now, that's interesting, direct links are back. I absolutely remember seeing there only redirects in the last couple of weeks. They are definitely experimenting with something.

Related

Requesting input on conceptual ideas for disguising browser history

I am working with a Domestic Violence support organisation to build a website and have been asked to provide a "Quick Exit" function.
The purpose is to enable the user to exit the site quickly without closing the browser. I have seen such buttons on similar sites and the normal scenario is that they simply cause a Google search page to be shown. (easy but doesn't hide history)
I am looking for ideas to improve on this function to hide/disguise the history stored in the browser as this is currently a fairly significant flaw with the Quick Exit buttons I've seen to date.
I had a concept but I am looking for input on either fleshing out my concept, or other alternative directions to consider.
My concept was to have two domains: let's call them dv-site.com and decoy-site.com. The former being the source of domestic violence support information and the latter being some random content, could be anything, lets just say weather information for the sake of the conversation.
If a user navigates directly to dv-site.com the server redirects to decoy-site.com but also attaches some session specific, or perhaps single use query string or similar.
decoy-site.com validates the query string and, if valid, loads dv-site.com within an iframe or something like that so from the users perspective they are just looking at dv-site.com, though the domain recorded in history is decoy-site.com.
Links within the iframe loaded site would similarly be redirected with the same or a new query string.
If a user was to click on the browser history and go directly to decoy-site.com it would not be able to validate the query string and would just load the decoy site like a normal site. i.e. just showing weather information that exist on that site.
Domestic violence is a serious systemic issue and I would love some input from anyone who has more technical knowledge than I do on fleshing out this concept.
Other aspects I am unsure of how to tackle;
ensuring that dv-site.com can get crawled and ranked by search engines, even though users are all redirected, as it is imperative that it appears in search results so it can be found
technical aspects of a redirect that does not appear in history.
I'm unsure if it's possible to do this without all content and engagement being attributed to the decoy-site..

For the redirect, I believe that HTTP redirects do not get stored in history. You can use a 302 redirect for that. HTTP has a set-cookie header that lets you record a cookie - coupled with the headers here, you can give the decoy site access without recording it in history. Then, delete the cookie.
As far as pagerank goes, you could add a line to robots.txt as described here (the last point) to force the bot to scrape using a query parameter. Then in the backend, return the dv site only if that parameter is passed, otherwise redirect. If the googlebot removes query params when publishing, it will work out. Otherwise, it might fail.
Best of luck.

Storage of user data

When looking at how websites such as Facebook stores profile images, the URLs seem to use randomly generated value. For example, Google's Facebook page's profile picture page has the following URL:
https://scontent-lhr3-1.xx.fbcdn.net/hprofile-xft1/v/t1.0-1/p160x160/11990418_442606765926870_215300303224956260_n.png?oh=28cb5dd4717b7174eed44ca5279a2e37&oe=579938A8
However why not just organise it like so:
https://scontent-lhr3-1.xx.fbcdn.net/{{ profile_id }}/50x50.png
Clearly this would be much easier in terms of storage and simplicity. Am I missing something? Thanks.

Companies like Facebook have fairly intense CDNs. They may look like randomly generated urls but they aren't, each individual route is on purpose and programed to be handled in that manner.
They aren't after simplicity of storage like you would be if you were just using a FTP to connect to a basic marketing website server. While you may put all your images in a /images folder, Facebook is much too complex for this. Dozens of different types of applications accessing hundreds if not thousands of CDNs and servers world wide.
If you ever build a web app, such as a Ruby on Rails app, and you work with a services such as AWS (Amazon Web Services) you'll also encounter what seems like nonsensical urls. But it's all part of the fast delivery network provided within the architecture. Every time you "push" your app up to the server new urls are generated for each unique resource automatically, css files, JavaScript files, image files, etc all dynamically created. You don't have to type in each of these unique urls individually each time you publish the app, the code simply knows where to look for those as a part of the publishing process.
Example: you tell the web app to look for
//= require jquery
and it returns you http://example.com/assets/jquery-eb3e278249152b5b5d5170b73d9dbf52.js?body=1 in your header.
It doesn't matter that the url is more complex than it should be, the application recognizes it, and that's all that matters.

Simply put, I think it can boil down to two main reasons: Security and Cache:
Security - Adding these long unpredictable hashes prevent others from guessing photo URLs and makes it pretty hard to download photos you aren't supposed to.
Consider what would happen if I could easily guess your profile photo URL and download it, even when you explicitly chose to share it only with friends.
Cache - by adding "random" query params to each photo, you make sure each photo instance gets its own URL. Thus you can store the photo in browser's cache for a long time, knowing that whenever you replace it with a new one, the new photo will have a fresh URL and the browser won't keep showing you the old photo.
If you were to keep the same URL for each user's profile photo (e.g. https://scontent-lhr3-1.xx.fbcdn.net/{{ profile_id }}/50x50.png), and then upload a new photo, either one of these can happen:
If you stored the photo in browser's cache for a long time, the browser will keep showing you the cached version (as long as URL is the same, and cache hasn't expired, there's no need to re-download the image).
If, instead, you only keep the image in cache for short period of time, you end up hitting your server much more then actually needed, increasing the load and hurting performance.
I hope this clarifies it.

With your route scheme, how would you avoid strangers to access the pictures of a private account? The hash also prevent bots to downloads all the pictures.

I get your pain :-) I might not stay with describing how this problem could appear more, but rather let me speak of a solution. Well it is normal that in general code while dealing with hashed value or even base64ed value it seems likes mess to deal with, but with an identifier to explain along, it does not remain much!
I use to work in a company where we use to collate Facebook post, using Graph API get its Insights Object and extract information from it for easy passing around within UI and sending back to our Redis cache store; and once we defined a data-structure in TaffyDB how an object organization is going to look like, everything just made sense with its ability to query the useful finite from long junk looking stream of minified Javascript stream
Refer: http://www.taffydb.com/

The extra values in the URL are useful to:
Track access. This is like when a newspaper appends "&homepage" vs. "&email" to an article URL, so their system knows how a reader found the page.
Avoid abuse and control access. Imagine that a user loaded a small, popular pornographic image into a profile image. They could then hijack the CDN to be a free web host for their porn site. But that code is used internally by the CDN to limit the number of views.

Local storage on Rails

I've built a Rails app, basically a CRUD app for memos/notes.
A notes title must be unique. If a user enters a name already taken a warning message is shown prompting them to chose another.
My question is how to make this latency for this feedback as close to zero as possible. When creating a note little UX speed bumps like this will get annoying for user quickly.
Of course the main bottleneck is the network. Inspired by Meteor (and mini-mongo) I was thinking some kind of local storage could be a solution?
I.E. When app first loads, send ALL JSON to the client with ALL note titles. The app (front end is Angular JS) could check LocalStorage (or App Cache, Web SQL?) instead of incurring a network round trip. The feedback would be instant.
I've used LocalStorage in the past to augment an app, but in the scenario it'd really seriously depend on it. I'm not sure how confident I'd be building on something that user might not have. Also as the number of user Notes/Memos I have doubts how feasible it is to send a JSON object down the wire with ALL the note titles. That might get pretty big. On the other hand MeteorJS seems to do this with no probs.
Has anyone done something similar or have any pointers? Thanks!

I don't know how Meteor works here, but you're right that storing all note titles in localStorage is not a good idea. Actually, you don't need localStorage here, you can just put it in a JS array, because you need this data only once (when checking new note title).
I think, there could be 2 possible solutions:
You can change your business requirements and allow non-unique title. Is there really a necessity for titles to be unique?
You can verify note title when user submits form. In this case you can provide suggestions for users, so they not spend time guessing vacant title.
Or, if titles must be unique only within a user (two users can have same title for their notes), you can really load all note titles in JS array and check uniqueness while users types in a title.
Or you can send an AJAX request checking title uniqueness as soon as user finished typing the title. In this case you can win some seconds.
Or you can send an AJAX request as soon as user typed in 3 symbols. The request will return all titles that begin with these 3 symbols, so you don't need to load all the titles.

Persisting data in MVC for the duration of a users session

Apologies in advance as I'm sure this topic has no doubt been asked before but I couldn't find any post that answers my specific query.
Bearing in mind that I'm new to MVC this is where I have got to. I've got a project developed under VS 2010 using the MVC 3 framework. I've got a search page which consists of 6 fields and a nested model which itself holds around 3 fields.
I can successfully post all this data back to itself and the data is successfully passed as a model and back agian so the fields keep the data which the user has supplied.
Before I move on to actually using this search criteria on another view a thought hit me. I want to keep this search criteria, and possibly even the search results in memory for the duration of the users session.
The reasoning behind this is simply to save my users time by:
a) negating the need to keep re-inputting their search criteria regardless of how they enter or leave the search page
b) speed up the user experience by presenting the search results more quickly
The later isn't as important as the first requirement.
I've done some google searches and indeed had a look through this site on similar topics. From what I've read using sessions (which I would typically use if developing a PHP site) is a no no. From the reasons I've read as to why you shouldn't use sessions seem valid and I'm happy to go along with it.
But now I'm left in a place where I'm scratching my head wondering to myself what exactly is best practice to achieve this simple goal that could be applied to similar situations later down the line in the project.
I also looked at the OutputCache method and that didn't behave as I expected it to. In a test I set the timeout for 30 seconds. After submitting a search I clicked the link to my search page to see if the fields would auto-populate, they didn't. But then clicking the search button the values in the cache were retrieved. I thought I was making progress but when I tried to submit a new value the old value from the cache came back i.e. I couldn't actually change my search criteria with the cache enforced. So I've discounted this as an avenue to explore.
The last option seems to suggest the use of cookies as the most likely candidate, but rightly or wrongly I feel this isn't the best solution. I would have thought the MVC 3 design pattern would have an easier and recommended method of persisting values. I'm sure there is but I've just not discovered it yet.
I have started to use JQuery and again this has been mentioned but I'm not sure this is right direction to take either.
So in summary my question really comes down to what is considered by the wider community as best practice for persisting data in my situation. Effiency, scalability and resiliancy is paramount as I'll have a large global user base that will end up using this web app.
Thanks in advance!
Pete

I'd just use cookies. They're simple to use, you can persist them for as long as you want or have them expire when the users closes their browser, and it doesn't sound like you are storing anything sensitive in them.

Captcha after several consecutive visits or actions

I have a Rails application that has a search field that allows any visitor to search a database.
I'm hesitant to implement a Captcha because I'd like to keep the site clean and user-friendly.
However, I'd like to make it difficult for bots to try to harvest everything from the database by making tons of consecutive random queries. So I'm considering adding a Captcha that appears only if it looks like this is happening (e.g., the Captcha appears after a few bad searches).
Any suggestions for how to implement this? Should I try to use a session variable or keep track of IP addresses? Would I better off handling this issue at the server level (i.e., with an htaccess file)?

Consider using a honeypot. That means adding a form element that you hide with CSS. Bots cannot see that you've hidden the field and they will fill it in. Normal users will not fill it in.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart