Cleaning Up Query Strings - asp.net-mvc

This is more of an open question. What is your opinion on query strings in a URL? While creating sites in ASP.NET MVC you spend a lot of time thinking about and crafting clean URLs only for them to be shattered the first time you have to use query strings, especially on a search form.
For example I recently did a fairly simple search form with half a dozen text field and two or three lists of checkboxes and selects. This produced the query string below when submitted
countrylcid=2057&State=England&StateId=46&Where=&DateFrom=&DateTo=&Tags=&Keywords=&Types
=1&Types=0&Types=2&Types=3&Types=4&Types=5&Costs=0.0-9.99&Costs=10.00-29.99&Costs=30.00-59.99&Costs=60.00-10000.00
Beautiful I think you'll agree. Half the fields had no information in them and the list inputs are very verbose indeed.
A while ago I implemented a simple solution to this for paging which produced a url such as
www.yourdomain.com/browse/filter-on/page-1/perpage-50/
This used a catchall route to grab what is essentially a replacement query string after the filter-on portion. Works quite well but breaks down when doing form submissions.
Id be keen to hear what other solutions people have come up with? There are lots of articles on clean urls but are aimed at asp.net developers creating basic restful urls which MVC has covered. I am half considering diving into model binding to produce a proper solution along those lines. With the above convention the large query string could be rewritten as:
filter-on/countrylcid-2057/state-England/stateId-46/types-{1,0,2,3,4,5}/costs-{0.0-9.99,10.00-29.99,30.00-59.99,60.00-10000.00}/
Is this worth the effort?
Thanks,

My personal view is that where users are likely to want to either bookmark or pass on URLs to other people then a nice, clean "friendly" URL is the way to go. Aesthetically they are much nicer. For simple pagination and ordering then a re-written URL is a good idea.
However, for pages that have a large number of temporary, dynamic fields (such as a search) then I think the humble query string is fine. Like wise for pages whose contents are likely to change significantly given the exact same URL in the future. In these cases, URLs with query strings are fine and, perhaps, even preferable as they at least indicate to the observant user that the page is dynamic. However, in these cases it may be better to use form POST variables, anyway, that way visitors are not tempted to "fiddle" with the values.

In addition to what others have said, a URL implies a hierarchy that is semantic. Whether true today or not, the ancestry is directories and people still think of it as such. That's why you have controller/action/id. Likewise, to me a querystring implies options or queries.
Personally, I think a rewritten URL is best when you can't tell if there's an interpreter behind it -- maybe it's just a generated HTML file?
So however you choose to do it (and it's a pain on the client in a search form -- I'd say more trouble than it's worth), I'd support you doing it for hierarchies.
E.g. /Search/Country/State/City
but once you start getting into prices and types, or otherwise having to preface a "directory" with the type of value (e.g. /prices=50.00/ or worse, with an array), then that's where you've lost me.
In fact, if all elements are filled in, then all you've really done is taken the querystring, replaced "&" with "/", and combined your arrays into a single field.
If you're going to be writing the javascript anyways, why don't you just loop through the form elements and:
Remove the empty ones, cleaning up the querystring from the "&price_low=&price_high=&" sorts of things.
Combine multiple values into an array structure
But then submit as a querystring.
James

Aren't the values of the different fields available in the FormsCollection anyway on post?

Related

Ransack/Metawhere querystring length

I have a Rails app, and it started like all Rails app, simple, with a searching function of first name and last name of users, now time has gone by, the user list has gotten very large, and the search has increased from a two field search to multiple field searching.
This created the problem of the query searching with MetaWhere is in the range of 10000+ characters. This has caused Nginx and HAProxy to break unless specific settings has been tweaked.
I am wondering what alternative solutions are there to solve this issue?
I thought about making the search a POST request, however, I am expecting pagination to work as well. Also the ability to "Copy and paste" the url.
Another potential solution is to encode the query string in database as a JSON blob, and then append a special hash for the query, but this has a lot of moving parts.

Lucene partial word matching

Lucene does not support it out of the box, so I need some help building my query.
Lets say I have the document with a field value "Develop"
I would like this document to be returned for the searches "Dev" and "lop".
Maybe creating two queries?
"*keyword"
and
"keyword*"
and
"keyword"
?
How would you go about doing this with multiple words? Would you split the sentence/search into a words list and do the previous example for each word?
What you're asking is if I understand you correctly not feasible on any large scale search engine.
Lucene creates an index over keywords using term-document matrix and inverted-file techniques (see links at the bottom). A fully fledged string matching might be very nice to have, but it does not scale: you will never be able to query a decently sized index (say more than a couple of dozen/hundreds of documents) in an acceptable time.
Still, here are two ideas that might help...
Syllable tokenization
To come back to your example with 'Develop'. As long as you are happy with letting users search for syllables I guess you can do something.
You would have to create use tokenizer that splits up words in your indexed according to their syllables and create a database index over the syllables. (I am not sure there are built in tokenizers for the English language that can do that and writing one on your own might be tricky...)
An important thing to note:
If you would index the full words AND the seperate syllables the size of your index will be much larger than if you only index one of the two.
However I would not suggest to index only syllables. If you want to also allow your users to search for the full word 'Develop' (which I guess you want) this would result in two queries with a logical and between them, namely <'dev' AND 'lop'>. Although Lucene supports such logical constructs in queries they are very expensive. I have personally had some trouble in the past using logical queries in Lucene.
Stemming
Another way to somehow arrive at what you're trying could be to use a brutal form of word stemming (http://en.wikipedia.org/wiki/Stemming) that stems words to their first syllable. (This would allow to search for 'dev' but not for 'lop'...)
Again, I don't think such a word stem feature is already in Lucene. Writing one for yourself will be a pain and involve working with/importing huge dictionaries.
Links
These might be looking into if you don't know about search engine internals:
http://en.wikipedia.org/wiki/Index_%28search_engine%29
http://en.wikipedia.org/wiki/Vector_space_model
http://en.wikipedia.org/wiki/Inverted_file
http://en.wikipedia.org/wiki/Term-document_matrix
http://en.wikipedia.org/wiki/Tf-idf

Using ASP.NET MVC2, How to make filters work with hackable urls?

Think about New Egg's e-commerce drill down model to start, where keyword links on the nav bar further limit the currently displayed product list with each selection.
I setup a house finder with Routes like this:
http://{website.com}/{Housetype}/{PriceLow}-{PriceHigh}
http://mysite.com/Residential/180000-230000
http://mysite.com/Commercial/300000-500000
But now I want to add links on the side navigation that further limit content:
1 Bedroom
2 Bedroom
4 Bedroom
1-4 Acres
4-10 Acres
10+ Acres
How would I pass and maintain selections between clicks using MVC? I think my options are:
Keep adding named parameters in the query string like /{acres}:{low}-{high}, but this will require that all parameters exist, I wouldn't be able to pick more than one criteria, or I hit the max querystring length.
Once I know the {housetype} and {price range} then flip to a totally different strategy, maybe passing an object (like a DTO) that will .AddCriteria( "Acres", ".gt.", "1"); .AddCriteria("Acres", ".lt.", "4"), etc.... This sounds very complicated
Something else?
Again, looking for starter answers, the most common way (almost design pattern way) to send filter and sort criteria from a client to ASP.NET MVC. Links are helpful too.
I built a URL expression parser a few weeks ago that will filter an IQueriable based on expressions from a URL. Here's a blog post for the explanation. Basically you can build your own expressions and plug them into the parser... or you could do it the standard IQueriable way with something like /price-greaterthan-3/price-lessthan-4000/acres-lessthan-10/
Here's a link to the code

What is the best approach for a interpreting an text input for geocoding purposes?

Consider the following site:
http://maps.google.com
It has a main text input, where the user can type business, countries, provinces, cities, addresses and zip codes. I wonder which is the best way to implement a search like this. I realize that probably Google Maps uses a full text search with all kinds of data in the same table, and it has a chance of having a parser which classifies the input (i.e. between numeric, like zip codes and coordinates, and textual, like business and addresses).
With the data spread in many tables and systems, a parser is essential. The parser could be built from regular expressions, or could be built with IA tools like Artificial Neural Networks and Genetic Algorithms.
Which approach would you recommend?
It might be best to aggregate the data from all of your tables into a search index. Lucene is a free search engine, similar to how Google's search engine works (inverted index), and it should allow you to search by any of those values or any combination of them with relative ease.
http://lucene.apache.org/java/docs/
Lucene comes with its own query language (again, very similar to Google's or any other Internet search sites syntax). The only drawback of using something like Lucene is you would need to build its index. You wouldn't be querying your database directly (which could get very complicated...inverted index are pretty much designed for what your trying to do), so you need to periodically gather up new information from your database and add it to your index. It might also be necessary to rebuild your index to remove unneeded data.
With Lucene, you get a pretty flexible query syntax that most people are familiar with (because pretty much everyone searches the internet), it performs very well, and is not terribly complicated. By using Lucene, you avoid the hit of using regular expressions (which are not the most performant text searching mechanism), and you don't have to write your own parser. Should be a win-win, aside from a little learning curve to build a Lucene index generator and figure out how to query that index.
I'd have the data in one database. If the data got to big or I knew it would be huge, I'd assign an id to each business, address etc, then have other tables which reference this data.
Regular Expressions would only be necessary if the user could define what they want to search for:
business: Argos
But then what happens if they want an Argos in Manchester (Sorry, I'm English), maybe then get the location of the user based on their IP but what happens if they say:
business: Argos Scotland
Now you don't know if the company has two words, or if there is a location next to it. All of this has to be taken into consideration.
P.s Sorry if that made no sense.
You will need to pre process the query before doing a full text search on it. If you are using a GIS database, then you will already have columns like city, areacode, country etc. Convert your query into tokens seperated on space or commas, or both. Then hit individual columns to see match. This way you will know what part of the query is the city, the areacode etc.
You could also try some naive approximation approaches,example - 6 consecutive numbers will probably be an area code. Look for common words like "road" , "restaurant" , "street" etc which will be part of many queries and then use some approximation to figure out what they are looking for. Hope this helps.

How would you design a hackable url

Imagine you had a group of product categories organized in a nice tree hierarchy and you wanted to provide hackable urls to browse these. You could do something like this
/catalog/categorya/categoryb/categoryc
You could then quite easily figure out which category you should list the products for (note that the full URL is needed since you could have categories with the same name but at different locations in the hierarchy)
Now what would be a good approach to add product information in that as well? To give you an example, you wanted to display the product Oblivion for this category
/catalog/games/consoles/playstation/adventure
It's tempting to just add the product at the end of the url
/catalog/games/consoles/playstation/adventure/oblivion
but the moment you do so you loose the ability to know if its category or a product which is called oblivion. I personally feel that not being forced to add a suffix such as .html
/catalog/games/consoles/playstation/adventure/oblivion.html
would be the nicest solution and using some sort of prefix, such as
/catalog/games/consoles/playstation/adventure/product:oblivion
You could also add some sort of trigger like
/catalog/games/consoles/playstation/adventure/PRODUCT/oblivion
not as nice either and you would (even though its very unlikely it would be a problem) restrict yourself from having a category called product
So far a suffix solution looks like the most user-friendly approach that I can think of from the top of my head but I'm not fond of having to use an extension
What are your thoughts on this?
Deep paths irk me. They're hideous to share.
/product/1234/oblivion --> direct page
/product/oblivion --> /product/1234/oblivion if oblivion is a unique product,
--> ~ Diambiguation page if oblivion is not a unqiue product.
/product/1234/notoblivion -> /product/1234/oblivion
/categories/79/adventure --> playstation adventure games
/categories/75/games --> console games page
/categories/76/games --> playstation games page
/categories/games --> Disambiguation Page.
Otherwise, the long urls, while seeming hackable, require you to get all node elements right to hack it.
Take php.net
php.net/str_replace --> goes to
http://nz2.php.net/manual/en/function.str-replace.php
And this model is so hackable people use it all the time blindly.
Note: The .html suffix is regarded by the W3C as functionally meaningless and redundant, and should be avoided in URLs.
http://www.w3.org/Provider/Style/URI
Lets disect your URL in order to be more DRY (non-repetitive). Here is what you are starting with:
/catalog/games/consoles/playstation/adventure/oblivion
Really, the category adventure is redundant as the game can belong to multiple genres.
/catalog/games/consoles/playstation/oblivion
The next thing that strikes me is that consoles is also not needed. It probably isn't a good idea to differentiate between PC's and Console machines as a subsection. They are all types of machines and by doing this you are just adding another level of complexity.
/catalog/games/playstation/oblivion
Now you are at the point of making some decisions about your site. I would recommend removing the playstation category on your page, as a game can exist across multiple platforms and also the games category. Your url should look like:
/catalog/oblivion
So how do you get a list of all the action games for the Playstation?
/catalog/tags/playstation+adventure
or perhaps
/catalog/tags/adventure/playstation
The order doesn't really matter. You have to also make sure that tags is a reserved name for a product.
Lastly, I am assuming that you cannot remove the root /catalog due to conflicts. However, if your site is tiny and doesn't have many other sections then reduce everything to the root level:
/oblivion
/tags/playstation/adventure
Oh and if oblivion isn't a unique product just construct a slug which includes it's ID:
/1234-oblivion
Those all look fine (except for the one with the colon).
The key is what to do when they guess wrong -- don't send them to a 404 -- instead, take the words you don't know and send them to your search page results for that word -- even better if you can spell check there.
If you see the different pieces as targets then the product itself is just another target.
All targets should be accessable by target.html or only target.
catalog/games/consoles/playstation.html
catalog/games/consoles/playstation
catalog/games/consoles/playstation/adventure.html
catalog/games/consoles/playstation/adventure
catalog/games/consoles/playstation/adventure/oblivion.html
catalog/games/consoles/playstation/adventure/oblivion
And so on to make it consistent.
My 5 cents...
One problem is that your user's notion of a "group of product categories organized in a nice tree hierarchy" may match yours.
Here's a google tech talk by David Weinberger's "Everything is Miscellaneous" with some interesting ideas on categorizing stuff:
http://www.youtube.com/watch?v=x3wOhXsjPYM
#Lou Franco yeah either method needs a sturdy fallback mechanism and sending it to some sort of suggestion page or seach engine would be good candidates
#Stefan the problem with treating both as targets are how to distinguish them (like I described). At worst case scenario is that you first hit your database to see if there is a category which satisfies the path and if it doesn't then you check if there is a product which does. The problem is that for each product path you will end up making a useless call to the database to make sure its not a category.
#some yeah a delimiter could be a possible solution but then a .html suffix is more userfriendly and commonly known of.
i like /videogames/consolename/genre/title" and use the amount of /'s to distinguish between category or product. The only thing i would be worried about multi (or hard to distinguish) genre. I highly recommend no extension on title. You could also do something like videogames(.php)?c=x360;t=oblivion; and just guess the missing information however i like the / method as it looks more neat. Why are you adding genre? it may be easier to use the first letter of the title or just to do videogame/console/title/
My humble experience, although not related to selling games, tells me:
editors often don't use the best names for these "slugs", they don't chose them wisely.
many items belong (logically) to several categories, so why restrict them (technically) to a single category?
Better design item urls by ids, (i.e. /item/435/ )
ids are stable (generated by the db, not editable by the editor), so the url stands a much bigger chance at not being changed over time
they don't expose (or depend on) the organization of the objects in the database like the category/item_name style of urls does. What if you change the underlying design (object structure) to allow an item to belong to multiple categories? the category/item urls suddenly won't make sense anymore; you'll change your url design and old urls might not work anymore.
Labels are better than categories. That is to say, allowing an item to belong to several categories is a better approach than assigning one category to each item.
the problem with treating both as
targets are how to distinguish them
(like I described). At worst case
scenario is that you first hit your
database to see if there is a category
which satisfies the path and if it
doesn't then you check if there is a
product which does. The problem is
that for each product path you will
end up making a useless call to the
database to make sure its not a
category.
So what? There's no real need to make a hard distinction between products and categories, least of all in the URI, except maybe a performance concern over an extra database call. If that's really such a big deal to you, consider these two suggestions:
Most page views will presumably be on products, not categories. So doing the check for a product first will minimize the frequency with which you need to double up on the database lookups.
Add code to your app to display the time taken to generate each page, then go out to the nearest internet cafe (not your internal LAN!) with a stopwatch. Bring up some pages from your site and time how long each takes to come up. Subtract the time taken to generate the page. Also compare the time taken to generate one-database-lookup pages vs. two-database-lookup pages. Then ask yourself, when it takes maybe 1-2 seconds total to establish a network connection, generate the content, and download the content, does it really matter whether you're spending an extra 0.05 second or less for an additional database lookup or not?
Optimize where it matters, like making URLs that will be human-friendly (as in Chris Lloyd's answer). Don't waste your time trying to shave off the last possible fraction of a percent.

Resources