I am implementing a complex search module with result page support paging. Most of examples provided just passes pagenumber as a parameter for the Index action, and the action uses the pagenumber to perform a query each time the user hit a different page number.
My problem is that my search take many many more criteria(more than 10 criterias) than just simple pagenumber. Therefore, I would like to preserve either search criteria or search result data after users' first submission, so that I only have to pass the pagenumber back and forth.
So I don't know which way is better: preserve search criterias, so every time when click to new page, call controller action do search again? Or preserve search result data, so application don't need query database again and again, but the data been preserved will big. If you have any idea, how to implement? Thanks in advence.
Preserving the search criteria in the querystring is generally best. It will allow users to bookmark the search.
Preserving search result data brings up issues of potential stale data and consumes more resources server-side. This wouldn't work well with large data sets anyway, as you would only be selecting one page at a time, so caching in memory wouldn't help much when the user navigates to the next page.
/I'd suggest you generate a unique key for each search, and store an object that contains all the search criteria in memory, or DB, with that unique key. then pass the unique key on the querystring./
So your means, save search criteria with unique key in DB, anytime I need THE search result again(include change page index), get unique key from querystring, run query again. That your suggestion, right? Thank you very much for your advice. Very help.
Related
I have an alphabetized index of people. My goal is to find what page of that index a person is listed on. For instance, "Tim Curry" might be listed on page 5 of the "T" section. Currently I'm getting the page number with ActiveRecord; Elasticsearch results are 20 per page, so I can work out the page number based on the index. But it seems wiser to get the page number directly from Elasticsearch if at all possible to ensure that I'm getting the right page. Is there a way to get this data from ES?
def page_index
letter= self.name[0].downcase
index=Person.where("lower(name) like?", "#{letter}%").order("lower(name)").pluck(:id).index(self.id)
page=index/20 + 1
end
This functionality does not come bundled with ElasticSearch. Using the results per page and index is the correct approach if that is the functionality you are looking for.
Since it's not clear exactly which document you need, or what the overall UX you are trying to achieve is, I would keep in mind you can always search your index(ces) for a specific document via various mean (filtered query term on name if you need "Tim Curry", id or _uid etc.).
Also ES is a full-text based search client, finding one Object and it's properties might be better served via a database call.
Again this is slightly heresay, as I don't know what exactly you need or are trying to achieve overall, however finding the page of a specific result in your set of returned results is best down via accessing index in your results and simple math.
I'm creating a registration app, where I make users to register themselves.
Now admin has option to fetch export all registered users to CSV/Excel file. Records could be thousands. I can't fetch them all in once in one row, how to make it fast ? Will indexing help ?
When exporting the entire table, all data must be read.
Adding an index cannot reduce the amount of data.
An index would help only if you sort the output by some column(s).
Indexing will generally help when reading or updating a subset of data in the table, such as a single row or group or rows.
For example, if you wanted to allow all the data to be exported, or even viewed, one page at a time, a primary key field with an index can be used to retrieve a subset, such as 20, rows at a time, resulting in a better "perceived" performance, rather than waiting for the entire table to be exported before viewing any of the data.
It is pretty common for a web application to display a list of items and for each item in the list to indicate to the current user whether they have already viewed the associated item.
An approach that I have taken in the past is to store HasViewed objects that contain the Id of a viewed item/object and the Id of the User who has viewed that item/object.
When it comes time to display a list of items this requires querying the database for the items, and separately querying the database for the HasViewed objects, and then combining the results of these queries into a set of objects constructed solely for the purpose of displaying them in the view.
Each e.g li then uses the e.g. has_viewed property of the objects constructed above.
I think it is time to find a better approach and would like to know what approaches you would recommend.
i wanna to know whether there is better idea also.
Right now my solution is putting the state in the redis and cached the view with fragment cache.
After hearing about NoSQL for a couple years I finally started playing with RavenDB today in an .Net MVC app (simple blog). Getting the embedded database up and running was pretty quick and painless.
However, I've found that after inserting objects into the document store, they are not always there when the subsequent page refreshes. When I refresh the page they do show up. I read somewhere that this is due to stale indexes.
My question is, how are you supposed to use this in production on a site with inserts happening all the time (example: e-commerce). Isn't this always going to result in stale indexes and unreliable query results?
Think of what actually happens with a traditional database like SQL Server.
When an item is created, updated, or deleted from a table, any indexes associated with table also have to be updated.
The more indexes you have on a table, the slower your write operations will be.
If you create a new index on an existing table, it isn't used at all until it is fully built. If no other index can answer a query, then a slow table scan occurs.
If others attempt to query from an existing index while it is being modified, the reader will block until the modification is complete, because of the requirement for Consistency being higher priority than Availability.
This can often lead to slow reads, timeouts, and deadlocks.
The NoSQL concept of "Eventual Consistency" is designed to alleviate these concerns. It is optimized reads by prioritizing Availability higher than Consistency. RavenDB is not unique in this regard, but it is somewhat special in that it still has the ability to be consistent. If you are retrieving single document, such as reviewing an order or an end user viewing their profile, these operations are ACID compliant, and are not affected by the "eventual consistency" design.
To understand "eventual consistency", think about a typical user looking at a list of products on your web site. At the same time, the sales staff of your company is modifying the catalog, adding new products, changing prices, etc. One could argue that it's probably not super important that the list be fully consistent with these changes. After all, a user visiting the site a couple of seconds earlier would have received data without the changes anyway. The most important thing is to deliver product results quickly. Blocking the query because a write was in progress would mean a slower response time to the customer, and thus a poorer experience on your web site, and perhaps a lost sale.
So, in RavenDB:
Writes occur against the document store.
Single Load operations go directly to the document store.
Queries occur against the index store
As documents are being written, data is being copied from the document store to the index store, for those indexes that are already defined.
At any time you query an index, you will get whatever is already in that index, regardless of the state of the copying that's going on in the background. This is why sometimes indexes are "stale".
If you query without specifying an index, and Raven needs a new index to answer your query, it will start building an index on the fly and return you some of those results right away. It only blocks long enough to give you one page of results. It then continues building the index in the background so next time you query you will have more data available.
So now lets give an example that shows the down side to this approach.
A sales person goes to a "products list" page that is sorted alphabetically.
On the first page, they see that "Apples" aren't currently being sold.
So they click "add product", and go to a new page where they enter "Apples".
They are then returned to the "products list" page and they still don't see any Apples because the index is stale. WTF - right?
Addressing this problem requires the understanding that not all viewers of data should be considered equal. That particular sales person might demand to see the newly added product, but a customer isn't going to know or care about it with the same level of urgency.
So on the "products list" page that the sales person is viewing, you might do something like:
var results = session.Query<Product>()
.Customize(x => x.WaitForNonStaleResultsAsOfLastWrite())
.OrderBy(x=> x.Name)
.Skip((pageNumber-1) * pageSize).Take(pageSize);
While on the customer's view of the catalog, you would not want to add that customization line.
If you wanted to get super precise, you could use a slightly more optimized strategy:
When going back from the "add product" page to the "list products" page, pass along the ProductID that was just added.
Just before you query on that page, if the ProductID was passed in then change your query code to:
var product = session.Load(productId);
var etag = session.Advanced.GetEtagFor(product);
var results = session.Query<Product>()
.Customize(x => x.WaitForNonStaleResultsAsOf(etag))
.OrderBy(x=> x.Name)
.Skip((pageNumber-1) * pageSize).Take(pageSize);
This will ensure that you only wait as long as absolutely necessary to get just that one product's changes included in the results list along with the other results from the index.
You could optimize this slightly by passing the etag back instead of the ProductId, but that might be less reusable from other places in your application.
But do keep in mind that if the list is sorted alphabetically, and we added "Plums" instead of "Apples", then you might not have seen these results instantly anyway. By the time the user had skipped to the page that includes that product, it would likely have been there already.
You are running into stale queries.
That is a by design part of RavenDB. You need to make distinction between queries (BASE) and loading by id (ACID).
I was reading through rails tutorial (http://ruby.railstutorial.org/book/ruby-on-rails-tutorial#sidebar-database_indices) but confused about the explanation of database indicies, basically the author proposes that rather then searching O(n) time through the a list of emails (for login) its much faster to create an index, giving the following example:
To understand a database index, it’s helpful to consider the analogy
of a book index. In a book, to find all the occurrences of a given
string, say “foobar”, you would have to scan each page for “foobar”.
With a book index, on the other hand, you can just look up “foobar” in
the index to see all the pages containing “foobar”.
source:
http://ruby.railstutorial.org/chapters/modeling-users#sidebar:database_indices**
So what I understand from that example is that words can be repeated in text, so the "index page" consists of unique entries. However, in the railstutorial site, the login is set such that each email address is unique to an account, so how does having an index make it faster when we can have at most one occurrence of each email?
Thanks
Indexing isn't (much) about duplicates. It's about order.
When you do a search, you want to have some kind of order that lets you (for example) do a binary search to find the data in logarithmic time instead of searching through every record to find the one(s) you care about (that's not the only type of index, but it's probably the most common).
Unfortunately, you can only arrange the records themselves in a single order.
An index contains just the data (or a subset of it) that you're going to use to search on, and pointers (or some sort) to the records containing the actual data. This allows you to (for example) do searches based on as many different fields as you care about, and still be able to do binary searching on all of them, because each index is arranged in order by that field.
Because the index in the DB and in the given example is sorted alphabetically. The raw table / book is not. Then think: How do you search an index knowing it is sorted? I guess you don't start reading at "A" up to the point of your interest. Instead you skip roughly to the POI and start searching from there. Basically a DB can to the same with an index.
It is faster because the index contains only values from the column in question, so it is spread across a smaller number of pages than the full table. Also, indexes usually include additional optimizations such as hash tables to limit the number of reads required.