I am fetching around 10,000 records from a service. I need to implement pagination in my ASP.NET UI. I don't want to store the records in the database. I have planned to fetch records in chunk (of 100 records) and put them in cache.
If I display 10 records/page, then I can paginate between 10 pages. Now if a user clicks page number 11, then again I will call the service, get the records and refresh the cache to hold a new set of records. If a user again clicks on the first page index, I need to hit the service again.
Is this a feasible strategy for pagination in an ASP.NET context? Also, too many records in cache could impact on performance. Could anybody suggest an effective approach for this kind of scenario?
If using Pagination there is no reason to cache. Otherwise this is generally the right approach.
Flow is select (select from) -> filter (where) -> sort (orderby) -> skip(page * pagesize) -> take(pagesize).
This should ALL get passed down to the data layer such that the user code is not actually executing any of it, the db is. The skip/take part is usually where people have the most issues as it requires a generate column in the query (row number) but is usually doable on the DB side.
Related
After hearing about NoSQL for a couple years I finally started playing with RavenDB today in an .Net MVC app (simple blog). Getting the embedded database up and running was pretty quick and painless.
However, I've found that after inserting objects into the document store, they are not always there when the subsequent page refreshes. When I refresh the page they do show up. I read somewhere that this is due to stale indexes.
My question is, how are you supposed to use this in production on a site with inserts happening all the time (example: e-commerce). Isn't this always going to result in stale indexes and unreliable query results?
Think of what actually happens with a traditional database like SQL Server.
When an item is created, updated, or deleted from a table, any indexes associated with table also have to be updated.
The more indexes you have on a table, the slower your write operations will be.
If you create a new index on an existing table, it isn't used at all until it is fully built. If no other index can answer a query, then a slow table scan occurs.
If others attempt to query from an existing index while it is being modified, the reader will block until the modification is complete, because of the requirement for Consistency being higher priority than Availability.
This can often lead to slow reads, timeouts, and deadlocks.
The NoSQL concept of "Eventual Consistency" is designed to alleviate these concerns. It is optimized reads by prioritizing Availability higher than Consistency. RavenDB is not unique in this regard, but it is somewhat special in that it still has the ability to be consistent. If you are retrieving single document, such as reviewing an order or an end user viewing their profile, these operations are ACID compliant, and are not affected by the "eventual consistency" design.
To understand "eventual consistency", think about a typical user looking at a list of products on your web site. At the same time, the sales staff of your company is modifying the catalog, adding new products, changing prices, etc. One could argue that it's probably not super important that the list be fully consistent with these changes. After all, a user visiting the site a couple of seconds earlier would have received data without the changes anyway. The most important thing is to deliver product results quickly. Blocking the query because a write was in progress would mean a slower response time to the customer, and thus a poorer experience on your web site, and perhaps a lost sale.
So, in RavenDB:
Writes occur against the document store.
Single Load operations go directly to the document store.
Queries occur against the index store
As documents are being written, data is being copied from the document store to the index store, for those indexes that are already defined.
At any time you query an index, you will get whatever is already in that index, regardless of the state of the copying that's going on in the background. This is why sometimes indexes are "stale".
If you query without specifying an index, and Raven needs a new index to answer your query, it will start building an index on the fly and return you some of those results right away. It only blocks long enough to give you one page of results. It then continues building the index in the background so next time you query you will have more data available.
So now lets give an example that shows the down side to this approach.
A sales person goes to a "products list" page that is sorted alphabetically.
On the first page, they see that "Apples" aren't currently being sold.
So they click "add product", and go to a new page where they enter "Apples".
They are then returned to the "products list" page and they still don't see any Apples because the index is stale. WTF - right?
Addressing this problem requires the understanding that not all viewers of data should be considered equal. That particular sales person might demand to see the newly added product, but a customer isn't going to know or care about it with the same level of urgency.
So on the "products list" page that the sales person is viewing, you might do something like:
var results = session.Query<Product>()
.Customize(x => x.WaitForNonStaleResultsAsOfLastWrite())
.OrderBy(x=> x.Name)
.Skip((pageNumber-1) * pageSize).Take(pageSize);
While on the customer's view of the catalog, you would not want to add that customization line.
If you wanted to get super precise, you could use a slightly more optimized strategy:
When going back from the "add product" page to the "list products" page, pass along the ProductID that was just added.
Just before you query on that page, if the ProductID was passed in then change your query code to:
var product = session.Load(productId);
var etag = session.Advanced.GetEtagFor(product);
var results = session.Query<Product>()
.Customize(x => x.WaitForNonStaleResultsAsOf(etag))
.OrderBy(x=> x.Name)
.Skip((pageNumber-1) * pageSize).Take(pageSize);
This will ensure that you only wait as long as absolutely necessary to get just that one product's changes included in the results list along with the other results from the index.
You could optimize this slightly by passing the etag back instead of the ProductId, but that might be less reusable from other places in your application.
But do keep in mind that if the list is sorted alphabetically, and we added "Plums" instead of "Apples", then you might not have seen these results instantly anyway. By the time the user had skipped to the page that includes that product, it would likely have been there already.
You are running into stale queries.
That is a by design part of RavenDB. You need to make distinction between queries (BASE) and loading by id (ACID).
I am working on asp.net mvc application and it provides the functionality of reading from ORACLE database using DATAREADER and present those rows to the user (sometimes up to 10 mil). The datareader read operation throws out of memory exception after reading about 900,000 rows.
I was discussing this issue with my colleague and he suggested that I should use connectionless paradigm (may be Entity framework) or stored procedure and bring data in chunks.
I wonder if there is someone out there who can authoritatively say which is the best way to accomplish above issue.
Don’t retrieve all the rows to memory and perform the paging
•Not all the users visits 2nd page
•So your data in memory will be unused
If you are having more records use SQL side paging, you can use Row_number() function to perform paging in SQL side.
You can also use ORM frameworks to access the data, they always provides best approaches to perform data related operations.
I prefer to use Peta Poco, it has a method to retrieve page wise data.
var result=db.Page<article>(1, 20, // <-- page number and items per page
"SELECT * FROM articles WHERE category=#0 ORDER BY date_posted DESC", "coolstuff");
http://www.toptensoftware.com/petapoco/
I've created a WebApi project in VS 2012, using NHibernate as my ORM and I intend to enable Odata support on it. So I've created a test controller with a single Get method that returns a list of entities from a table on my database.
Everything works fine, I can use OData to filter and order my results, etc. The problem is I couldn't find a way to limit the amount of data that's being returned from the database to the controller, and this table has millions of records in it.
Using the PageSize property of the Queryable attribute only seems to be limiting the amount of data returned to the client, but no the amount of Data returned from the DB.
I've tried applying a Take(n) on the IQueryable inside the get method before returning it, and it limits the results brought back from the DB, but it breaks the OData filtering, since if you try to query an entity that's not in the first n results, it just returns an empty collection.
I know you can use the $Top parameter on OData to accomplish this, but I would like not to depend on the client/consumer providing it in order to ensure that I'm not unnecessarily bringing thousands or even million of records that I'm not going to use.
I've also tried to manually check if the client provided a Top parameter on the query string, apply the OData transformation to my Queryable and then applying the Take(n) method over the transformed query. This approach enabled me to filter for any entity through OData, but it breaks pagination, because if I use the $Skip=n parameter, it again returns an empty collection.
So, is there any way to reliably limit the results fetched from the DB while not breaking the OData support?
We recently found that too. We are not applying a Take(pageSize) when server driven paging is enabled as we have to figure out if a next page link should be generated or not. We just enumerate the result set for pageSize number of entities and check if there are more entities or not. We thought that most providers generally bring a partial set of results as IQueryable is generally a lazy implementation. Turns out that is not true. Also, the database can optimize the query if it knows only pageSize number of results are required.
This is the issue that was opened for it. Good news is Youssef fixed it already :). This is the commit that fixed it. So, if you grab the nightly builds you should be good.
I'm trying to build a (simple) twitter-clone which uses CouchDB as Database-Backend.
Because of its reduced feature set, I'm almost finished with coding, but there's one thing left I can't solve with CouchDB - the per user timeline.
As with twitter, the per user timeline should show the tweets of all people I'm following, in a chronological order. With SQL it's a quite simple Select-Statement, but I don't know how to reproduce this with CouchDBs Map/Reduce.
Here's the SQL-Statement I would use with an RDBMS:
SELECT * FROM tweets WHERE user_id IN [1,5,20,33,...] ORDER BY created_at DESC;
CouchDB schema details
user-schema:
{
_id:xxxxxxx,
_rev:yyyyyy,
"type":"user",
"user_id":1,
"username":"john",
...
}
tweet-schema:
{
"_id":"xxxx",
"_rev":"yyyy",
"type":"tweet",
"text":"Sample Text",
"user_id":1,
...
"created_at":"2011-10-17 10:21:36 +000"
}
With view collations it's quite simple to query CouchDB for a list of "all tweets with user_id = 1 ordered chronologically".
But how do I retrieve a list of "all tweets which belongs to the users with the ID 1,2,3,... ordered chronologically"? Do I need another schema for my application?
The best way of doing this would be to save the created_at as a timestamp and then create a view, and map all tweets to the user_id:
function(doc){
if(doc.type == 'tweet'){
emit(doc.user_id, doc);
}
}
Then query the view with the user id's as keys, and in your application sort them however you want(most have a sort method for arrays).
Edited one last time - Was trying to make it all in couchDB... see revisions :)
Is that a CouchDB-only app? Or do you use something in between for additional buisness logic. In the latter case, you could achieve this by running multiple queries.
This might include merging different views. Another approach would be to add a list of "private readers" for each tweet. It allows user-specific (partial) views, but also introduces the complexity of adding the list of readers for each new tweet, or even updating the list in case of new followers or unfollow operations.
It's important to think of possible operations and their frequencies. So when you're mostly generating lists of tweets, it's better to shift the complexity into the way how to integrate the reader information into your documents (i.e. integrating the readers into your tweet doc) and then easily build efficient view indices.
If you have many changes to your data, it's better to design your database not to update too many existing documents at the same time. Instead, try to add data by adding new documents and aggregate via complex views.
But you have shown an edge case where the simple (1-dimensional) list-based index is not enough. You'd actually need secondary indices to filter by time and user-ids (given that fact that you also need partial ranges for both). But this not possible in CouchDB, so you need to work around by shifting "query" data into your docs and use them when building the view.
I am implementing a complex search module with result page support paging. Most of examples provided just passes pagenumber as a parameter for the Index action, and the action uses the pagenumber to perform a query each time the user hit a different page number.
My problem is that my search take many many more criteria(more than 10 criterias) than just simple pagenumber. Therefore, I would like to preserve either search criteria or search result data after users' first submission, so that I only have to pass the pagenumber back and forth.
So I don't know which way is better: preserve search criterias, so every time when click to new page, call controller action do search again? Or preserve search result data, so application don't need query database again and again, but the data been preserved will big. If you have any idea, how to implement? Thanks in advence.
Preserving the search criteria in the querystring is generally best. It will allow users to bookmark the search.
Preserving search result data brings up issues of potential stale data and consumes more resources server-side. This wouldn't work well with large data sets anyway, as you would only be selecting one page at a time, so caching in memory wouldn't help much when the user navigates to the next page.
/I'd suggest you generate a unique key for each search, and store an object that contains all the search criteria in memory, or DB, with that unique key. then pass the unique key on the querystring./
So your means, save search criteria with unique key in DB, anytime I need THE search result again(include change page index), get unique key from querystring, run query again. That your suggestion, right? Thank you very much for your advice. Very help.