ASP.NET MVC 3 - Web Application - Efficiently Aggregate Data - asp.net-mvc

I am running an ASP.NET MVC 3 web application and would like to gather statistics such as:
How often is a specific product viewed
Which search phrases typically return specific products in their result list
How often (for specific products) does a search result convert to a view
I would like to aggregate this data and break it down:
By product
By product by week
etc.
I'm wondering what are the cleanest and most efficient strategies for aggregating the data. I can think of a couple but I'm sure there are many more:
Insert the data into a staging table, then run a job to aggregate the data and push it into permanent tables.
Use a queuing system (MSMQ/Rhino/etc.) and create a service to aggregate this data before it ever gets pushed to the database.
My concerns are:
I would like to limit the number of moving parts.
I would like to reduce impact on the database. The fewer round trips and less extraneous data stored the better
In certain scenarios (not listed) I would like the data to be somewhat close to real-time (accurate to the hour may be appropriate)
Does anyone have real world experience with this and if so which approach would you suggest and what are the positives and negatives? If there is a better solution that I am not thinking of I'd love ot hear it...
Thanks
JP

I needed to do something similar in a recent project. We've implemented a full audit system in a secondary database, it tracks changes on every record on the live db. Essentially every insert, update and delete actually updates 2 records, one in the live db and one in the audit db.
Since we have this data in realtime on the audit db, we use this second database to fill any reports we might need. One of the tricks I've found when working with a reporting DB is to forget about normalisation. Just create a table for each report you want, and have it carry just the data you want for that report. Its duplicating data, but the performance gains are worth it.
As to filling the actual data in the reports, we use a mixture. Daily reports are generated by a scheduled task at around 3am, ditto for the weekly and monthly reports, normally over weekends or late at night.
Other reports are generated on demand, using mostly the data since the last daily, so its not that many records, once again all from the secondary database.

I agree that you should create a separate database for your statistics, it will reduce the impact on your database.
You can go with your idea of having "Staging" tables and "Aggregate" tables; that way, if you want to access the near-real-time data you go o the staging table, when you want to historical data, you go to the aggregates.
Finally, I would recommend you use an asynchronous call to save your statistics; that way your pages will not have an impact in response time.

I suggest that you will create a separate database for this. The best way is to use BI technique. There is a separate services in
SQL server for Bi.

Related

Rails saving and/or caching complicated query results

I have an application that, at its core, is a sort of data warehouse and report generator. People use it to "mine" through a large amount of data with ad-hoc queries, produce a report page with a bunch of distribution graphs, and click through those graphs to look at specific result sets of the underlying items being "mined." The problem is that the database is now many hundreds of millions of rows of data, and even with indexing, some queries can take longer than a browser is willing to wait for a response.
Ideally, at some arbitrary cutoff, I want to "offline" the user's query, and perform it in the background, save the result set to a new table, and use a job to email a link to the user which could use this as a cached result to skip directly to the browser rendering the graphs. These jobs/results could be saved for a long time in case people wanted to revisit the particular problem they were working on, or emailed to coworkers. I would be tempted to just create a PDF of the result, but it's the interactive clicking of the graphs that I'm trying to preserve here.
None of the standard Rails caching techniques really captures this idea, so maybe I just have to do this all by hand, but I wanted to check to see if I wasn't missing something that I could start with. I could create a keyed model result in the in-memory cache, but I want these results to be preserved on the order of months, and I deploy at least once a week.
Considering Data loading from lots of join tables. That's why it's taking time to load.
Also you are performing calculation/visualization tasks with the data you fetch from DB, then show on UI.
I like to recommend some of the approaches to your problem:
Minimize the number of joins/nested join DB queries
Add some direct tables/columns, ex. If you are showing counts of comments of user the you can add new column in user table to store it in user table itself. You can add scheduled job to update data or add callback to update count
also try to minimize the calculations(if any) performing on UI side
you can also use the concept of lazy loading for fetching the data in chunks
Thanks, hope this will help you to decide where to start 🙂

Is it okay if Operational data store holds data for rolling time period?

We are planning to build an Operational data store for the front-end users data extraction requirements.
As far as I know the Kimball's approach to build ODS\DW, it should hold the data for complete time period and not like the rolling time period.
The reason being, there could be a need to extract older data from ODS\DW.
So I need your thoughts on this. How should I approach ?
I would create a snapshot table that could hold the values for the rolling period for each day, and filter on the client side of things which snapshot to display.
Once the period is over then the final values can be stored on the permanent data mart.
Kimball's approach for a data warehouse would be to load transactional data to any data warehouse if you can, because it is more flexible in terms of being rolled up. Certainly at the ODS stage you wouldn't want to 'pre-aggregate' your data, if there could be a need to get hold of older data.
If you store both the transactional data and then pre-aggregated versions of the data (in aggregate fact tables, with indexes/views or with a cube, or just filtering on the report side as the other answer suggests), you can get the best of both worlds.
(Note: Kimball's approach in fact does not require an ODS: they're fine if you want to build one, but their focus is on the dimensionally modelled data warehouse.)

How to keep track of daily data?

In my rails application, I want to keep several daily metrics in order to see how this data changes over times. In other words, if I want to see how many times a user logged in on a particular date (and therefore allowing me to accumulate this data over times).
Some of this data I can figure out through queries, such as the number of posts a user made on a particular day (because the post model includes a date). However, there are many different daily metrics I want to keep track of.
I thought of creating a DataPlayers model which has data for every player and every day creating a new instance of this, but I don't think that is the best approach.
Are there best practices for this type of data collection?
You could use a gem like SqlMetrics to track events as they happen.
It stores the events in your own database so its easy to query them via sql.

ASP.NET MVC 3 in-memory data store

I have a project which provides users with a list of current tasks that need to be completed. Any user can complete any task, and so to ensure that only one user is working on a task at a time I need to be able to 'lock' it. I'm using SignalR for this, so a user requests a lock on a task, and if they are successful (ie. if noone else has locked it) then they will be able to access the further information that they need.
My problem is how to store the list of locked tasks. The original plan was simply to add an additional bit field 'IsLocked' to the Task table and update this when the user requested a lock and when the task was unlocked. We have about 300 concurrent users, however, and a task takes only about 3-4 minutes, meaning huge numbers of additional - and tiny - queries on the database. Therefore we were wondering about in-memory storage, simply storing a list of task ids in a 'lockedTasks' list.
I had considered using caching, but am unsure on the best ways to do this, or even if better alternatives exist. If anyone has any experience in this then some advice would be great thanks
I would avoid memory completely as IIS is not that great with it, if you found your self in the IIS need for refreshing the Application Pool for some sort of reason, your list is simply gone!
Maybe a MemCache system? If it does not loose things in the above way, but...
I would advice to be in the middle, IO File is fast that request data to a Database, specially if it's not in the same machine (witch for security reasons, it should never be), so... why not, and just to hold your list, you don't use one of the currently famous NoSQL database?
MongoDB is a document database that has a .NET Library and it's easy to use, it is not as fast as Memmory, but extremely quicker than Physical databases for what you want.
Normally the NoSQL Database will be hosted in the App_Data folder so it will be extremely fast to access and you can just hold there the task_id and user_id of all locked tasks.
Have you considered stateful filters?
Check out this links for more info:
ASP.NET MVC Filters and Statefulness
Brad Wilson: Advanced MVC
3 - (Video)
Brad Wilson: Advanced MVC 3 - (PDF)
I'm sorry, but if your app can't handle a single query every 3-4 minutes x 300 users, then you're doing something very wrong. Just browsing a site typically generates orders of magnitude more queries than that.

Using statistical tables with Rails

I'm building an app that needs to store a fair amount of events that the users carry out. (Think LOTS as in millions per month).
I need to report on the these events (total of type x in the last month, etc) and need something resilient and fast.
I've toyed with Redis etc to store aggregates of the data, but this could just mean that I'm building up a massive store of single figure aggregates that aren't rebuildable.
Whilst this isn't a bad solution, I'm looking at storing the raw event data in tables that I can then query on a needs basis, and potentially generate aggregate counters on a periodic basis. This would thus give me the ability to add counters over time, and also carry out ad-hoc inspections on what is going on, something which aggregates don't allow.
Question is, how is best to do this? I obviously don't want to have to create a model for each table (which is what Rails would prefer), so do I just create the tables and interact with raw SQL on a needs basis, or is there some other choice for dealing with this sort of data?
I've worked on an app that had that type of data flow and the solution is the following :
-> store everything
-> create aggregates
-> delete everything after a short period (1 week or somehting) to free up resources
So you can simply store events with rails, have some background aggregate creation from another fast script (cron sql), read with rails the aggregates and yet another background script for raw event deletion.
Also .. rails and performance don't quite go hand in hand usually ;)

Resources