I want to store data of all page views on my site in InfluxDB and later I will need to get some analytics info about it. For example, "How many views does post X have?"
The problem is, as I understand it, if I have table page_views with tag(url) that is highly dynamic then I am going to have memory problems.
Right now I store this data in MySQL, but I am starting to face some problems as sometimes I need to store a 1000+ data requests per second.
So is InfluxDB the right tool for this kind of job? Is there some smarter way of storing each page view without tagging the url?
Related
How much data can a column of mnesia can store.Is there any limit on it or we can store as much as we want.Any pointer?(If table is disc_only_copy)
As with any potentially large data set (in terms of total entries, not total volume of bytes) the real question isn't how much you can cram into a single table, but how you want to partition the data and how unified or distinct those partitions should appear to the system.
In the context of a chat system, for example, you may want to be able to save the chat history forever, which is a reasonable goal. But you may not want all chat entries to be in the same table forever and ever (10 years? how long? who knows!) right next to chat entries made yesterday. You may also discover as time moves on that storing every chat message in a single table to be a painfully naive decision to overcome later on down the road.
So this brings up the issue of partitioning. How do you want to do it? (Staying within the context of a chat system, but easily transferrable to another problem...) By time? By channel? By user? By time and channel?
How do you want to locate the data later? This brings up obvious answers that are the same as above: By time? By channel? By user? By time and channel?
This issue exists whether you're dealing with Mnesia or with Postgres -- or any database -- when you're contemplating the storage of lots of entries. So think about your problem in the context of how you want to partition the data.
The second issue is the volume of the data in bytes, and the most natural representation of that data. Considering basic chat data, its not that hard to imagine simply plugging everything into the database. But if its a chat system that can have large files attached within a message, I would probably want to have those files stored as what they are (files) somewhere in a system made for that (like a file system!) and store only a reference to it in the database. If I were creating a movie archive I would certainly feel comfortable using Mnesia to store titles, actors, years, and a pointer (URL or file system path) to the movie, but I wouldn't dream of storing movie file data in my database, even if I was using Postgres (which can actually stand up to that sort of abuse... but think about new awkwardness of database dumps, backups and massive bottleneck introduced in the form of everyone's download/upload speed being whatever the core service's bandwidth to the database backend is!).
In addition to these issues, you want to think about how the data backend will interface with the rest of the system. What is the API you wish you could use? Write it now and think it through to see if its silly. Once it seems perfect, go back through critically and toss out any elements you don't have an immediate need to actually use right now.
So, that gives us:
Partition scheme
Context of future queries
Volume of data in bytes
Natural state of the different elements of data you want to store
Interface to the overall system you wish you could use
When you start wondering how much data you can put into a database these are the questions you have to start asking yourself.
Now that all that's been written, here is a question that discusses Mnesia in terms of entries, bytes, and how many bytes different types of entries might represent: What is the storage capacity of a Mnesia database?
Mnesia started as an in-memory database. It means that it is not designed to store large amount of data. When you ask yourself this question, it means you should look at another ejabberd backend.
I am currently creating an iOS app, which connects to a database and asynchronously downloads a JSON object of data to display in a table view.
As it currently stands, this is an ok way to do it. However, when the database starts getting much larger, this will cause a massive inconvenience. I'm reasonably proficient in Objective-C but not so much in the database side of things. What would be the best way to get this data from the server, and keep it in the app? At the moment, I have a custom class object storing the data for each of the 'objects' in the JSON object. There will however be many other aspects of the app that the database will handle, such as invites, logins and user details.
Would core data be the way to go? I.e duplicating the database (to a certain extent) and storing it locally, then accessing from there. As I said, i'm not really sure which route to take here, so any advice would be real appreciated.
Core location is for handling location (satellite (and wifi) positionning).
I guess you mean Core Data. Core Data is a graph object model which allows you to manipulate data as objects. You don't dig directly into the database, you ask for objects instanciation through predicates (kind of where clause in SQL) and the manipulate the objects.
This stated, it all depends on what is a "big" database. If it's really big you could consider copying locally a part of it and ask for what's remaining from the server through your webservice.
Another question that you could ask yourself is the quantity of data that never change and if your website database and your app database needs to get synchronized (if your website database is always changing then it would be dumb to copy it in your app totally and always synced your app..).
Links :
Introduction to Core Data
Difference between Core Data and a Database (Cocoa With Love)
edit :
A question you can ask yourself is where your data needs to be saved ?
if your app is just for printing 20 cells out of a total of 200 cells then i would go for a total download of your 200 cells. The load of the other cells will be with no delay after first download, especially appreciated if you're using table view cells with reusable cells
is a delay of some seconds acceptable between the 20 first cells and the 20 following ? I think there is no real "good" answer to your question, it depends on many factors (purpose of your app, acceptable time between loads, does the info needs to be modified and saved back to server or locally, what kind of customers, what your app will do with the cells, if you have a database locally will it be totally independant from "mother" database (if no, what kind of synchronization), etc.)
Trying to sum up things according to what I've understood of your needs, I would say that webservices is good if you just need to retrieve info and exploiting it after without saving it back (even if you can do it actually having services allowing you to do it), having a database locally is good if you need your app to be independant from your server in some ways.
Only you has the key to answer all this and take a decision according to your needs and your knowledge of your application and your customers.
Something like JSON or SOAP is the way to go with getting structured data from a web service into objects in your iPhone app.
Storing relational data on the iPhone itself is easy with SQLite. Here's a decent looking tutorial.
Make things easy for yourself by writing a data layer, abstracting away calls to the database, to avoid dotting SQL queries all over your code in places it shouldn't be, like the UI.
I have a project which provides users with a list of current tasks that need to be completed. Any user can complete any task, and so to ensure that only one user is working on a task at a time I need to be able to 'lock' it. I'm using SignalR for this, so a user requests a lock on a task, and if they are successful (ie. if noone else has locked it) then they will be able to access the further information that they need.
My problem is how to store the list of locked tasks. The original plan was simply to add an additional bit field 'IsLocked' to the Task table and update this when the user requested a lock and when the task was unlocked. We have about 300 concurrent users, however, and a task takes only about 3-4 minutes, meaning huge numbers of additional - and tiny - queries on the database. Therefore we were wondering about in-memory storage, simply storing a list of task ids in a 'lockedTasks' list.
I had considered using caching, but am unsure on the best ways to do this, or even if better alternatives exist. If anyone has any experience in this then some advice would be great thanks
I would avoid memory completely as IIS is not that great with it, if you found your self in the IIS need for refreshing the Application Pool for some sort of reason, your list is simply gone!
Maybe a MemCache system? If it does not loose things in the above way, but...
I would advice to be in the middle, IO File is fast that request data to a Database, specially if it's not in the same machine (witch for security reasons, it should never be), so... why not, and just to hold your list, you don't use one of the currently famous NoSQL database?
MongoDB is a document database that has a .NET Library and it's easy to use, it is not as fast as Memmory, but extremely quicker than Physical databases for what you want.
Normally the NoSQL Database will be hosted in the App_Data folder so it will be extremely fast to access and you can just hold there the task_id and user_id of all locked tasks.
Have you considered stateful filters?
Check out this links for more info:
ASP.NET MVC Filters and Statefulness
Brad Wilson: Advanced MVC
3 - (Video)
Brad Wilson: Advanced MVC 3 - (PDF)
I'm sorry, but if your app can't handle a single query every 3-4 minutes x 300 users, then you're doing something very wrong. Just browsing a site typically generates orders of magnitude more queries than that.
I am running an ASP.NET MVC 3 web application and would like to gather statistics such as:
How often is a specific product viewed
Which search phrases typically return specific products in their result list
How often (for specific products) does a search result convert to a view
I would like to aggregate this data and break it down:
By product
By product by week
etc.
I'm wondering what are the cleanest and most efficient strategies for aggregating the data. I can think of a couple but I'm sure there are many more:
Insert the data into a staging table, then run a job to aggregate the data and push it into permanent tables.
Use a queuing system (MSMQ/Rhino/etc.) and create a service to aggregate this data before it ever gets pushed to the database.
My concerns are:
I would like to limit the number of moving parts.
I would like to reduce impact on the database. The fewer round trips and less extraneous data stored the better
In certain scenarios (not listed) I would like the data to be somewhat close to real-time (accurate to the hour may be appropriate)
Does anyone have real world experience with this and if so which approach would you suggest and what are the positives and negatives? If there is a better solution that I am not thinking of I'd love ot hear it...
Thanks
JP
I needed to do something similar in a recent project. We've implemented a full audit system in a secondary database, it tracks changes on every record on the live db. Essentially every insert, update and delete actually updates 2 records, one in the live db and one in the audit db.
Since we have this data in realtime on the audit db, we use this second database to fill any reports we might need. One of the tricks I've found when working with a reporting DB is to forget about normalisation. Just create a table for each report you want, and have it carry just the data you want for that report. Its duplicating data, but the performance gains are worth it.
As to filling the actual data in the reports, we use a mixture. Daily reports are generated by a scheduled task at around 3am, ditto for the weekly and monthly reports, normally over weekends or late at night.
Other reports are generated on demand, using mostly the data since the last daily, so its not that many records, once again all from the secondary database.
I agree that you should create a separate database for your statistics, it will reduce the impact on your database.
You can go with your idea of having "Staging" tables and "Aggregate" tables; that way, if you want to access the near-real-time data you go o the staging table, when you want to historical data, you go to the aggregates.
Finally, I would recommend you use an asynchronous call to save your statistics; that way your pages will not have an impact in response time.
I suggest that you will create a separate database for this. The best way is to use BI technique. There is a separate services in
SQL server for Bi.
Is there a plugin for this or a gem that I can use. I was thinking about just writing it to a table when a view was called in the controller. Is this the best way? I see stackoverflow has this functionality how do they do it?
Google Analytics - Let Google or some other third-party analytics provider handle it for you for free. I don't think you want to do file writes on every page load - potentially costly. Another option is to store the information in memory and write to the database periodically instead of on every page load.
[EDIT] This is an interesting question. I asked for help on this issue of what's more efficient - db writes vs file writes - there's some good feedback there too.
If you just wanted to get something in there easily you could use a real time analytics provder like W3 Counter
It gives you real time data (as opposed to Google Analytics) and is relatively simple to deploy (a few lines in your global template) but may not give you the granularity that you want. I guess it depends on if you are wanting this information programmatically to display/use in the app or for statistical purposes.
Obviously, there are third party statistics services (Google Analytics, Mint, etc...), but if you must do it yourself then doing a write each time someone hits a page will seriously impact your DB.
I'd write individual hits to an intermediate file on the filesystem or memcached, then fire a task every 10 - 15 minutes that will parse that data and insert it into the database.