I may need to process a large number of database records, stored locally on an iPad, using Swift. I've yet to choose the database (but likely sqlite unless suggestions point to other). The records could number up to 700,000 and would need to be processed by way of adding up totals, working out percentages etc.
Is this even possible on an iPad with limited processing power? Is there a software storage limit? Is the iPad up to processing such large amounts of data on the fly?
Another option may be to split the data into smaller chunks or around 30,000 records and work with that. Even then I am not sure its a practical thing to attempt.
Any advice on how, or if, to approach this and what limitations may apply?
Related
I am creating an SSRS report that returns data for several "Units", which are all to be displayed on a row, with Unit 1 first, to its right Unit 2 data, etc.
I can either get all this data using a Stored Proc that queries the database using an "IN" clause, or with multiple targeted ("Unit = Bla") queries.
So I'm thinking I can either filter each "Unit" segment with something like "=UNIT:[Unit1]" OR I can assign a different Dataset to each segment (witht the targeted data).
Which way would be more "performant" - getting a big chunk of data, and then filtering the same thing in various locations, or getting several instances/datasets of targeted data?
My guess is the latter, but I don't know if maybe SSRS is smart enough to make the former approach work just as well or better by doing some optimizing "behind the scenes"
I think it really depends on how big the big chunk of data is. My experience has been that SSRS can process quite a large amount of data after it comes back from the database, and it does it quickly. If the report is going to aggregate the data in the end, I try to do as much of that as I can on the database end. The reason, usually the database server has more resources to do all that work. But, if the detail is needed, and you can aggregate on the report server end easily enough, pull 10K records and do it to it.
I lean toward hitting the database as few times as possible, but sometimes it just makes sense to get the data I need with individual queries. I have built reports with over 20 datasets, each for very specific measures that just didn’t union up really well. Breaking it up like this took the report run time from 3 minutes, to 20 seconds.
Not a great answer if you were looking for which exact solution to go with. It depends on the situation. Often, trial and error gets you to the answer for the report in question.
SSRS is not going to do any "optimizing" and the rendering requirements sound trivial, so you should probably consider this as SQL query issue, not really SSRS.
I would expect the single SELECT with an IN clause will be faster, as it will require fewer I/Os on the database files. An SP is not required, you can just write a SELECT statement.
A further benefit is that you will be left with N-times less code to maintain (where N = the number of Units), and can guarantee the consistency of the code/logic across Units.
I want to query an ActiveRecord model, modify it, and calculate the size of the new object in mb. How do I do this?
The size of data rows in a database as well as the object size of ruby objects in memory are both not readily available unfortunately.
While it is a bit easier to get a feeling for the object size in memory, you would still have to find all objects which take part of your active record object and thus should be counted (which is not obvious). Even then, you would have to deal with non-obvious things like shared/cached data, class overhead, which might be required to count, but doesn't have to.
On the database side, it heavily depends on the storage engine used. From the documentation of your database, you can normally deduct the storage requirements for each of the columns you defined in your table (which might vary in case of VARCHAR, TEXT, or BLOB columns. On top of this come shared resources like indexes, general table overhead, ... To get an estimate, the documented size requirements for the various columns in your table should be sufficient though
Generally, it is really hard to get a correct size for complex things like database rows or in-memory objects. The systems are not build to collect or provide this information.
Unless you absolutely positively need to get an exact data, you should err on the side of too much space. Generally, for databases it doesn't hurt to have too much disk space (in which case, the database will generally run a little faster) or too much memory (which will reduce memory pressure for Ruby which will again make it faster).
Often, the memory usage of Ruby processes will be not obvious. Thus, the best course of action is almost always to write your program and then test it with the desired amount of real data and check its performance and memory requirements. That way, you get the actual information you need, namely: how much memory does my program need when handling my required dataset.
The size of the record will be totally dependent on your database, which is independent of your Ruby on Rails application. It's going to be a challenge to figure out how to get the size, as you need to ask the DATABASE how big it is, and Rails (by design) shields you very much from the actual implementation details of your DB.
If you need to know the storage to estimate how big of a hard disk to buy, I'd do some basic math like estimate size in memory, then multiply by 1.5 to give yourself some room.
If you REALLY need to know how much room it is, try recording how much free space you have on disk, write a few thousand records, then measure again, then do the math.
Most trading applications receive datafeed from commerical providers such as IQFeed or brokerages that support trading API. Is there merit in storing it in the local database? Intraday datafeed is just massive in size, and the database would grow exponentially with 1 minute data for just 50 stocks, never mind tick-by-tick data. I suspect this would be a nightmare for database backup and may impact performance.
If you get historical data in text files on DVD or online, then storing it in the database is the only logical choice, but would it be still a good idea if you get it through API?
Its all about storage space really. You can definitely do it through API, but make sure you don't do it using the same application that is doing the automated trading for you.
As you said Tick Data is pretty much out of question, for a 1 minute data that would mean approximately 400 bars/day and 20000 bars for 50 symbols.
The calculation space can be calculated based on that, if you are storing OLHC it can be achieved with four values of type Int.
As the other answer pointed out, performance may be an issue with more and more symbols but shouldn't be a problem with 50 symbols on 1 minute bars.
This is a performance question. If the API is fast enough, then use that. If it's not and caching will help, then cache it. Only your application and your usage patterns can determine how much truth and necessity apply to these statements.
I have a spreadsheet, approximately 1500 rows x 1500 columns. The labels along the top and side are the same, and the data in the cells is a quantified similarity score for the two inputs. I'd like to make a Rails app allowing a user to enter the row and column values and retrieve the similarity score. The similarity scores were derived empirically, and can't be mathematically produced by the controller.
Some considerations: with every cell full, over half of the data is redundant; e.g., (row 34, column 985) holds the same value as (row 985, column 34). And row x will always be perfectly similar to column x. The data is static, and won't change for years.
Can this be done with one db table? Is there a better way? Can I skip the relational db entirely and somehow query the file directly?
All assistance and advice is much appreciated!
Database is always a safe place to store it. Relational Database is straightforward and a good idea. However there are alternatives to consider. How often will this data be accessed? Is it accessed rarely or very frequently? If it's accessed very rarely, just put it in the database and let your code take care of searching and presenting. You'll optimize it by database indexes etc.
Flat-File is a good idea but reading and searching it at run time for every request is going to be too slow.
You could read all the data (from db/file) at server startup, and keep it in memory and ensure that your servers dont restart too often. It means each one of your servers is going to sit with the entire grid in memory but computation is going to be really fast. If you use REE and calibrate the Garbage Collection settings, you can also minimize the startup time of the server to a large extent.
Here's my final suggestion. Just build your app in the simplest way you know. Once you know how often and how much your app is going to be used, you start optimizing. You are fundamentally working with 1125000 cells. This is not unreasonably large dataset for a database to process. But since your dataset will not change, you can go far by conventional caching techniques.
I have a website backed by a relational database comprised of the usual e-commerce related tables (Order, OrderItem, ShoppingCart, CreditCard, Payment, Customer, Address, etc...).
The stored proc. which returns order history is painfully slow due to the amount of data + the numerous joins which must occur, and depending on the search parameters it sometimes times out (despite the indexing that is in place).
The DB schema is pretty well normalized and I believe I can achieve better performance by moving toward something like a data warehouse. DW projects aren't trivial and then there's the issue of keeping the data in sync so I was wondering if anyone knows of a shortcut. Perhaps an out-of the box solution that will create the DW schema and keep the data in sync (via triggers perhaps). I've heard of Lucene but it seems geared more toward text searches and document management. Does anyone have other suggestions?
How big is your database?
There's not really any shortcuts, but dimensional modelling is really NOT that hard. You first determine a grain and then need to identify your facts and the dimensions associated with the facts. Then you divide the dimensions into tables which allow you to have the dimensions only grow slowly over time. The choice of dimensions is completely practical and based on the data behavior.
I recommend you have a look at Kimball's books.
For a database of a few GB, it's certainly possible to update a reporting database from scratch several times a day (no history, just repopulating from a 3NF for a different model of the same data). There are certain realtime data warehousing techniques which just apply changes continuously throughout the day.
So while DW projects might not be trivial, the denormalization techniques are very approachable and usable without necessarily building a complete time-invariant data warehouse.
Materialized Views are what you might use in Oracle. They give you the "keeping the data in sync" feature you are looking for combined with fast access of aggregate data. Since you didn't mention any specifics (platform, server specs, number of rows, number of hits/second, etc) of your platform, I can't really help much more than that.
Of course, we are assuming you've already checked that all your SQL is written properly and optimally, that your indexing is correct, that you are properly using caching in all levels of your app, that your DB server has enough RAM, fast hard drives, etc.
Also, have you considered denormalizing your schema, just enough to serve up your most common queries faster? that's better than implementing an entire data warehouse, which might not even be what you want anyway. Usually a data warehouse is for reporting purposes, not for serving interactive apps.