I have program with a table in SQLite with about 100 rows and 6 colunms of text (not more than hundred of characters). Each time click a button, the program will display in the view a contents of a row in the table.
My question is that should I copy the content of whole table into an array and then reading from array each time user clicks button or I access the table in database each time the user click the button? Which one is more efficient?
It all depends, but retrieving from the database as you need data (rather than storing the whole thing in an array) would generally be most efficient use of memory, which is a pretty precious resource. My most extravagant use of memory would be to store an array of table unique identifiers (i.e. a list of primary keys, returned in the correct order), that way I'm not scanning through the database every time, but my unique identifiers are always numeric, so it doesn't use up too much memory. So, for something of this size, I generally:
open the database;
load an array of solely the table's unique identifiers (which were returned in the right order for my tableview using the SQL ORDER BY clause);
as table cells are built, I'll go back to the database and get the few fields I need for that one row corresponding to the unique identifier that I've kept track of for that one table row;
when I go to the details view, I'll again get that one row from the database; and
when I'm all done, I'll close the database
This yields good performance while not imposing too much of a drain on memory. If the table was different (much larger or much smaller) I might suggest different approaches, but this seems reasonable to me given your description of your data.
for me - it was much easier to re-read the database for each view load, and drop the contents when done.
the overhead of keeping the contents in memory was just too much to offset the quick read of a small dataset.
100 row and 6 columns is not a lot of data. iOS is capable of handling larger data than that and very efficiently. So don't worry about creating a new array. Reading from the database should work just fine.
Related
I'm wondering about something that doesn't seem efficient to me.
I have 2 tables, one very large table DATA (millions of rows and hundreds of cols), with an id as primary key.
I then have another table, NEW_COL, with variable rows (1 to millions) but alwas 2 cols : id, and new_col_name.
I want to update the first table, adding the new_data to it.
Of course, i know how to do it with a proc sql/left join, or a data step/merge.
Yet, it seems inefficient, as far as I see with time executing, (which may be wrong), these 2 ways of doing rewrite the huge table completly, even when NEW_DATA is only 1 row (almost 1 min).
I tried doing 2 sql, with alter table add column then update, but it's waaaaaaaay too slow as update with joining doesn't seem efficient at all.
So, is there an efficient way to "add a column" to an existing table WITHOUT rewriting this huge table ?
Thanks!
SAS datasets are row stores and not columnar stores like tables in other databases. As such, adding rows is far easier and efficient than adding columns. A key joined view could be argued as the most 'efficient' way to add a column to a data rectangle.
If you are adding columns so often that the 1 min resource incursion is a problem you may need to upgrade hardware with faster drives, less contentious operating environment, or more memory and SASFILE if the new columns are often yet temporary in nature.
#Richard answer is perfect. If you are adding columns on regular basis then there is problem with your design. You either need to give more details on what you are doing and someone can suggest you.
I would try hash join. you can find code for simple hash join. This is efficient way of joining because in your case you have one large table and one small table if it fit into memory, it much better than a left join. I have done various joins using and query run times was considerably less( to order of 10)
By Altering table approach you are rewriting the table and also it causes lock on your table and nobody can use the table.
You should perform this joins when workload is less, which means during not during office and you may need to schedule the jobs in night, when more SAS resources are available
Thanks for your answers guys.
To add information, i don't have any constraint about table locking, balance load or anything as it's a "projet tool" script I use.
The goal is, in data prep step 'starting point data generator', to recompute an already existing data, or add a new one (less often but still quite regularly). Thus, i just don't want to "lose" time to wait for the whole table to rewrite while i only need to update one data for specific rows.
When i monitor the servor, the computation of the data and the joining step are very fast. But when I want tu update only 1 row, i see the whole table rewriting. Seems a waste of ressource to me.
But it seems it's a mandatory step, so can't do much about it.
Too bad.
I have a lot of analytics data which I'm looking to aggregate every so often (let's say one minute.) The data is being sent to a process which stores it in an ETS table, and every so often a timer sends it a message to process the table and remove old data.
The problem is that the amount of data that comes in varies wildly, and I basically need to do two things to it:
If the amount of data coming in is too big, drop the oldest data and push the new data in. This could be viewed as a fixed size queue, where if the amount of data hits the limit, the queue would start dropping things from the front as new data comes to the back.
If the queue isn't full, but the data has been sitting there for a while, automatically discard it (after a fixed timeout.)
If these two conditions are kept, I could basically assume the table has a constant size, and everything in it is newer than X.
The problem is that I haven't found an efficient way to do these two things together. I know I could use match specs to delete all entires older than X, which should be pretty fast if the index is the timestamp. Though I'm not sure if this is the best way to periodically trim the table.
The second problem is keeping the total table size under a certain limit, which I'm not really sure how to do. One solution comes to mind is to use an auto-increment field wich each insert, and when the table is being trimmed, look at the first and the last index, calculate the difference and again, use match specs to delete everything below the threshold.
Having said all this, it feels that I might be using the ETS table for something it wasn't designed to do. Is there a better way to store data like this, or am I approaching the problem correctly?
You can determine the amount of data occupied using ets:info(Tab, memory). The result is in number of words. But there is a catch. If you are storing binaries only heap binaries are included. So if you are storing mostly normal Erlang terms you can use it and with a timestamp as you described, it is a way to go. For size in bytes just multiply by erlang:system_info(wordsize).
I haven't used ETS for anything like this, but in other NoSQL DBs (DynamoDB) an easy solution is to use multiple tables: If you're keeping 24 hours of data, then keep 24 tables, one for each hour of the day. When you want to drop data, drop one whole table.
I would do the following: Create a server responsible for
receiving all the data storage messages. This messages should be time stamped by the client process (so it doesn't matter if it waits a little in the message queue). The server will then store then in the ETS, configured as ordered_set and using the timestamp, converted in an integer, as key (if the timestamps are delivered by the function erlang:now in one single VM they will be different, if you are using several nodes, then you will need to add some information such as the node name to guarantee uniqueness).
receiving a tick (using for example timer:send_interval) and then processes the message received in the last N µsec (using the Key = current time - N) and looking for ets:next(Table,Key), and continue to the last message. Finally you can discard all the messages via ets:delete_all_objects(Table). If you had to add an information such as a node name, it is still possible to use the next function (for example the keys are {TimeStamp:int(),Node:atom()} you can compare to {Time:int(),0} since a number is smaller than any atom)
I'm creating a registration app, where I make users to register themselves.
Now admin has option to fetch export all registered users to CSV/Excel file. Records could be thousands. I can't fetch them all in once in one row, how to make it fast ? Will indexing help ?
When exporting the entire table, all data must be read.
Adding an index cannot reduce the amount of data.
An index would help only if you sort the output by some column(s).
Indexing will generally help when reading or updating a subset of data in the table, such as a single row or group or rows.
For example, if you wanted to allow all the data to be exported, or even viewed, one page at a time, a primary key field with an index can be used to retrieve a subset, such as 20, rows at a time, resulting in a better "perceived" performance, rather than waiting for the entire table to be exported before viewing any of the data.
I have a UITableView with data coming from NSFetchedResultsController.
Here is my tablewView:
I need to add a row "All types". It also needs to be:
Sortable with all other items
Selectable (Design is now selected)
Selecting "All types" should deselect other rows
Give something to understand that it's an "All types" row when selected
I've read Add extra row to a UITableView managed by NSFetchedResultsController and NSFetchedResultsController prepend a row or section. Given approaches makes impossible to sort data or will look so hacky and produce so much hard-maintailable code, that it will be impossible to change logic and maintain code.
Are there any other good options?
PS. I understand, that my question may sound "broad" and doesn't containt code, but I think it's very common problem.
I do not think this is a very common problem at all. I can see it seems natural to do what you are trying but lets analyse your situation: What you generally have are 2 arrays of objects which you wish to sort as a single array. Now that is quite a common situation and I believe everyone knows how to solve this issue. You need to create a single array of objects and then sort it.
The way I see it you have 3 options:
Fetch all the items, merge the 2 arrays, sort and present them. This is not a very good idea since your memory consumption can be a bit too large if there are a lot of items in the database.
Put the extra data into the database and use a fetch result controller as you would normally. This should work good but you will probably need to mark these items so they are later removed or keep it in the database but ignore them where you wish not to display them.
Create a temporary database combined with what needs to be fetched from the database and your additional data. This approach is great if your data are meant for read-only in this list (which actually seems to be the case in what you posted). Still it is best if you create some kind of link between the objects. For instance some kind of ID would be great, this way when user selects an object from the second database you simply read the ID and fetch the object from the original database.
I have a service that generates a large map through multiple iterations and calculations from multiple tables. My problem is I cannot use pagination offset to slice the data because the data is coming from multiple tables and different modifications happen on the data. To display this on the screen; I have to send the map with 10-20,000 records to the view and that is problematic with this large dataset.
At this time I have on-page pagination but this is very slow and inefficient.
One thing I thought is to dump it on a table and query it each time but then I have to deal with concurrent users.
My question is what is the best approach to display this list when I cannot use database slicing (offset, max)?
I am using
grails 1.0.3
datatables and jquery
Maybe SlickGrid! is an option for you. One of there examples works with 50000 rows and it seems to be fast.
Christian
I end up writing the result of the map in a table and use the data slicing on that table for pagination. It takes some time to save the data but at least I don't have to worry about the performance with the large data. I use time-stamp to differentiate between requests. each requests will be saved and retrieved with its time stamp.