We experience intermittent, seemingly random brownouts of a firebase realtime database. We are beginning to shard our data into multiple databases, however, we are not sure this will solve our problem. It appears to us that firebase cannot scale to meet our needs in terms of doing frequent writes to a specific data set.
We sync data from a third-party data source in cycles (every 4-10 minutes, 1000 active jobs). Each update has the potential to change a few thousand nodes in firebase, most of which lie pretty low. However, most of the time the number of low-level nodes changed is much lower. We do differential updates on the sync'd data in order to allow very small writes to the lower-level nodes. This helps prevent our users from downloading a ton of additional data. We also batch all of our updates per cycle into only a handful of writes, between 10-20 (not sure of the performance impact of a batched write to multiple nodes vs. a write to a single node).
Here is an image of the database load graph, which includes some sharding:
Database Load
The "blue" line is our "main" database. The "orange line" is a database containing only the data that requires many writes, as described above. Currently, the main (blue) database is supporting normal operations, including reads/writes, etc.. The shard (orange) database is only handling writes. The mirror of these is pretty indicative of a "write" load issue, given that a large percentage of writes occurs in the morning.
At times, the database load reaches 100% and remains in this state for 30+ minutes.
Please let me know if I can expand on anything or explain anything in more detail. Would appreciate any suggestions on debugging strategies or explanations as to why this may be occurring.
We are actively refactoring a lot of code to mitigate this issue, however, it is not obvious what the main driver is.
I need to insert 800000 records into an MS Access table. I am using Delphi 2007 and the TAdoXxxx components. The table contains some integer fields, one float field and one text field with only one character. There is a primary key on one of the integer fields (which is not autoinc) and two indexes on another integer and the float field.
Inserting the data using AdoTable.AppendRecord(...) takes > 10 Minutes which is not acceptable since this is done every time the user starts using a new database with the program. I cannot prefill the table because the data comes from another database (which is not accessible through ADO).
I managed to get down to around 1 minute by writing the records to a tab separated text file and using a tAdoCommand object to execute
insert into table (...) select * from [filename.txt] in "c:\somedir" "Text;HDR=Yes"
But I don't like the overhead of this.
There must be a better way, I think.
EDIT:
Some additional information:
MS Access was chosen because it does not need any additional installation on the target machine(s) and the whole database is contained in one file which can be easily copied.
This is a single user application.
The data will be inserted only once and will not change for the lifetime of the database. Though, the table contains one additional field that is used as a flag to indicate that the corresponding record in another database has been processed by the user.
One minute is acceptable (up to 3 minutes would be too) and my solution works, but it seems too complicated to me, so I thought there should be an easier way to do this.
Once the data has been inserted, the performance of the table is quite good.
When I started planning/implementing the feature of the program working with the Access database the table was not required. It only became necessary later on, when another feature was requested by the customer. (Isn't that always the case?)
EDIT:
From all the answers I got so far, it seems that I already got the fastest method for inserting that much data into an Access table. Thanks to everybody, I appreciate your help.
Since you've said that the 800K records data won't change for the life of the database, I'd suggest linking to the text file as a table, and skip the insert altogether.
If you insist on pulling it into the database, then 800,000 records in 1 minute is over 13,000 / second. I don't think you're gonna beat that in MS Access.
If you want it to be more responsive for the user, then you might want to consider loading some minimal set of data, and setting up a background thread to load the rest while they work.
It would be quicker without the indexes. Can you add them after the import?
There are a number of suggestions that may be of interest in this thread Slow MSAccess disk writing
What about skipping the text file and using ODBC or OLEDB to import directly from the source table? That would mean altering your FROM clause to use the source table name and an appropriate connect string as the IN '' part of the FROM clause.
EDIT:
Actually I see you say the original format is xBase, so it should be possible to use the xBase ISAM that is part of Jet instead of needing ODBC or OLEDB. That would look something like this:
INSERT INTO table (...)
SELECT *
FROM tablename IN 'c:\somedir\'[dBase 5.0;HDR=NO;IMEX=2;];
You might have to tweak that -- I just grabbed the connect string for a linked table pointing at a DBF file, so the parameters might be slightly different.
Your text based solution seems the fastest, but you can get it quicker if you could get an preallocated MS Access in a size near the end one. You can do that by filling an typical user database, closing the application (so the buffers are flushed) and doing a manual deletion of all records of that big table - but not shrinking/compacting it.
So, use that file to start the real filling - Access will not request any (or very few) additional disk space. Don't remeber if MS Access have a way to automate this, but it can help much...
How about an alternate arrangement...
Would it be an option to make a copy of an existing Access database file that has this table you need and then just delete all the other data in there besides this one large table (don't know if Access has an equivalent to something like "truncate table" in SQL server)?
I would replace MS Access with another database, and for your situation I see Sqlite is the best choice, it doesn't require any installation into client machine, and it's very fast database and one of the best embedded database solution.
You can use it in Delphi in two ways:
You can download the Database engine Dll from Sqlite website and use Free Delphi component to access it like Delphi SQLite components or SQLite4Delphi
Use DISQLite3 which have the engine built in, and you don't have to distribute the dll with your application, they have a free version ;-)
if you still need to use MS Access, try to use TAdoCommand with SQL Insert statment directly instead of using TADOTable, that should be faster than using TADOTable.Append;
You won't be importing 800,000 records in less than a minute, as someone mentioned; that's really fast already.
You can skip the annoying translate-to-text-file step however if you use the right method (DAO recordsets) for doing the inserts. See a previous question I asked and had answered on StackOverflow: MS Access: Why is ADODB.Recordset.BatchUpdate so much slower than Application.ImportXML?
Don't use INSERT INTO even with DAO; it's slow. Don't use ADO either; it's slow. But DAO + Delphi + Recordsets + instantiating the DbEngine COM object directly (instead of via the Access.Application object) will give you lots of speed.
You're looking in the right direction in one way. Using a single statement to bulk insert will be faster than trying to iterate through the data and insert it row by row. Access, being a file-based database will be exceedingly slow in iterative writes.
The problem is that Access is handling how it optimizes writes internally and there's not really any way to control it. You've probably reached the maximum efficiency of an INSERT statement. For additional speed, you should probably evaluate if there's any way around writing 800,000 records to the database every time you start the application.
Get SQL Server Express (free) and connect to it from Access an external table. SQL express is much faster than MS Access.
I would prefill the database, and hand them the file itself, rather than filling an existing (but empty) database.
If the data you have to fill changes, then keep an ODBC access database (MDB file) synchronized on the server using a bit of code to see changes in the main database and copy them to the access database.
When the user requests a new database zip up the MDB, transfer it to them, and open it.
Alternately, you may be able to find code that opens and inserts data into databases directly.
Alternately, alternately, you may be able to find another format (other than csv) which access can import that is faster.
-Adam
Also check to see how long it takes to copy the file. That will be the lower bound of how fast you can write data. In db's like SQL, it usually takes a bulk load utility to get close to that speed. As far as I know, MS never created a tool to write directly to MS Access tables the way bcp does. Specialized ETL tools will also optimize some of the steps surrounding the insert, such as the way SSIS does transformations in memory, DTS likewise has some optimizations.
Perhaps you could open a ADO Recordset to the table with lock mode adLockBatchOptimistic and CursorLocation adUseClient, write all the data to the recordset, then do a batch update (rs.UpdateBatch).
If it's coming from dbase, can you just copy the data and index files and attach directly without loading? Should be pretty efficient (from the people who bring you FoxPro.) I imagine it would use the existing indexes too.
At the least, it should be a pretty efficient single-command Import.
how much do the 800,000 records change from one creation to the next? Would it be possible to pre populate the records and then just update the ones that have changed in the external database when creating the new database?
This may allow you to create the new database file quicker.
How fast is your disk turning? If it's 7200RPM, then 800,000 rows in 3 minutes is still 37 rows per disk revolution. I don't think you're going to do much better than that.
Meanwhile, if the goal is to streamline the process, how about a table link?
You say you can't access the source database via ADO. Can you set up a table link in MS Access to a table or view in the source database? Then a simple append query from the table link would copy the data over from the source database to the target database for you. I'm not sure, but I think this would be pretty fast.
If you can't set up a table link until runtime, maybe you could build the table link programatically via ADO, then build the append query programatically, then invoke the append query.
HI
The best way is Bulk Insert from txt File as they said
you should insert your record's in txt file then bulk insert the txt file into table
that time should be less than 3 second.
I am using FMDB to access the standard iOS internal SQLite database, with one db connection shared among multiple threads.
To make it thread safe I'm locking access to the db to one block of code at a time. All works well, although the access to the db is now a bit of a bottleneck, obviously.
My question is: Can I ease this up a bit by allowing simultaneous queries from multiple threads, as long as they are all readonly SELECT statements?
I can't find an answer anywhere.
You cannot use the same connection to execute multiple queries at the same time.
However, for purely read-only accesses, you can use multiple connections.
You can have one FMDatabase object for each thread. You might have to write code to test for genuine busy conditions and handle them properly. For example set busyRetryTimeout appropriate for your situation (e.g. how long do you want it to retry in contention situations). Also gracefully handle if the timeout expires and your database query fails.
Clearly, using a shared FMDatabaseQueue is the easiest way to do database interactions from multiple threads. See the Using FMDatabaseQueue and Thread Safety section of the FMDB README.
I'm working on an app where I need to persist data in a reliable manner, i.e. updates need to be persisted all-or-nothing even in the face of application crashes and quits etc.
However I can't find much information on the level of resilience Core Data is able to support and from looking around it seems that Core Data corruption is a possibility. Is this correct or is Core Data able to provide the high and low level ACID properties needed to support reliable data storage?
Please be specific as to which APIs give these guarantees - for example is a save guaranteed to commit all updates or none, even if a crash occurs (possibly on another thread) during the save?
ACID only applies to databases and Core Data isn't a database API so the ACID standard doesn't really apply to Core Data. At best ACID applies only in the case of the use of a SQLite persistent store and does not apply to binary, xml, in-mmemory or custom atomic stores.
The NSManagedObjectContext do enforce the first three ACID components: Atomicity, Consistency and Isolation. You can in principle turn on SQLite's logging function and get Durability.
ACID was conceived as a theoretical standard for multiuser, big-iron databases. Atomicity, Consistency and Isolation are all intended to keep multiple simultaneous user transactions from corrupting the database by stepping on each other. Durability really applies only to a system that cannot be practically backed up otherwise.
Core Data, by contrast, is an API for implementing the model layer of Model-View-Controller design application. It's persistence functions are merely optional. It does not support multiple users but merely multiple subprocesses of the same app.
There is no system which can perfectly guarantee data integrity in the face of a hardware failure. At best you can guarantee that you can revert to an earlier version of the data but you cannot protect against hardware failure while you are making changes.
Having said all that, Core Data is very robust. I have seen a literal handful of cases of corrupt persistent stores and then usually only under extreme conditions. I don't think any other system currently available for desktop and mobile platforms is more reliable.
Update:
Please be specific as to which APIs
give these guarantees - for example is
a save guaranteed to commit all
updates or none, even if a crash
occurs (possibly on another thread)
during the save?... The kinds of
failures I am referring to are crashes
within the app or quitting of the app
- in this case it is desirable that only the last transaction gets
affected (i.e. lost, at worst), but I
don't see how to express this with
Core Data
There are two areas of concern here, the operations of the persistent store writing to disk and the operations of the managed object context in maintaining object graph integrity and saving the graph.
For the SQLite store, SQLite itself is ACID compliant:
A transactional database is one in
which all changes and queries appear
to be Atomic, Consistent, Isolated,
and Durable (ACID). SQLite implements
serializable transactions that are
atomic, consistent, isolated, and
durable, even if the transaction is
interrupted by a program crash, an
operating system crash, or a power
failure to the computer.
For the other stores, which are written out as unitary files, Core Data will use safe write methods that insure that an existing good file will not be overwritten by a corrupted file.
At the higher abstraction level of the object graph, the managed object context performs validations before writing to the store and will reject the entire write attempt if any validation errors occur.(Link Pending.) This behavior is necessitated by the object graph.
An object graph is a collection of related live objects in memory. Unlike a procedural database, where data is only encoded in fields, the relationships between the objects encodes vital data. Therefore, the entire graph must be validated in step and then saved in one step. (Of course, behind the scenes using an sqlite store, there will be procedural ACID compliant steps but validation of the object graph occurs before this level is reached.)
E.g. You have a data model to model/simulate a file system. It has two entities: Folder and File. Each File object has a required relationship with a single Folder object because each real-world file is always inside a folder. However, you insert a file object that has no folder. When the context validates the object graph for a save, it will reject the entire graph because the graph is nonsensical with a file object that is not inside a folder.
Is it ever a good idea to let a large amount of people connect to your website while it is using sqlite?
edit: I am using it in a critical ruby on rails application that may have hundreds of concurrent users.
There are two important properties unique to SQLite that I know of that are relevant:
When doing multiple inserts, you will get better performance if you wrap them all in a single transaction. If the inserts are done individually, SQLite waits for the disk platters to rotate around completely on each insert, so that the inserted data can be read back from the disk and validated.
When writing to a SQLite file, the entire file is locked, which can cause writer starvation. This situation improved in SQLite 3.
The SQLite website says that SQLite is suitable for small to medium traffic websites, with low OLTP capability. This accounts for about 95% of all websites.