Xodus: How to open earlier states of the db - xodus

As I understand, the Xodus database appends transactions to its Log files, and full .xd files don't change anymore, so the Log files kind of become a record of the transaction history. Is there a way to read out past transactions and/or to open the database in an earlier state?

Right, full .xd files don't change anymore and get read-only attribute unless this is not turned off.
You can open database on the latest valid snapshot only. At runtime, you can open a read-only transaction and use it as long as you wish. It will hold correspoding database snapshot, database GC will be stopped until the transaction is aborted.
In future versions, there woould appear an API for opening a read-only transaction by an address of a snapshot in the Log. It will be an unsafe operation, since the snapshot can be incomplete due to database GC. So it would require manual control over the GC.

Related

Is it safe to delete sqlite's WAL file?

I have a strange problem with Core Data in an iOS app where sometimes the WAL file becomes huge (~1GB). It appears there are other people with the problem (e.g. Core Data sqlite-wal file gets MASSIVE (>7GB) when inserting ~5000 rows).
My initial thought is to delete the WAL file at app launch. It seems from reading the sqlite documentation on the matter that this will be fine. But does anyone know of any downsides to doing this?
I'd of course like to get to the bottom of why the WAL file is growing so big, but I can't get to the bottom of it right now and want to put in a workaround while I dig deeper into the problem.
It's worth pointing out that my Core Data database is more of a cache. So it doesn't matter if I lose data that's in the WAL. What I really need to know is, will the database be completely corrupted if I delete the WAL? My suspicion is no, otherwise the WAL doesn't serve one of its purposes.
Couple of things:
You can certainly delete the WAL file. You will lose any committed transactions that haven't been checkpointed back to the main file. (Thus violating the "durability" part of ACID, but perhaps you don't care.)
You can control the size of the WAL file on disk with the journal_size_limit pragma (if it bothers you). You may want to manually checkpoint more often too. See "Avoiding Excessively Large WAL files" here: https://www.sqlite.org/wal.html
I dislike all the superstitious bashing of WAL mode. WAL mode is faster, more concurrent, and much simpler since it dispenses with the all the locking level shenanigans (and most "database is busy" problems) that go with rollback journals. WAL mode is the right choice in almost every situation. (The only place it is problematic is on flash filesystems that don't support shared memory-mapped access to files. In that case, the "unofficial" SQLITE_SHM_DIRECTORY compile directive can be used to move the .shm file to a different kind of filesystem -- e.g. tmpfs -- but this should not be a concern on iOS.)
It baffles me how many people here are suggesting it's safe to delete WAL files, without even bad looks in their direction.
The documentation explicitly lists this as one of the official ways to corrupt a database. It doesn't say deleting a hot WAL may cause you to lose most recent transactions or something benign like that. It says it may corrupt the database.
Why? Because an application may have crashed in the middle of a checkpointing operation. When this happens, the database file itself is in an invalid state unless paired with the new data contained in the WAL.
So the answer is a clear no. Don't delete WAL files.
What you can do to clear the file is call PRAGMA schema.wal_checkpoint(TRUNCATE);
I have been seeing quite a few negative reports on WAL in iOS 7. I have had to disable it on several projects until I have had time to explore the issues more throughly.
I would not delete the journal file but you could play with the option of vacuuming the SQLite file which will cause SQLite to "consume" the journal file. You can do this by adding the NSSQLiteManualVacuumOption as part of the options when you add the NSPersistentStore to the NSPersistentStoreCoordinator.
If that ends up being time consuming then I would suggest disabling WAL. I have not seen any ill effects to disabling it (yet).
WAL mode has problems, don't use it. Problems vary but the very large size your report is one, other problems include failure during migration (using NSPersistentStoreCoordinators migratePersistentStore) and failure during importing of iCloud transaction logs. So while there are reported benefits until these bugs are fixed its probably unwise to use WAL mode.
And NO you can't delete the Write Ahead Log, because that contains the most recent data.
Set the database to use rollback journal mode and I think you will find you no longer have these very large files when loading lots of data.
Here is an extract which explains how WAL works. Unless you can guarantee that your app has run a checkpoint I don't see how you can delete the WAL file without running the risk of deleting committed transactions.
How WAL Works
The traditional rollback journal works by writing a copy of the
original unchanged database content into a separate rollback journal
file and then writing changes directly into the database file. In the
event of a crash or ROLLBACK, the original content contained in the
rollback journal is played back into the database file to revert the
database file to its original state. The COMMIT occurs when the
rollback journal is deleted.
The WAL approach inverts this. The original content is preserved in
the database file and the changes are appended into a separate WAL
file. A COMMIT occurs when a special record indicating a commit is
appended to the WAL. Thus a COMMIT can happen without ever writing to
the original database, which allows readers to continue operating from
the original unaltered database while changes are simultaneously being
committed into the WAL. Multiple transactions can be appended to the
end of a single WAL file.
Checkpointing
Of course, one wants to eventually transfer all the transactions that
are appended in the WAL file back into the original database. Moving
the WAL file transactions back into the database is called a
"checkpoint".
Another way to think about the difference between rollback and
write-ahead log is that in the rollback-journal approach, there are
two primitive operations, reading and writing, whereas with a
write-ahead log there are now three primitive operations: reading,
writing, and checkpointing.
By default, SQLite does a checkpoint automatically when the WAL file
reaches a threshold size of 1000 pages. (The
SQLITE_DEFAULT_WAL_AUTOCHECKPOINT compile-time option can be used to
specify a different default.) Applications using WAL do not have to do
anything in order to for these checkpoints to occur. But if they want
to, applications can adjust the automatic checkpoint threshold. Or
they can turn off the automatic checkpoints and run checkpoints during
idle moments or in a separate thread or process.
There are quite good answers on this thread, but i'm adding this one to link to the Apple official QnA about journaling mode in iOS7 Core Data:
https://developer.apple.com/library/ios/qa/qa1809/_index.html
They give differents solutions:
To safely back up and restore a Core Data SQLite store, you can do the
following:
Use the following method of NSPersistentStoreCoordinator class, rather
than file system APIs, to back up and restore the Core Data store:
- (NSPersistentStore *)migratePersistentStore:(NSPersistentStore *)store toURL:(NSURL *)URL options:(NSDictionary *)options withType:(NSString *)storeType error:(NSError **)error
Note that this is the option we recommend.
Change to rollback journaling mode when adding the store to a
persistent store coordinator if you have to copy the store file.
Listing 1 is the code showing how to do this:
Listing 1 Use rollback journaling mode when adding a persistent store
NSDictionary *options = #{NSSQLitePragmasOption:#{#"journal_mode":#"DELETE"}}; if (! [persistentStoreCoordinator addPersistentStoreWithType:NSSQLiteStoreType
configuration:nil
URL:storeURL
options:options
error:&error])
{
// error handling.
}
For a store that was loaded with the WAL mode, if both the main store file and the
corresponding -wal file
exist, using rollback journaling mode to add the store to a persistent
store coordinator will force Core Data to perform a checkpoint
operation, which merges the data in the -wal file to the store file.
This is actually the Core Data way to perform a checkpoint operation.
On the other hand, if the -wal file is not present, using this
approach to add the store won't cause any exceptions, but the
transactions recorded in the missing -wal file will be lost.
VERY IMPORTANT EDIT
If some of your users are on iOS 8.1 and you chose the first solution (the one Apple recommends), note that their many-to-many data relationships will be completely discarded. Lost. Deleted. In the entire migrated database.
This is a nasty bug apparently fixed in iOS 8.2. More info here http://mjtsai.com/blog/2014/11/22/core-data-relationships-data-loss-bug/
You should never delete the sqlite WAL file, it contains transactions that haven't been written to the actual sqlite file yet. Instead force the database to checkpoint, and then clean up the WAL file for you.
In CoreData the best way to do this is to open the database with the DELETE journal mode pragma. This will checkpoint and then delete the WAL file for you.
NSDictionary *options = #{ NSSQLitePragmasOption: #{ #"journal_mode": #"DELETE"}};
[psc addPersistentStoreWithType:NSSQLiteStoreType
configuration:nil
URL:_storeURL
options:options
error:&localError];
For sanity sake you should ensure you only have one connection to the persistent store when you do this, i.e. only one persistent store instance in a single persistent store coordinator.
FWIW in your particular case you may wish to use TRUNCATE or OFF for your initial database import, and switch to WAL for updates.
https://www.sqlite.org/pragma.html#pragma_journal_mode

What should I do after crashing when writing into a sqlite db (iOS)?

If an app crashes when writing into a sqlite db (or CoreData), sometimes the db file will be broken, after which initialisation of the db may fail to open.
What I'm doing now is deleting the db file if it fails to open, and copying a new one to be used.
I'm wondering what's the BEST WAY to deal with such situation?
Due to the atomic commit nature of SQLite, you should never experience database corruption, if you are, it could be due to enabling features such as "Write Caching" within iOS or in the hard drive itself, or could possibly even be caused by hardware failure.
SQLite maintains a journal file to rollback commits and return the database to a consistent state in the event of a power failure or other abrupt shutdown. If corruption occurs, it means that the OS responded to SQLite stating a write operation had completed when in actuality, it wasn't physically committed to the media yet. Please ensure Write Caching is disabled when using it in your App. For more information, please see the SQLite Atomic Commit reference.
Otherwise the common method people seem to "repair" a SQlite DB is to .dump the DB file into another. Like so echo ".dump" | sqlite old.db | sqlite new.db
Hope this helps...
[source]

Core data iCloud transaction logs

I'm testing Core Data and iCloud with UIManagedDocument and ubiquity options (NSPersistentStoreUbiquitousContentNameKey and NSPersistentStoreUbiquitousContentURLKey).
Everything is working OK. My devices get synched without problems and in an reasonable time. The DB is small (below 100K).
As i said I'm testing the app and making a lot of changes to the db and as result, a lot of transaction logs are generated. The problem I have is that if I delete and reinstall the app on one of the devices used for testing (without deleting iCloud data) the app take a very long time to open the document. openWithCompletionHandler takes minutes, sometimes never ends. If I turn on debugging (-com.apple.coredata.ubiquity.logLevel 3) i can see that there is a long wait and after that the DB is reconstructed with transaction logs.
If i remove iCloud data and reinsert the data on first device the second one sync without problems. Because of that I think that the reason for the delay is a high number of transaction logs (20-30 while testing as I can see on developer.icloud.com)
According to Managing Core Data iCloud Transaction Logs will handle core data automatically, but I can't see any deletion. Perhaps that needs some more time.
My questions are: Do transaction logs gets consolidated ever ? Can I force the consolidation of logs ? Another recommended option ?
I only store the subset of essential information needed for syncing in iCloud Core Data file. I have another local file with full DB, so I can reconstruct the iCloud DB without any major loss of information. Perhaps I could delete iCloud DB when I detect a bunch of logs and re-create it. Do you think this is a good option ?
Thank you for helping.
Do transaction logs gets consolidated ever ?
That is how it's supposed to work.
Can I force the consolidation of logs ?
No. There is no API that directly affects the existence of transaction logs. The iCloud system will consolidate them at some point, but there's no documentation regarding when that happens, and you can't force it.
Another recommended option ?
You can limit the number of transaction logs indirectly-- save changes less frequently. A transaction log corresponds to saving changes in Core Data. It may not make much of a difference because, honestly, 20-30 transaction logs is not very many. You might be able to reduce the number of log files but you'll still have the same amount of data in them.
Transaction logs aren't really your problem. As you observed, there's a long wait before iCloud starts running through the transaction logs. During that delay, iCloud is communicating with Apple's servers and downloading the transaction logs. Some of this is affected by network speed and latency, and the rest is just the way iCloud is.

Interbase transaction monitoring

I have a very strange problem with transactions in Interbase 7.5 which seem to be stuck.
I can track the problem with IBConsole -> right click DB -> Performance Monitor -> Transactions
Usually this list should show only a few active transaction. But I get several hundred active transactions when I start my application (a web module for an apache webserver using Delphi 7 Interbase components, e.g. IBQuery, IBTransaction, ...)
Transaction type is always listed as snapshot, if this is of relevance.
I have already triple checked all sql statements and cannot find anything that should produce such problems...
Is there any way get the sql statements of a specific transaction?
Any other suggestion how to find such a problem would be very welcome.
Is there any way get the sql statements of a specific transaction?
Yes, you can SELECT from TMP$STATEMENTS WHERE TRANSACTION_ID = .... That's from memory, but should get you started.
In IB Performance Monitor, you can locate the transaction from the statements tab, using the button on the toolbar. Can't remember if you can go the other way in that app. It's been a long time since I wrote it!
Active IBX data-sets require an active transaction all the time. If you don't have active data-sets just don't forget to commit all the active transactions.
If you have active data-sets, you can configure all your components to use the same TIbTransaction object, and you can also configure the unique TIbTransaction to commit or rollback after a idle time-out period via the IdleTimer and DefaultAction properties.
Terminating the transaction (by manually or automatically committing or rolling back) will close all the linked datasets (TIBQuery, TIBTable and the like).
You may be tempted to use the CommitRetaining or RollbackRetaining methods to terminate the transaction without closing the related data-sets, but this may affect the performance of the server, and my advise is to always avoid using it.
If you want to improve your application, you should consider changing your database connection layer or introducing a in-memory capable dataset over IBX, for example, Delphi's TClientDataSet, which allows you to retrieve data and retain it in memory while closing all the underlying datasets (and transactions), while allowing you to use the traditional Insert/Append/Edit/Delete methods to modify the data and then apply that changes to the database in a new short-time transaction.

What's the difference between Jet OLEDB:Transaction Commit Mode and Jet OLEDB:User Commit Sync?

Althoug both Jet/OLE DB parameters are relativly well documented I fail to understand the difference between these two connection parameters:
The first one:
Jet OLEDB:Transaction Commit Mode
(DBPROP_JETOLEDB_TXNCOMMITMODE)
Indicates whether Jet writes data to
disk synchronously or asynchronously
when a transaction is committed.
The second one:
Jet OLEDB:User Commit Sync
(DBPROP_JETOLEDB_USERCOMMITSYNC)
Indicates whether changes that were
made in transactions are written in
synchronous or asynchronous mode.
What's the difference? When to use which?
This is very long, so here's the short answer:
Don't set either of these. The default settings for these two options are likely to be correct. The first, Transaction Commit Mode, controls Jet's implicit transactions, and applies outside of explicit transactions, and is set to YES (asynchronous). The second controls how Jet interacts with its temporary database during an explicit transaction and is set to NO (synchronous). I can't think of a situation where you'd want to override the defaults here. However, you might want to set them explicitly just in case you're running in an environment where the Jet database engine settings have been altered from their defaults.
Now, the long explanation:
I have waded through a lot of Jet-related resources to see if I can find out what the situation here is. The two OLEDB constants seem to map onto these two members of the SetOptionEnum of the top-level DAO DBEngine object (details here for those who don't have the Access help file available):
dbImplicitCommitSync
dbUserCommitSync
These options are there for overriding the default registry settings for the Jet database engine at runtime for any particular connection, or for permanently altering the stored settings for it in the registry. If you look in the Registry for HLKM\Software\Microsoft\Jet\X.X\ you'll find that under the key there for the Jet version you're using there are keys, of which two are these:
ImplicitCommitSync
UserCommitSync
The Jet 3.5 Database Engine Programmer's Guide defines these:
ImplicitCommitSync: A value of Yes indicates that Microsoft Jet will wait for commits to finish. A value other than Yes means that Microsoft Jet will perform commits asynchronously.
UserCommitSync: When the setting has a value of Yes, Microwsoft Jet will wait for commits to finish. Any other value means that Microsoft Jet will perform commits asynchronously.
Now, this is just a restatement of what you'd already said. The frustrating thing is that the first has a default value of NO while the second defaults to YES. If they really were controlling the same thing, you'd expect them to have the same value, or that conflicting values would be a problem.
But the key actually turns out to be in the name, and it reflects the history of Jet in regard to how data writes are committed within and outside of transactions. Before Jet 3.0, Jet defaulted to synchronous updates outside of explicit transactions, but starting with Jet 3.0, IMPLICIT transactions were introduced, and were used by default (with caveats in Jet 3.5 -- see below). So, one of these two options applies to commits OUTSIDE of transactions (dbImplicitCommitSync) and the other for commits INSIDE of transactions (dbUserCommitSync). I finally located a verbose explanation of these in the Jet Database Engine Programmer's Guide (p. 607-8):
UserCommitSynch
The UserCommitSynch setting determines
whether changes made as part of an
explicit transaction...are written to
the database in synchronous mode or
asynchronous mode. The default value...is Yes, which specifies
asynchronous mode. It is not
recommended that you change this value
because in synchronous mode, there is
no guarantee that information has been
written to disk before your code
proceeds to the next command.
ImplicitCommitSync
By default, when
performing operations that add,
delete, or update records outside of
explicit transactions, Microsoft Jet
automatically performs internal
transactions called implicit
transactions that temporarily save
data in its memory cache, and then
later write the data as a chunk to the
disk. The ImplicitCommitSync setting
determines whether changes made by
using implicit transactions are
written to the database in synchronus
mode or asynchronous mode. The default
value...is No, which specifies that
these changes are written to the
database in asynchronous mode; this
provides the best performance. If you
want implicit transactions to be
written to the database in synchronous
mode, change the value...to Yes. If
you change the value...you get
behavior similar to Microsoft Jet
versions 2.x and earlier when you
weren't using explicit transactions.
However, doing so can also impair
performance considerably, so it is not
recommended that you change the value
of this setting.
Note: There is no longer a need to use
explicit transactions to improve the
performance of Microsoft Jet. A
database application using Microsoft
Jet 3.5 should use explicit
transactions only in situations where
there may be a need to roll back
changes. Micosoft Jet can now
automatically perform implicit
transactions to improve performance
whenever it adds, deletes or changes
records. However, implicit
transactions for SQL DML statements
were removed in Microsoft Jet
3.5...see "Removal of Implicit Transactions for SQL DML Statements"
later in this chapter.
That section:
Removal of Implicit Transactions for SQL DML Statements
Even with all the work in Microsoft
Jet 3.0 to eliminate transactions in
order to obtain better performance,
SQL DML statements were still placed
in an implicit transaction. In
Microsoft Jet 3.5, SQL DML statements
are not placed in an implicit
transaction. This substantially
improves performance when running SQL
DML statements that affect many
records of data.
Although this change provides a
substantial performance improvement,
it also introduces a change to the
behavior of SQL DML statements. When
using Microsoft Jet 3.0 and previous
versions that use implicit
transactions for SQL DML statements,
an SQL DML statement rolls back if any
part of the statement is not
completed. When using Microsoft Jet
3.5, it is possible to have some of the records committed by SQL DML
statement while others are not. An
example of this would be when the
Microsoft Jet cache is exceeded. The
data in the cache is written to disk
and the next set of records is
modified and placed in the cache.
Therefore, if the connection is
terminated, it is possible that some
of the records were saved to disk, but
others were not. This is the same
behavior as using DAO looping routines
to update data withoug an explicit
transaction in Microsoft Jet 3.0. If
you want to avoid this behavior, you
need to add explicit transactions
around the SQL DML statement to define
a set of work and you must sacrifice
the performance gains.
Confused yet? I certainly am.
The key point to me seems to me to be that dbUserCommitSync seems to control the way Jet writes to the TEMPORARY database it uses for staging EXPLICIT transactions, while dbImplicitCommitSync relates to where Jet uses its implicit transactions OUTSIDE of an explicit transaction. In other words, dbUserCommitSync controls the behavior of the engine while inside a BeginTrans/CommitTrans loop, while dbImplicitCommitSync controls how Jet behaves in regard to asynch/synch outside of explicit transactions.
Now, as to the "Removal of Implicit Transactions" section: my reading is that implicit transactions apply to updates when you're looping through a recordset outside of a transaction, but no longer apply to a SQL UPDATE statement outside a transaction. It stands to reason that an optimization that improves the performance of row-by-row updates would be good and wouldn't actually help so much with a SQL batch update, which is already going to be pretty darned fast (relatively speaking).
Also note that the fact that it is possible to do it both ways is what enables DoCmd.RunSQL to make incomplete updates. That is, a SQL command that would fail with CurrentDB.Execute strSQL, dbFailOnError, can run to completion if executed with DoCmd.RunSQL. If you turn off DoCmd.SetWarnings, you don't get a report of an error, and you don't get the chance to roll back to the initial state (or, if you are informed of the errors and decide to commit, anyway).
So, what I think is going on is that SQL executed through the Access UI is wrapped in a transaction by default (that's how you get a confirmation prompt), but if you turn off the prompts and there's an error, you get the incomplete updates applied. This has nothing to do with the DBEngine settings -- it's a matter of the way the Access UI executes SQL (and there's an option to turn it off/on).
This contrasts to updates in DAO, which were all wrapped in the implicit transactions starting with Jet 3.0, but starting with Jet 3.5, only sequential updates were wrapped in the implicit transactions -- batch SQL commands (INSERT/UPDATE/DELETE) are not.
At least, that's my reading.
So, in regard to the issue in your actual question, in setting up your OLEDB connection, you'd set the options for the Jet DBEngine for that connection according to what you were doing. It seems to me that the default Jet DBEngine settings are correct and shouldn't be altered -- you want to use implicit transactions for edits where you're walking through a recordset and updating one row at a time (outside of an explicit transaction). On the other hand, you can wrap the whole thing in a transaction and get the same result, so really, this only applies to cases where you're walking a recordset and updating and have not used an explicit transaction, and the default setting seems quite correct to me.
The other setting, UserCommitSync, seems to me to be something you'd definitely want to leave alone as well, as it seems to me to apply to the way Jet interacts with its temp database during an explicit transaction. Setting it to asynchronous would seem to me to be quite dangerous as you'd basically not know the state of the operation at the point that you committed the data.
You'd think that USERCOMMITSYNC=YES would be the option to commit synchronously. And that is the cause of the confusion.
I spent ages googling on this topic because I found that the behavior I was getting with old vb6 applications was not the same as I get in .net oledb/jet4
Now I really should back up what I'm going to say with a link to the actual page(s) I read but I can't find those pages now.
Anyway, I was browsing MSDN website and found a page that described a 'by design' error in Jet3 which transposed the functionality of USERCOMMITSYNC meaning a value of NO gets synchronous commit.
Therefore MS set the default to NO and we get synchronous commit by default. Exactly as described above by David Fenton. A behavior we've all come to accept.
But, the document then went on to explain that the behavior in oledb/Jet4 has been changed. Basically MS fixed their bug and now a setting of USERCOMMITSYNC=YES does what it says.
But did they change the default? I think not because now my explicit transactions are NOT committing synchronously in .Net applications using oledb/jet4.

Resources