Why Informix dbexport is generating corrupt data? - informix

We are having a strange situation while trying to dbexport/dbimport an Informix database.
while importing the DB we got the error:
1213 - Character to numeric conversion error
I checked at which does does the import stops.
I edited the corresponding file (sed -n '1745813,1745815p' table.unl) and have seen data that look to be corrupt.
3.0]26.0]018102]0.0]20111001.0]0.0]77.38]20111012.0]978]04]0.0072]6.59]6.59]29.93]29.93]77.38]
3.0]26.0]018102]0.0]20111001.0]0.0]143.69]20111012.0]978]04]0.0144]6.59]6.59]48.79]48.79]143.69]
]0.000/]]-0.000000000000000000000000000000000000000000000000000044]8\00\00\07Ú\00\00Õ²\00\00\07P27\00\00\07Ú\00\00i]-0.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000999995+']-49999992%(000000000000000000.0]-989074999997704800000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.0]-999992%(0000000000000000000000.0]]]Ú\00\00]*00000015056480000000000000000000000000000000000000000000000000000000000.0]-92%'9999)).'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.0]-;24944999992%(000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.0]-81%-999994;2475200000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.0]]-97704751999992%(00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.0]
The first two lines are OK. The rest seems to be corrupt data.
I do not know how the data appears here since it does not appear in select statement.
I exported only the affected table and figured out that the same data is there.
I looked for a filter that matches all the rows, I used it in another export. This time the corrupt data is not there.
Any idea about what might be the reason behind this?
Best Regards
Arthur

Arthur,
Trying answering the question, why the database is generating corrupted data.
You will need to investigate.
The common causes is :
Occur some crash at your OS/Hardware
Occur some lack of energy
Occur some crash at your database or they process are killed by some admin.
After any of problems above, your FS become corrupt and probably at recovery (fsck) mess the database data.
Probably you are working with a Journaling FS , which ext3,ext4,ntfs is...
If you don't know anything about any events like described before, you need to investigate into the online.log of our Informix database , looking for any start of the engine without a regular shutdown before. Look at your OS logs will help too to found for any involuntary restart of the OS (lack of energy or crash).
Now about the solutions.
Recover a backup
Then you can export just the table corrupted and replace it at your dbexport.
You can do this with archecker. (must be Informix version grater of 10.FC4)
This article maybe will help you if need : Table Level Restore - Pretty Useful Stuff
Export your table just like your describe at the comments.
But this will not recover the corrupt data, they just will "save" the "good" data and discard the "bad" data.
created a new table copy of the first one.
Insert into table 2 select * from table1 where (my filter which matched all rows)
recreated table indexes
renamed tables
Depending how is bad is the corrupted data sometimes you not able to export all "good" data at just one select, you need workaround the "bad" data , check this IBM article:
Unloading around table corruption
Ways to prevent this kind of problem or make easily any recover
First, of course, there is no way to prevent any crash...
What you can do is try minimize the damage after any crash.
Do not use journal file systems!
(at linux, use ext2 FS or RAW devices)
Enable KAIO (for RAW) or DIRECT_IO (of any FS) at Informix configuration.
This will prevent the database to use the OS cache, making more secure the process of writing data at your disk. At some situations this can slower down or speed up your database, depends a lot of your hardware/storage.
Configure your backup to work and test/check it with some frequency.
I recommend configure the backup of full database + logical logs backup.
Depending the version of the Informix and which license you have, you maybe have the rights to configure a cold RSS server ("cluster" secondary node) which will work as active-passive mode at different server and will reduce dramatically you chances to loose any data after any crash at the mainly server.
After any crash, run oncheck to detect the if occur some corruption :
How to use oncheck to detect corruption

Related

TClientDataSet : Mismatch in datapacket

I'm on Delphi 10.3. I use TClientdataset a lot in my applications, as a way to locally store small amount of information, usually less than 10000 rows.
Basically i create the table structure and save it on disk :
myClientDataSet.savetofile('c:\mydata.dat') ;
This is a great way to have a persistence mechanism without having to install any database. It works but i noticed that at some point i can't open the file anymore, because when i try to load :
myClientDataSet.loadfromfile('c:\mydata.dat');
I'll get this error :
Mismatch in datapacket.
When this happens, as far as i know the only fix is to delete the file and create a new one, loosing all data (or restoring a backup, if it exists). In some cases this issue will take years to happen, but always happen at some point.
What can i do to avoid this issue and if it already happened, is there a way to recover the data file with this error ?
Thanks

Realm file size in iOS app

I have an app that uses Realm as a staging database. It receives information from a bluetooth device, processes it, and sends the processed result to a server.
The incoming data from bluetooth gets stored in a Realm table (table1). A separate thread reads data from the Realm database, processes it, and stores it into a second table (table2) for uploading to a server. When it pulls this data and successfully processed it, it deletes it from table1.
The third thread pulls data from table2, and when it successfully sends, removes it from table2.
I'm using a database here in case, for whatever reason, the app is killed - data won't be lost... it will just resume where it left off when the app is restarted. But as you can see, the database is not something that hangs around (it's not like an address book or something... it is just temporary staging)
What I notice is that no matter what the heck I do, the realm database file just increases in size over time. I'll end up with a database that if I open it, will have one record in it, but the database file on disk could be 10s of MB in size if the app is running long enough.
Data is being processed on different background queues so as to not block any UX (one of the reasons I'm using Realm instead of CoreData). But I'm using things like autoreleasepools and the invalidate command to avoid objects that are read from having copies made (as suggested by many realm questions/answers)
What gives? I know I don't have a code sample here, but this just seems like a basic garbage collection problem in Realm. I've seen other questions related to this where people are like "why is my database so huge", and the answers suggest doing things like "writeCopyToPath", but that feels like an incredible hack, and regardless, it would be very difficult - this app is meant to be constantly connected and monitoring a bluetooth device, so to do this, it would mean stopping, making sure all threads that might alter the database are quiesced, doing the copy to compact the db, and then starting everything back up again. That just seems nonsensical to me. I might interrupt user operations for example. I don't want a user to not be able to do something because I decided it was time to do database maintenance.
I feel like I'm either missing some incredibly fundamental point in how to make Realm not keep junk around, or Realm is just the completely wrong solution for my problem. I've never seen this problem with databases - adding and deleting lots of records... quickly... seems like something a database should just be able to do without exploding in size.
Are you making sure that the background thread is not holding on to old versions of the Realm, preventing the space from being reused?
Quote from the docs (https://realm.io/docs/swift/latest/#seeing-changes-from-other-threads):
If a thread has no runloop (which is generally the case in a background thread), then Realm.refresh() must be called manually in order to advance the transaction to the most recent state.
Failing to refresh Realms on a regular basis could lead to some transaction versions becoming “pinned”, preventing Realm from reusing the disk space used by that version, leading to larger file sizes.

Keeping a 'revisionable' copy of Neo4j data in the file system; how?

The idea is to have git or a git-like system (users, revision tracking, branches, forks, etc) store the 'master copy' of objects and relationships.
Since the master copy is on the filesystem, any changes can be checked in, tracked, and backed up. Neo4j could then import the files and serve queries. This also gives freedom since node and connection files can be imported to any other database.
Changes in Neo4j can be written to these files as part of the query
Nodes and connections can be added by other means (like copying from a seed dataset)
Nodes and connections are rarely created/updated/deleted by users
Most of the usage is where Neo4j shines: querying
Due to these two, the performance penalty on importing can be safely ignored
What's the best way to set this up?
If this isn't wise; how come?
It's possible to do that, but it will be lot of work which would not have a real value. IMHO.
With unmanaged extension for Transaction Event API you are able to store information about each transaction onto disk in your common file format.
Here is the some information about Transaction Event API - http://graphaware.com/neo4j/transactions/2014/07/11/neo4j-transaction-event-api.html
Could you please tell us more about the use case and how would design that system?
In general nothing keeps you from just keeping neo4j database files around (zipped).
Otherwise I would probably use a format which can be quickly exported / imported and diffed too.
So very probably csv files with node-file per label ordered by a sensible key
And then relationship-files between pairs of nodes, with neo4j-import you can recover that data quickly into a graph again.
If you want to write changes to the files you have to make sure they are replayable (appends + updates + deletes) , i.e. you have to chose a format which is more or less a transaction-log (which Neo4j already has).
If you want to do it yourself the TransactionHandler is what you want to look at. Alternatively you could dump the full database to a snapshot at times you request.
There are plans to add point-in-time recovery on the existing tx-logs, which I think would also address your question.

Core Data sqlite-wal file gets MASSIVE (>7GB) when inserting ~5000 rows

I'm importing data into Core Data and find that the save operation is slow. Using the iOS simulator, I watch the sqlite-wal file grow and grow until its over 7GB in size.
I'm importing approx 5000 records with about 10 fields. This isn't a lot of data.
Each object I'm inserting has a to-one relation to various other objects (6 relations total). All of those records combined equal less than 20 fields. There are no images or any binary data or anything that I can see that would justify why the resulting size of the WAL file is so huge.
I read the sqlite docs describing the wal file and I don't see how this can happen. The source data isn't more than 50 MB.
My app is multi-threaded.
I create a managed object context in the background thread that performs the import (creates and saves the core data objects).
Without writing the code out here, has anyone encountered this? Anyone have a thought on what I should be checking. The code isn't super simple and all the parts would take time to input here so lets start with general ideas.
I'll credit anyone who gets me going in the right direction.
Extra info:
I've disabled the undo manager for the context as I don't need that (I think it's nil by default on iOS but I explicitly set it to nil).
I only call save after the entire loop is complete and all managed objects are in ram (ram goes up to 100 MB btw).
The loop and creation of the core data objects takes only 5 seconds or so. The save takes almost 3 minutes as it writes the the awl file.
It seems my comment to try using the old rollback(DELETE) journal mode rather than WAL journal mode fixed the problem. NOTE that there seem to be a range of problems when using WAL journal mode including the following:
this problem
problems with database migrations when using the migratePersistentStore API
problems with lightweight migrations
Perhaps we should start a Core Data WAL problems page and get a comprehensive list and ask Apple to fix the bugs.
Note that the default mode under OS X 10.9 and iOS 7 now uses WAL mode. To change this back add the following option
#{ NSSQLitePragmaOptions : #{ #"journal_mode" : #"DELETE" } }
All changed pages of a transaction get appended to the -wal file.
If you are importing multiple records, you should, if possible, use a single transaction for the entire import.
SQLite cannot do a full WAL checkpoint while some other connection is reading the database (which might just be some statement that you forgot to close).

serve my text from the filesystem instead of a database?

I am working on a content management application in which the data being stored on the database is extremely generic. In this particular instance a container has many resources and those resources map to some kind of digital asset, whether that be a picture, a movie, an uploaded file or even plain text.
I have been arguing with a colleague for a week now because in addition to storing the pictures, etc - they would like to store the text assets on the file system and have the application look up the file location(from the database) and read in the text file(from the file system) before serving to the client application.
Common sense seemed to scream at me that this was ridiculous and if we are bothering to look up something from the database, we might as well store the text in a database column and have it served along up with the row lookup. Database lookup + File IO seemed sounds uncontrollably slower then just Database Lookup. After going back and forth for some time, I decided to run some benchmarks and found the results a little surprising. There seems to be very little consistency when it comes to benchmark times. The only clear winner in the benchmarks was pulling a large dataset from the database and iterating over the results to display the text asset, however pulling objects one at a time from the database and displaying their text content seems to be neck and neck.
Now I know the limitations of running benchmarks, and I am not sure I am even running the correct idea of "tests" (for example, File system writes are ridiculously faster then database writes, didn't know that!). I guess my question is for confirmation. Is File I/O comparable to database text storage/lookup? Am I missing a part of the argument here? Thanks ahead of time for your opinions/advice!
A quick work about what I am using:
This is a Ruby on Rails application,
using Ruby 1.8.6 and Sqlite3. I plan
on moving the same codebase to MySQL
tomorrow and see if the benchmarks are
the same.
The major advantage you'll get from not using the filesystem is that the database will manage concurrent access properly.
Let's say 2 processes need to modify the same text as the same time, synchronisation with the filesystem may lead to race conditions, whereas you will have no problem at all with everyhing in database.
I think your benchmark results will depend on how you store the text data in your database.
If you store it as LOB then behind the scenes it is stored in an ordinary file.
With any kind of LOB you pay the Database lookup + File IO anyway.
VARCHAR is stored in the tablespace
Ordinary text data types (VARCHAR et al) are very limited in size in typical relational database systems. Something like 2000 or 4000 (Oracle) sometimes 8000 or even 65536 characters. Some databases support long text
but these have serious drawbacks and are not recommended.
LOBs are references to file system objects
If your text is larger you have to use a LOB data type (e.g. CLOB in Oracle).
LOBs usually work like this:
The database stores only a reference to a file system object.
The file system object contains the data (e.g. the text data).
This is very similar to what your colleague proposes except the DBMS lifts the heavy work of
managing references and files.
The bottom line is:
If you can store your text in a VARCHAR then go for it.
If you can't you have two options: Use a LOB or store the data in a file referenced from the database. Both are technically similar and slower than using VARCHAR.
I did this before. Its a mess, you need to keep the filesystem and the database synchronized all the time, so that makes the programming more complicated, as you would guess.
My advice is either go for an all filesystem solution, or all database solution, depending on the data. Notably, if you require lots of searches, conditional data retrieval, then go for database, otherwise fs.
Note that database may not be optimized for storage of large binary files. Still, remember, if you use both, youre gonna have to keep them synchronized, and it doesnt make for an elegant nor enjoyble (to program) solution.
Good luck!
At least, if your problems come from the "performance side", you could use a "no SQL" storage solution like Redis (via Ohm, for example), or CouchDB...

Resources