I have a few .dat and .idx files inside a .dbs folder, so presumably these were created using Informix SE. I need a way to view or extract info from these files.
I have Informix SE 7.25 installed in a Windows platform but I don't know how to read those files. Do I use dbexport.exe that comes with Informix SE or dbaccess.exe? I tried to have a play around with both but don't know how to use them properly.
Can anyone help me with this problem?
If the files are in a directory dbase.dbs, then you should be able to use:
dbexport dbase
to generate a directory dbase.exp containing a schema file dbase.sql and a series of .unl files — the unloaded data from each table in the database.
That's by far the easiest way to get the data.
The data format is Informix's UNLOAD format — usually pipe-delimited, backslash-escaped files with one logical line per record. There are tedious wrinkles in the data format, but most of them won't affect you.
It's relatively straight-forward (but by no means trivial) to convert the data into CSV format — and other formats would be possible to, with greater or lesser amounts of work.
Using DB-Access would be a labour of love — it would be a lot harder work. You'd probably generate a list of tables from the database (with DB-Access), then munge that output into a series of UNLOAD statements, which you'd then run into DB-Access again to get the actual data. It can certainly be done; you should aim to avoid having to do it that way.
If the database is incomplete — for example, the system catalog is missing — then all is not lost, but the work gets far harder as you have to read the raw .dat files and you have to deduce the schema for the tables. That is hard to do 100% reliably. If you have the schema, then I have tools to get the data out of the database — contact me (see my profile).
Related
Update
On a completely unrelated search, I have found this: Lightweight SQL database which doesn't require installation which lists a few possible options with about the same goals.
Original post
We have a desktop .Net/WPF app, with large (for a desktop app) amounts of data stored: it's has layouts, templates, products list and technical specs, and many more stuff.
Today it's stored in an Access DB, but we have hit the Access limitations pretty hard: it's not very fast, the DB weights 44Mb (which results in a large setup package), and more importantly, it's a pain to use with version control, because we can't merge the data from a branch to another. For instance, we might create a branch to add a few products, but then we have to add them manually in the trunk when we merge. We could use SQL scripts, but writing advanced SQL scripts for Access is a pain.
Basically, I want to replace the MS Access DB with another storage format, because Access is not well adapted.
I had tought of using JSON files that would be unzipped during or after install, but I'm a bit afraid of performance problems.
I'm also thinking of splitting the data into multiple files with multiple formats, depending on its usage, but using different formats might get complicated or annoying to develop.
Performance
Some parts of the DB are accessed pretty often and should be performance-optimized, whereas others are accessed maybe 1 or 2 times per work session, and using a poor-performance but high-compression format could be OK.
Size
We want the installer to be the smallest possible, so the library should be small, and the format should use small files. Using a library that adds 5 Mb to the installer file is out of the question.
Compatibility
The software must be able to run on .Net 4 (not 4.5), and it would be great if it ran on Windows XP (even though we're thinking more and more of just abandoning it going forward, it's still more than 7% of our market share).
Moreover, it should not need to install a server (like MS Access or SQLite) because it will be installed on end-user's computers, and we don't want to bloat them.
Versionning
It should be easy to version the data and the DB structure. The file should either be a text file (like JSON), or scripts should be easy to run in the continuous integration platform (like SQL server).
So, which technology would you use that answers all these contraints ?
Thanks !
As for your version control pains, it sounds from your description that if I were you, I'd keep the raw data in text files that are version-controlled, and only have the build process produce the database from them. This way, you should be able to use SQLite.
I would go for SQLite in your case, since the files are self-contained and easy to locate (hence easy to save on a version control system), installer is small, and performance is good.
http://www.sqlite.org/
In my application I want to use files for storing data. I don't want to use database or clear text file, the goal is to save double and integer values along with string just to identify the name of the record ; I simple need to save data on disk for generating reports. File can grow even to gigabyte. What format you suggest to use? Binary? If so what vcl component/library you know which is good to use? My goal is to create an application which creates and updates the files while another tool will "eat" those file
producing nice pdf reports for user on demand. What do you think? Any idea or suggestion?
Thanks in advance.
If you don't want to reinvent the wheel, you may find all needed Open Source tools for your task from our side:
Synopse Big Table to store huge amount of data - see in particular the TSynBigTableRecord class to store an unlimited number of records with fields, including indexes if needed - it will definitively be faster and use less disk size than any other regular SQL DB
Synopse SQLite3 Framework if you would rather use a standard SQLite engine for the storage - it comes with a full Client/Server ORM
Reporting from code, including pdf file generation
With full Source code, working from Delphi 6 up to XE.
I've just updated the documentation of the framework. More than 600 pages, with details of every class method, and new enhanced general introduction. See the SAD document.
Update: If you plan to use SQLite, you should first guess how the data will be stored, which indexes are to be created, and how a SQL query may speed up your requests. It's a bad idea to read all file content for every request: you should better structure your data so that a single SQL query would be able to return the expended results. Sometimes, using additional values (like temporary sums or means) to the data is a good idea. Also consider using the RTree virtual table of SQLite3, which is dedicated to speed up access to double min/max multi-dimensional data: it may speed up a lot your requests.
You don't want to use a full SQL database, and you think that a plain text file is too simple.
Points in between those include:
Something that isn't a full SQL database, but more of a key-value store, would technically not be a flat file, but it does provide a single "key+value" list, that is quickly searchable on a single primary key. Such as BSDDB. It has the letter D and B in the name. Does that make it a database, in your view? Because it's not a relational database, and doesn't do SQL. It's just a binary key-value (hashtable) blob storage mechanism, using a well-understood binary file format. Personally, I wouldn't start a new project and use anything in this category.
Recommended: Something that uses SQL but isn't as large as standalone SQL database servers. For example, you could use SQLite and a delphi wrapper. It is well tested, and used in lots of C/C++ and Delphi applications, and can be trusted more than anything you could roll yourself. It is a very light embedded database, and is trusted by many.
Roll your own ISAM, or VLIR, which will eventually morph over time into your own in-house DBMS. There are multiple files involved, and there are indexes, so you can look up data fast without loading everything into memory. Not recommended.
The most flat of flat binary fixed-record-length files. You mentioned originally in your question, power basic which has something called Random Access files, and then you deleted that from your question. Probably what you are looking for, especially for append-only write as the primary operation. Roll your own TurboPascal era "file of record". If you use the "FILE OF RECORD" type, you hit the 2gb limit, and there are problems with Unicode. So use TStream instead, like this. Binary file formats have a lot of strikes against them, especially since it is difficult to grow and expand your binary file format over time, without breaking your ability to read old files. This is a key reason why I would recommend you start out with what might at first seem like overkill (SQLite) instead of rolling your own binary solution.
(Update 2: After updating the question to mention PDFs and what sounds like a reporting-system requirement, I think you really should be using a real database but perhaps a small and easy to use one, like firebird, or interbase.)
I would suggest using TClientDataSet, and use it's SaveToFile() / SaveToStream() methods by the generating program, and LoadFromFile() / LoadFromStream() methods for the program that will "consume" the data. That way, you can still make indexed records without connecting to any external database, all while keeping the interchange data in a single file.
Define API to work with your flat file, so that the API can be implemented by a separate data layer in many ways.
Implement the API using standard embedded SQL database (ex SQLite or Firebird).
Only if there is something wrong with the standard solution think of your own.
I use KBMMemtable - see http://www.components4developers.com/ - fast, reliable, been around a long time - supports binary and CSV streaming in and out of files, as well indexing, filters, and lots of other goodies - TClientDataSet will not do well with large datasets.
We've got a rails app that processes large amounts of xml data imports. Right now we're storing these ~5MB xml docs in Postgres. This is not ideal given that we use each xml doc once or twice for parsing. We'd like to have an intelligent way of storing and archiving these docs, but not overly complicate the retrieval process for the sake of space. We've considered moving the docs to Mongo (which we're also using), but then aren't we just artificially boosting the memory requirements of our Mongo db servers?
What's the best way for us to deal with this?
I would just store a link to the file in the DB if you use it only for parsing once or twice and then load the file from the given link. Another aproach is to use a XML DB, e.g. eXist.
You could try eXist, an XML database. If you are just archiving them, though, why not just store them in a directory tree?
You may want to look into DB2's PureXML capabilities. To play with it, you can download the free DB2 Express-C version here. For the record, IBM is also the only database provider officially supporting their Ruby driver and Rails adapter, so you wouldn't be on your own.
What harm are they doing where they are? They will take up 'space' wherever you put them.
If are confident you will never need them again then there is a case for archival to less expensive storage (eg tape?) - otherwise whatever you do will 'overly complicate the retrieval process'
You could consider compressing them in-place if you are not already doing so
I'm looking to go through a bunch of text files in a bunch of folders. I'd like to go through each file line by line and do some basic statistics, like grabbing time stamp and count repeating values. Is there any tool or scripting solution that someone could recommend for doing this?
Another possibility is to have a script/tool that could just parse these files and add them to a database like sqlite or access, for easy filtering.
So far I tried using AIR, but it looks like there might be too much data for it to process, and it hangs, but that could be because of some inefficient filtering.
I have used QuickMacros for things like this. It can do just about anyting to a textfile (some illegal in 7 states) as well as connect to databases and perform sql tasks like create and modify tables etc.
I routinely used it to extract data, parse it, and then load it into another database. Especially useful with Scheduled Tasks.
Here's the website
I recommend Perl and CPAN
I'm having a HDF5 file with one-dimensional (N x 1) dataset of compound elements - actually it's a time series. The data is first collected offline into the HFD5 file, and then analyzed. During analysis most of the data turns out to be uninteresting, and only some parts of it are interesting. Since the datasets can be quite big, I would like to get rid of the uninteresting elements, while keeping the interesting ones. For instance, keep elements 0-100 and 200-300 and 350-400 of a 500-element dataset, dump the rest. But how?
Does anybody have experience on how accomplish this with HDF5? Apparently it could be done in several ways, at least:
(Obvious solution), create a new fresh file and write the necessary data there, element by element. Then delete the old file.
Or, into the old file, create a new fresh dataset, write the necessary data there, unlink the old dataset using H5Gunlink(), and get rid of the unclaimed free space by running the file through h5repack.
Or, move the interesting elements within the existing dataset towards the start (e.g. move elements 200-300 to positions 101-201 and elements 350-400 to positions 202-252). Then call H5Dset_extent() to reduce the size of the dataset. Then maybe run through h5repack to release the free space.
Since the files can be quite big even when the uninteresting elements have been removed, I'd rather not rewrite them (it would take a long time), but it seems to be required to actually release the free space. Any hints from HDF5 experts?
HDF5 (at least the version I am used to, 1.6.9) does not allow deletion. Actually, it does, but it does not free the used space, with the result that you still have a huge file. As you said, you can use h5repack, but it's a waste of time and resources.
Something that you can do is to have a lateral dataset containing a boolean value, telling you which values are "alive" and which ones have been removed. This does not make the file smaller, but at least it gives you a fast way to perform deletion.
An alternative is to define a slab on your array, copy the relevant data, then delete the old array, or always access the data through the slab, and then redefine it as you need (I've never done it, though, so I'm not sure if it's possible, but it should)
Finally, you can use the hdf5 mounting strategy to have your datasets in an "attached" hdf5 file you mount on your root hdf5. When you want to delete the stuff, copy the interesting data in another mounted file, unmount the old file and remove it, then remount the new file in the proper place. This solution can be messy (as you have multiple files around) but it allows you to free space and to operate only on subparts of your data tree, instead of using the repack.
Copying the data or using h5repack as you have described are the two usual ways of 'shrinking' the data in an HDF5 file, unfortunately.
The problem, as you may have guessed, is that an HDF5 file has a complicated internal structure (the file format is here, for anyone who is curious), so deleting and shrinking things just leaves holes in an identical-sized file. Recent versions of the HDF5 library can track the freed space and re-use it, but your use case doesn't seem to be able to take advantage of that.
As the other answer has mentioned, you might be able to use external links or the virtual dataset feature to construct HDF5 files that were more amenable to the sort of manipulation you would be doing, but I suspect that you'll still be copying a lot of data and this would definitely add additional complexity and file management overhead.
H5Gunlink() has been deprecated, by the way. H5Ldelete() is the preferred replacement.