MS's OneNote uses a data hierarchy that is essentially a simple tree, even though the info is displayed via a tabbed interface rather than a treeview. You begin with "notebooks," which can have "sections," which have "pages." I'm trying to model this. In my case, a page would be linked to the contents of a RichEdit.
My problem is not that I can't figure out a way to do this. My problem is that I am unsure which of several possibilities will ultimately be simplest. That's where I am hoping you will come in.
I could, for example, use a regular [MyBase] database. A Page dataset would have fields for its name, and the RichEdit data. Pages would be nested inside Sections, and Sections would be nested inside Notebooks.
If I thought about it for awhile, it seems like this is something that could be modeled with simple stringLists as well, especially if each element in the list included comma separate values for an ID and position in the hierarchy, as well as notebook/section/page name.
But then this also seems like something that might be well suited for XML ... if I were to learn more about XML :-)
What do you more experienced folks think?
Thanks, as always ...
Whether you go with a database or XML, try putting your data access routines in a datamodule. Let your GUI unit(s) make calls to public methods of the datamodule, and ensure that those calls do not depend on how your data are stored. That way, you can start with one approach, and switch to the other just by editing your datamodule.
One thing you could try would be to use a structured storage system. The concept is simple, you work with a datafile much like you would a disk and folders...thing is that the folders and files are all under your exclusive control. The only issue is that it doesn't exactly scale well to multiple users, but then neither does XML. (a good structured storage library is available on gabr's blog)
For a "multi-user" system, your best option will be to implement using tables. You can probably get a good start at things using Microsoft Access and ADO, which migrates to MS Sql Server very easily. From your brief description, I would expect you to have three tables, NOTEBOOKS, SECTIONS, PAGES. The Pages would have a Foreign key relationship (detail/master) to Sections and sections would have a Foreign key relationship with Notebooks.
If you are storing it in a file system, why not use.... folders? That's what OneNote does. A "notebook" and a "section group" are simply regular file system folders. The only other level is a section which is the .one file. There is a very limited hierarchy in there (it's just pages, any of which can be marked as subpages but these are not really linked in any way to a parent.)
Inside the .one file you can use XML to represent your pages, though this is not what OneNote does. OneNote uses a binary file format in order to facilitate fast edits, object-level synchronization, multi-user access, and compact storage.
If you look around for info about "random access files" you can get an idea for how to do this. But try to avoid using XML if you think it can get large because it will become cumbersome to make edits. You'd need to load the entire XML file, make changes in-memory, then write the whole thing back out again.
I think it depends on how you want to save the data. If you plant to use a database for shared access, speed and large amounts of data: just normalize the data and create the structure you proposed.
If you want the user to save data locally on a filesystem, I do think that XML will be a good solution because it allows you to store the data in a structured file.
So... how do you want the user to store and use the data?
I think at your data like a tree.
node {
id
parent_id
content
type
}
nodes with paren_id = 0 are notebook.
type is optional but could be useful.
content on page and notebook will give you a page/notebook description for free :D
I will use a simple table on sqlite or MyBase or whatever.
I think a little db is better than Xml because xml force you to load all data in memory.
Related
What is the best way to load huge text file data in delphi? Is there any component that can load text file superfast?
Let's say I have a text file contains database and stored in fix length format.
It contains 150 field with each at least 50 characters.
1. I need to load it into memory
2. I need to parse it and probably store it in a memdataset for processing
My questions:
1. Is it enough if I use TStringList.loadFromFile method?
2. Is there any other better component to manipulate the text file?
3. Should I use low level reading from textfile?
Thank you in advance.
TStringList is never the optimal way of working with lots of text, but it's the simplest. If you've got small files on your hands you can use TStringList without issues. Even if you have large files (not huge files) you might implement a version of you algorithm using TStringList for testing purposes, because it's simple and easy to understand.
If your files are large, as they probably are since you call them "databases", you need to look into alternative technologies that will enable you to read only as much as you need from the database. Look into:
TFileStream
Memory mapped files.
Don't look at the old "file" based API's still available in Delphi, they're plain old.
I'm not going to go into details on how to access text using those methods because we've recently had two similar questions on SO:
How Can I Efficiently Read The FIrst Few Lines of Many Files in Delphi
and
Fast Search to see if a String Exists in Large Files with Delphi
Since you have a fixed length that you're working with, you can build an access class based on TList with a TWriter and TReader that will take your records into account. You'll have none of the overhead of a TStringList (not that it's a bad thing, but if you don't need it, why have it) and you can build in your own access to records into the class.
Ultimately it depends on what you are trying to accomplish with the data once you have it loaded into memory. While TStringlist is easy to use, it isn't as efficient as "rolling your own".
However, efficiency in data manipulation may not be that much of an issue, as you are using text files to hold a database. If you just need to read in and make decisions based on data in the file, the more flexible TList may be overkill.
I recommend to adhere to TStringList if you find it convenient for your problem. Optimization is another thing that should be done later.
As for TStringList the optimization is to declare a descendant class that overrides TStrings.LoadFromStream method - you can make it practically as fast as possible, taking into account the structure of your files.
It is not entirely clear from your question why you need to load the entire file into memory, prior to then going on to create an in-memory data set.... are you conflating the two issues? (i.e. because you need to create an in-memory data set you think you first need to load the source data entirely into memory? Or is there some initial pre-processing of the source file which is only possible with the entire file loaded in memory (this is unlikely and even if this is the case, it isn't necessary with a navigable stream object such as a TFileStream).
But I think the answer you are looking for is right there in the question....
If you are loading this file in order to parse it and populate/initialise a further data structure (the data set) for further processing, then using an existing high level data structure is an unnecessary and potentially costly (in terms of time) step.
Use the lowest level means of access that provides the capabilities you need.
In this case a TFileStream will likely provide the best balance of convenience and ease of use.
I haven't found an explicit "no" in the documentation/discussion but suspect it is not possible to generate CoreData objects programmatically, at runtime.
What I want to do is similar to executing DDL commands (eg. "Create Table", "Drop Table" etc.) from inside running code, because I don't know until I ask the user how many columns his table needs, or what data types they need to be. Maybe he needs multiple tables, at that.
Does anyone know whether this is possible? Would appreciate a pointer to something to read.
(Would also appreciate learning the negative, so I can stop wondering.)
If not doable in CoreData, would this be a reason to switch to SQLite?
You can create the entire Core Data model at run time-- there's no requirement to use Xcode's data modeler at all, and there's API support for creating and configuring every detail of the model. But it's probably not as flexible as it sounds like you want it to be. Although you can create new entity descriptions or modify existing ones, you can only do so before loading a data store file. Once you're reading and writing data, you must consider the data model as being fixed. Changing it at that point will generate an exception.
It's not quite the same as typical SQLite usage. It's sort of like the SQLite tables are defined in one fie and the data is stored in another file-- and you can modify the tables on the fly but only before loading the actual data. (I know that's not how SQLite really works, but that's basically the approach that Core Data enforces).
If you expect to need to modify your model / schema as you describe, you'll probably be better off going with direct SQLite access. There are a couple of Objective-C SQLite wrappers that allow an ObjC-style approach while still supporting SQLite-style access:
PLDatabase
FMDB
I'm trying to figure out the cut-off with respect to when a "text entry" should be stored in the database vs. as a static file. Are there any rules of thumb here? The text entries will be at the most several paragraphs and have links to images and tables (and hyperlinks to other text entries). Some criteria for the text entry:
I'm thinking of using DITA as the content format
The text should be searchable
If the text is revised, a new version will be created
thanks in advance, Chuck
The "rails way" would be using a database.
The solution will be more scalable, therefore faster and probably easier to develop with (using migration and so on). Using the file system, you will have to build lots of functions on your own, that are already implemented for database usage.
You could create a Model (e.g.) Document and easily use existing versioning systems, like paper_trail. When using an indexed search, you can just have an has_many relation enabling you to realise the depencies between the models (destroy a model means to destroy the search index).
Rather than a cut-off, you could look at what databases provide and ask yourself if those features would be useful. Take Isolation (the I in ACID): if you have any worries that multiple people could be trying to edit an entry at the same time, a database would handle that well while you'd have to handle the locks yourself working with files. Or Atomicity: you might want to update two things at once (e.g. an index page and an entry page) and know they will either both succeed or both fail.
Databases do a number of things beyond ACID, such as taking advantage of multiple datatypes, making querying easier, and allowing for scaling. It's a question worth asking since most databases end up having data stored in a bunch of files on disk. Would you end up writing a mini-database if you used files yourself?
Besides, if you're using rails you mind as well take advantage of its ActiveRecord functionality, and make it possible to use the many plugins that expect a database.
I'd use a database for even a small, single user rails app.
In my application I want to use files for storing data. I don't want to use database or clear text file, the goal is to save double and integer values along with string just to identify the name of the record ; I simple need to save data on disk for generating reports. File can grow even to gigabyte. What format you suggest to use? Binary? If so what vcl component/library you know which is good to use? My goal is to create an application which creates and updates the files while another tool will "eat" those file
producing nice pdf reports for user on demand. What do you think? Any idea or suggestion?
Thanks in advance.
If you don't want to reinvent the wheel, you may find all needed Open Source tools for your task from our side:
Synopse Big Table to store huge amount of data - see in particular the TSynBigTableRecord class to store an unlimited number of records with fields, including indexes if needed - it will definitively be faster and use less disk size than any other regular SQL DB
Synopse SQLite3 Framework if you would rather use a standard SQLite engine for the storage - it comes with a full Client/Server ORM
Reporting from code, including pdf file generation
With full Source code, working from Delphi 6 up to XE.
I've just updated the documentation of the framework. More than 600 pages, with details of every class method, and new enhanced general introduction. See the SAD document.
Update: If you plan to use SQLite, you should first guess how the data will be stored, which indexes are to be created, and how a SQL query may speed up your requests. It's a bad idea to read all file content for every request: you should better structure your data so that a single SQL query would be able to return the expended results. Sometimes, using additional values (like temporary sums or means) to the data is a good idea. Also consider using the RTree virtual table of SQLite3, which is dedicated to speed up access to double min/max multi-dimensional data: it may speed up a lot your requests.
You don't want to use a full SQL database, and you think that a plain text file is too simple.
Points in between those include:
Something that isn't a full SQL database, but more of a key-value store, would technically not be a flat file, but it does provide a single "key+value" list, that is quickly searchable on a single primary key. Such as BSDDB. It has the letter D and B in the name. Does that make it a database, in your view? Because it's not a relational database, and doesn't do SQL. It's just a binary key-value (hashtable) blob storage mechanism, using a well-understood binary file format. Personally, I wouldn't start a new project and use anything in this category.
Recommended: Something that uses SQL but isn't as large as standalone SQL database servers. For example, you could use SQLite and a delphi wrapper. It is well tested, and used in lots of C/C++ and Delphi applications, and can be trusted more than anything you could roll yourself. It is a very light embedded database, and is trusted by many.
Roll your own ISAM, or VLIR, which will eventually morph over time into your own in-house DBMS. There are multiple files involved, and there are indexes, so you can look up data fast without loading everything into memory. Not recommended.
The most flat of flat binary fixed-record-length files. You mentioned originally in your question, power basic which has something called Random Access files, and then you deleted that from your question. Probably what you are looking for, especially for append-only write as the primary operation. Roll your own TurboPascal era "file of record". If you use the "FILE OF RECORD" type, you hit the 2gb limit, and there are problems with Unicode. So use TStream instead, like this. Binary file formats have a lot of strikes against them, especially since it is difficult to grow and expand your binary file format over time, without breaking your ability to read old files. This is a key reason why I would recommend you start out with what might at first seem like overkill (SQLite) instead of rolling your own binary solution.
(Update 2: After updating the question to mention PDFs and what sounds like a reporting-system requirement, I think you really should be using a real database but perhaps a small and easy to use one, like firebird, or interbase.)
I would suggest using TClientDataSet, and use it's SaveToFile() / SaveToStream() methods by the generating program, and LoadFromFile() / LoadFromStream() methods for the program that will "consume" the data. That way, you can still make indexed records without connecting to any external database, all while keeping the interchange data in a single file.
Define API to work with your flat file, so that the API can be implemented by a separate data layer in many ways.
Implement the API using standard embedded SQL database (ex SQLite or Firebird).
Only if there is something wrong with the standard solution think of your own.
I use KBMMemtable - see http://www.components4developers.com/ - fast, reliable, been around a long time - supports binary and CSV streaming in and out of files, as well indexing, filters, and lots of other goodies - TClientDataSet will not do well with large datasets.
What is the best way to load huge text file data in delphi? Is there any component that can load text file superfast?
Let's say I have a text file contains database and stored in fix length format.
It contains 150 field with each at least 50 characters.
1. I need to load it into memory
2. I need to parse it and probably store it in a memdataset for processing
My questions:
1. Is it enough if I use TStringList.loadFromFile method?
2. Is there any other better component to manipulate the text file?
3. Should I use low level reading from textfile?
Thank you in advance.
TStringList is never the optimal way of working with lots of text, but it's the simplest. If you've got small files on your hands you can use TStringList without issues. Even if you have large files (not huge files) you might implement a version of you algorithm using TStringList for testing purposes, because it's simple and easy to understand.
If your files are large, as they probably are since you call them "databases", you need to look into alternative technologies that will enable you to read only as much as you need from the database. Look into:
TFileStream
Memory mapped files.
Don't look at the old "file" based API's still available in Delphi, they're plain old.
I'm not going to go into details on how to access text using those methods because we've recently had two similar questions on SO:
How Can I Efficiently Read The FIrst Few Lines of Many Files in Delphi
and
Fast Search to see if a String Exists in Large Files with Delphi
Since you have a fixed length that you're working with, you can build an access class based on TList with a TWriter and TReader that will take your records into account. You'll have none of the overhead of a TStringList (not that it's a bad thing, but if you don't need it, why have it) and you can build in your own access to records into the class.
Ultimately it depends on what you are trying to accomplish with the data once you have it loaded into memory. While TStringlist is easy to use, it isn't as efficient as "rolling your own".
However, efficiency in data manipulation may not be that much of an issue, as you are using text files to hold a database. If you just need to read in and make decisions based on data in the file, the more flexible TList may be overkill.
I recommend to adhere to TStringList if you find it convenient for your problem. Optimization is another thing that should be done later.
As for TStringList the optimization is to declare a descendant class that overrides TStrings.LoadFromStream method - you can make it practically as fast as possible, taking into account the structure of your files.
It is not entirely clear from your question why you need to load the entire file into memory, prior to then going on to create an in-memory data set.... are you conflating the two issues? (i.e. because you need to create an in-memory data set you think you first need to load the source data entirely into memory? Or is there some initial pre-processing of the source file which is only possible with the entire file loaded in memory (this is unlikely and even if this is the case, it isn't necessary with a navigable stream object such as a TFileStream).
But I think the answer you are looking for is right there in the question....
If you are loading this file in order to parse it and populate/initialise a further data structure (the data set) for further processing, then using an existing high level data structure is an unnecessary and potentially costly (in terms of time) step.
Use the lowest level means of access that provides the capabilities you need.
In this case a TFileStream will likely provide the best balance of convenience and ease of use.