some questions about xodus

some questions about xodus - xodus

yo! i have some questions about xodus.
if i get something from entity using getProperty, do i take it from memory or read from database?
if from database, is there a built-in ability to turn on some cache system?
can i work directly on an entity as in normal data classes? should i cover them with some extra intermediate layer?
why youtrack need schema-less database?
why xodus out of the box supports only chacha20 and salsa20 and not, for example, aes which is used practically everywhere and recognized as a standard?
is it a good idea to use xodus in a mail server?

There are different caches that Xodus can use while getting a property value.
Entity methods require an implicit transaction in current thread.
Schema-less database is much more suitable for implementation of YouTrack's data model, e.g. custom fields.
AES is a block cipher, whereas Xodus uses stream ciphers which tend to be faster. There are no indications that ChaCha20 is less secure than AES, and ChaCha20 is always faster than AES on systems where the CPU does not feature AES acceleration.

Related

iOS optimal performance persistent data solution?

I am building a healthKit based app and am wondering how to best save healthKit data persistently.
My current approach is to get the data and save it as attributes of a custom class object and then save it in core data as NSData.
In terms of performance is Realm faster than CoreData?
According to http://qiita.com/moriyaman/items/1a2916f4c2b79e934370 CoreData is apparently slower than FMDB which is slower than Realm. Can someone confirm if this is true even after taking into account faults and indexes?

Disclaimer: I work at Realm
The question of which persistence product will perform best among the solutions you mentioned is highly dependent on the type/amount/frequency of your data. Since Core Data and FMDB are layers over SQLite, they can't be faster than SQLite by design, but they provide enough convenience to be worthwhile to many users. On the other hand, Realm isn't based on SQLite, but rather its own custom database engine that was designed specifically for modern smartphones. It was designed to strike a better balance between powerful features, simple API without adding a large performance hit.
You can see public benchmarks comparing Realm/SQLite/Core Data/FMDB in Realm's launch blog post here: http://realm.io/news/introducing-realm#fast
Finally, your approach of serializing HealthKit information into NSData using something like NSCoding is going to be terribly inefficient. No matter the persistence solution you choose, you'll be better served by using the serialization built in to those products rather than storing an already serialized data blob.

As I commented to #jpsim, it is difficult to simply compare the performance of Core Data to lower-level frameworks like FMDB, or differently-abstracted frameworks like Realm. Which approach you choose will dramatically impact how you build your program, which will tend to shuffle the performance problems around to different places.
Core Data and SQLite solve very different problems. SQLite is a relational database. Core Data is an object persistence engine. I'm not an expert on Realm, but it seems to be trying to strike a balance between those two approaches with more low-level control than Core Data affords, but a closer tie-in to the object model than SQLite. The fact that Realm (at least in my impressions of it) gives you more low-level control opens up opportunities for you to optimize things, or to mess things up IMO. That's neither good nor bad, it just makes it hard to apples-to-apples compare them, and particularly makes generic "performance benchmarks" problematic. The question isn't whether it's possible for someone to write faster code using engine A vs. engine B. The question is whether you will likely write acceptably performant code in each engine, while avoiding bugs, and minimizing development time.
In general, I believe HealthKit data is supposed to be stored in HealthKit in order to protect privacy. You should be careful about storing this data in your own storage anyway. Be particularly aware of the guidelines about iCloud:
Apps using the HealthKit framework that store users’ health information in iCloud will be rejected
I don't know how this will impact documents that you store and are then backed-up to iCloud. Just leaving the data in HealthKit is the best way to not have to worry about such problems.
In any case, though, performance is just one axis to consider. You didn't indicate anything to suggest that you have very special performance problems (for example, that you're handling tens of thousands of records, or real-time data, or something like that). So I would focus first on what tool meet your general needs best, and then do some basic experiments to make sure the performance is reasonable, and then optimize as you find issues.

How to store RDF graphs within a data storage?

I want to write a web app with rails what uses RDF to represent linked data. But I really don't know what might be the best approach to store RDF graphs within a database for persistent storage. Also I want to use something like paper_trail to provide versioning database objects.
I read about RDF.rb and activeRDF. But RDF.rb does not include a layer to store data in a database. What about activeRDF?
I'm new to RDF. What is the best approach to handle large RDF graphs with rails?
Edit:
I found 4Store and AllegroGraph what fits for Ruby on Rails. I read that 4Store is entirely for free and AllegroGraph is limited to 50 million triples in the free version. What are the advantages of each of them?
Thanks.

Your database survey is quite incomplete. There is also BigData, OWLIM, Stardog, Virtuoso, Sesame, Mulgara, and TDB or SDB which are provided by Jena.
To clarify, Fuseki is just a server component for a backend that supports the Jena API to provide support for the SPARQL protocol. Generally, since you're using Ruby, this is how you will interact with a database -- via HTTP using SPARQL protocol. Probably every single database supports the SPARQL HTTP protocol for querying, and many will support something in the ballpark of either SPARQL update protocol, the graph store protocol, or a similar custom HTTP protocol for handling updates.
So if you're set on using Rails, then your best bet is to pick a database, work out a simple wrapper for the HTTP protocol, perhaps forking support in an existing Ruby library if it exists, and building your application based on that support.
Versioning is something that's not readily supported in a lot of systems. I think there is still a lot of thought going into how to do it properly in an RDF database. So likely, if you want versioning in your application, you're going to have to do something custom.

Flat file in delphi

In my application I want to use files for storing data. I don't want to use database or clear text file, the goal is to save double and integer values along with string just to identify the name of the record ; I simple need to save data on disk for generating reports. File can grow even to gigabyte. What format you suggest to use? Binary? If so what vcl component/library you know which is good to use? My goal is to create an application which creates and updates the files while another tool will "eat" those file
producing nice pdf reports for user on demand. What do you think? Any idea or suggestion?
Thanks in advance.

If you don't want to reinvent the wheel, you may find all needed Open Source tools for your task from our side:
Synopse Big Table to store huge amount of data - see in particular the TSynBigTableRecord class to store an unlimited number of records with fields, including indexes if needed - it will definitively be faster and use less disk size than any other regular SQL DB
Synopse SQLite3 Framework if you would rather use a standard SQLite engine for the storage - it comes with a full Client/Server ORM
Reporting from code, including pdf file generation
With full Source code, working from Delphi 6 up to XE.
I've just updated the documentation of the framework. More than 600 pages, with details of every class method, and new enhanced general introduction. See the SAD document.
Update: If you plan to use SQLite, you should first guess how the data will be stored, which indexes are to be created, and how a SQL query may speed up your requests. It's a bad idea to read all file content for every request: you should better structure your data so that a single SQL query would be able to return the expended results. Sometimes, using additional values (like temporary sums or means) to the data is a good idea. Also consider using the RTree virtual table of SQLite3, which is dedicated to speed up access to double min/max multi-dimensional data: it may speed up a lot your requests.

You don't want to use a full SQL database, and you think that a plain text file is too simple.
Points in between those include:
Something that isn't a full SQL database, but more of a key-value store, would technically not be a flat file, but it does provide a single "key+value" list, that is quickly searchable on a single primary key. Such as BSDDB. It has the letter D and B in the name. Does that make it a database, in your view? Because it's not a relational database, and doesn't do SQL. It's just a binary key-value (hashtable) blob storage mechanism, using a well-understood binary file format. Personally, I wouldn't start a new project and use anything in this category.
Recommended: Something that uses SQL but isn't as large as standalone SQL database servers. For example, you could use SQLite and a delphi wrapper. It is well tested, and used in lots of C/C++ and Delphi applications, and can be trusted more than anything you could roll yourself. It is a very light embedded database, and is trusted by many.
Roll your own ISAM, or VLIR, which will eventually morph over time into your own in-house DBMS. There are multiple files involved, and there are indexes, so you can look up data fast without loading everything into memory. Not recommended.
The most flat of flat binary fixed-record-length files. You mentioned originally in your question, power basic which has something called Random Access files, and then you deleted that from your question. Probably what you are looking for, especially for append-only write as the primary operation. Roll your own TurboPascal era "file of record". If you use the "FILE OF RECORD" type, you hit the 2gb limit, and there are problems with Unicode. So use TStream instead, like this. Binary file formats have a lot of strikes against them, especially since it is difficult to grow and expand your binary file format over time, without breaking your ability to read old files. This is a key reason why I would recommend you start out with what might at first seem like overkill (SQLite) instead of rolling your own binary solution.
(Update 2: After updating the question to mention PDFs and what sounds like a reporting-system requirement, I think you really should be using a real database but perhaps a small and easy to use one, like firebird, or interbase.)

I would suggest using TClientDataSet, and use it's SaveToFile() / SaveToStream() methods by the generating program, and LoadFromFile() / LoadFromStream() methods for the program that will "consume" the data. That way, you can still make indexed records without connecting to any external database, all while keeping the interchange data in a single file.

Define API to work with your flat file, so that the API can be implemented by a separate data layer in many ways.
Implement the API using standard embedded SQL database (ex SQLite or Firebird).
Only if there is something wrong with the standard solution think of your own.

I use KBMMemtable - see http://www.components4developers.com/ - fast, reliable, been around a long time - supports binary and CSV streaming in and out of files, as well indexing, filters, and lots of other goodies - TClientDataSet will not do well with large datasets.

Would I use an ORM if I am using Stored Procedures?

If I use stored procedures, can I use an ORM?
EDIT:
If I can use a ORM, doesn't that defeat part of the database agnosticity reason for using an ORM? In other words, why else would I want to use an ORM, if I am binding myself to a particular database with stored procedures (or is that assumption wrong)?

Using ORM to access stored procedures is one of the best uses of ORM. It'll give you strongly typed objects, while you still have full control over the SQL.

In my experience I would let the ORM handle the 'CRUD' operations, and leave the specialty work to the stored procedures. Generally, using a stored procedure for 'CRUD' operations is overkill, and to let the ORM handle it, could drastically improve your productivity.

Yes, you can, all main ORMs support stored procedures.
As for your assumption, you are particulary right, when you use stored procedures with ORM you are coupling your project to a particular database. But in practice it is 99% that you will not need to change your database provider, so in this case you use ORM not to abstract from concrete DB provider, but to help yourself with object-relational mapping task - which is a main ORM's task and which ORM was originally made for.

It raises an interesting point.
Once you have ORM, and relatively simple queries, why do you need stored procedures? SP's are intimately bound to the database. ORM frees you from having to maintain a lot of DB-specific code. What is DB-specific can be isolated and managed.
I suggest that an ORM is a golden chance to cut the complexity and put all the processing in the code where it belongs.
Use the database for what it does best -- store data.
Use your application for what it does best -- process data.

You can use both ORM features and stored procedures functionality at once. Particularly use ORM until it fits you, but if you have some trouble with performance or need some low level tune - include stored procedures in your business-logic.

Yes you can but you will want to spend some time investigating what capabilities the ORM provides around stored procedures.
Most will allow you to run a stored procedure that returns a strongly typed object / entity. More advanced ORM's will allow you to plug stored procedures in for performing CRUD actions as well (so your generic querying, deleting etc goes via a stored procedure rather than a dynamic query).
Generally ORM's are great for generating ad-hoc queries and getting strongly typed entities but having strong stored procedure support has the benefit of allowing you to (sometimes) more easily access native capability of your RDMS that may not be exposed as first class citizens in the ORM - especially if the ORM supports many database engines.
Following up from your edit:
Often you will want to use the ad-hoc querying engine provided by the ORM however as I alluded to earlier - sometimes you want to query using a capability not exposed from the ORM.
The benefits of strongly typed entities is invaluable as it means you have domain object usually, rather than data readers, data tables etc. You can cleanly encapsulate behaviors and logic within those entities that you have retrieved.
The list of additional benefits is very long indeed - for example, with the LightSpeed ORM (and most others) your entities will support standard binding interfaces, error reporting interfaces, validation etc. On the querying side you will lose out on lazy loading etc unless you write it yourself.

Database "agnosticity" (?) is not the only reason to use an ORM. However, you could take advantage of being DB independent on 99% of your interactions with the DB and in 1% (or 2% or 10% or whatever) you might need stored procedures for speed/clarity/complexity. If you changed DBs, you would need to rewrite those.

I use netTiers a lot at work and we let it generate our stored procedures for us. These only handle the basic CRUD operations, but they are very fast and save me a TON of time. netTiers will also let us create custom stored procedures and generate our data access code with these procedures.

You can, but many of the more advanced ORM features tend to become more cumbersome to use. Something like iBatis is very easy to integrate with stored procedures, while the more sophisticated features of more complex engines like (N?)Hibernate like generation of dynamic SQL and lazy loading of large fields can become more of a hassle than they're worth.

I believe that any tool that frees you from redoing work and concentrate in solving the problems is valid. ORMs appear to be that tool when it come to basic CRUD operations - even if using SPs to better implement a requirement (like using a hammer on a nail, it's just the right tool for task).
The point is: there's no black or white, just a scale of gray. Very inneficient and badly coded applications use the excuse of being 'DB agnostic' to explain the exagerated use of DB resources. In many cases, being very tied to a database is not good too. The objective is: getting maximum 'DB agnosticism' while not wasting customer IT resources without need.
There's no 'old vs new', just people saying that extreme 'pure' approaches are better. I don't really believe so. I believe that, as with any tool, the 'best' (notice the quotes) approach is using ORM until still is the right tool to make your data access. And use SPs inside your ORM when you reach a point where you're wasting resources and reducing scalability and 'worth life' (I forgot the english expression equivalent for the portuguese 'vida útil') of TI resources. Or, in other words, use SP when it's for the processing at hand what a hammer is for the nail.

Lightweight Store Mechanisms

I'm about to write a small utility to organze and tag my mp3s.
What is the best way to store small amounts of data. More importantly, are there databases which exist where I don't need to install a client/server environment, I just include the library and I'm good?
I could use XML, but I'm afraid that the file size would become large and hard to handle, not to mention keeping the memory footprint small.
Thanks
EDIT: I haven't decided on the language, I wanted to make my decision independent of platform. If I had to choose, most likely .NET, second Java, third C++.
My apologies, this is for a Windows App.

On Windows you can use the built-in esent database engine. There is an API you can use from C++
http://blogs.msdn.com/windowssdk/archive/2008/10/23/esent-extensible-storage-engine-api-in-the-windows-sdk.aspx
There is also a managed interop layer that you can use from C# code:
http://www.codeplex.com/ManagedEsent

Which language/platform are you talking about?
In the Java world I prefer using embedded databases such as HSQLDB, H2 or JavaDB (f.k.a. Derby).
They don't need installing and still provide the simple access you're used to from a "real" DBMS.
In the C/Python/Unixy world SQLite is a hot contender in that area.

Another option is the various forms of the Berkeley database (eg, db3, db4, SleepyCat.)

SQLITE if you want the pain of a relational DB without a server install or hassle.
I would use one of the many text-serialization formats. I personally think that YAML 1.1 is the most powerful (built-in support for referential object graphs) and easiest to read/modify by a human (parsing is a bear, use a library such as PyYAML or JYaml or some .NET libaray).
Otherwise XML or JSON are adequate file formats.
Whichever format you use, just compress the file if you're concerned about disk usage. If you're worried about in-memory usage, then I don't see how your serialization format matters...

Have a look at Prevayler - it's a serialization persistence framework (use xstream etc if you want to human-read your data), which is really fast, does not require annotations and "just works". Some basic info:
It does impose a more rigorous transaction pattern, as it does not give you automatic rollback:
Ensure transaction will succeed (with current state of system) - e.g. does it make sense now?
[transaction is added to queue], and stored (for power reset etc)
transaction is executed and applied to the object structure.
Writes of 1000's of transactions/sec
Reads of 100,000's transactions/sec
I haven't used it much, but it's sooo much nicer to use for small projects (persisting any serializable object is so nice)
Oh - as for every one saying "what platform you running on?", Prevayler (java) has/had ports to quite a few platforms, but I can't find a decent list :(. I remember that there were around 5-7, but can only remember .NET.

If you're planning on storing everything in memory while your program does work on it, then serializing to a file using a basic load() and save() function that you write would be fine, and less pain than a full on DB.
In Java that can be done using standard Serialization (or can serialize to and from XML to make it somewhat human readable and editable).
It shouldn't affect your memory footprint at all as it is merely saving and restoring your objects. You just won't get transactions and random access and queries and all that good stuff.

you could even use xml, json, an .ini file... a text file even

I would advise a SQL like database (such as SQLLite). Today your requirements might make a full SQL database seem silly. But you never know how much this "little project" will grow over the years. When it does grow to the point where you have to have a SQL engine, you will be glad you didn't just serialize some Java objects or store stuff in JSON format.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart