I want to write a web app with rails what uses RDF to represent linked data. But I really don't know what might be the best approach to store RDF graphs within a database for persistent storage. Also I want to use something like paper_trail to provide versioning database objects.
I read about RDF.rb and activeRDF. But RDF.rb does not include a layer to store data in a database. What about activeRDF?
I'm new to RDF. What is the best approach to handle large RDF graphs with rails?
Edit:
I found 4Store and AllegroGraph what fits for Ruby on Rails. I read that 4Store is entirely for free and AllegroGraph is limited to 50 million triples in the free version. What are the advantages of each of them?
Thanks.
Your database survey is quite incomplete. There is also BigData, OWLIM, Stardog, Virtuoso, Sesame, Mulgara, and TDB or SDB which are provided by Jena.
To clarify, Fuseki is just a server component for a backend that supports the Jena API to provide support for the SPARQL protocol. Generally, since you're using Ruby, this is how you will interact with a database -- via HTTP using SPARQL protocol. Probably every single database supports the SPARQL HTTP protocol for querying, and many will support something in the ballpark of either SPARQL update protocol, the graph store protocol, or a similar custom HTTP protocol for handling updates.
So if you're set on using Rails, then your best bet is to pick a database, work out a simple wrapper for the HTTP protocol, perhaps forking support in an existing Ruby library if it exists, and building your application based on that support.
Versioning is something that's not readily supported in a lot of systems. I think there is still a lot of thought going into how to do it properly in an RDF database. So likely, if you want versioning in your application, you're going to have to do something custom.
Related
The question is relatively plain, but mainly directed to the ProcessMaker experts.
I need to extract batches of data from ProcessMaker to perform analysis later.
Currently, we have v3.3 which has database model documented very well, and not so well documented REST API.
Having no clue on the best approach I suggest Process maker developers are encouraged to use direct database connection to fetch data batches.
However, from the perspective of the v.4 upgrade, I see that the database model is no longer a part of the official documentation, as well as the "Data Integration" chapter. Everything points out to use REST API for any data affairs.
So, I am puzzled. Which way to go for v3.3 and v4? REST API or direct DB connection?
ProcessMaker 4 was designed and built as an API first application. The idea is that everything that can and should be done through the application should be done via the API. In fact, this is the way all modern systems are designed. The days of accessing the database directly are gone and for good reason. The API is a contract. It is a contract that says that if you make a request in a certain way, you will get a certain response. On the other hand, we cannot guarantee that the database itself will always have the same tables. As a result if you access the database directly, and then we decide to change the database structure, you will be out of luck and anything you built that access the database directly will potentially fail.
So - the decision is clear. V4 is a modern architecture built with modern tooling. It performs and scales better than V3. It is the future of ProcessMaker. So, we highly recommend using this versioning, upgrading and staying on our mainline, and using the API for all activities related to the data models.
I am evaluating if I should use Spring Data Neo4j 4 or directly use the native APIs that Neo4j has. Is it possible to get the full potential of Neo4j when using Spring Data Neo4j 4 or will it limit my future usage of Neo4j?
I see the benefit with that POJO simplifies the storage of objects in the database.
The recently updated content on https://graphaware.com/spring-data-neo4j may provide you with additional information to consider.
In my view, yes, SDN allows you to make use of the full potential of Neo4j. That said, for use cases where needed you can also sidestep SDN and make direct use of the underlying OGM and/or Cypher directly. In other words, when making use of SDN you also have the freedom and flexibility to use alternate options of what best suits your needs, so your usage does not need to be an "all SDN" or "no SDN" approach; you can mix and match as needed.
There are 2 "native" APIs
There is the Java API, that you can access in unmanaged extensions or when using Neo4j as embedded
Neo4j java driver (a.k.a. Bolt) - this what Neo itself promotes the most
OGM (and therefore SDN) supports both embedded and bolt, with new features of Bolt being covered shortly after being released.
There are some features of the embedded database that can't be used (at least not directly, you may use them through user-defined procedures/functions). E.g. traversals etc..
You should also consider other aspects of your use case, like performance, if your domain model matches the graph model etc..
I'm currently doing some R and D regarding moving some business functionality from an Oracle RDBMS to Neo4j to reduce join complexity in the application queries. Due to the maintenance and visibility requirements for the data, I believe the stand alone server is the best option.
My thought is that within a java program I would pull the relevant data out of the Oracle tables, map it to a node object and persist it to neo4j (creating the appropriate relationships in the process).
I'm curious, with SDN over REST not being an optimal solution, what options are available for persistence. Are server plugins or unmanaged extensions the preferred method or am I overcomplicating the issue as tends to happen from time to time.
Thank you!
REST refers to a way to query the data over a network, not a way to store the data. Typically, you're going to store the data on some machine; you then have the option of either making it accessible via RESTful services with the neo4j server, or just using java applications to access the data.
I assume by SDN you're referring to spring data neo4j. Spring is a framework used for java applications, and SDN then refers to a plugin if you will for spring that allows java programmers to store models in neo4j. One could indeed use spring-data-neo4j to read data in, and then store it in Neo4J - but again this is a method of how the data gets into neo4j, it's not storage by itself.
The storage model in most cases is pretty much always the same. This link describes aspects of how storage actually happens.
Now -- to your larger business objective. In order to do this with neo4j, you're going to need to take a look at your oracle data and decide how it is best modeled as a graph. There's a big difference between an oracle RDBMS and Neo4J in terms of how the data is represented. Once you've settled on a graph design, you can then load your data into neo4j (many different options for doing that).
Will all of this "reduce join complexity in the application queries"? Well, yes, in the sense that Neo4j doesn't do joins. Will it improve the speed/performance of your application? There's just no way to tell. The answer to that depends on what your app is, what the queries are, how you model the data as a graph, and how you express the resulting queries over that graph.
Is there drop-in replacement for ActiveRecord that uses some sort of Object Store?
I am thinking something like Erlang's MNesia would be ideal.
Update
I've been investigating CouchDB and I think this is the option I am going to go with. It's a toss-up between using CouchRest and ActiveCouch. CouchRest is pretty mature, and is used in the CouchDB peepcode episode, but it's not a drop-in replacement for ActiveRecord, which is a bit of a disadvantage.
Suffice to say CouchDB is pretty phenomenal.
Update (November 10, 2009)
CouchDB hasn't really worked for me. CouchDB doesn't really support arbitrary queries (queries need to be written and compiled ahead of time). It also breaks on very large datasets.
I have been playing with MongoDB and it's really incredible. Schema-less JSON data store with queries and indexing.
I've even started building a management tool for it called Ming.
Try Maglev!
AciveCouch purports to be just such a library for CouchDB, which is, in fact, written in Erlang. I wouldn't say it's as mature as ActiveRecord though.
That is the closest thing I know of to what you're asking for.
Madeleine is an implementation of the Java Prevayler object store
see http://madeleine.rubyforge.org/
I'm currently working on a ruby object database that uses mysql as a backing store (hence it's called hybriddb) that you may be interested in.
It uses no SQL or migrations, you just save your objects to the database, it also tries to work around the conventional problems with object databases (speed, finding objects quickly, large object graphs) transparently.
It is still an early version so take care. The code is here
http://github.com/pauliephonic/hybriddb/tree/master The development branch has support for transactions and I'm currently adding basic validations.
I have a web site with some tutorials etc. http://www.hybriddb.org/pages/tutorial_starter
Any comments are welcome there.
Apart from Madeleine, you can also see:
http://purple.rubyforge.org/
But it depends on scale too. Mnesia is known to support large amount of data, and is clustered, whereas these solutions won't work so well with large amount of data.
If amount of data is not huge, another options is:
http://copiousfreetime.rubyforge.org/amalgalite/files/README.html
I'm about to write a small utility to organze and tag my mp3s.
What is the best way to store small amounts of data. More importantly, are there databases which exist where I don't need to install a client/server environment, I just include the library and I'm good?
I could use XML, but I'm afraid that the file size would become large and hard to handle, not to mention keeping the memory footprint small.
Thanks
EDIT: I haven't decided on the language, I wanted to make my decision independent of platform. If I had to choose, most likely .NET, second Java, third C++.
My apologies, this is for a Windows App.
On Windows you can use the built-in esent database engine. There is an API you can use from C++
http://blogs.msdn.com/windowssdk/archive/2008/10/23/esent-extensible-storage-engine-api-in-the-windows-sdk.aspx
There is also a managed interop layer that you can use from C# code:
http://www.codeplex.com/ManagedEsent
Which language/platform are you talking about?
In the Java world I prefer using embedded databases such as HSQLDB, H2 or JavaDB (f.k.a. Derby).
They don't need installing and still provide the simple access you're used to from a "real" DBMS.
In the C/Python/Unixy world SQLite is a hot contender in that area.
Another option is the various forms of the Berkeley database (eg, db3, db4, SleepyCat.)
SQLITE if you want the pain of a relational DB without a server install or hassle.
I would use one of the many text-serialization formats. I personally think that YAML 1.1 is the most powerful (built-in support for referential object graphs) and easiest to read/modify by a human (parsing is a bear, use a library such as PyYAML or JYaml or some .NET libaray).
Otherwise XML or JSON are adequate file formats.
Whichever format you use, just compress the file if you're concerned about disk usage. If you're worried about in-memory usage, then I don't see how your serialization format matters...
Have a look at Prevayler - it's a serialization persistence framework (use xstream etc if you want to human-read your data), which is really fast, does not require annotations and "just works". Some basic info:
It does impose a more rigorous transaction pattern, as it does not give you automatic rollback:
Ensure transaction will succeed (with current state of system) - e.g. does it make sense now?
[transaction is added to queue], and stored (for power reset etc)
transaction is executed and applied to the object structure.
Writes of 1000's of transactions/sec
Reads of 100,000's transactions/sec
I haven't used it much, but it's sooo much nicer to use for small projects (persisting any serializable object is so nice)
Oh - as for every one saying "what platform you running on?", Prevayler (java) has/had ports to quite a few platforms, but I can't find a decent list :(. I remember that there were around 5-7, but can only remember .NET.
If you're planning on storing everything in memory while your program does work on it, then serializing to a file using a basic load() and save() function that you write would be fine, and less pain than a full on DB.
In Java that can be done using standard Serialization (or can serialize to and from XML to make it somewhat human readable and editable).
It shouldn't affect your memory footprint at all as it is merely saving and restoring your objects. You just won't get transactions and random access and queries and all that good stuff.
you could even use xml, json, an .ini file... a text file even
I would advise a SQL like database (such as SQLLite). Today your requirements might make a full SQL database seem silly. But you never know how much this "little project" will grow over the years. When it does grow to the point where you have to have a SQL engine, you will be glad you didn't just serialize some Java objects or store stuff in JSON format.