Document-Oriented or Graph databases - ruby-on-rails

It's a RoR project.
We want to store user activities, like uploaded a photo, voted for somebody, followed somebody, etc. When listing the activities, we need to list your friends activities as well. So, what is better to use in this case: a document-oriented database (couchdb, mongo db), a graph database (neo4js), or maybe some other approach?
Thank you for helping in advance guys :)

Yeah, I think Neo4j is a good choice, the Rails 3 support is excellent, see https://github.com/andreasronge/neo4j, see even the social examples with cypher like in http://docs.neo4j.org/chunked/snapshot/data-modeling-examples.html, and for activity streams, there are various cool approaches like Graphity, see http://www.rene-pickhardt.de/graphity-an-efficient-graph-model-for-retrieving-the-top-k-news-feeds-for-users-in-social-networks/

Depending on the scale of your application, and volume of activity, I'd recommend a combination of Couchbase (not CouchDB) for actual activity data which is extremely scalable and fast, and Neo4J for the graph discovery (both databases at the same time). I've used the combination very effectively in my application that was both social and real-time.
If you want more info from me, please feel free to contact me directly and I can help with architectural decisions or implementation help.

Take a look at Infinitegraph. It is scalable unlike neo4j. I think they have a free download for 1 million nodes.

Consider using Sqlite. It is a flat file mimicing database and can be used as an embedded database.

Related

graph database revision control

GitHub for Neo4J?
I'm evaluating graph databases as a possible solution for modeling a complex computer network. It occurs to me something like a revision control system would be useful for planning and testing updates to the database. I had been assuming that either we would instantiate a test network graph for such planning and then write a routing to sync the changes.
I see that this question has been asked and answered for relational databases (How do you maintain revision control of your database structure?). But I'm asking for graph databases, probably Neo4J.
In that relational thread someone pitches the Rails approach of making rollback a required element of database development. I like this idea too; I'm not sure how easy it is in graph databases.
How is this handled in the real world?
I found your question while also searching for an answer, so I don't have tested solutions to offer. But I can share that there's some discussion of this at How do I implement revisions with neo4j?, including a specific case at Neo4j / Strategy to keep history of node changes.
There's also a more detailed blog post at http://iansrobinson.com/2014/05/13/time-based-versioned-graphs/, which weighs the read-time / write-time / storage requirements of several alternatives. It also includes a number of diagrams and example queries that helped me wrap my head around what all this would look like.
Hope that's still useful, lo these months later, and sorry I can't be of more help! If you've found something that works in the meantime, can you let us know?

Geodata Querying Optimisations

I am planning to write a Node.js-powered RESTful web service that I will use for a mobile application which provides some sort of location based features. The most basic use case is going to look something like this:
the user can create a resource by sending a request to the web service containing the resource's name and the user's current location (latitude and longitude)
the web service will store the metadata about this resource internally in some sort of collection
the user can query the web service for a list of resources within 5km of his current location
One of the first problems that came up in my mind was scalability. Let's suppose that at some point in the future the server will hold metadata for 1 million resources. When a user will query for nearby results, looping through 1 million entries to compute the distance will take forever.
There are many services out there that have the same flow, so I thought implementing something like this is not going to take me a lot of time. I might have been wrong.
I am now two days into researching proven methods and algorithms. By now I have read everything I could put my hands on about QuadTrees, Geohases, databases with spatial indexing support, formulas and so on. However, I still can't get the whole picture of how everything is going to work.
I was hoping that maybe someone who has worked on something similar could share his insight on what approach might be the most suitable considering this use case and the technologies that I am planning to use. Also, a short description of how it can be implemented would help me a lot!
For those who are also looking for more information on this topic out of curiosity, my answer might not provide much clearance. However, some answers in here might help you understand how you could achieve proximity searches using Geohashes.
My approach, after doing a little research on Redis, will be not to overcomplicate things and just use the tools that are already out there. It has out of the box support for spatial indexing and will most probably meet all my persistance requirements for this project.
Apparently MongoDB also comes with built-in support for geodata. In fact, even RDBMS like MySQL or SQLite do come with such capabilities.

Can I use this case in Neo4j database?

Hello I am going to make a project for the faculty I am trying to make web app like twoodo.com
so can I use Neo4j database in it? Can I save all conversations, dates, files, videos, photos, tasks, etc. or should I use another database? Can anyone guide me or help me. I am working with spring boot, angular too. Thanks in advance.
Neo4j can store different kinds of documents, but you would choose a graph database because of how the documents are related. Other databases might be better for storing certain kinds of data; for example, neo4j does not handle nested documents as well as mongo or couchdb. But neo4j is really good at linking and retrieving documents based on relationships. For example, here's Michael Hunger modeling the entire database of 10M stackoverflow questions: http://neo4j.com/blog/import-10m-stack-overflow-questions/ Check some of his queries at the bottom showing off what you can do with relationship queries. This kind of thing might be pretty solid in a twoodo app.

Using different datastores in the Rails same app?

So this is more or less an implementation question, this is the senario I have, basically we have an app which uses MySQL as it's datastore, user accounts, transactions etc, but we want to add in a robust charting feature and the data will be stored in Redis, now basically my question is:
Is it possible, and what are the best practices for integrating another datastore into an app which already depends on another one. Can I use Rack to generate the reports? etc...
I want to turn this into a sort of open discussion because I think the need for a solution like this is going to rise as we see more and more key/value stores that offer benefits far different than a RDBMS, an NoSQL stores as well. They all have their benefits but no one solution covers them all.
Thoughts?
You can have models that do not inherit ActiveRecord::Base. Add your preferred Redis client gem, do whatever config is necessary, and start writing Redis models.
I can try to reopen this topic, because should be very practical.
Have same issue with this. I want to replicate data from SQL to NoSQL. SQL used as main database storage, because data integrity, relations etc. And NoSQL as secondary database storage set for reading. In SQL you have much associations divided to much tables. Many one-to-one association saved in different tables for better readability. This associations should be saved as one document with NoSQL. It gives unbelievable speed. Only one load. Great for data exchange for API.
Do someone positive experience with replication SQL data to more consistent NoSQL documents?

Free data warehousing systems--specifically, for data storage

I am building out some reporting stuff for our website (a decent sized site that gets several million pageviews a day), and am wondering if there are any good free/open source data warehousing systems out there.
Specifically, I am looking for only something to store the data--I plan to build a custom front end/UI to it so that it shows the information we care about. However, I don't want to have to build a customized database for this, and while I'm pretty sure an SQL database would not work here, I'm not sure what to use exactly. Any pointers to helpful articles would also be appreciated.
Edit: I should mention--one DB I have looked at briefly was MongoDB. It seems like it might work, but their "Use Cases" specifically mention data warehousing as "Less Well Suited": http://www.mongodb.org/display/DOCS/Use+Cases . Also, it doesn't seem to be specifically targeted towards data warehousing.
http://www.hypertable.org/ might be what you are looking for is (and I'm going by your descriptions above here) something to store large amounts of logged data with normalization. i.e. a visitor log.
Hypertable is based on google's bigTable project.
see http://code.google.com/p/hypertable/wiki/PerformanceTestAOLQueryLog for benchmarks
you lose the relational capabilities of SQL based dbs but you gain a lot in performance. you could easily use hypertable to store millions of rows per hour (hard drive space withstanding).
hope that helps
I may not understand the problem correctly -- however, if you find some time to (re)visit Kimball’s “The Data Warehouse Toolkit”, you will find that all it takes for a basic DW is a plain-vanilla SQL database, in other words you could build a decent DW with MySQL using MyISAM for the storage engine. The question is only in desired granularity of information – what you want to keep and for how long. If your reports are mostly periodic, and you implement a report storage or cache, than you don’t need to store pre-calculated aggregations (no need for cubes). In other words, Kimball star with cached reporting can provide decent performance in many cases.
You could also look at the community edition of “Pentaho BI Suite” (open source) to get a quick start with ETL, analytics and reporting -- and experiment a bit to evaluate the performance before diving into custom development.
Although this may not be what you were expecting, it may be worth considering.
Pentaho Mondrian
Open source
Uses standard relational database
MDX (think pivot table)
ETL ( via Kettle )
I use this.
In addition to Mike's answer of hypertable, you may want to take a look at Apache's Hadoop project:
http://hadoop.apache.org/
They provide a number of tools which may be useful for your application, including HBase, another implementation of the BigTable concept. I'd imagine for reporting, you might find their mapreduce implementation useful as well.
It all depends on the data and how you plan to access it. MonetDB is a column-oriented database engine from the most revolutionary team on database technologies. They just got VLDB's 10-year best paper award. The DB is open source and there are plenty of reviews online praising them.
Perhaps you should have a look at TPC and see which of their test problem datasets match best your case and work from there.
Also consider the need for concurrency, it adds a big overhead for any kind of approach and sometimes is not really required. For example, you can pre-digest some summary or index data and only have that protected for high concurrency. Profiling your data queries is the following step.
About SQL, I don't like it either but I don't think it's smart ruling out an engine just because of the front-end language.
I see a similar problem and thinking of using plain MyISAM with http://www.jitterbit.com/ as data access layer. Jitterbit (or another free tool alike) seems very nice for this sort of transformations.
Hope this helps a bit.
A lot of people just use Mysql or Postgres :)

Resources