How to load relational database tables into neo4j database - neo4j

I have tables which have millions of records. I need to load these records as nodes in neo4j.
Please help me out on how to do it as I'm new to neo4j.

It is quite easy, just map your entities that should become nodes into a set of csv files and the connections that should become relationships in another set of files.
Then run them with my batch-importer: https://github.com/jexp/batch-import/tree/20#binary-download
import.sh nodes1.csv,nodes2.csv rels1.csv,rels2.csv
Add types and index information to the headers and the batch.properties config file as needed.
You can use the batch-importer for the initial inserter but also subsequent updates (but the database has to be shut-down for that).
It is pretty easy to connect to your existing database using its driver and then extract the information of the right shape and kind and insert it into your graph model,
Either using Cypher statements with parameters or the embedded, transactional Java API for ongoing updates.
See: http://jexp.de/blog/2013/05/on-importing-data-in-neo4j-blog-series/

You can export to CSV and import it into node (probably wont work well since you have millions of records)
You can write a program to do it (this is what I am currently working on).
This also depends on what programming languages you know... but the bottom line is, because no two databases are created equally (unless on purpose), it's very difficult to create a catch-all solution for migrating data from SQL to Neo.
The best way that I've discovered so far is to create a program that queries the tables in the database, finds all related tables (i.e. foreign keys), and imports all those table rows into Neo, labeling the nodes using the Table name, then process the foreign keys as relationships.
It's not easy. I've been working on something for my database here for a week or so now... but I'm close!

Related

Neo4J ETL from SQL Server: Import from views

I'm currently trying to use the Neo4J ETL import (https://neo4j.com/developer/neo4j-etl/) to import some data from an existing SQL Server data warehouse to a new Neo4J Graph database.
Everything working fine, but interesting enough, the Neo4J model transformation logic doesn't seem to take views into consideration. The description on the link above also mentions just three mapping logics:
A table with a foreign key is treated as a join and imported as a
node with a relationship
A table with 2 foreign keys is treated as a join table and imported
as a relationship
A table with >2 foreign keys is treated as n intermediate node and
imported as a node with multiple relationships
Interesting enough, I really don't find any further information and similar topics are several years old. It seems odd, that this feature is not possible, as I would guess transferring an existing data warehouse to a Graph database wouldn't be something out of the ordinary?
Unfortunately, I can't give much more information, as there just doesn't seem to be more. Am I using the tool conceptually wrong or is this feature never worth being mentioned?

Join query in Cassandra

I am new in Cassandra. Although I can do some stuff in SQL, I am finding it pretty hard to do simple join in Cassandra. My schema looks like this:
Now I have to find, for each department how many emails in total were sent out from employees working there. The output per department shall contain the corresponding number of emails.
Maybe I am missing some simple thing, but no matter what I do, I am not even being able to retrieve data from two tables.
Cassandra has no join operation. It has been implemented in such way to increase the performance in basic operations like reading and writing, but with the caveat that you write to and read from a single table at a particular moment.
If your model is relational, so it is based on relations between tables, than Cassandra is not the way to go. In this case you should go with some RDBMS(Relational Database Management System) like: PostgreSQL, MySql, Sql Server etc.

Importing SNOMED CT into Neo4J

I need to import SNOMED CT ontology into a graph database, in this case Neo4J but it could be another choice eventually.
However, I could not find a clear depiction of SNOMED CT underlying relational data model, in order to achieve this. Or at least, simplified SQL views that expose entity relantionship in a way that can be mapped to a graph database.
I would greatly appreciate any guidance or previous experiencies with this matter.
Directly trying to serialise the relational data model is probably going to be quite difficult and will take you further away from your goal.
It is worth noting that SNOMED data is actually available in RDF format already. So you get a graph structure for "free".
For example this project provides the data in a RDF format and putting RDF data into a graph is quite simple regardless of your choice of Titan or Neo4j.
Side Note:
A colleague of mine has actually worked on importing SNOMED data into a Grakn Graph, a semantic graph system we both work on. If you interested you can check out his work here. Grakn is a semantic graph solution which runs on top of Titan.
If you are looking for a sample on how to model the Concepts, Descriptions and Relationships into a Graph database. I have a sample project in Github that can upload the Snomed data into a Neo4j database.
https://github.com/pradeepvemulakonda/Snomed
Before you go into the implementation detail, I would suggest trying out the following Snomed data browser at
http://ontoserver.csiro.au/shrimp/
Once you get a feel of the concepts and relationships you can go through the implementation. You can use the following gist to understand how you can query the uploaded concepts and relationships in Neo4j.
https://neo4j.com/graphgist/95f4f165-0172-4b3d-981b-edcbab2e0a4b#listing_category=health-care-and-science
SNOMED can be loaded into MySQL using the UMLS (unified medical language system) released by NIH. Once loaded the table MRREL contains all the relations between SNOMED nodes. If you want load it right away in Neo4j you can totally skip the MySQL step and work directly with the UMLS RRF files. The RRF documentation format is not great but the files are easy to parse tabular text.
There are in fact three tables, Concepts, Descriptions and Relationships
You'll find them described here:
https://confluence.ihtsdotools.org/display/DOCTIG/3.1.+Components
Most important are the relations between Relationships and Concepts and Descriptions and Concepts.

Smartest way to import massive datasets into a Rails application?

I've got multiple massive (multi gigabyte) datasets I need to import into a Rails app. The datasets are currently each in their own database on my development machine, and I need to read from them and create rows in tables in my Rails database based on the information they contain. The tables in my Rails database will not be exactly the same as the tables in the source databases.
What's the smartest way to go about this?
I was thinking migrations, but I'm not exactly sure how to connect the migration to the databases, and even if that is possible, is that going to be ridiculously slow?
without seeing the schemas or knowing the logic you want to apply to each row, I would say the fastest way to import this data is to create a view of the table you want to export in the column order you want (and process it using sql) and the do a select into outfile on that view. You can then take the resulting file and import it into the target db.
This will not allow you to use any rails model validations on the imported data, though.
Otherwise, you have to go the slow way and create a model for each source db/table to extract the data (http://programmerassist.com/article/302 tells you how to connect to a different db for a given model) and import it that way. This is going to be quite slow, but you could set up an EC2 monster instance and run it as fast as possible.
Migrations would work for this, but I wouldn't recommend it for something like this.
Since georgian suggested it, I'll post my comment as an answer:
If the changes are superficial (column names changed, columns removed, etc), then I would just manually export them from the old database and into the new, and then run a migration to change the columns.

Rails: Script to import data

I have a couple of scripts that generate & gather large amounts of data that I will need both to seed my database with and in the future to add large amounts more to it. What is the best way to import lots of relational data into a rails database as both seed data and intermittently during production?
I haven't settled on an output format for my script yet but the data's structure largely mirrors my rails model's and contains has_many associations that I would like the import to preserve.
I've googled a fair bit and seen ar-extensions and seed_fu as well as the idea of using fixtures.
With ar-extensions all the examples seem to be staightforward csv imports (likely from table dumps which seems to be its primary use case) with no mention of handling associations or avoiding duplicate updates. In my case I have no id's, foreign keys, or join tables in my script so this seems like it wouldn't work for me unless I was prepared to handle that complexity myself.
With seed_fu, it looks like it could handle the relational aspects of the data creation but would still require me to specify ids (how can you know which ones are available in production?) and mix code with data.
Fixtures also have the same id problem though now it requires objects to be named (I'd probably just end up using numbers for names) and I am not sure how I would avoid accidental duplications of records.
Or would i be better off just putting my data into a local sqlite db first and then using the straight table dumping techniques?
I've done this before with CSV files. I have a cron job that collects the data and puts it in accessible CSV files. Then I have a rake task (also in cron, but on another box) that looks for CSVs, and if there are any, calls model methods to create objects out of the CSV rows.
The models have a csv_create_or_update method that takes a CSV row. This technique sidesteps the id issue, and also allows validations to run (or not) on the data coming in.
I was handed a rails app recently that required some imported data and it was all handled in a rake task. All the data came in form of csv formatted files and was handled per class/model etc as necessary. This worked out relatively well for me being new to the system as it was very easy to see where data was going and how it was being applied. As you import the data you can check for id conflicts/collisions and handle them accordingly.

Resources