Neo4J ETL from SQL Server: Import from views - neo4j

I'm currently trying to use the Neo4J ETL import (https://neo4j.com/developer/neo4j-etl/) to import some data from an existing SQL Server data warehouse to a new Neo4J Graph database.
Everything working fine, but interesting enough, the Neo4J model transformation logic doesn't seem to take views into consideration. The description on the link above also mentions just three mapping logics:
A table with a foreign key is treated as a join and imported as a
node with a relationship
A table with 2 foreign keys is treated as a join table and imported
as a relationship
A table with >2 foreign keys is treated as n intermediate node and
imported as a node with multiple relationships
Interesting enough, I really don't find any further information and similar topics are several years old. It seems odd, that this feature is not possible, as I would guess transferring an existing data warehouse to a Graph database wouldn't be something out of the ordinary?
Unfortunately, I can't give much more information, as there just doesn't seem to be more. Am I using the tool conceptually wrong or is this feature never worth being mentioned?

Related

Join query in Cassandra

I am new in Cassandra. Although I can do some stuff in SQL, I am finding it pretty hard to do simple join in Cassandra. My schema looks like this:
Now I have to find, for each department how many emails in total were sent out from employees working there. The output per department shall contain the corresponding number of emails.
Maybe I am missing some simple thing, but no matter what I do, I am not even being able to retrieve data from two tables.
Cassandra has no join operation. It has been implemented in such way to increase the performance in basic operations like reading and writing, but with the caveat that you write to and read from a single table at a particular moment.
If your model is relational, so it is based on relations between tables, than Cassandra is not the way to go. In this case you should go with some RDBMS(Relational Database Management System) like: PostgreSQL, MySql, Sql Server etc.

Importing SNOMED CT into Neo4J

I need to import SNOMED CT ontology into a graph database, in this case Neo4J but it could be another choice eventually.
However, I could not find a clear depiction of SNOMED CT underlying relational data model, in order to achieve this. Or at least, simplified SQL views that expose entity relantionship in a way that can be mapped to a graph database.
I would greatly appreciate any guidance or previous experiencies with this matter.
Directly trying to serialise the relational data model is probably going to be quite difficult and will take you further away from your goal.
It is worth noting that SNOMED data is actually available in RDF format already. So you get a graph structure for "free".
For example this project provides the data in a RDF format and putting RDF data into a graph is quite simple regardless of your choice of Titan or Neo4j.
Side Note:
A colleague of mine has actually worked on importing SNOMED data into a Grakn Graph, a semantic graph system we both work on. If you interested you can check out his work here. Grakn is a semantic graph solution which runs on top of Titan.
If you are looking for a sample on how to model the Concepts, Descriptions and Relationships into a Graph database. I have a sample project in Github that can upload the Snomed data into a Neo4j database.
https://github.com/pradeepvemulakonda/Snomed
Before you go into the implementation detail, I would suggest trying out the following Snomed data browser at
http://ontoserver.csiro.au/shrimp/
Once you get a feel of the concepts and relationships you can go through the implementation. You can use the following gist to understand how you can query the uploaded concepts and relationships in Neo4j.
https://neo4j.com/graphgist/95f4f165-0172-4b3d-981b-edcbab2e0a4b#listing_category=health-care-and-science
SNOMED can be loaded into MySQL using the UMLS (unified medical language system) released by NIH. Once loaded the table MRREL contains all the relations between SNOMED nodes. If you want load it right away in Neo4j you can totally skip the MySQL step and work directly with the UMLS RRF files. The RRF documentation format is not great but the files are easy to parse tabular text.
There are in fact three tables, Concepts, Descriptions and Relationships
You'll find them described here:
https://confluence.ihtsdotools.org/display/DOCTIG/3.1.+Components
Most important are the relations between Relationships and Concepts and Descriptions and Concepts.

Data Partitioning in Neo4j

I'm playing around with neo4j - seeing what I can and can't do with it before suggesting it for something serious. One of the things I'm looking at now is Data Partitioning. By this I mean having a single data store that contains data from many different customers, and knowing which customer the data belongs to.
In the SQL world, we've always done this by having a customer_id field on the tables that are customer specific, and then always including that in the queries and indices. This works perfectly well for us, but in the Graph DB world it feels like we can do better.
The options that I've come up with some far are:
The same as before - including a property on the nodes that is the Customer ID
Storing a Label on each Node that identifies the Customer. However, as far as I can tell you can't bind parameters to labels so this would mean that the queries are generated slightly awkwardly.
Storing a Customer Node, and linking all of the other nodes to it.
Number #3 seems to be the "correct" Graph DB way of managing this, but I'm concerned with the impact of this on the performance of the data. It's perfectly feasible that there will be hundreds of thousands of links from a single Customer Node to the other data nodes, and there will be hundreds of different Customer Nodes. (Based on the volume of data in the existing SQL database)
What's the recommended way of achieving this level of data partitioning whilst maintaining performance?

Is core data is a kind of Graph Database?

I am required to develop a big application,required to know graph database concepts the link http://sparsity-technologies.com/UserManual/API.html#transactions.I am planning to use core data instead of above link frame work. I want answerers for the following questions.
1)What is Graph Database exactly?.Explain with simple general example.which we can not perform with sqlite.
2)Does core data come under relational data base or not ? Explain.
3)Does core data come under Graph Database? But in apple documentation they mentioned that core data is for object graph management.object graph management means Graph Database .If i want to make relation ships ,weighted edge between objects core data is suitable?.
1)What is Graph Database exactly?.Explain with simple general
example.which we can not perform with sqlite.
Well, since this is all Turing complete, you can do it any database operation with any other database, the real question is a matter of efficiency.
In conventional "relational" databases the "relationships" are nothing but pointers to entries in other tables. They don't inherently communicate any information other than, "A is connected to B" To capture and structure anything more complex than that, you have to build a lot of pseudo-structure.
A1-->B1 // e.g. first-name, last-name
Which is fine but the relationship doesn't necessarily have a reciprocal, nor does the data in each table cell have to be names. To make the relationship always make sense, you've got build a lot of logic to put the data into the tables directly. Ditto for getting it out.
In a GraphDB you have "nodes" and "relationships". Nodes are not entries in a table. They can be arbitrarily complex objects, persisted or not, and persisted in a variety of ways. Nodes general model some "real-world" object like a person.
"Relationships" GraphDBs, owing to the previous meaning in SQL et al, really need another term because instead of be simple pointers, they to can be arbitrarily complex objects. In a node of names (way to simple to actually justify it)
Node-Name-A--(comes before)-->Node-Name-B
Node-Name-B--(comes after)-->Node-Name-B
In a sqlite, to find first and last names you query both tables. In a Graph, you grab one of the nodes and follow its relationship to other node.
(Come to think of it, graph theory in math started out as a way to model bridges of Konigsberg connecting the islands that made up the city. So maybe a transportation map would be a better example)
If cities are nodes, the roads are relationships. The road objects/descriptors would just connect the two but would contain their own logic and data such as their direction, length, conditions, traffic, suseptiblity to weather, and so on.
When you wanted to fetch and optimum route between widely separated cities, nodes for any particular time, traffic weather etc between two different nodes, you'd start with the node representing the start city and the follow the relationship/road-descriptors. In a complex model, any two nearby city-nodes might have several roads connecting them each best in certain circumstances.
All you have to do computationally though is compare the relationships between any to nodes. This is called "walking the graph" The huge benefit is that no matter how big the overall DB is, you only have to process the relationships coming out of the first node, say 3, and ignore utterly the the millions of other nodes and relationships that might be in the DB.
Can't do that in sqlite. The more data, the more "relationships" the more you have to process
2)Does core data come under relational data base or not ? Explain.
No, but if you hum a few bars you can fake it. By default, Core Data is an Object graph, which means it does connect object/nodes, but the relationships are themselves not objects but are instead defined by information contained in the class for each Object. E.g. you could have a Core Data of the usual Company, manager and employee.
CompanyClass
set_of_manager_objects
min_managers==1, max_managers==undefined
delete_Company_Object_delete_all_manager_objects
reciprocal_relationship_from_manager_is_company
ManagerClass
one company object
min_companies==1, max_companies==1
delete_manager_object_nullify (remove from set in company class)
recipocal_relationship_from_company_is_manager
So, Core Data a kind of "missing link" in the evolution of GraphDBs. I has relationships but they're not objects of themselves. They're inside the object/node. The relationship properties are hard coded into the classes themselves and just a few, but not all values can be changed. Still, Core Data does have the advantage of walking the graph. To find the Employees of one manager at one company. You just start at the company object, go through a small set of managers to find the right one, then walk down to the employee set. Even if you had hundreds of companies, thousands of managers and tens of thousands of employees. You can find one employee out of tens of thousands with a couple of hops.
But you can fake a GraphDB by creating relationship objects and putting them between any two object/nodes. Because Core Data allows any subclasses of relationship definition to be in the same relationship set e.g. ManagerClass--> LowManager,MidManager,HighManager, you can define a simple relationship in any given class and then populate with objects of arbitrary complexity as long as they are subclasses. These are usually termed "linking classes" or "linking relationships"
The normal pattern is to have the linking class have a relationship to the two or more classes it might have to link (which can be generic as well, I've started class trees with a base class with nothing but relationship properties, although their is a performance penalty if you get huge.)
If you give each node/object several relationships all defined on separate base linking classes, you can link the same nodes together in multiple ways.
3)Does core data come under Graph Database?
No, because the fundamental task of a database is persistence, saving the data. The fundamental task of Core Data is modeling the logic of the data inside the app.
Two different things. For example, when I start building a Core Data model, I start with an in-memory store, usually with test. The model graph is built from scratch every run, in memory, never touches the disk. As it progresses, I will shift to an XML store on disk, so I can examine it if necessary. The XML and binary stores are written out once entire and read in the same way. Only, at the end do I change the store to MySQL or something custom.
In a GraphDB, the nodes, relationships and the general graph are tied to the persistence systems innately AFAK and can't be altered. When you walk the graph, you walk the persistence, every time (except for caching.)
The usual question people ask is when to use Core Data and when to use SQL in the Apple Ecosystem.
The answer is pretty simple:
Core Data handles complexity inside the running app. The more complex the data model interactions, the more you get free with Core Data.
SQL derived solutions handle volumes of simple data. If the data model inside the app has little or no logic and there's a lot of it.
If your app is displaying something that would fit on a bunch of index cards, library book records, baseball cards etc, the an SQL solution is best because of the logic is just getting particular cards in and out of persistence.
If your app is complex vector drawing app, where every document will be different and of arbitrary complexity, or you're modeling an V8 engine, then most of the logic in the active data model while the app is running while persistence is trivial, then Core Data is the better choice.
Graph Databases are catching on because our data is getting 1) really, really big and 2) increasing complex. We need to model the complexity in the node-relationship graph in persistence so we don't have chew through the entire DB to find the data and then have to add an additional layer of logic
Core data is nothing but Data Model Layer, core data is NOT a datatbase and far away from being a graph database.
Core data only helps you to
Create Tables (Entities)
Columns in a table (Attribute)
Relationship (such as primary key, foreign key, one to one, one to many)
Core Data uses sqlite to store data and make queries.
Core Data is used in iOS mobile apps, I believe what you want is a backend solution for database.

How to load relational database tables into neo4j database

I have tables which have millions of records. I need to load these records as nodes in neo4j.
Please help me out on how to do it as I'm new to neo4j.
It is quite easy, just map your entities that should become nodes into a set of csv files and the connections that should become relationships in another set of files.
Then run them with my batch-importer: https://github.com/jexp/batch-import/tree/20#binary-download
import.sh nodes1.csv,nodes2.csv rels1.csv,rels2.csv
Add types and index information to the headers and the batch.properties config file as needed.
You can use the batch-importer for the initial inserter but also subsequent updates (but the database has to be shut-down for that).
It is pretty easy to connect to your existing database using its driver and then extract the information of the right shape and kind and insert it into your graph model,
Either using Cypher statements with parameters or the embedded, transactional Java API for ongoing updates.
See: http://jexp.de/blog/2013/05/on-importing-data-in-neo4j-blog-series/
You can export to CSV and import it into node (probably wont work well since you have millions of records)
You can write a program to do it (this is what I am currently working on).
This also depends on what programming languages you know... but the bottom line is, because no two databases are created equally (unless on purpose), it's very difficult to create a catch-all solution for migrating data from SQL to Neo.
The best way that I've discovered so far is to create a program that queries the tables in the database, finds all related tables (i.e. foreign keys), and imports all those table rows into Neo, labeling the nodes using the Table name, then process the foreign keys as relationships.
It's not easy. I've been working on something for my database here for a week or so now... but I'm close!

Resources