Fast way to mockup hierarchical data easily - neo4j

I'm seeking a quick and easy solution to help mock/populate/test an org chart (in ASP MVC), with tree based or hierarchical data. I need the test data for this...
I have used both http://www.generatedata.com and mock-aro (both of which I like, but the MS Sql data from the site doesn't work, it has multiple syntax errors, including issues with dates an date based data) and looked at redgate - which is not affordable, plus it never got the data right on the nested side.
What is the fastest/least effort way/tool to mockup hierarchy data like an org chart, with dept, name, cost and employees?

There is an online tool capable of generating graphs : http://graphgen.graphaware.com
It is based on the Cypher spec.
A simple pattern expressing Deptartment Org Chart, could be defined like this :
(Dept1:Department {name:word}*5)<-[:PART_OF *1..n]-(subDept1:Department {name:word} *10)
(Dept2:Department {name:word} *5)<-[:PART_OF *1..n]-(subDept2:Department {name:word} *10)
(ssd1:Department {name:word} *20)-[:PART_OF *n..1]->(subDept1)
(ssd2:Department {name:word} *20)-[:PART_OF *n..1]->(subDept2)
(employee1:Person {name:fullName} *50)-[:WORKS_IN_DEPT *n..1]->(ssd1)
(employee2:Person {name:fullName} *50)-[:WORKS_IN_DEPT *n..1]->(ssd2)
You can have a graph preview here : (click on generate after the page is loaded ) http://graphgen.graphaware.com/?graph=koWvmnBTW7JMR7
Also, there is the possibility to import the graph data in your database (even your local db) , create a neo4j console or get a graphjson format.
Don't hesitate to adapt to your needs and try the tool, the documentation is available here : http://graphgen.graphaware.com/documentation
Also, you can ping me on twitter : https://twitter.com/ikwattro for further questions regarding graphgen.
Chris

Related

Are multiple vertex labels in Gremlin/Janusgraph possible, or is an alternative solution better?

I am working on an import runner for a new graph database.
It needs to work with:
Amazon Neptune - Gremlin implementation, has great infrastructure support in production, but a pain to work with locally, and does not support Cypher. No visualization tool provided.
Janusgraph - easy to work with locally as a Gremlin implementation, but requires heavy investment to support in production, hence using Amazon Neptune. No visualization tool provided.
Neo4j - Excellent visualization tool, Cypher language feels very familiar, even works with Gremlin clients, but requires heavy investment to support in production, and there appears to be no visualization tool that is anywhere nearly as good as the one found in Neo4j that works with Gremlin implementations.
So I am creating the graph where the Entity (Nodes/Verticies) have multiple Types (Labels), some being orthogonal to each other, as well as multi-dimensional.
For example, an Entity representing an order made online would be labeled as Order, Online, Spend, Transaction.
| Spend Chargeback
----------------------------------------
Transaction | Purchase Refund
Line | Sale Return
Zooming into the Spend column.
| Online Instore
----------------------------------------
Purchase | Order InstorePurchase
Sale | OnlineSale InstoreSale
In Neo4j and its Cypher query language, this proves to be very powerful for creating Relationships/Edges across multiple types without explicitly knowing what transaction_id values are in the graph :
MATCH (a:Transaction), (b:Line)
WHERE a.transaction_id = b.transaction_id
MERGE (a)<-[edge:TRANSACTED_IN]-(b)
RETURN count(edge);
Problem is, Gremlin/Tinkerpop does not natively support multiple Labels for its Verticies.
Server implementations like AWS Neptune will support this using a delimiter eg. Order::Online::Spend::Transaction and the Gremlin client does support it for a Neo4j server but I haven't been able to find an example where this works for JanusGraph.
Ultimately, I need to be able to run a Gremlin query equivalent to the Cypher one above:
g
.V().hasLabel("Line").as("b")
.V().hasLabel("Transaction").as("a")
.where("b", eq("a")).by("transaction_id")
.addE("TRANSACTED_IN").from("b").to("a")';
So there are multiple questions here:
Is there a way to make JanusGraph accept multiple vertex labels?
If not possible, or this is not the best approach, should there be an additional vertex property containing a list of labels?
In the case of option 2, should the label name be the high-level label (Transaction) or the low-level label (Order)?
Is there a way to make JanusGraph accept multiple vertex labels?
No, there is not a way to have multiple vertex labels in JanusGraph.
If not possible, or this is not the best approach, should there be
an additional vertex property containing a list of labels?
In the case of option 2, should the label name be the high-level label
(Transaction) or the low-level label (Order)?
I'll answer these two together. Based on what you have described above I would create a single label, probably named Transaction, and with different properties associated with them such as Location (Online or InStore) and Type (Purchase, Refund, Return, Chargeback, etc.). Looking at how you describe the problem above you are really talking only about a single entity, a Transaction where all the other items you are using as labels (Online/InStore, Spend/Refund) are really just additional metadata about how that Transaction occurred. As such the above approach would allow for simple filtering on one or more of these attributes to achieve anything that could be done with the multiple labels you are using in Neo4j.

Neo4j entity relationship diagram

How to extract an entity relationship diagram from a graph database? I have all the required files that was created from my application.
You can use
call db.schema for a graph representation of the graph data model. There are a few other functions to get the properties, keys, indexes, etc like call db.indexes, call db.propertykeys etc.
The APOC procedure library has a few relevant functions that might help to get a tabular layout - or develop it yourself in Excel from the labels, property keys, etc.
You can also build a data model using the Arrows tool
Please reorient your thinking to use graph terms - the equivalent for the ER diagram would be a model built using the Arrows tool or the db.schema.
I used: CALL db.schema.visualization for visualizing the database schema. Like https://stackoverflow.com/a/45357049/7924573 already said, in graph databases this is as closest as you can to ER-diagrams. In the remote interface you can export it directly as e.g. .svg graphic
Here is an example:

Data model of existing data in Neo4J

I have a small dataset loaded into Neo4J consisting of a 6 node labels with about 20 nodes for each label and there are about 10 different relationships. I was wondering if you can automatically create a picture of this data model using the data available in the database.
I would like to create something like this automatically from the data:
taken from http://neo4j.com/docs/stable/cypherdoc-movie-database.html
I know that it would be quite simple doing it manually in this example but it could come in handy looking at more complex data models.
Any suggestions?
Thank you Michael, that helped. There is also functionality in the web tool that ships with Neo4J that can do something similar although less graphically.
You click on the little bubbles in the top left corner of the interface and then there is a predefined query that extracts all lables and relations from the graph.

surveymonkey Where is qtype and respondent_id in the get_survey_details extract?

I'm trying to replicate the survey monkey relational database format (A relational database view of your data with a separate file created for each database table. Knowledge of SQL (Structured Query Language) is necessary.) to download responses for our reporting analytics using the Survey Monkey API. However I'm not able to find the QType and respondent_id data in the get_survey_details API extract method. Can someone help?
1.QType is found in the Questions.xls data in the current relational database format download.
I was able to find all of the other data in the Questions.xls data in the get_survey_details API (question_id, page_id, position, heading) but not QType.
2.Respondent_id is found in the Responses.xls data in the the relational database format download.
I can see that respondent_id is in the get_responses API method but that does not have the associated Key1 data that I also need. Key1 data is answer_id data in the get_survey_details API which is why I expected to find the corresponding respondent_id there as well.
SurveyMonkey's deprecated relational database download (RDD) format and API provide data using very different paradigms. Using the API to recreate the RDD format in order to work with an old integration is probably a poor use of time. A more productive idea would be to use the API to build a more modern integration from the ground-up taking advantage of things like real-time data availability to modernize the functionality. But if you're determined:
You will need to map the family and subtype of the question type to the QTypes you're used to. The information you need to build the mapping can be found on SurveyMonkey's developer portal in Data Types.
get_responses returns answer_id as row and/or col. For matrix question types, you will have both which cross reference to and answer and answer items from get_survey_details. For matrix questions, you might consider concatenating the row and col to create a single unique key value like the Key1 you're accustomed to.
I've done this. It got over the immediate need when the RDD format was withdrawn.
Now that I have more time, I'm looking at a better design but as always backwards compatibility with a large code base is the drag.
To answer your question on Qtype, see my reply at
What are the expected values for the various "ENUM" types returned by the SurveyMonkey API?

oData - applying filters to SQL queries

I am relatively new to oData service and I am trying to explore if oData is feasible for my project.
From all the examples / demos that I have come across,every demo always loads up all data into the repository and then oData filters are applied over the data.
Is there a way to not load up all data (apply the filters to SQL from oData) from SQL which will obviously be highly inefficient for N number of requests coming in /second ?
So for example if I had a movies service :
localhost:4502/OdataService/movies(55)
The above example is actually just filtering for movie id 55 from an "entire" set of movies.Is there a way to make this filter happen at SQL level instead of bloating the memory first with all movies and then allowing oData to filter it?
Can anyone guide me in the right direction?
I found out after doing a small POC that Entity framework takes care of building dynamic query based on the request.

Resources