Adding edge properties using DseGraphFrame API - datastax-enterprise

If I want to add edges with properties as data frames, what format should the properties data be in? I'm reading these docs, but it doesn't address anything about the format of the properties columns.
For instance:
Dataset<Row> edgesToAdd = sparkDataSetContainingData
.select(
dseGraphFrame.idColumn(lit(srcLabel), col("sourceName")).as("src"),
dseGraphFrame.idColumn(lit(destLabel), col("destinationName")).as("dst"),
lit(inputEdgeLabel).as("~label"),
dseGraphFrame.idColumn("some_property_key", col("some_property_value")) // is this correct?
);
dseGraphFrame.updateEdges(edgesToAdd, true);

That should be correct - you must have 3 columns: src, dst & ~label. First two are generated using the idColumn function that accepts vertex label and vertex ID as parameters. You can see that in the following example from DataStax-Examples.
There is also a lot of resources available on that topic:
blog post introducing DSE GraphFrames
blog post about best practices working with DSE GraphFrames
presentation from Graph Day 2018
DS332 course on DSE Graph Analytics

Related

Does odata v4 support aggregation on date values?

I am looking for an OData query syntax which helps to solve Sum((DateDiff(minute, StartDate, EndDate) which we do in SqlServer. Is it possible to do such things using OData v4?
I tried the aggregate function but not able to use the sum operator on the duration type. Any idea?
You can't execute a query like that directly in standards compliant v4 service as the built in Aggregates all operate on single fields, for instance there is no support for creating a new arbitrary column to project the results into, this is mainly because the new column is undefined. By restricting the specification to only columns that are pre-defined in the resource itself, we can have a strong level of certainty on the structure of the data that will be returned.
If you are the author of the API, there are three common approaches that can achieve a query similar to your request.
Define a Custom Data Aggregate, this is way more involved than is necessary, but it means you could define the aggregate once and use it in many resource queries.
Only research this solution if you truly need to reuse the same aggregate on multiple resources
Define a Custom Function to compute the result of all or some elements in your query.
Think of a Function as similar to a SQL View, it is really just a way of expressing a custom query and custom response object that is associated with a resource.
It is common to use Functions to apply complex filter conditions that still return the resource that they are bound to, but you can return an entirely different structure of data if you want.
Exploit Open Type, this can sometimes be more effort than you expect, but can be managed if there is only a small number of common transformations you want to apply to the resource and project their results as discrete properties in addition to the standard resource definition.
In your case you could project DateDiff(minute, StartDate, EndDate) into its own discrete column, perhaps called Minutes or Duration. Then you could $apply a simple SUM across this new field.
Exposing a custom Function is usually the least effort approach, because you are not constrained by the shape of the result at all, it can be maintained in relative isolation from the main resource, as with Open Types, the useful thing about functions is that the caller can still apply OData aggregates to the result of the Function.
If the original post is updated with some more detailed code examples, I can elabortate on the function implementation, however in this state I hope this information sets you on the right path.

Neo4j entity relationship diagram

How to extract an entity relationship diagram from a graph database? I have all the required files that was created from my application.
You can use
call db.schema for a graph representation of the graph data model. There are a few other functions to get the properties, keys, indexes, etc like call db.indexes, call db.propertykeys etc.
The APOC procedure library has a few relevant functions that might help to get a tabular layout - or develop it yourself in Excel from the labels, property keys, etc.
You can also build a data model using the Arrows tool
Please reorient your thinking to use graph terms - the equivalent for the ER diagram would be a model built using the Arrows tool or the db.schema.
I used: CALL db.schema.visualization for visualizing the database schema. Like https://stackoverflow.com/a/45357049/7924573 already said, in graph databases this is as closest as you can to ER-diagrams. In the remote interface you can export it directly as e.g. .svg graphic
Here is an example:

Fast way to mockup hierarchical data easily

I'm seeking a quick and easy solution to help mock/populate/test an org chart (in ASP MVC), with tree based or hierarchical data. I need the test data for this...
I have used both http://www.generatedata.com and mock-aro (both of which I like, but the MS Sql data from the site doesn't work, it has multiple syntax errors, including issues with dates an date based data) and looked at redgate - which is not affordable, plus it never got the data right on the nested side.
What is the fastest/least effort way/tool to mockup hierarchy data like an org chart, with dept, name, cost and employees?
There is an online tool capable of generating graphs : http://graphgen.graphaware.com
It is based on the Cypher spec.
A simple pattern expressing Deptartment Org Chart, could be defined like this :
(Dept1:Department {name:word}*5)<-[:PART_OF *1..n]-(subDept1:Department {name:word} *10)
(Dept2:Department {name:word} *5)<-[:PART_OF *1..n]-(subDept2:Department {name:word} *10)
(ssd1:Department {name:word} *20)-[:PART_OF *n..1]->(subDept1)
(ssd2:Department {name:word} *20)-[:PART_OF *n..1]->(subDept2)
(employee1:Person {name:fullName} *50)-[:WORKS_IN_DEPT *n..1]->(ssd1)
(employee2:Person {name:fullName} *50)-[:WORKS_IN_DEPT *n..1]->(ssd2)
You can have a graph preview here : (click on generate after the page is loaded ) http://graphgen.graphaware.com/?graph=koWvmnBTW7JMR7
Also, there is the possibility to import the graph data in your database (even your local db) , create a neo4j console or get a graphjson format.
Don't hesitate to adapt to your needs and try the tool, the documentation is available here : http://graphgen.graphaware.com/documentation
Also, you can ping me on twitter : https://twitter.com/ikwattro for further questions regarding graphgen.
Chris

surveymonkey Where is qtype and respondent_id in the get_survey_details extract?

I'm trying to replicate the survey monkey relational database format (A relational database view of your data with a separate file created for each database table. Knowledge of SQL (Structured Query Language) is necessary.) to download responses for our reporting analytics using the Survey Monkey API. However I'm not able to find the QType and respondent_id data in the get_survey_details API extract method. Can someone help?
1.QType is found in the Questions.xls data in the current relational database format download.
I was able to find all of the other data in the Questions.xls data in the get_survey_details API (question_id, page_id, position, heading) but not QType.
2.Respondent_id is found in the Responses.xls data in the the relational database format download.
I can see that respondent_id is in the get_responses API method but that does not have the associated Key1 data that I also need. Key1 data is answer_id data in the get_survey_details API which is why I expected to find the corresponding respondent_id there as well.
SurveyMonkey's deprecated relational database download (RDD) format and API provide data using very different paradigms. Using the API to recreate the RDD format in order to work with an old integration is probably a poor use of time. A more productive idea would be to use the API to build a more modern integration from the ground-up taking advantage of things like real-time data availability to modernize the functionality. But if you're determined:
You will need to map the family and subtype of the question type to the QTypes you're used to. The information you need to build the mapping can be found on SurveyMonkey's developer portal in Data Types.
get_responses returns answer_id as row and/or col. For matrix question types, you will have both which cross reference to and answer and answer items from get_survey_details. For matrix questions, you might consider concatenating the row and col to create a single unique key value like the Key1 you're accustomed to.
I've done this. It got over the immediate need when the RDD format was withdrawn.
Now that I have more time, I'm looking at a better design but as always backwards compatibility with a large code base is the drag.
To answer your question on Qtype, see my reply at
What are the expected values for the various "ENUM" types returned by the SurveyMonkey API?

Understanding Neo4j, creating unique nodes

I'm trying to wrap my head around how Neo4j works and how I can apply it to my problem. I thought it should be really easy and a matter of minutes, but I'm stuck.
I have data in MongoDB, say User and Item. What I want is connecting User and Item in a graph with a LIKE relationship (maybe with a score). Later I want to do things like recommending items based on connections, basic stuff.
But how do I get the data into Neo4j? Every document in MongoDB has an unique _id, so I though I could just throw both _ids into Neo4j and have them connected. What I found so far is that it's not even possible to have unique nodes based on the _id field (Neo4j has numeric incremented ids), which is only possible with some "hack" (https://github.com/jexp/app-net-graph/blob/master/lib/appnet.rb#L11) or using MERGE (I'm stuck on < 2.0). Even their examples on the website add the same node again if executed multiple times. I think I have a fundamental misunderstanding of how to use Neo4j. Maybe I'm too spoiled by redis, where I can put strings in and and it just works. Redis' sets aren't feasible though for complex graphs, only for simple connections.
Maybe someone can help me with a simple cypher example of how to add two nodes foo and bar and have them connected with a LIKE connection. And the operation should be idempotent, no matter if none or all of the nodes/relationships already existed before execution.
I'm accessing Neo4j via REST, in particular using this node module https://github.com/thingdom/node-neo4j
You could define your external ID as extra property on your nodes. Then depending on if your are using SpringData or not, you can insert the data.
If you are using SpringData, you can configure your external ID as unique index and then normally save you nodes(consider though, that inserting a duplicated ID will overwrite the existing one).
If you are using the plain java API, you can create unique nodes as described here:
http://docs.neo4j.org/chunked/stable/tutorials-java-embedded-unique-nodes.html#tutorials-java-embedded-unique-get-or-create
EDIT:
As for a sample query, does this help you?
http://console.neo4j.org/?id=b0z486
With the java api you would do it like this
firstNode = graphDb.createNode();
firstNode.setProperty( "externalID", "1" );
firstNode.setProperty( "name", "foo" );
secondNode = graphDb.createNode();
secondNode.setProperty( "externalID", "2" );
secondNode.setProperty( "name", "bar" );
relationship = firstNode.createRelationshipTo( secondNode, RelTypes.Likes );
I suggest you read some tutorials here: http://docs.neo4j.org/chunked/stable/tutorials-java-embedded-hello-world.html
Given you are using Neo4J1.9, have you tried creating a unique index on your _ID column?
Try this article from the docs
If you were using Neo4j2, then this article is helpful

Resources