how to define an index in the graphML format? - neo4j

i exported my data into graphML format, and want to import them into neo4j via gremlin's graphML.import() function. i need to create indexes to index all my imported data. is it even possible in the graphML format?
my export xml looks like this:
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key id="user" for="node" attr.name="user" attr.type="int" />
<key id="item" for="node" attr.name="item" attr.type="int" />
<graph id="G" edgedefault="directed">
....
</graph>
</graphml>

No,
GrapmML does not contain this. You could enable autoindexes on the fields you want before you start the import, so they are recording the changes fro you? http://docs.neo4j.org/chunked/snapshot/auto-indexing.html

Related

Gremlin: Read edge GraphML file and node GraphML file in separate queries

I have two files that I want to load by using g.io(<name file>).read().iterate(): nodes.xml and edges.xml.
The nodes.xml file contains the nodes of the graph I want to upload, and its contents are this:
<?xml version='1.0' encoding='utf-8'?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key id="labelV" for="node" attr.name="labelV" attr.type="string" />
<key id="name" for="node" attr.name="name" attr.type="string" />
<key id="age" for="node" attr.name="age" attr.type="int" />
<graph id="G" edgedefault="directed">
<node id="1">
<data key="labelV">person</data>
<data key="name">marko</data>
<data key="age">29</data>
</node>
<node id="2">
<data key="labelV">person</data>
<data key="name">vadas</data>
<data key="age">27</data>
</node>
</graph>
</graphml>
The edges.xml file contains the edges of the graph I want to upload, and its content are this:
<?xml version='1.0' encoding='utf-8'?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key id="labelE" for="edge" attr.name="labelE" attr.type="string" />
<key id="weight" for="edge" attr.name="weight" attr.type="double" />
<graph id="G" edgedefault="directed">
<edge id="7" source="1" target="2">
<data key="labelE">knows</data>
<data key="weight">0.5</data>
</edge>
</graph>
</graphml>
I want to upload the nodes first by running g.io('nodes.xml').read().iterate() and then the edges by running g.io('edges.xml').read().iterate(). But when I upload the edges.xml, instead of adding edges to the previously created nodes, it creates new nodes.
It is possible to easily load the nodes first and then the edges in separate queries with a similar command in Gremlin? I know this can be accomplished with complex queries that involve reading and creating edge by edge the edges in the edges.xml file via user queries, but I'm wondering if there is something easier. Also, I wouldn't want to upload a single file with all the nodes and edges.
I'm afraid that the GraphMLReader doesn't work that way. It's not designed to read into an existing graph. I honestly can't remember if this was done purposefully or not.
The code isn't too complicated though. You could probably just modify it to work they way that you want. You can see here where the code checks the vertex cache for the id. That cache is empty on your second execution because it is only filled by way of new vertex additions - it doesn't remember any from your first run and it doesn't read from the graph directly for your second run. Simply change that to logic to better suit your needs.

Specify an edgeLabel in graphML that tinkerpop can understand

I have been struggling to load a graphml in to Tinkerpop3.
Graph graphMLGraph = TinkerGraph.open();
graphMLGraph.io(IoCore.graphml()).readGraph(file.getAbsolutePath());
While loading, I want the the edges to have a label.
graphTraversalSource.E().toStream().forEach(edge -> {
System.out.println(edge.label());
});
The above code always prints the labels as edge for all the edges in the graphml. My graphml snippets.
<edge id="1" source="1" target="3">
<data key="edgelabel">belongs-to</data>
<data key="weight">1.0</data>
</edge>
<edge id="2" source="1" target="4">
<data key="weight">1.0</data>
<data key="edgelabel">part-of</data>
</edge>
And the key definition
<key attr.name="Edge Label" attr.type="string" for="edge" id="edgelabel"/>
I am use DSE 5.1.3's java driver and tinkerpop 3.2.5 is used via a transitive dependency and used Gephi to author the graphml.
By default, your edge label will be recognized if you do define the key as:
<key id="labelE" for="edge" attr.name="labelE" attr.type="string" />
The important part being that the attr.name is defaulted to "labelE". See the IO Reference documentation for GraphML here. Note that the default can be changed when you instantiate the GraphMLReader.Builder object by setting the edgeLabelKey value on the builder itself.

Nosqlunit-neo4j unable to assign label to Node in Spring Data Neo4j 4

It seems that Nosqlunit-neo4j is not compatible with SDN 4 since TypeRepresentationStrategy is removed. It adds the node defined in following graphml xml file into test database but doesn't assign it a label due to which repository.count() returns 0. However, if I query the database natively, then it does fetches the node without any Label.
<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns">
<key id="__type__" for="node" attr.name="__type__" attr.type="string"></key>
<key id="productId" for="node" attr.name="productId" attr.type="string"></key>
<graph id="G" edgedefault="directed">
<node id="3">
<data key="__type__">com.my.package.Product</data>
<data key="productId">100001235</data>
<index name="__types__" key="className">com.my.package.Product
</index>
</node>
</graph>
</graphml>
Does anyone facing the same issue?
If you use the label Product directly it should work.
You don't need the index or the __type__ properties anymore.

How to retrieve specific tag from an xml file in spring batch in java

This is what I need to implement. I need to create a batch which:
1. READ: will read mutiple xml files from a folder
2. PROCESS: extract the values of some tags from one of these xmls (as explained below)and save the extracted data in a DB.
3. WRITE: move the processed xml file as it is to another directory.
There are no repeating tags in the xml.
For example if this is my XML:
<?xml version="1.0" encoding="UTF-8"?>
<report>
<info>
<ssn>5214365214356</ssn>
<name>abc</name>
<age>12</age>
<gender>male</gender>
</info>
<address>
<street>abc</street>
<city>atrdtysaf</city>
<state>abcsvc</state>
<country>USA</country>
</address>
<healthinfo>
<smoking>no</smoking>
<drinking>no</drinking>
</healthinfo>
</report>
I want to extract the values of "ssn, gender and country tags only". Please note that the actual xml would be relatively huge. I am supposed to use StaxEvenItemReader provided by spring batch.

Import custom Atom feed in PowerPivot

Question: I have created a sample xml document containing data conforming to the atom 1.0 schema. When I import the contents of this file (for testing purposes) in PowerPivot, it creates columns for each atom element in each entry, instead of creating a column per content element. Why is this?
Background: A customer wants to import data from a web service which provides a feed that uses a custom XML schema that is not supported by PowerPivot. The service provides the ability for the caller to supply an XSLT template that will be applied to the feed. I am hoping to be able to transform this feed into a valid atom feed thereby allowing the customer to import data into PowerPivot.
Sample atom xml:
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices"
xmlns="http://www.w3.org/2005/Atom"
xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata">
<title type="text">My Data Feed</title>
<id>http://temp/feed</id>
<updated>2012-12-13T00:00:00Z</updated>
<entry>
<id>http://temp/feed/1</id>
<title type="text">Title</title>
<author>
<name>Author</name>
</author>
<updated>2012-12-13T00:00:00Z</updated>
<content type="application/xml">
<d:Name>John Smith</d:Name>
<d:Address>Address</d:Address>
<d:Zip>1234</d:Zip>
</content>
</entry>
</feed>
When imported into PowerPivot (selecting "From Data Feeds", clicking "Browse" and pointing out the xml file), it looks like this:
I was excpecting three columns: Name, Address and Zip. If I change "Include Atom Elements" from Auto to False in the connection configuration, no columns are imported.
It seems I was just missing the m:properties element. Final result - also includes examples of null attributes and data types:
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices"
xmlns="http://www.w3.org/2005/Atom"
xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata">
<title type="text">My Data Feed</title>
<id>http://temp/feed</id>
<updated>2012-12-13T00:00:00Z</updated>
<entry>
<id>http://temp/feed/1</id>
<title type="text">Title</title>
<author>
<name>Author</name>
</author>
<updated>2012-12-13T00:00:00Z</updated>
<content type="application/xml">
<!-- attributes placed under the properties element -->
<m:properties>
<d:Name>John Smith</d:Name>
<d:Address>Address</d:Address>
<d:Zip m:type="Edm.Int32">1234</d:Zip>
<d:Comment m:null="true" />
</m:properties>
</content>
</entry>
</feed>

Resources