Importing Data to neo4j - neo4j

I am trying to import a Database on Game of Thrones for neo4j.
Github_link_to_the_data
I copied and pasted the code to my cypher browser but I keep getting errors.
Can someone please instruct me how to import this data so I can start querying this database?
Here is the error I am getting:
Neo.ClientError.Statement.SyntaxError: Invalid input ')': expected
whitespace,
comment or an expression (line 630, column 3 (offset: 24452))
"CREATE (banner)-[:BANNERMAN_OF]->(euron);"
I would appreciate some help here.
Thanks!

The Neo4j Browser has only recently been augmented with the ability to process multiple cypher statements in the query editor (separated by ;), and there are still a couple bugs being worked out here as of 3.4.5.
Your best bet for processing these is via cypher-shell, you can pipe in the input file and it will take care of the rest.
Check out this section of the docs, and pay attention to example 10.17 on how to pipe the input file.

Related

ArangoDB - how to import neo4j database export into ArangoDB

Are there any utilities to import database from Neo4j into ArangoDB? arangoimp utility expects the data to be in certain format for edges and vertices than what is exported by Neo4j.
Thanks!
Note: This is not an answer per se, but a comment wouldn't allow me to structure the information I gathered in a readable way.
Resources online seem to be scarce w/r to the transition from neo4j to arangodb.
One possible way is to combine APOC (https://github.com/neo4j-contrib/neo4j-apoc-procedures) and neo4j-shell-tools (https://github.com/jexp/neo4j-shell-tools)
Use apoc to create a cypher export file for the database (see https://neo4j.com/developer/kb/export-sub-graph-to-cypher-and-import/)
Use the neo4j-shell-tool cypher import with the -o switch -- this should generate csv-files
Analyse the csv-files,
massage them with csvtool OR
create json-data with one of the numerous csv2json converters available (npm, ...) and massage these files with jq
Feed the files to arangoimp, repeat 3 if necessary
There is also a graphml to json converter (https://github.com/uskudnik/GraphGL/blob/master/examples/graphml-to-json.py) available, so that you could use the afforementioned neo4j-shell-tools to export to graphml, convert this representation to json and massage these files to the necessary format.
I'm sorry that I can't be of more help, but maybe these thoughts get you started.

SSIS - Vendor supplies 12 xml tables, we only need 3 how to remove or ignore extra tables

I have found this to be a bug, however, I need to remove the warnings
[SSIS.Pipeline] Warning: The output column "x" on output "y" and component "z" is not subsequently used in the Data Flow task.
How do I accomplish this ?
Thanks
The vendor supplied .xsd contains definitions for 12 xml tables.
We only use 3 but SSIS is complaining with warning message:
[SSIS.Pipeline] Warning: The output column....
Most of what I've seen on web search, say to direct these input streams to a union-all task, haven't seen a good exxample of this, so looking for other
methods.
Thanks
You may need to open the XMLSource component and, on the columns tab, uncheck the columns that you are not using later on the datafow; this will clear the warning messages.

tFuzzyMatch apparently not working on Arabic text strings

I have created a job in talend open studio for data integration v5.5.1.
I am trying to find matches between two customer names columns, one is a lookup and the other contain dirty data.
The job runs as expected when the customer names are in english. However, for arabic names, only exact matches are found regardless of the underlying match algorithm i used (levenschtein, metaphone, double metaphone) even with loose bounds for the levenschtein algorithm min 1 max 50).
I suspect this has to do with character encoding. How should I proceed? any way I can operate using the unicode or even UTF-8 interpretation in Talend?
I am using excel data sources through tFileInputExcel
I got it resolved by moving the data to mysql with a UTF-8 collation. Somehow Excel input wasn't preserving the collation.

Mahout: Importing CSV file to Sequence Files using regexconverter or arff.vector

I just started learning how to use mahout. I'm not a java programmer however, so I'm trying to stay away from having to use the java library.
I noticed there is a shell tool regexconverter. However, the documentation is sparse and non instructive. Exactly what does specifying a regex option do, and what does the transformer class and formatter class do? The mahout wiki is marvelously opaque. I'm assuming the regex option specifies what counts as a "unit" or so.
The example they list is of using the regexconverter to convert http log requests to sequence files I believe. I have a csv file with slightly altered http log requests that I'm hoping to convert to sequence files. Do I simply change the regex expression to take each entire row? I'm trying to run a Bayes classifier, similar to the 20 newsgroups example which seems to be done completely in the shell without need for java coding.
Incidentally, the arff.vector command seems to allow me to convert an arff file directly to vectors. I'm unfamiliar with arff, thought it seems to be something I can easily convert csv log files into. Should I use this method instead, and skip the sequence file step completely?
Thanks for the help.

Recommended column delimiter for Click stream data to consumed by SSIS

I am working with some click stream data and i would need to give specifications to the vendor regarding a preferred format to be consumed by SSIS.
As its URL data in the text file which column delimiter would you recommend. I was thinking pipe "|" but i realize that pipes can be used within the URL.
I did some testing to specify multiple charecters as delimiter lile |^| but when I am creating a flat file connection there is not option in SSIS. I had type these charecters. But when i went to edit the flat file connection manager it had changed to {|}^{|}. It just made me nervous to the import succeeded.
I just wanted to see if anybody has good ideas as to which would safe column delimiter to use.
Probably tab-delimited would be fairly safe, at least assuming that by "clickstream" you mean a list of URLs or something similar. But in theory any delimiter should be fine as long as the supplier quotes the data appropriately.

Resources