How to connect to oralce database and import the data into csv format using dask? - dask

How can I connect to oracle database using dask and fetch the data from it and create a csv file using the fetched data.

You can read a SQL table into a Dask DataFrame with dask.dataframe.read_sql_table.
This question should help you create a Dask DataFrame from an Oracle table.
You can use to_csv to write out CSV files after creating the DataFrame.
It'll look something like this:
import dask.dataframe as dd
ddf = dd.read_sql_table(...)
ddf.to_csv(...)

Related

How to export data from neo4j to a MySQL table

I have below data in my neo4j database which I want to insert into mysql table using jdbc.
"{""id"":7512,""labels"":[""person1""],""properties"":{""person1"":""Nishant"",""group_uuid"":""6b27c9c8-4d5b-4ebc-b8c2-667bb159e029""}}"
"{""id"":7513,""labels"":[""person1""],""properties"":{""person1"":""anish"",""group_uuid"":""6b27c9c8-4d5b-4ebc-b8c2-667bb159e029""}}"
"{""id"":7519,""labels"":[""person1""],""properties"":{""person1"":""nishant"",""group_uuid"":""6b27c9c8-4d5b-4ebc-b8c2-667bb159e029""}}"
"{""id"":7520,""labels"":[""person1""],""properties"":{""person1"":""xiaoyi"",""group_uuid"":""9d7d4bf6-6db6-4cf2-8186-d8d0621a58c5""}}"
"{""id"":7521,""labels"":[""person1""],""properties"":{""person1"":""pavan"",""group_uuid"":""3ddc954a-16f5-4c59-a94a-b262f9784211""}}"
"{""id"":7522,""labels"":[""person1""],""properties"":{""person1"":""jose"",""group_uuid"":""6b27c9c8-4d5b-4ebc-b8c2-667bb159e029""}}"
"{""id"":7523,""labels"":[""person1""],""properties"":{""person1"":""neil"",""group_uuid"":""9d7d4bf6-6db6-4cf2-8186-d8d0621a58c5""}}"
"{""id"":7524,""labels"":[""person1""],""properties"":{""person1"":""menish"",""group_uuid"":""9d7d4bf6-6db6-4cf2-8186-d8d0621a58c5""}}"
"{""id"":7525,""labels"":[""person1""],""properties"":{""person1"":""ankur"",""group_uuid"":""3ddc954a-16f5-4c59-a94a-b262f9784211""}}"
Desired Output in mysql database table.
id,name,group_id
7525,ankur,3ddc954a-16f5-4c59-a94a-b262f9784211
7524,menish,9d7d4bf6-6db6-4cf2-8186-d8d0621a58c5
...
Since you did not provide much info in your question, here is a general approach for exporting from neo4j to MySQL.
Execute a Cypher query using one of the APOC export to CSV procedures to export the data intended for the table to a CSV file.
Import from the CSV file into MySQL. (E.g., here is a tutorial.)

How to "reindex" with Dask DataFrame

I'm looking into using dask for time-series research with large volumes of data. One common operation that I use is realignment of data to a different index (the reindex operation on pandas dataframe's). I noticed that the reindex function is not currently supported in the dask dataframe API, but is in the DataArray API. Are there plans to add this function?
I believe you could use the Dataframe.set_index() method combined with .resample() for that same purpose.

Converting Parquet to Avro

I see plenty of examples on how to convert Avro files to Parquet, with Parquet retaining the Avro schema in its metadata.
I'm confused however on if there's some simple way of doing the opposite - converting Parquet to Avro. Any examples of that?
I think with Impala with some query like that should work :
CREATE TABLE parquet_table(....) STORED AS PARQUET;
CREATE TABLE avro_table(....) STORED AS AVRO;
INSERT INTO avro_table SELECT * FROM parquet_table;
Parquet data stored in parquet_table will be converted into avro format into avro_table.

Neo4j staged batch import

I want to import existing entities and their relationships from MySQL database to a new Neo4j db. I have several questions that I still do not quite understand -
Based on the description of the batch importer, it appears as if I need to have both an entity and relationship file. Can I execute an import without one or the other file type?
Can I execute a series of batch imports, using different files for different entities?
Are you using the batch importer from the Neo4j website or the one by jexp/Michael Hunger ?
If it's the jexp batch-import you could execute just the entity/nodes file (resulting in a bunch of nodes and no edges) or just the rels file (resulting in an empty graph since there's no nodes to connect). Or you could import the nodes, then import the rels, either in the same import or in a series of imports.

Use CSV to populate Neo4j

I am very new for Neo4j. I am a learner of this graph database. I need to load a csv file into Neo4j database. I am trying from 2 days,I couldn't able to find good information of reading the csv file in to Neo4j. Please suggest me wil sample code or blogs of reading csv file into Neo4j.
Example:
Suppose if i have a csv file in This way how can we read it into Neo4j
id name language
1 Victor Richards West Frisian
2 Virginia Shaw Korean
3 Lois Simpson Belarusian
4 Randy Bishop Hiri Motu
5 Lori Mendoza Tok Pisin
You may want to try https://github.com/sroycode/neo4j-import
This populates data directly from a pair of CSV files ( entries must be COMMA separated )
To build: (you need maven)
sh build.sh
The nodes file has a mandatory field id and any other fields you like
NODES.txt
id,name,language
1,Victor Richards,West Frisian
2,Virginia Shaw,Korean
3,Lois Simpson,Belarusian
The relationships file has 3 mandatory fields from,to,type. Assuming you have a field age ( long integer), and info, the relations file will look like
RELNS.txt
from,to,type,age#long,info
1,2,KNOWS,10,known each other from school
1,3,CLUBMATES,5,member of country club
Running:
sh run.sh graph.db NODES.txt RELNS.txt
will create graph.db in the current folder which you can copy to the neo4j data folder.
Note:
If you are using neo4j later than 1.6.* , please add this line in conf/neo4j.properties
allow_store_upgrade = true
Have fun.
Please take a look at https://github.com/jexp/batch-import
Can be used as starting point
There is nothing available to generically load CSV data into Neo4j because the source and destination data structures are different: CSV data is tabular whereas Neo4j holds graph data.
In order to achieve such an import, you will need to add a separate step to translate your tabular data into some form of graph (e.g. a tree) before it can be loaded into Neo4j. Taking the tree structure further as an example, the page below shows how XML data can be converted into Cypher which may then be directly executed against a Neo4j instance.
http://geoff.nigelsmall.net/xml2graph/
Please feel free to use this tool if it helps (bear in mind it can only deal with small files) but this will of course require you to convert your CSV to XML first.
Cheers
Nigel
there is probably no known CSV importer for neo4j, you must import it yourself:
i usually do it myself via gremlin's g.loadGraphML(); function.
http://docs.neo4j.org/chunked/snapshot/gremlin-plugin.html#rest-api-load-a-sample-graph
i parse my data with some external script into the xml syntax and load the particular xml file. you can view the syntax here:
https://raw.github.com/tinkerpop/gremlin/master/data/graph-example-1.xml
parsing an 100mb file takes few minutes.
in your case what you need to do is a simple bipartite graph with vertices consisting of users and languages, and edges of "speaks". if you know some programming, then create user nodes with parameters id, name | unique language nodes with parameters name | relationships where you need to connect each user with the particular language. note that users can be duplicite whereas languages can't.
I believe your question is too generic. What does your csv file contain? Logical meaning of the contents of a csv file can vary very much. An example of two columns with IDs, which would represent entities connected to each other.
3921 584
831 9891
3841 92
...
In this case you could either write a BatchInserter code snippet which would import it faster, see http://docs.neo4j.org/chunked/milestone/batchinsert.html.
Or you could import using regular GraphDatabaseService with transaction sizes of a couple of thousands inserts for performance. See how to setup and use the graph db at http://docs.neo4j.org/chunked/milestone/tutorials-java-embedded.html.

Resources