Insert data into avro-formatted, partitioned hive table with data from HDFS - apache-hive

I have created a hive table named employee (avro formatted) with partition on department.
I have the avro dataset in my HDFS location. My dataset is also having department id.
I would like to import the data into Hive table with the data from HDFS. During the import, I want the data to be kept in its respective partition.
How to achieve this? any idea?

There are 2 ways of doing it.
1.Manual partitioning
load data inpath hdfs path into table employee_table partition(deptId='1')
load data inpath hdfs path into table employee_table partition(deptId='2')
2.Dynamic partitioning
a. Create a intermediate table
b. Create a employee table with partition
c. Load data from intermediate table to partition table

Related

Bulk Update neo4j relationship properties through csv

I have a csv file which have 3 column
Follower_id,Following_id,createTime
My Node in neo4j represent a USER and it has multiple properties one of them is profileId,.Two nodes in the graph can have FOLLOW_RELATIONSHIP and i have to update the createtime for FOLLOW_RELATIONSHIP properties.There are lots of relationships in the graph. I am new in neo4j i dont have much idea about how to do bulk update efficiently.
You can try something like this:
USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM 'FILEPATH' AS row
WITH row
MATCH (u1:User{profileId: row.Follower_id})
MATCH (u2:User{profileId: row.Following_id})
MERGE (u1)-[r:FOLLOW_RELATIONSHIP]->(u2)
SET r.createTime = row.createTime
FILEPATH is the path of the file on your system, usually within the database directory itself or some web link. You can learn how to set it from this article.

Can we use .set, .append or .replace command in ADX to combine data of Table A with Table B?

I have Table A which contains History data of 1 month(August 2022) and Table B in which I am doing real time ingestion . I want to ingest the data of Table A in Table B and also check if there is any duplicate data in Table B after combining data of both the tables.Can we use .append command to combine data of these tables without loosing my existing data?
How to use these tags in this code. I am bit confused on using these tags?
.append Table B with(tags='["TagA","TagB"]') <|
Table A
| where timestamp between (datetime(2022-08-01 T09:00:00.000Z) .. datetime(2022-08-31 T11:00:00.000))
| project timeseries, value, timestamp, scaled_value
1.
Regarding set/append/replace, the documentation is quite clear.
You need .append.
Command
If table exists
If table doesn't exist
.set
The command fails
The table is created and data is ingested
.append
Data is appended to the table
The command fails
.set-or-append
Data is appended to the table
The table is created and data is ingested
.set-or-replace
Data replaces the data in the table
The table is created and data is ingested
2.
Regarding the tables` merge, it would be better to write a query that compares the two tables and returns only the data that is actually needed to be appended to table B (As opposed to ingest everything and then delete the excess data).
P.S.
If you need help with such a query, please open a new post for that.
3.
You can read all about tags here.
Theoretically, as part of the ingestion process, you can mark your A table data with a drop-by: tag, so it could be identified after the merge into table B.

How to export data from neo4j to a MySQL table

I have below data in my neo4j database which I want to insert into mysql table using jdbc.
"{""id"":7512,""labels"":[""person1""],""properties"":{""person1"":""Nishant"",""group_uuid"":""6b27c9c8-4d5b-4ebc-b8c2-667bb159e029""}}"
"{""id"":7513,""labels"":[""person1""],""properties"":{""person1"":""anish"",""group_uuid"":""6b27c9c8-4d5b-4ebc-b8c2-667bb159e029""}}"
"{""id"":7519,""labels"":[""person1""],""properties"":{""person1"":""nishant"",""group_uuid"":""6b27c9c8-4d5b-4ebc-b8c2-667bb159e029""}}"
"{""id"":7520,""labels"":[""person1""],""properties"":{""person1"":""xiaoyi"",""group_uuid"":""9d7d4bf6-6db6-4cf2-8186-d8d0621a58c5""}}"
"{""id"":7521,""labels"":[""person1""],""properties"":{""person1"":""pavan"",""group_uuid"":""3ddc954a-16f5-4c59-a94a-b262f9784211""}}"
"{""id"":7522,""labels"":[""person1""],""properties"":{""person1"":""jose"",""group_uuid"":""6b27c9c8-4d5b-4ebc-b8c2-667bb159e029""}}"
"{""id"":7523,""labels"":[""person1""],""properties"":{""person1"":""neil"",""group_uuid"":""9d7d4bf6-6db6-4cf2-8186-d8d0621a58c5""}}"
"{""id"":7524,""labels"":[""person1""],""properties"":{""person1"":""menish"",""group_uuid"":""9d7d4bf6-6db6-4cf2-8186-d8d0621a58c5""}}"
"{""id"":7525,""labels"":[""person1""],""properties"":{""person1"":""ankur"",""group_uuid"":""3ddc954a-16f5-4c59-a94a-b262f9784211""}}"
Desired Output in mysql database table.
id,name,group_id
7525,ankur,3ddc954a-16f5-4c59-a94a-b262f9784211
7524,menish,9d7d4bf6-6db6-4cf2-8186-d8d0621a58c5
...
Since you did not provide much info in your question, here is a general approach for exporting from neo4j to MySQL.
Execute a Cypher query using one of the APOC export to CSV procedures to export the data intended for the table to a CSV file.
Import from the CSV file into MySQL. (E.g., here is a tutorial.)

How to add a new record into a existing labelled node in neo4j GraphDB reading from csv file

I am trying to add new record that is a whole row into a labelled node in neo4j graph db. Lets say I have node named Customer
╒══════════════════════════════════════════════════════════════════════╕
│"n" │
╞══════════════════════════════════════════════════════════════════════╡
│{"DISTRICT":"abc","THANA":"xyzzy","DIVISIO│
│N":"abc","REGDATE":"1-2-2015","ID":"0123"} │
├──────────────────────────────────────────────────────────────────────┤
I want to add another row consists with these fields and relevant value from reading a csv file. This nodes holds a large data. so I think apoc with periodic iteration will be good idea for processing it parallel. but I am confused about adding a whole row into a labelled node. I have learnt to update property information through "merge on set on create" approach but can't perform to add new record into labelled node. I am expecting to see a table consisted new record having labelled node (customer). kindly help me to solve this
Here is an example of how to use LOAD CSV to create your neo4j data from a CSV file. Please pay attention to the Introduction section of the docs for important info on how to configure the neo4j server and for where to store the CSV file (if you want to use a local file).
Suppose your data is in a input.csv file that starts with a header row, like this:
DISTRICT,THANA,DIVISION,REGDATE,ID
abc,xyzzy,abc,1-2-2015,0123
def,foobar,nbc,1-3-2015,0124
This query should then create one Customer node per file row:
LOAD CSV WITH HEADERS FROM 'file:///input.csv' AS row
CREATE (c:Customer)
SET c = row

how to export neo4j data of database A to database B?

I have two cases:
case 1:export a part of data in neo4j database A to database B,like data of Label "Person" in database A ,I wanna export "Person" data from A to B
case 2: export whole data from A to B
so how to deal with these two cases? thanks
APOC allows to export the full graph or subgraphs into a cypher file consisting of create statements, see https://neo4j-contrib.github.io/neo4j-apoc-procedures/#_export_to_cypher_script for details.
The other option would be access the other database via the neo4j jdbc driver and use apoc.load.jdbc to retrieve data from there.

Resources