Neo4j creating relationships - neo4j

I've got two csv files imported into neo4j named : uniq_names and all_names. I've got one column and about 5000 rows in uniq_names file , and i've got three columns : name , id1 and id2 it's about 300000 rows in all_names file.
Now i'm trying to create relationship with code below:
MATCH (a:uniq_names),(b:all_names)
WHERE a.name=b.name AND b.id1<>b.id2
CREATE (a)-[:child]->(b);
as i execute the code it thinks about 20 minutes but returns: " 0 rows returned " as result , and it down't create any relationship.it works perfectly when i've got 1000 rows in all_names file and 50 rows in uniq_names file
I've got windows 7 64bit, jdk 1.7.0_71, neo4j 2.1.6 enterprise. Any ideas?

That query basically creates a cross product of you 5k uniq_names and 300k all_names, so 1.5bn operations - is not very efficient.
To optimize:
Create an index: CREATE INDEX ON :all_names(name). Then first go over all uniq_names and find corresponding all_names via an index lookup, check the id condition and create the relationships:
MATCH (a:uniq_names)
WITH a
MATCH (b:all_names {name: a.name})
WHERE b.id1<>b.id2
CREATE (a)-[:child]->(b);

Related

How to create Relationships betweem two nodes for Bulk Records

`I have Millions of records in Existing Database (Psql) that i have already imported into Neo4j using C# .Now how can we create Realtionships for millions of record.
For Example 1st Drug(node) Table seems like this :
ISR
Drug Name
1
DrugName1
1
DrugName2
For Example 2nd Reaction(node) Table seems like this :
iSR
Reaction Name
1
HeadPain
2
Rash
Ths is my Cypher Query
MERGE (d:Drug {id:'1', drug:'DrugName1'}),(r:Reaction{id:'1',reactionname:'ReactionName1'})
`
Please help me on this ,Thanks in Advance

Snowflake stream behavior

I have following fields in table 1-
db,schema,jobnm,status,runtime, ins_tstmp, upd_tstmp.
A stream has been created on table 1.
A stored procedure was written to loop through another table's dataset (4 records) and write all 4 records to table 1 if they don't already exist else update (using merge sql here; ins_tstmp gets populated via insert part of merge while upd_tstmp gets updated via update part ).
As expected, table1 has all 4 records and Stream also has 4 records with metadata$action as INSERT . UPD_TSTMP is null here.
Now on 2nd run, same 4 records were retrieved. Since they were a match, upd_tstmp got populated in both table 1 and stream but why metadata$action is INSERT only? Not seeing 2 entries for an update. Could someone please explain what I am missing here?
Thanks
Since they were a match, upd_tstmp got populated in both table 1 and
stream but why metadata$action is INSERT only?
The METADATA$ACTION column can have 2 possible values: INSERT and DELETE. So you can't see "UPDATE" in this column.
METADATA$ISUPDATE: This is an extra column indicating whether the operation was part of an UPDATE statement. In your case, you should also see it "false" because Streams record the differences between two offsets. If a row is added and then updated in the current offset, the delta change is a new row. The METADATA$ISUPDATE row records a FALSE value.
https://docs.snowflake.com/en/user-guide/streams-intro.html#stream-columns

Model.group(:id) throws an error "Select list is not in GROUP BY clause contains non aggregated column "id"

I am using Model.group(:category) in order to get unique records based on category field in Rails 5 application.
Table data:
id catgeory description
1 abc test
2 abc test1
3 abc test2
4 xyz test
5 xyz testabc
I want records (1,4) as a result. Therefore I am using Model.group(:category) which works fine for MYSQL whose sql_mode is " " .
Unforunately its throwing an error "SELECT list is not in GROUP BY clause and contains nonaggregated column which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by" whose sql_mode is "only_full_group_by".
whats the best way to change the query to match the mode?
Perhaps try specifying which id you want? You could use MIN(id), MAX(id) etc.
MySQL supports a non-standard extension to SQL described here. To continue using that behavior, you could change the sql_mode to TRADITIONAL in config/database.yml.

How to create a file in hdfs by combining two files in hadoop

I want to create a table in hive combining the columns of two tables.
So i want to create a single file in hdfs by including columns of both the files.
file1: a b c are the 3 columns
file2: x y z are the 3 columns
i want to create a file3: a b c x y z that has 6 columns.
How to do this ?
I tried many commands but it is appending the data into columns but i want all the columns in both the files to be present in a single file .
Thank you.
I think the simplest way will be to add id column to both tables (you need some column to do the join on) and then join tables on id column:
CREATE TABLE joined AS
SELECT first.id, first.a, first.b, first.c, second.x, second.y, second.z
FROM first JOIN second ON (first.id = second.id)

How to efficiently search for last record matching a condition in Rails and PostgreSQL?

Suppose you want to find the last record entered into the database (highest ID) matching a string: Model.where(:name => 'Joe'). There are 100,000+ records. There are many matches (say thousands).
What is the most efficient way to do this? Does PostgreSQL need to find all the records, or can it just find the last one? Is this a particularly slow query?
Working in Rails 3.0.7, Ruby 1.9.2 and PostgreSQL 8.3.
The important part here is to have a matching index. You can try this small test setup:
Create schema xfor testing:
-- DROP SCHEMA x CASCADE; -- to wipe it all for a retest or when done.
CREATE SCHEMA x;
CREATE TABLE x.tbl(id serial, name text);
Insert 10000 random rows:
INSERT INTO x.tbl(name) SELECT 'x' || generate_series(1,10000);
Insert another 10000 rows with repeating names:
INSERT INTO x.tbl(name) SELECT 'y' || generate_series(1,10000)%20;
Delete random 10% to make it more real life:
DELETE FROM x.tbl WHERE random() < 0.1;
ANALYZE x.tbl;
Query can look like this:
SELECT *
FROM x.tbl
WHERE name = 'y17'
ORDER BY id DESC
LIMIT 1;
--> Total runtime: 5.535 ms
CREATE INDEX tbl_name_idx on x.tbl(name);
--> Total runtime: 1.228 ms
DROP INDEX x.tbl_name_idx;
CREATE INDEX tbl_name_id_idx on x.tbl(name, id);
--> Total runtime: 0.053 ms
DROP INDEX x.tbl_name_id_idx;
CREATE INDEX tbl_name_id_idx on x.tbl(name, id DESC);
--> Total runtime: 0.048 ms
DROP INDEX x.tbl_name_id_idx;
CREATE INDEX tbl_name_idx on x.tbl(name);
CLUSTER x.tbl using tbl_name_idx;
--> Total runtime: 1.144 ms
DROP INDEX x.tbl_name_id_idx;
CREATE INDEX tbl_name_id_idx on x.tbl(name, id DESC);
CLUSTER x.tbl using tbl_name_id_idx;
--> Total runtime: 0.047 ms
Conclusion
With a fitting index, the query performs more than 100x faster.
Top performer is a multicolumn index with the filter column first and the sort column last.
Matching sort order in the index helps a little in this case.
Clustering helps with the simple index, because still many columns have to be read from the table, and these can be found in adjacent blocks after clustering. It doesn't help with the multicolumn index in this case, because only one record has to be fetched from the table.
Read more about multicolumn indexes in the manual.
All of these effects grow with the size of the table. 10000 rows of two tiny columns is just a very small test case.
You can put the query together in Rails and the ORM will write the proper SQL:
Model.where(:name=>"Joe").order('created_at DESC').first
This should not result in retrieving all Model records, nor even a table scan.
This is probably the easiest:
SELECT [columns] FROM [table] WHERE [criteria] ORDER BY [id column] DESC LIMIT 1
Note: Indexing is important here. A huge DB will be slow to search no matter how you do it if you're not indexing the right way.

Resources