Cypher load CSV eager and long action duration - neo4j

im loading a file with 85K lines - 19M,
server has 2 cores, 14GB RAM, running centos 7.1 and oracle JDK 8
and it can take 5-10 minutes with the following server config:
dbms.pagecache.memory=8g
cypher_parser_version=2.0
wrapper.java.initmemory=4096
wrapper.java.maxmemory=4096
disk mounted in /etc/fstab:
UUID=fc21456b-afab-4ff0-9ead-fdb31c14151a /mnt/neodata
ext4 defaults,noatime,barrier=0 1 2
added this to /etc/security/limits.conf:
* soft memlock unlimited
* hard memlock unlimited
* soft nofile 40000
* hard nofile 40000
added this to /etc/pam.d/su
session required pam_limits.so
added this to /etc/sysctl.conf:
vm.dirty_background_ratio = 50
vm.dirty_ratio = 80
disabled journal by running:
sudo e2fsck /dev/sdc1
sudo tune2fs /dev/sdc1
sudo tune2fs -o journal_data_writeback /dev/sdc1
sudo tune2fs -O ^has_journal /dev/sdc1
sudo e2fsck -f /dev/sdc1
sudo dumpe2fs /dev/sdc1
besides that,
when running a profiler, i get lots of "Eagers", and i really cant understand why:
PROFILE LOAD CSV WITH HEADERS FROM 'file:///home/csv10.csv' AS line
FIELDTERMINATOR '|'
WITH line limit 0
MERGE (session :Session { wz_session:line.wz_session })
MERGE (page :Page { page_key:line.domain+line.page })
ON CREATE SET page.name=line.page, page.domain=line.domain,
page.protocol=line.protocol,page.file=line.file
Compiler CYPHER 2.3
Planner RULE
Runtime INTERPRETED
+---------------+------+---------+---------------------+--------------------------------------------------------+
| Operator | Rows | DB Hits | Identifiers | Other |
+---------------+------+---------+---------------------+--------------------------------------------------------+
| +EmptyResult | 0 | 0 | | |
| | +------+---------+---------------------+--------------------------------------------------------+
| +UpdateGraph | 9 | 9 | line, page, session | MergeNode; Add(line.domain,line.page); :Page(page_key) |
| | +------+---------+---------------------+--------------------------------------------------------+
| +Eager | 9 | 0 | line, session | |
| | +------+---------+---------------------+--------------------------------------------------------+
| +UpdateGraph | 9 | 9 | line, session | MergeNode; line.wz_session; :Session(wz_session) |
| | +------+---------+---------------------+--------------------------------------------------------+
| +ColumnFilter | 9 | 0 | line | keep columns line |
| | +------+---------+---------------------+--------------------------------------------------------+
| +Filter | 9 | 0 | anon[181], line | anon[181] |
| | +------+---------+---------------------+--------------------------------------------------------+
| +Extract | 9 | 0 | anon[181], line | anon[181] |
| | +------+---------+---------------------+--------------------------------------------------------+
| +LoadCSV | 9 | 0 | line | |
+---------------+------+---------+---------------------+--------------------------------------------------------+
all the labels and properties have indices / constrains
thanks for the help
Lior

He Lior,
we tried to explain the Eager Loading here:
And Marks original blog post is here: http://www.markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/
Rik tried to explain it in easier terms:
http://blog.bruggen.com/2015/07/loading-belgian-corporate-registry-into_20.html
Trying to understand the "Eager Operation"
I had read about this before, but did not really understand it until Andres explained it to me again: in all normal operations, Cypher loads data lazily. See for example this page in the manual - it basically just loads as little as possible into memory when doing an operation. This laziness is usually a really good thing. But it can get you into a lot of trouble as well - as Michael explained it to me:
"Cypher tries to honor the contract that the different operations
within a statement are not affecting each other. Otherwise you might
up with non-deterministic behavior or endless loops. Imagine a
statement like this:
MATCH (n:Foo) WHERE n.value > 100 CREATE (m:Foo {m.value = n.value + 100});
If the two statements would not be
isolated, then each node the CREATE generates would cause the MATCH to
match again etc. an endless loop. That's why in such cases, Cypher
eagerly runs all MATCH statements to exhaustion so that all the
intermediate results are accumulated and kept (in memory).
Usually
with most operations that's not an issue as we mostly match only a few
hundred thousand elements max.
With data imports using LOAD CSV,
however, this operation will pull in ALL the rows of the CSV (which
might be millions), execute all operations eagerly (which might be
millions of creates/merges/matches) and also keeps the intermediate
results in memory to feed the next operations in line.
This also
disables PERIODIC COMMIT effectively because when we get to the end of
the statement execution all create operations will already have
happened and the gigantic tx-state has accumulated."
So that's what's going on my load csv queries. MATCH/MERGE/CREATE caused an eager pipe to be added to the execution plan, and it effectively disables the batching of my operations "using periodic commit". Apparently quite a few users run into this issue even with seemingly simple LOAD CSV statements. Very often you can avoid it, but sometimes you can't."

Related

Grep finds the word and then stucks (bash)

I have a loop (while, extracting 2 variables) where I found one command is not working. Even when I put the command in the console directly (subsituting by my own the variable) it gives the result but continue working without any advance.
The command's objective is to find in a big file.gct, in specific in its first three lines, an object obtained from other file and then print the finding and everything before in that line.
If someone know why it stucks and how to fix it or even an alternative that works well in loops and does not demands more RAM's use it would be appreciated.
head -3 file_2 | grep -E -o ".{0,1000}$variable."
Kind of an example as how it looks the big file (file_2):
head -3 file_2
| #1.2 |
| 57000 | 17300 |
|Irrelevant|Irrelevant2| DATA-B12-18 | DATA-Y17-72 | DATA-A12-44 | .... |
When I run in the terminal: head -3 file_2 | grep -E -o ".{0,1000}DATA-B12-18"
the output is:
Irrelevant Irrelevant2 DATA-B12-18 and then stacks.

FItNesse: Convert Fit fixture to Slim

Looking for a solution to convert Fit fixture for FitNesse test to Slim.
I got the Command-Line Fit fixture.
Since all my Fitnesse test system is running on Slim I need to have CommandLineFixture as Slim to execute bash script from my test.
Any other workaround for this should work for me.
I am trying to execute a script from FitNesse test and this script writes some text in file present in a server where my Fitnesse server is running.
But what I am observing with the provided fixture its opening file and not writing any text into it.
So just wanted to check do we have any constraint with Fitnesse to execute a script which writes into a file.
Also, I have given all rwx permission to the text file
Below is my modified script:
!define TEST_SYSTEM {slim}
!path ../fixtures/*.jar
|Import |
| nl.hsac.fitnesse.fixture.slim.ExecuteProgramTest |
|script |
|set |-c |as argument|0 |
|set |ls -l / |as argument|1 |
|execute|/bin/bash |
|check |exit code |0 |
|show |standard out |
|check |standard error|!--! |
Executing the above test fetched no response and gives the result as:
Test Pages: 0 right, 0 wrong, 1 ignored, 0 exceptions
Assertions: 0 right, 0 wrong, 0 ignored, 0 exceptions
(0.456 seconds)
I had a helper method to start a program in my my fixture library already, but I started work on fixture today. Would the execute program test fixture work for you?
Example usage:
We can run a program with some arguments, check its exit code and show its output.
|script |execute program test |
|set |-c |as argument|0|
|set |ls -l / |as argument|1|
|execute|/bin/bash |
|check |exit code |0 |
|show |standard out |
|check |standard error|!--! |
The default timeout for program execution is one minute, but we can set a custom timeout. Furthermore we can control the directory it is invoked from, set all arguments using a list and get its output 'unformatted'.
|script |execute program test |
|check |timeout |60000 |
|set timeout of |100 |milliseconds|
|set working directory|/ |
|set |-c, ls -l |as arguments|
|execute |/bin/bash |
|check |exit code |0 |
|show |raw standard out |
|check |raw standard error|!--! |
The timeout can also be set in seconds, and pass environment variables (and the process's output is escaped to ensure it is displayed properly).
|script |execute program test |
|set timeout of|1 |seconds |
|set value |Hi <there> |for |BLA|
|set |-c |as argument|0 |
|set |!-echo ${BLA}-!|as argument|1 |
|execute |/bin/bash |
|check |exit code |0 |
|check |raw standard out |!-Hi <there>
-!|
|check|standard out|{{{Hi <there>
}}}|

Does Release Management 2013 rollback across tags

We're heavy users of tags and I'm confused how tags and rollbacks interact together.
I understand that rollbacks cascade (at least within a sequence) from this article:
http://incyclesoftware.com/2014/03/understanding-rollbacks-release-management/
But I'm not clear how this would interact when you use tags, i.e. we tag servers by what features are installed on them (web, database, service) and vary the mix of features depending on environment (i.e. DEV might have web & services running on the same machine, but UAT & PROD would have seperate machines)
So does the rollback go back across the tag boundaries? If for example your sequence looked like this
+--Database tag --+
| Backup DB |
| | |
| Update DB |
| | | <- Runs against SQL server
| +--Rollback--+ |
| | Restore DB | |
| +------------+ |
+-----------------+
|
+---Web Tag-------+
| Do Stuff | <- Runs against WEB server
+-----------------+
|
+---Service tag----+
| Backup |
| | |
| Install new ver | <- Runs against Service server
| | |
| Smoke test |
| | |
| +--Rollback----+ |
| | Replace with | |
| | backup | |
| +--------------+ |
+------------------+
Would a roll back inside the service tag cause the database tag to execute it's rollback? Do rollbacks cascade across sequences?
I haven't had time to set this up yet and test so I thought I'd ask the question instead.
By accident I've managed to test this out with a suitable release and roll back does roll back across the tags as #joerage says.
It appears I was wrong... faulty memory and all that. Rollbacks work across tag boundaries.
I generally recommend against using rollback blocks, since their behavior is generally backwards, unpredictable, and not immediately obvious. The current best practice is actually to not use agent-based releases at all, as they will not be portable to the forthcoming Release Management Service.

Removal of Role PostgreSQL Failed - cache lookup failed for database

This is my first time using PostgreSQL for production.
I made a database blog_production with username blog_production and generated password from gemfile capistrano-postgresql. Once it is generated, I tried to delete database blog_production with this command from terminal:
$ sudo -u postgres dropdb blog_production
After that I tried to delete user blog_production with this command:
$ sudo -u postgres droprole blog_production
And it returned dropuser: removal of role "blog_production" failed: ERROR: cache lookup failed for database 16417
1.) Why is this happening?
2.) I also tried to delete from psql using DELETE FROM pg_roles WHERE rolname='blog_production' but it returned the same error (cache lookup failed)
3.) How do I solve this problem?
Thank you.
Additional Information
PostgreSQL Version
PostgreSQL 9.1.15 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
(1 row)
select * from pg_shdepend where dbid = 16417;
dbid | classid | objid | objsubid | refclassid | refobjid | deptype
-------+---------+-------+----------+------------+----------+---------
16417 | 1259 | 16419 | 0 | 1260 | 16418 | o
16417 | 1259 | 16426 | 0 | 1260 | 16418 | o
16417 | 1259 | 16428 | 0 | 1260 | 16418 | o
(3 rows)
select * from pg_database where oid = 16417;
datname | datdba | encoding | datcollate | datctype | datistemplate | datallowconn | datconnlimit | datlastsysoid | datfrozenxid | dattablespace | datacl
---------+--------+----------+------------+----------+---------------+--------------+--------------+---------------+--------------+---------------+--------
(0 rows)
select * from pg_authid where rolname = 'blog_production'
rolname | rolsuper | rolinherit | rolcreaterole | rolcreatedb | rolcatupdate | rolcanlogin | rolreplication | rolconnlimit | rolpassword | rolvaliduntil
-----------------+----------+------------+---------------+-------------+--------------+-------------+----------------+--------------+-------------------------------------+---------------
blog_production | f | t | f | f | f | t | f | -1 | md5d4d2f8789ab11ba2bd019bab8be627e6 |
(1 row)
Somehow the DROP database; didn't drop the shared dependencies correctly. PostgreSQL still thinks that your user owns three tables in the database you dropped.
Needless to say this should not happen; it's almost certainly a bug, though I don't know how we'd even begin tracking it down unless you know exactly what commands you ran etc to get to this point, right from creating the DB.
If the PostgreSQL install's data isn't very big and if you can share the contents, can I get you to stop the database server and make a tarball of the whole database directory, then send it to me? I'd like to see if I can tell what happened to get you to this point.
Send a dropbox link to craig#2ndquadrant.com . Just:
sudo service postgresql stop
sudo tar cpjf ~abrahamks/abrahamks-postgres.tar.gz \
/var/lib/postgresql/9.1/main \
/etc/postgresql/9.1/main \
/var/log/postgresql/postgresql-9.1-main-*.
/usr/lib/postgresql/9.1
sudo chown abrahamks ~abrahamks/abrahamks-postgres.tar.gz
and upload abrahamks-postgres.tar.gz from your home folder.
Replace abrahamks with your username on your system. You might need to adjust the paths above if I'm misremembering where the PostgreSQL data lives on Debian-derived systems.
Note that this contains all your databases not just the one that was an issue, and it also contains your PostgreSQL user accounts.
(If you're going to send me a copy, do so before continuing):
Anyway, since the database is dropped, it is safe to manually remove the dependencies that should've been removed by DROP DATABASE:
DELETE FROM pg_shdepend WHERE dbid = 16417
It should then be possible to DROP USER blog_production;

" Error Compilation error: encoded string too long:" when making a build

I have a Grails project that is running correctly in dev mode but when I try to create a war file it gives me following message and stops the build
| Compiling 1 source files
| Compiling 1 source files.
| Compiling 1 source files..
| Compiling 1 source files...
| Compiling 1 source files....
| Compiling 1 source files.....
| Compiling 16 GSP files for package [ProjectName]
| Compiling 16 GSP files for package [ProjectName].
| Error Compilation error: encoded string too long: 108421 bytes
Grails doesn't give me any other info in terms of which GSP or line has the problem, anyone seen this happening?
Here are the grails stats, I would say its a fairly small project
+----------------------+-------+-------+
| Name | Files | LOC |
+----------------------+-------+-------+
| Controllers | 6 | 624 |
| Domain Classes | 6 | 109 |
| Java Helpers | 1 | 96 |
| Unit Tests | 12 | 565 |
| Scripts | 1 | 4 |
+----------------------+-------+-------+
| Totals | 26 | 1398 |
+----------------------+-------+-------+
It seems this is grails bug with versions prior to 2.3.7 but it's fixed in 2.3.7 and above.
You have two options upgrade or follow the below steps
Find all the gsp pages with file size greater than 64K.
Add <% /* comment to break the static gsp block */ %> to the middle of your static pages (add it to the end of html tags, for example after </P> etc).
This will make grails think that it's processing two chunks and allows it to get processed.
I've seen this before. Exactly what #tim_yates commented! Refactored some gsp's [include for example] and all was good again. Also, making a little research about this I found some interesting things about DataOutputStream.java. It seems to have a 64kb limit for String objects.
Maybe this can also help you.
Cheers!
I never knew what the problem was, all I did is moved all the needed file to a brand new project and this error disappeared!

Resources