Neo4j 2.0.4 browser cannot query large datasets - neo4j

Whenever I try to run cypher queries in Neo4j browser 2.0 on large (anywhere from 3 to 10GB) batch-imported datasets, I receive an "Unknown Error." Then Neo4j server stops responding, and I need to exit out using Task Manager. Prior to this operation, the server shuts down quickly and easily. I have no such issues with smaller batch-imported datasets.
I work on a Win 7 64bit computer, using the Neo4j browser. I have adjusted the .properties file to allow for much larger memory allocations. I have configured my JVM heap to 12g, which should be fine for 64bit JDK. I just recently doubled my RAM, which I thought would fix the issue.
My CPU usage is pegged. I have the logs enabled but I don't know where to find them.
I really like the visualization capabilities of the 2.0.4 browser, does anyone know what might be going wrong?

Your query is taking a long time, and the web browser interface reports "Unknown Error" after a certain timeout period. The query is still running, but you won't see the results in the browser. This drove me nuts too when it first happened to me. If you run the query in the neo4j shell you can verify whether or not this is the problem, because the shell won't time out.
Once this timeout occurs, you can find that the whole system becomes quite non-responsive, especially if you re-run the query, because now you have two extremely long queries running in parallel!
Depending on the type of query, you may be able to improve performance. Sometimes it's as simple as limiting the number of returned nodes (in cases where you only need to find one node or path).
Hope this helps.
Grace and peace,
Jim

Related

Neo4J Memory tuning having little effect

I am currently running some simple cypher queries (count etc) on a large dataset (>10G) and am having some issues with tuning NE04J.
The machine running the queries has 4TB of ram, 160 cores and is running Ubuntu 14.04/neo4j version 2.3. Originally I left all the settings as default as it is stated that free memory will be dynamically allocated as required. However, as the queries are taking several minutes to complete I assumed this was not the case. As such I have set various combinations of the following parameters within the neo4j-wrapper.conf:
wrapper.java.initmemory=1200000
wrapper.java.maxmemory=1200000
dbms.memory.heap.initial_size=1200000
dbms.memory.heap.max_size=1200000
dbms.jvm.additional=-XX:NewRatio=1
and the following within neo4j.properties:
use_memory_mapped_buffers=true
neostore.nodestore.db.mapped_memory=50G
neostore.relationshipstore.db.mapped_memory=50G
neostore.propertystore.db.mapped_memory=50G
neostore.propertystore.db.strings.mapped_memory=50G
neostore.propertystore.db.arrays.mapped_memory=1G
following every guide/Stackoverflow post I could find on the topic, but I seem to have exhausted the available material with little effect.
I am running queries through the shell using the following command neo4j-shell -c < "queries/$1.cypher", but have also tried explicitly passing the conf files with -config $NEO4J_HOME/conf/neo4j-wrapper.conf (restarting the sever everytime I make a change).
I imagine that I have missed something silly which is causing the issue, as there are many reports of neo4j working well with data of this size, but cannot think what it could be. As such any help would be greatly appreciated.
Type :SCHEMA in your neo4j browser to show if you have indexes.
Share a couple of your queries.
In the neo4j.properties file, you need to set the dbms.pagecache.memory setting to about 1.5x the size of your database files. In your example, you can set it to 15g

How to debug Neo4J stalls?

I have a Neo4J server running that periodically stalls out for 10s of seconds. The web frontend will say it's "disconnected" with the red warning bar at the top, and a normally instant REST query in my application will apparently hang, until the stall ends and then everything returns to normal. The web frontend becomes usable and my REST query completes fine.
Is there any way to debug what is happening during one of these stall periods? Can you get a list of currently running queries? Or a list of what hosts are connected to the server? Or any kind of indication of server load?
Most likely JVM garbage collection kicking in because you haven't allocated enough heap space.
There's a number of ways to debug this. You can for example enable GC logging (uncomment appropriate lines in neo4j-wrapper.conf), or use a profiler (e.g. YourKit) to see what's going on and why the pauses.

How to configure Neo4j to run in a minimal memory environment?

For demo purposes, I am running Neo4j in a low memory environment -- A laptop with 4GB of RAM, 1644MB is use for video memory, leaving only 2452 MB available for use.. It's also running SQL Server, our WCF services, and our clients.. So there's little memory for Neo4j.
I'm running LOAD CSV cypher scripts via REST from a C# service. There are more than 20 scripts, and theyt work well in a server environment. I've written code to paginate, so that they run in smaller batches. I've reduced the batch size very low ( 25 csv rows ) and a given script may do 300 batches, but I continue to get "Java heap space" errors at some point.
I've tried configuring Neo4j with a relatively large heap space ( 640MB ) which is all the available RAM size plus setting the cache_type to none, and it gets much further before I get the java heap space error. What I don't understand is in that case, why does it grow that much? Also until I restart the neo4j service, I get these java heap space errors quickly. The batch size doesn't seem to impact how much memory is used appreciably.
However, after doing that, and I run the application with these settings, the query performance becomes very slow due to the cache settings.
I am running this on a Windows 7 laptop with 4G RAM -- using Neo4j 2.2.1 Community Edition.
Thoughts?
Perhaps you can share your LOAD CSV statement and the other queries you run.
I think you just run into this:
http://markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/
So PROFILE or EXPLAIN your queries and make it not to use that much intermediate state. We can help if you share your statements.
And you should use PERIODIC COMMIT 100.
Something like:
heap=512M
dbms.pagecache.memory=200M
keep_logical_logs=false
cache_type=none
http://console.neo4j.org runs neo4j in memory putting up to 50 instances in a single gigabyte of memory. So it should be doable.

Performance problems jasper reports and grails/groovy

I am expericencing heavy performance problems with generating PDFs while using Jasper Reports in my grails application. I am invoking the jasperService:
def reportDef = jasperService.buildReportDefinition(parameter, LocaleContextHolder.getLocale(), [data: emptyData])
Running in Jboss several times, performance is good. After X hours, performance is 100+ times worse than after the start of Jboss... Response time is changing from 7-12 seconds to several minutes for creating a PDF with one single page. I am sure, that the performance lag is within this invocation, because I have added time measurements around it. As the report data is passed within the parameters, I can exclude also data base connection issues.
I have analyzed the HEAP, but it is used ~50% and not changing much during PDF creation. Overall memory is also not fully used.
I have analyzed the PermGen, but it is also far from being full.
The CPU ist permanently at 100% during creation, which is ok, knowing that PDF creation is very CPU consuming. I have ensured that no other process is holding the PDF creation up, 1st by restarting the process several times and measuring no difference, so I can exclude external interruption and 2nd) knowing that performance is much better if JBoss is restarted.
Due to the facts, I have started to analyze the JBoss itself by analyzing the Thread dumps while running the PDF creation thread. I see that nothing else is running (except the thread dumping thread), neither when it is slow nor fast after restart. I can just see that in several Thread dumps Groovy is making several AST transformations which is not strange for Groovy...
Now, I am despaired. HEAP/PermGen is ok, CPU is ok. What the hell is Jasper Reports / Grails doing?
Maybe someone has made similar experiences or an idea for the root cause? Is there something which needs/should to be cleaned up in Jasper Reports?
EDIT: My further analysis yield to the unproofed but certain outcome that JBoss 7.1.1 (latest stable) is the root cause. After installing the app on a Tomcat, everything runs smoothly, also after several days. I'll keep this open. Maybe someone has made same experience and likes to post it...? Otherwise, I will close it with this solution. I will maybe test my app on earlier versions of Jboss or 7.2/7.3.
The solution was that we haven't perceived that JBoss was partially ignoring our Log4J configuration and was massively logging into the server.log which we were not monitoring. Jasper and Grails plugins were writing dozens of MB for each PDF generation into the log file. After removing these log inserts, performance was good again.

Mysterious time out on .NET MVC application with SQL Server

I have a very peculiar problem and I'm looking for suggestions that might help me get to the bottom of it.
I have an application in .NET 3.5 (MVC3) on a SQL Server 2008 R2 database.
Locally and on two other servers, it runs fine. But on the live server there is a stored procedure that always times out after 30 seconds.
If I run the stored procedure on the database, it takes a couple of seconds. But the if the stored procedure is received by the application, then profiler says it took over 30 seconds.
The same query the profiler receives, runs immediately if we run it directly on the DB.
Furthermore, the same problem doesn't occur on any of the other 3 local servers.
As you can understand, it's driving me nuts and I don't even have a clue how to diagnose this.
The even logs just show the timeout as a warning.
Has anyone had anything like this before and where could I start looking for a fix?
Many thanks
You probably have some locking taking place in your application that doesn't occur when running the query on the server.
To test this run your query in your application using READ UNCOMMITTED or the NOLOCK hint. If it works you need to check your sequence of calls or check to see whether your isolation level isn't too aggressive.
These can be tricky to nail down.

Resources