Store Lock Issue on Neo4j - neo4j

Getting following exception
java.lang.RuntimeException: Error starting
org.neo4j.kernel.EmbeddedGraphDatabase
at org.neo4j.kernel.InternalAbstractGraphDatabase.run(InternalAbstractGraphDatabase.java:335)
at org.neo4j.kernel.EmbeddedGraphDatabase.(EmbeddedGraphDatabase.java:59)
at org.neo4j.graphdb.factory.GraphDatabaseFactory.newDatabase(GraphDatabaseFactory.java:108)
at org.neo4j.graphdb.factory.GraphDatabaseFactory$1.newDatabase(GraphDatabaseFactory.java:95)
at org.neo4j.graphdb.factory.GraphDatabaseBuilder.newGraphDatabase(GraphDatabaseBuilder.java:176)
at org.neo4j.graphdb.factory.GraphDatabaseFactory.newEmbeddedDatabase(GraphDatabaseFactory.java:67)
at com.tpgsi.mongodb.dataPollingWithOplog.ORBCreateLink.main(ORBCreateLink.java:62)
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component
'org.neo4j.kernel.StoreLockerLifecycleAdapter#5b6ca687' was
successfully initialized, but failed to start. Please see attached
cause exception.
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:513)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:115)
at org.neo4j.kernel.InternalAbstractGraphDatabase.run(InternalAbstractGraphDatabase.java:331)
... 6 more Caused by: org.neo4j.kernel.StoreLockException: Unable to obtain l file:
/home/aps/neo4j-community-2.2.3/CompleteTest/store_lock. Please ensure
no other process is using this database, and that the directory is
writable (required even for read-only access)
at org.neo4j.kernel.StoreLocker.checkLock(StoreLocker.java:78)
at org.neo4j.kernel.StoreLockerLifecycleAdapter.start(StoreLockerLifecycleAdapter.java:44)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:507)
... 8 more Caused by: java.nio.channels.OverlappingFileLockException
at sun.nio.ch.SharedFileLockTable.checkList(FileLockTable.java:255)
at sun.nio.ch.SharedFileLockTable.add(FileLockTable.java:152)
at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:1087)
at java.nio.channels.FileChannel.tryLock(FileChannel.java:1154)
at org.neo4j.io.fs.StoreFileChannel.tryLock(StoreFileChannel.java:135)
at org.neo4j.io.fs.FileLock.wrapFileChannelLock(FileLock.java:38)
at org.neo4j.io.fs.FileLock.getOsSpecificFileLock(FileLock.java:99)
at org.neo4j.io.fs.DefaultFileSystemAbstraction.tryLock(DefaultFileSystemAbstraction.java:85)
at org.neo4j.kernel.StoreLocker.checkLock(StoreLocker.java:74)
I have one process which will create graph and on completion of this i have one more process to create few more relationship on top of if. But i am getting above exception while running the second process after completion of first process. I checked the data directory of Neo4j is not been used by any process but still getting lock issue.
I ran once a piece of code to create the graph.
static GraphDatabaseFactory dbFactory = null;
static GraphDatabaseService graphdb = null;
static{
dbFactory =new GraphDatabaseFactory();
graphdb = dbFactory.newEmbeddedDatabase(com.tpgsi.mongodb.dataPollingWithOplog.CommonConstants.NEO4J_DATA_DIRECTORY);
}
try{
Transaction tx = graphdb.beginTx();
try
{
// creating Node and relationships
tx.success();
} catch (Exception e) {
tx.failure();
e.printStackTrace();
} finally {
tx.close();
}
}
catch(Exception e)
{
e.printStackTrace();
}
I have created graphdb object as global variable and using that everywhere. Only transaction, i am closing. I am not using registerShutDownHook() and shutdown function for graphdb object. The reason i am not using these functions are because, i am running this in storm environment with multiple executors and if we will shutdown this then for every thread i have to create it again which is also not good.
I am thinking not shutting down the graphdb could be the reason.
If i am running the same code with One executor it works fine but with multiple executor getting Lock issue.
Can anybody tell me what i have to do to get ride of it.

A graph database instance is thread-safe so you can use it across all bolts, just make it accessible as a global variable.
Only one GDB at a time can access a store-directory.
Otherwise create a service that your storm-bolts access/use via another protocol e.g. http or binary.

I faced the same issue. But my mistake is I executed the program while the server is running. If I stop the server, then the program executed successfully.

I was executing my code with multiple worker and executor in storm environment. As multiple worker were creating multiple graphdb install and i was getting store lock exception. I changed the number of worker to 1 and to create graphdb object i have written singleton class to ensure i am creating only one graphdb object at a time and it's a gloabal variable.

Related

Impossible (?) NullPointerException - Springframework RabbitMQ, Failed to invoke afterAckCallback

I'm running a Java application that uses RabbitMQ Server 3.8.9, spring-amqp-2.2.10.RELEASE, and spring-rabbit-2.2.10.RELEASE.
My test case does something like the following:
Start the RabbitMQ Server
Start my Java application
Test and validate some functionality on my Java application
Gracefully stop my Java application
Gracefully stop the RabbitMQ Server
Repeat 1-6 a few more times
Everything looks fine except sometimes during one of the restarts about 10 minutes into it, I see the following error in my application's logs:
2021-02-05 12:52:46.498 UTC,ERROR,org.springframework.amqp.rabbit.connection.PublisherCallbackChannelImpl,null,rabbitConnectionFactory23,runWorker():1149,Failed to invoke afterAckCallback
java.lang.NullPointerException: null
at org.springframework.amqp.rabbit.connection.PublisherCallbackChannelImpl.lambda$doHandleConfirm$1(PublisherCallbackChannelImpl.java:1027) ~[spring-rabbit.jar:2.2.10.RELEASE]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_181]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_181]
Further analysis doesn't point to anything specific. There are no errors in the RabbitMQ log files, no restarts of the RabbitMQ server, nothing weird in the RabbitMQ logs during the time stamp above.
The code in question:
https://github.com/spring-projects/spring-amqp/blob/v2.2.10.RELEASE/spring-rabbit/src/main/java/org/springframework/amqp/rabbit/connection/PublisherCallbackChannelImpl.java#L1027
My tests are automated and run as part of a CI pipeline. The issue is intermittent and I have had trouble reproducing it locally in my sandbox.
From what I can tell, the functionality of my Java application is unaffected.
Code that creates the RabbitMQ connection factory used everywhere:
final CachingConnectionFactory connectionFactory = new CachingConnectionFactory(HOST_NAME);
connectionFactory.setChannelCacheSize(1);
connectionFactory.setPublisherConfirms(true);
It seems like a concurrency problem, but I'm not so sure on how to get to the bottom of it. For the most part, we use the RabbitTemplate and other Spring facilities to connect to RabbitMQ.
Anyone in the Spring world with some knowledge in RabbitMQ care to chime in?
Thanks
The code you talk about is like this:
finally {
try {
if (this.afterAckCallback != null && getPendingConfirmsCount() == 0) {
this.afterAckCallback.accept(this);
this.afterAckCallback = null;
}
}
catch (Exception e) {
this.logger.error("Failed to invoke afterAckCallback", e);
}
}
There is really could be a race condition around that this.afterAckCallback property.
We may pass if() in one but then different thread makes this.afterAckCallback as null, so we fail with that NPE.
We have to copy its value to the local variable and then check and perform accept().
Feel free to raise a GitHub issue against Spring AMQP project: https://github.com/spring-projects/spring-amqp/issues
We have a race condition because we really call this doHandleConfirm() with its async logic from the loop in the processMultipleAck().

HSQL server mode while connection from DatabaseSwingManager throws exception java.sql.SQLTransientConnectionException

I have written a java code of connecting to server mode
p.setProperty("server.database.3",
"file:G:/SERVERMODE/soamware;user=soamware;password=123#123");
p.setProperty("server.dbname.3", "soamware");
server.setProperties(p);
server.setLogWriter(null); // can use custom writer
server.setErrWriter(null); // can use custom writer
server.start();
try {
//Registering the HSQLDB JDBC driver
Class.forName("org.hsqldb.jdbc.JDBCDriver");
con = DriverManager.getConnection("jdbc:hsqldb:hsql://ip/soamware;
file:G:/SERVERMODE/soamware;user=soamware;password=123#123");
this code is working fine in netbeans with jdk8 and hsqldb2.5.1, however the console shows the build is not terminated and its still running. While when i connect to SwingDatabaseManager
with same url, username and password as mentioned in java code. It throws above mentioned exception. Kindly clarify also, why my program doesnt exit. I am not adding "server.shutdownCatalogs(1);" statement at end because then I cannot perform multiple operations in one session.
Because you are starting the server with only one database, you should set database.0 properties. You shouldn't use the # character at all on a connection string because it has a special meaning. You shouldn't use the file path when connecting to a server database. Use the dbname.0 value only. Edited code below:
p.setProperty("server.database.0", "file:G:/SERVERMODE/soamware;user=soamware;password=123x123");
p.setProperty("server.dbname.0", "soamware");
server.setProperties(p);
server.setLogWriter(null); // can use custom writer
server.setErrWriter(null); // can use custom writer
server.start();
try {
Class.forName("org.hsqldb.jdbc.JDBCDriver");
con = DriverManager.getConnection("jdbc:hsqldb:hsql://localhost/soamware", "soamware", "123x123");

Re-using Bigtable connection with AbstractCloudBigtableTableDoFn

I have a DoFn that extends AbstractCloudBigtableTableDoFn<> in order to send frequent Buffered Mutation requests to Bigtable.
When I run the job in the Cloud, I see repeated log entries at this step of the Dataflow pipeline that look like this:
Opening connection for projectId XXX, instanceId XXX, on data host batch-bigtable.googleapis.com, table admin host bigtableadmin.googleapis.com...
and
Bigtable options: BigtableOptions{XXXXX (lots of option entries here}
The code within the DoFn looks something like this:
#ProcessElement
public void processElement(ProcessContext c)
{
try
{
BufferedMutator mPutUnit = getConnection().getBufferedMutator(TableName.valueOf(TABLE_NAME));
for (CONDITION)
{
// create lots of different rowsIDs
Put p = new Put(newRowID).addColumn(COL_FAMILY, COL_NAME, COL_VALUE);
mPutUnit.mutate(p);
}
mPutUnit.close();
} catch (IOException e){e.printStackTrace();}
c.output(0);
}
This DoFn gets called very frequently.
Should I worry that Dataflow tries to re-establish the connection to Bigtable with every call to this DoFn? I was under the impression that inheriting from this class should ensure that a single connection to Bigtable should be re-used across all calls?
"Opening connection for projectId ..." should appear once per worker per AbstractCloudBigtableTableDoFn instance. Can you double check that connections are being opened per call as opposed to per worker?
Limit the number of workers to a handful
In stack driver, expand the "Opening connection for projectId" messages and check if jsonPayload.worker is duplicated across different log messages.
Also, can you detail what version of the client you are using and what version of beam?
Thanks!
To answer your questions...
Yes, you should be worried that Dataflow tries to reestablish a connection to Bigtable with each call to the DoFn. The expected behavior of AbstractCloudBigtableDoFn is that a Connection instance is maintained per worker.
No, inheriting from AbstractCloudBigtableDoFn does not ensure a single Connection instance is reused for each call to the DoFn. This is not possible because the DoFn is serialized across multiple physical machines based on the number of workers allocated for the Dataflow job.
First, ensure that there are no connection/authentication issues to Bigtable. Occasionally, Dataflow will need to reestablish a connection to Bigtable. However, doing so for each call to the DoFn is not expected.

Grails Quartz Plugin Freezes on the 8th execution

Environment: Grails 2.0.3, Quartz plugin 1.0-RC2
I have a simple quartz job that reads a value from the database. On the 8th execution, the Job freezes while reading from the database. There is also a web page that retrieves the value from the DB. Once the Job gets into the waiting state, attempting to read the value through the web page also freezes.
Environment: Grails 2.2.0, Quartz plugin 1.0-RC5
I ran into the same problem using quartz-1.0-RC5.
As a workaround I replaced the SessionBinderJobListener class with the one from quartz-0.4.2 (changed only the package to the new one) and the job runs again without any problem. So it looks like the persistenceInterceptor bean does not close the connections or return them to the pool. Maybe there is a problem in org.codehaus.groovy.grails.orm.hibernate.support.HibernatePersistenceContextInterceptor with flush and destroy.
If org.quartz.threadPool.threadCount is much less than maxActive in dataSource properties, the problem does not appear (perhaps each job thread already got its connection) or it will only take longer.
The default size of the datasource connection pool is 8, so you're probably not properly closing the connections to return them to the pool.
I'm seeing the same thing with Quartz plugin version 1.0.1. On the 8th execution both the Job and Tomcat workers freeze. Used withSession and called Hibernate session.disconnect() in the finally {} block of the job. That did the trick.
def execute() {
def hsession
try {
DomainObject.withSession { ses ->
hsession = ses
....
}
} catch(Exception e) {
//log it etc.
} finally {
hsession?.disconnect()
}
}

ServiceController seems to be unable to stop a service

I'm trying to stop a Windows service on a local machine (the service is Topshelf.Host, if that matters) with this code:
serviceController.Stop();
serviceController.WaitForStatus(ServiceControllerStatus.Stopped, timeout);
timeout is set to 1 hour, but service never actually gets stopped. Strange thing with it is that from within Services MMC snap-in I see it in "Stopping" state first, but after a while it reverts back to "Started". However, when I try to stop it manually, an error occurs:
Windows could not stop the Topshelf.Host service on Local Computer.
Error 1061: The service cannot accept control messages at this time.
Am I missing something here?
I know I am quite late to answer this but I faced a similar issue , i.e., the error: "The service cannot accept control messages at this time." and would like to add this as a reference for others.
You can try killing this service using powershell (run powershell as administrator):
#Get the PID of the required service with the help of the service name, say, service name.
$ServicePID = (get-wmiobject win32_service | where { $_.name -eq 'service name'}).processID
#Now with this PID, you can kill the service
taskkill /f /pid $ServicePID
Either your service is busy processing some big operation or is in transition to change the state. hence is not able to accept anymore input...just think of it as taking more than it can chew...
if you are sure that you haven't fed anything big to it, just go to task manager and kill the process for this service or restart your machine.
I had exact same problem with Topshelf hosted service. Cause was long service start time, more than 20 seconds. This left service in state where it was unable to process further requests.
I was able to reproduce problem only when service was started from command line (net start my_service).
Proper initialization for Topshelf service with long star time is following:
namespace Example.My.Service
{
using System;
using System.Threading.Tasks;
using Topshelf;
internal class Program
{
public static void Main()
{
HostFactory.Run(
x =>
{
x.Service<MyService>(
s =>
{
MyService testServerService = null;
s.ConstructUsing(name => testServerService = new MyService());
s.WhenStarted(service => service.Start());
s.WhenStopped(service => service.Stop());
s.AfterStartingService(
context =>
{
if (testServerService == null)
{
throw new InvalidOperationException("Service not created yet.");
}
testServerService.AfterStart(context);
});
});
x.SetServiceName("my_service");
});
}
}
public sealed class MyService
{
private Task starting;
public void Start()
{
this.starting = Task.Run(() => InitializeService());
}
private void InitializeService()
{
// TODO: Provide service initialization code.
}
[CLSCompliant(false)]
public void AfterStart(HostControl hostStartedContext)
{
if (hostStartedContext == null)
{
throw new ArgumentNullException(nameof(hostStartedContext));
}
if (this.starting == null)
{
throw new InvalidOperationException("Service start was not initiated.");
}
while (!this.starting.Wait(TimeSpan.FromSeconds(7)))
{
hostStartedContext.RequestAdditionalTime(TimeSpan.FromSeconds(10));
}
}
public void Stop()
{
// TODO: Provide service shutdown code.
}
}
}
I've seen this issue as well, specifically when a service is start pending and I send it a stop programmatically which succeeds but does nothing. Also sometimes I see stop commands to a running service fail with this same exception but then still actually stop the service. I don't think the API can be trusted to do what it says. This error message explanation is quite helpful...
http://technet.microsoft.com/en-us/library/cc962384.aspx
I run into a similar issue and found out it was due to one of the services getting stuck in a state of start-pending, stop pending, or stopped.
Rebooting the server or trying to restart services did not work.
To solve this, I run the Task Manager in the server and in the "Details" tab I located the services that were stuck and killed the process by ending the task. After ending the task I was able to restart services without problem.
In brief:
1. Go to Task Manager
2. Click on "Detail" tab
3. Locate your service
4. Right click on it and stop/kill the process.
That is it.
I know it was opened while ago, but i am bit missing the option with Windows command prompt, so only for sake of completeness
Open Task Manager and find respective process and its PID i.e PID = 111
Eventually you can narrow down the executive file i.e. Image name = notepad.exe
in command prompt use command TASKKILL
example: TASKKILL /F /PID 111 ; TASKKILL /F /IM notepad.exe
I had this exact issue internally when starting and stopping a service using PowerShell (Via Octopus Deploy). The root cause for the service not responding to messages appeared to be related to devs accessing files/folders within the root service install directory via an SMB connection (looking at a config file with notepad/explorer).
If the service gets stuck in that situation then the only option is to kill it and sever the connections using computer management. After that, service was able to be redeployed fine.
May not be the exact root cause, but something we now check for.
I faced the similar issue. This error sometimes occur because the service can no longer accept control messages, this may be due to disk space issues in the server where that particular service's log file is present.
If this occurs, you can consider the below option as well.
Go to the location where the service exe & its log file is located.
Free up some space
Kill the service's process via Task manager
Start the service.
I just fought this problem while moving code from an old multi partition box to a newer single partition box. On service stop I was writing to D: and since it didn't exist anymore I got a 1061 error. Any long operation during the OnStop will cause this though unless you spin the call off to another thread with a callback delegate.

Resources