I have successfully setup a neo4j 3-instance HA cluster using version 2.0.2 enterprise, but I'm having a problem using the built in backup script (../bin/neo4j-backup).
I manually run:
./bin/neo4j-backup -from ha://10.6.10.48:5001 -to /usr/local/neo4j/backup
...on the master, and it works fine the first time, dumping the data into ../neo4j/backup.
Subsequent tries with the same command yields only this on the command line:
Could not find backup server in cluster neo4j.ha at 10.6.10.48:5001, operation timed out
and this in messages.log:
2014-04-29 17:08:00.919+0000 DEBUG [o.n.c.p.c.ClusterState$4]: ClusterState: entered-[configurationRequest]->entered from:cluster://10.6.10.48:5002 conversation-id:-1/8# payload:-1:cluster://0.0.0.0:5002/?name=Backup
2014-04-29 17:08:00.922+0000 ERROR [o.n.c.c.NetworkSender]: Receive exception:
java.nio.channels.ClosedChannelException: null
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:409) ~[netty-3.6.3.Final.jar:na]
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromUserCode(AbstractNioWorker.java:127) ~[netty-3.6.3.Final.jar:na]
at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:83) ~[netty-3.6.3.Final.jar:na]
at org.jboss.netty.channel.Channels.write(Channels.java:725) ~[netty-3.6.3.Final.jar:na]
at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.doEncode(OneToOneEncoder.java:71) ~[netty-3.6.3.Final.jar:na]
at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59) ~[netty-3.6.3.Final.jar:na]
at org.jboss.netty.channel.Channels.write(Channels.java:704) ~[netty-3.6.3.Final.jar:na]
at org.jboss.netty.channel.Channels.write(Channels.java:671) ~[netty-3.6.3.Final.jar:na]
at org.jboss.netty.channel.AbstractChannel.write(AbstractChannel.java:248) ~[netty-3.6.3.Final.jar:na]
at org.neo4j.cluster.com.NetworkSender$2.run(NetworkSender.java:266) ~[neo4j-cluster-2.0.2.jar:2.0.2]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_15]
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) ~[na:1.7.0_15]
at java.util.concurrent.FutureTask.run(FutureTask.java:166) ~[na:1.7.0_15]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_15]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_15]
at java.lang.Thread.run(Thread.java:722) [na:1.7.0_15]
(exception repeats every 5 seconds for awhile)
relevant neo4j.properties values:
online_backup_enabled=true
online_backup_server=127.0.0.1:6362
ha.cluster_server=10.6.10.48:5001
I've checked all firewall settings for all instances.
Any help would be appreciated!
Running a online backup versus single://<host> in general way more easy compared to ha://<host>. From a functional view there is no advantage of ha://.
So you might change
online_backup_server=10.6.10.48:6362
and then run
/bin/neo4j-backup -single ha://10.6.10.48:6362 -to /usr/local/neo4j/backup
Related
Even I remove the github plugin from jenkins not working. which is running under the tomcat7 server.
hudson.util.HudsonFailedToLoad: org.jvnet.hudson.reactor.ReactorException: java.io.IOException: Unable to read /usr/share/tomcat7/.jenkins/config.xml
at hudson.WebAppMain$3.run(WebAppMain.java:237)
Caused by: org.jvnet.hudson.reactor.ReactorException: java.io.IOException: Unable to read /usr/share/tomcat7/.jenkins/config.xml
at org.jvnet.hudson.reactor.Reactor.execute(Reactor.java:269)
at jenkins.InitReactorRunner.run(InitReactorRunner.java:44)
at jenkins.model.Jenkins.executeReactor(Jenkins.java:914)
at jenkins.model.Jenkins.<init>(Jenkins.java:813)
at hudson.model.Hudson.<init>(Hudson.java:83)
at hudson.model.Hudson.<init>(Hudson.java:79)
at hudson.WebAppMain$3.run(WebAppMain.java:225)
Caused by: java.io.IOException: Unable to read /usr/share/tomcat7/.jenkins/config.xml
at hudson.XmlFile.unmarshal(XmlFile.java:165)
at jenkins.model.Jenkins$16.run(Jenkins.java:2642)
at org.jvnet.hudson.reactor.TaskGraphBuilder$TaskImpl.run(TaskGraphBuilder.java:169)
at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:282)
at jenkins.model.Jenkins$7.runTask(Jenkins.java:903)
at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:210)
at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:117)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
If you see a Java stack trace, you can usually find what the problem is by looking at the "Caused by:" line. Here is yours:
Caused by: org.jvnet.hudson.reactor.ReactorException: java.io.IOException: Unable to read /usr/share/tomcat7/.jenkins/config.xml
Jenkins is unable to read its configuration file. This could be due to one of:
The file doesn't exist
The file isn't readable by the user that runs the Jenkins process (note that it needs to be writable as well)
The JENKINS_HOME environment variable isn't set correctly for the user that runs the Jenkins process
I am using Jenkins with the Xvnc plugin to run acceptance tests on Firefox in a CentOS slave . I have limited the display numbers to 2-4 since there will be at most 3 instances of testing that need a display. The tests and plugin work fine until Jenkins had to be restarted a few times due to issues in other builds. The following error now occurs whenever the build tries to run:
FATAL: All available display numbers are allocated or blacklisted.
allocated: [2, 3, 4]
blacklisted: []
java.lang.RuntimeException: All available display numbers are allocated or blacklisted.
allocated: [2, 3, 4]
blacklisted: []
at hudson.plugins.xvnc.DisplayAllocator.doAllocate(DisplayAllocator.java:59)
at hudson.plugins.xvnc.DisplayAllocator.allocate(DisplayAllocator.java:49)
at hudson.plugins.xvnc.Xvnc.doSetUp(Xvnc.java:99)
at hudson.plugins.xvnc.Xvnc.setUp(Xvnc.java:89)
at jenkins.tasks.SimpleBuildWrapper.setUp(SimpleBuildWrapper.java:146)
at hudson.model.Build$BuildExecution.doRun(Build.java:156)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:537)
at hudson.model.Run.execute(Run.java:1741)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:98)
at hudson.model.Executor.run(Executor.java:381)
I checked a working build where I restarted Jenkins without manually stopping each job and found potential cause:
Terminating xvnc.
FATAL: hudson.remoting.Channel$OrderlyShutdown
hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown
at hudson.remoting.Request.abort(Request.java:296)
at hudson.remoting.Channel.terminate(Channel.java:815)
at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1034)
at hudson.remoting.Channel$2.handle(Channel.java:484)
at hudson.remoting.AbstractByteArrayCommandTransport$1.handle(AbstractByteArrayCommandTransport.java:61)
at org.jenkinsci.remoting.nio.NioChannelHub$2.run(NioChannelHub.java:594)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
at ......remote call to jenkinstest.build.thoughtwire.com.test(Native Method)
at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1361)
at hudson.remoting.Request.call(Request.java:171)
at hudson.remoting.Channel.call(Channel.java:752)
at hudson.Launcher$RemoteLauncher.kill(Launcher.java:954)
at hudson.plugins.xvnc.Xvnc$DisposerImpl.tearDown(Xvnc.java:183)
at jenkins.tasks.SimpleBuildWrapper$EnvironmentWrapper.tearDown(SimpleBuildWrapper.java:175)
at hudson.model.Build$BuildExecution.doRun(Build.java:173)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:537)
at hudson.model.Run.execute(Run.java:1741)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:98)
at hudson.model.Executor.run(Executor.java:381)
Caused by: hudson.remoting.Channel$OrderlyShutdown
at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1034)
at hudson.remoting.Channel$2.handle(Channel.java:484)
at hudson.remoting.AbstractByteArrayCommandTransport$1.handle(AbstractByteArrayCommandTransport.java:61)
at org.jenkinsci.remoting.nio.NioChannelHub$2.run(NioChannelHub.java:594)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: Command close created at
at hudson.remoting.Command.<init>(Command.java:56)
at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:1028)
at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:1026)
at hudson.remoting.Channel.close(Channel.java:1109)
at hudson.remoting.Channel.close(Channel.java:1092)
at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1033)
at hudson.remoting.Channel$2.handle(Channel.java:484)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:60)
It seems like the job did not close properly and the Xvnc plugin did not get a chance to deallocate the display. I made sure the processes and tests in the slave are properly terminated and nothing is running.
The core issue here is that display numbers 2, 3, and 4 are now permanently allocated and cannot be reused even though no builds are running. If the slave (TEST) is mirrored (TEST2) then TEST2 can use display 2, 3, and 4 but TEST cannot. I have tried reinstalling the plugin but the numbers stay allocated and linked to TEST.
Does anyone know of a way to clear the list of allocated display numbers?
Is this a bug with the plugin?
Is there a way to prevent display numbers from staying allocated if say Jenkins suddenly dies while jobs are running?
The allocated display number is saved in hudson.plugins.xvnc.Xvnc.xml file on the jenkins master (under jenkins home directory). To clear the numbers, you need to stop jenkins, clean up <allocatedNumbers> in that xml, and start jenkins server again.
It is important to edit the file after you stop jenkins server, since jenkins will save the current numbers when it stops.
This is a groovy script I created to clean up the Xvnc display numbers without stopping jenkins. But it might also clean up numbers of still running jobs.
https://github.com/sdiepend/jenkins-monitoring/blob/master/cleanXvncDisplayNumbers.groovy
import jenkins.*
import jenkins.model.Jenkins
Jenkins jenkins = Jenkins.getActiveInstance();
xvncDescriptor = jenkins.getDescriptorByType(hudson.plugins.xvnc.Xvnc.DescriptorImpl.class)
xvncDescriptor.allocators.each {
allocator = it.value
// collect is used to make sure numAlloc is an entire new list and not just a reference to the same list object, otherwise you'll get a
// concurrentmodification exception
numAlloc = allocator.allocatedNumbers.collect()
numAlloc.each {
allocator.allocatedNumbers.remove(it)
}
}
This question is similar to:
java.lang.IllegalStateException:Could not find backup for factory javax.faces.application.ApplicationFactory
But unfortunately it is not my case. I have richfaces-5.0.0.Alpha1.jar inside my WAR and jboss-jsf-api_2.1_spec-2.0.2.Final.jar in the Tomcat lib folder. Nothing else. I don't use MyFaces and never did.
The log is the following:
Grave: Application was not properly initialized at startup, could not find Factory: javax.faces.context.FacesContextFactory. Attempting to find backup.
dic 12, 2013 1:41:41 PM org.apache.catalina.core.ApplicationContext log
Grave: StandardWrapper.Throwable
java.lang.IllegalStateException: Could not find backup for factory javax.faces.context.FacesContextFactory.
at javax.faces.FactoryFinder$FactoryManager.getFactory(FactoryFinder.java:1008)
at javax.faces.FactoryFinder.getFactory(FactoryFinder.java:343)
at javax.faces.webapp.FacesServlet.init(FacesServlet.java:302)
at org.apache.catalina.core.StandardWrapper.initServlet(StandardWrapper.java:1266)
at org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:1185)
at org.apache.catalina.core.StandardWrapper.load(StandardWrapper.java:1080)
at org.apache.catalina.core.StandardContext.loadOnStartup(StandardContext.java:5027)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5314)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1559)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1549)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
"jboss-jsf-api_2.1_spec-2.0.2.Final.jar" why that one and not the proper JSF jar you can download from javaserverfaces.java.net? I think you have only the API and no JSF implementation here.
(Don't know how to convert a comment into an answer, so I just reposted).
I installed neo4j 2.0.0 M06 version on my Ubuntu pc. It service worked fine, and I could use the new web browser perfectly.
Then, I used the sample java project (https://github.com/neo4j/neo4j/blob/2.0.0-M06/community/embedded-examples/src/main/java/org/neo4j/examples/EmbeddedNeo4jWithIndexing.java) to connect embedded way to the DB and add some nodes. (btw, I'm sure I stopped the neo4j service before launching the java application)
I changed the number of nodes added by the program to 100,000, and the application crashed on exceeding heap size (GC overhead limit).
Now, when trying to launch the neo4j I get a startup error :
2013-11-01 09:53:13.806+0000 DEBUG [API] Failed to start Neo Server on port [7474]
2013-11-01 10:00:52.865+0000 INFO [API] Setting startup timeout to: 120000ms based on -1
2013-11-01 10:00:52.998+0000 DEBUG [API]
org.neo4j.server.ServerStartupException: Starting Neo4j Server failed: org/neo4j/helpers /Settings
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:193) ~[neo4j- server-2.0.0-M06.jar:2.0.0-M06]
at org.neo4j.server.Bootstrapper.start(Bootstrapper.java:87) [neo4j-server-2.0.0- M06.jar:2.0.0-M06]
at org.neo4j.server.Bootstrapper.main(Bootstrapper.java:50) [neo4j-server-2.0.0- M06.jar:2.0.0-M06]
Caused by: java.lang.NoClassDefFoundError: org/neo4j/helpers/Settings
at org.neo4j.shell.ShellSettings.<clinit>(ShellSettings.java:42) ~[neo4j-shell- 2.0.0-M06.jar:2.0.0-M06]
at org.neo4j.server.database.CommunityDatabase.getDbTuningPropertiesWithServerDefaults(Communit yDatabase.java:106) ~[neo4j-server-2.0.0-M06.jar:2.0.0-M06]
at org.neo4j.server.enterprise.EnterpriseDatabase.start(EnterpriseDatabase.java:89) ~[neo4j-server-enterprise-2.0.0-M06.jar:2.0.0-M06]
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:141) ~[neo4j- server-2.0.0-M06.jar:2.0.0-M06]
... 2 common frames omitted
Caused by: java.lang.ClassNotFoundException: org.neo4j.helpers.Settings
at java.net.URLClassLoader$1.run(URLClassLoader.java:366) ~[na:1.7.0_45]
at java.net.URLClassLoader$1.run(URLClassLoader.java:355) ~[na:1.7.0_45]
at java.security.AccessController.doPrivileged(Native Method) ~[na:1.7.0_45]
at java.net.URLClassLoader.findClass(URLClassLoader.java:354) ~[na:1.7.0_45]
at java.lang.ClassLoader.loadClass(ClassLoader.java:425) ~[na:1.7.0_45]
at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ~[na:1.7.0_45]
... 6 common frames omitted
2013-11-01 10:00:53.000+0000 DEBUG [API] Failed to start Neo Server on port [7474]
I found the problem with the jar files. Unfortunately, after solving the jar files problem, I had to reinstall neo4j for the service to work again
I've set up a HA cluster using one standalone instance and one embedded.
When the embedded instance is slave and I have a longer taking transaction (commit time about 30s), it throws:
org.neo4j.graphdb.TransactionFailureException: Unable to commit transaction
at org.neo4j.kernel.TopLevelTransaction.finish(TopLevelTransaction.java:143)
... 4 more
Caused by: org.neo4j.com.TransactionNotPresentOnMasterException: Transaction RequestContext[session: 1372752815336, ID:3, eventIdentifier:0, [lucene-index/1311102, nioneodb/7397657]] has either timed out on the master or was not started on this master. There may have been a master switch between the time this transaction started and up to now. This transaction cannot continue since the state from the previous master isn't transferred.
at org.neo4j.kernel.ha.com.master.MasterImpl.suspendOtherAndResumeThis(MasterImpl.java:280)
at org.neo4j.kernel.ha.com.master.MasterImpl.finishTransaction(MasterImpl.java:457)
at org.neo4j.kernel.ha.com.HaRequestType18$14.call(HaRequestType18.java:188)
at org.neo4j.kernel.ha.com.HaRequestType18$14.call(HaRequestType18.java:183)
at org.neo4j.com.Server$4.run(Server.java:559)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
As stated in the exception, I've checked wether the standalone instance loses it's master state, but it doesn't. So no master switch occurs. The instances are running on the same machine, so a network error can't be the cause.
When the embedded instance is master or not in any cluster, the transaction succeeds.
The code generating this error is quite complex, so I can't put an example here atm.