Jenkins is spawning a lot of daemon processes and server crashes - jenkins

I've recently installed Jenkins on a cheap VM on Azure. The specs are very low, since I use this server for testing the setup: 1vCPU & 1GB RAM. There will usually only be 1 build at the same time, with a max. of 3, in very rare occassions.
During the build process from Jenkins quite frequently my server would crash completely and stay so for +- 10 - 15 minutes until being able to be used again.
I checked the processes on the server and this is the result:
The full line is like this:
/etc/alternatives/java -Dcom.sun.akuma.Daemon=daemonized -Djava.awt.headless=true -DJENKINS_HOME=/var/lib/jenkins -jar /usr/lib/jenkins/jenkins.war --logfile=/var/log/jenkins/jenkins.log --webroot=/var/cache/jenkins/war --daemon --httpPort=8080 --debug=5 --handlerCountMax=100 --handlerCountMaxIdle=20
It is the same for every single one of those daemons, not a single parameter is different.
Is this normal behavior, and is this the reason why my server is crashing? Or are my specs just too low for Jenkins to run on to?
Thanks in advance!
EDIT:
My jenkins.log file looks pretty normal except for one NullPointerException that keeps coming back up:
2020-01-08 12:43:17.702+0000 [id=148] WARNING h.ExpressionFactory2$JexlExpression#evaluate: Caught exception evaluating: h.filterDescriptors(it,attrs.descriptors) in /configure. Reason: java.lang.NullPointerException: Descriptor list is null for context 'class hudson.model.Hudson' in thread 'Handling GET /configure from 85.154.65.124 : qtp2085857771-148 Jenkins/configure.jelly GlobalLibraries/config.jelly LibraryConfiguration/config.jelly SCMRetriever/DescriptorImpl/config.jelly MultiSCM/DescriptorImpl/config.jelly'
java.lang.NullPointerException: Descriptor list is null for context 'class hudson.model.Hudson' in thread 'Handling GET /configure from 85.154.65.124 : qtp2085857771-148 Jenkins/configure.jelly GlobalLibraries/config.jelly LibraryConfiguration/config.jelly SCMRetriever/DescriptorImpl/config.jelly MultiSCM/DescriptorImpl/config.jelly'
at hudson.model.DescriptorVisibilityFilter.apply(DescriptorVisibilityFilter.java:73)
...

Related

tdb2.tdbcompact command line tool returns Failed to get a lock: file

I'm running apache-jena-fuseki-3.13-1 and just found tdb2.tdbcompact from its bin-directory. I should run tdb2.tdbcompact nightly to prevent my jena-fuseki from running out of disk space, but now I get error message( Failed to get a lock: file) when running it:
miettinj#ramen:~/jena> ./apache-jena-3.13.1/bin/tdb2.tdbcompact --loc=./apache-jena-fuseki- 3.13.1/run/databases/test_TDB2
org.apache.jena.dboe.DBOpEnvException: Failed to get a lock: file='/srv/work/miettinj/jena/apache-jena-fuseki-3.13.1/run/databases/test_TDB2/tdb.lock': held by process 6136
ps -x|grep 6136
6136 ? Sl 30:48 /usr/lib64/jvm/java/bin/java -Xmx1200M -cp /srv/work/miettinj/jena/apache-jena-fuseki-3.13.1/fuseki-server.jar
"held by process 6136"
Another process is using the database. Compaction has to happen from the process using the database.
Apache Jena Fuseki Jena 3.17.0 added a function endpoint so that the administrator can ask for compaction on a running Fuseki server.

vertx clustered mode hazelcast log config on linux

Using Eclipse on Windows, a vertx Verticle with a misconfigured cluster.xml shows the following error in the Eclipse console:
11:46:18.536 [hz._hzInstance_1_dev.generic-operation.thread-0] ERROR com.hazelcast.cluster - [192.168.25.8]:5701 [dev] [3.5.2] Node could not join cluster. A Configuration mismatch was detected: Incompatible joiners! expected: multicast, found: tcp-ip Node is going to shutdown now!
11:46:22.529 [vert.x-worker-thread-0] ERROR com.hazelcast.cluster.impl.TcpIpJoiner - [192.168.25.8]:5701 [dev] [3.5.2] com.hazelcast.core.HazelcastInstanceNotActiveException: Hazelcast instance is not active!
This is fine, I know to reconfigure the cluster for multicast. The problem is when I deploy the same code and configuration to Linux, and run it as a fat jar then the same log doesn't show either the hz thread or the vertx worker thread logs. Instead it shows the verticle logs as:
2015-11-05 12:03:09,329 Starting clustered Vertx
2015-11-05 12:03:13,549 ERROR: VerticleService failed to start: java.lang.NullPointerException
So if I run on Linux the log to tell me there's a misconfiguration isn't showing. There's something I am missing in the vertx / maven log config but I don't know what. Maven properties are as follows:
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<exec.mainClass>main.java.eiger.isct.service.Verticle</exec.mainClass>
<log4j.configurationFile>log4j2.xml</log4j.configurationFile>
<hazelcast.logging.type>log4j2</hazelcast.logging.type>
</properties>
and I start the fat jar using:
java -Dlog4j.configuration=log4j2.xml -jar Verticle-0.5-SNAPSHOT-fat.jar
How can I get the hz thread and vertx thread to log on Linux?
I've tried adding a vertx-default-jul-logging.properties file below to the maven resources dir but no luck.
com.hazelcast.level=ALL
java.util.logging.ConsoleHandler.level=ALL
java.util.logging.FileHandler.level=ALL
THANKS for your comment.
Vertx has started logging having added
-Djava.util.logging.config.file=../logging.properties
to the java start command and with the default logging.properties like (and this is a nice config for lower level stuff):
handlers=java.util.logging.ConsoleHandler,java.util.logging.FileHandler
java.util.logging.SimpleFormatter.format=%1$tY-%1$tm-%1$td %1$tH:%1$tM:%1$tS:%1$tL %4$s %2$s %5$s%6$s%n
java.util.logging.ConsoleHandler.formatter=java.util.logging.SimpleFormatter
java.util.logging.ConsoleHandler.level=ALL
java.util.logging.FileHandler.level=ALL
java.util.logging.FileHandler.formatter=java.util.logging.SimpleFormatter
java.util.logging.FileHandler.pattern=../logs/vertx.log
.level=ALL
io.vertx.level=ALL
com.hazelcast.level=ALL
io.netty.util.internal.PlatformDependent.level=ALL
and vertx is logging to ../logs/vertx.log on Linux

Flume NullPointerExceptions on checkpoint

I've setup a file to file source/sink , just as a test of basic flume functionality.
Im currently using the "exec" source, with the command being "tail -F mytmpfile".
In my script, I continuously echo "....." >> mytmpfile , so that the tail command constitutes a stream.
However, I've started seeing the following exception in the flume logs:
java.lang. IllegalStateException: Channel closed [channel=c1]. Due to
java.lang.NullPointerException: null
at org.apache.flume.channel.file.FileChannel.createTransaction(FileChannel.java:353)
at org.apache.flume.channel.BasicChannelSemantics.getTransaction(BasicChannelSemantics.java:122)
at org.apache.flume.sink.RollingFileSink.process(RollingFileSink.java:183)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.NullPointerException
at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:895)
at org.apache.flume.channel.file.Log.replay(Log.java:406)
at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:303)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:236)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
... 1 more
Any thoughts on where this NullPointerException is coming from? It appears from scanning the code that maybe it related to a missing folder or directory. But I cant find the exact line on the git hub branches.
This is using apache-flume-1.3.1.23-...
In the past I've had problems with file channels, and they've normally boiled down to two problems:
1) If you're running multiple agents on the same box, make sure you configure them to have separate dataDirs and checkpointDir.
2) On Linux boxes, check that your tmpfs isn't near its capacity. If it's getting full, flume will complain. Try stopping the flume agent, unmount tmpfs, enlarge it, remount and restart the agent.

Neo4j server failed to start

Tried starting neo4j service and got a message like
WARNING: Detected a limit of 1024 for maximum open files, while a
minimum value of 40000 is recommended.
WARNING: Problems with the
operation of the server may occur. Please refer to the Neo4j manual
regarding lifting this limitation. Starting Neo4j Server...
WARNING:
not changing user process [17348]... waiting for server to be
ready... BAD. Neo4j Server may have failed to start, please check the
logs.
The log says :
Opened [/home/ub/graph_db/neo4j-community-1.7.M01/data/graph.db/nioneo_logical.log.1] clean empty log, version=224, lastTxId=654769
2013-03-14 11:26:28.111+0000: TM opening log: /home/ub/graph_db/neo4j-community-1.7.M01/data/graph.db/tm_tx_log.1
2013-03-14 11:26:28.159+0000: Failed to load index provider lucene Target file[lucene.log.v318] already exists
org.neo4j.graphdb.NotFoundException: Target file[lucene.log.v318] already exists
at org.neo4j.kernel.impl.util.FileUtils.renameFile(FileUtils.java:165)
at org.neo4j.kernel.DefaultFileSystemAbstraction.renameFile(DefaultFileSystemAbstraction.java:78)
at org.neo4j.kernel.impl.transaction.xaframework.XaLogicalLog.renameLogFileToRightVersion(XaLogicalLog.java:700)
at org.neo4j.kernel.impl.transaction.xaframework.XaLogicalLog.renameIfExists(XaLogicalLog.java:219)
at org.neo4j.kernel.impl.transaction.xaframework.XaLogicalLog.open(XaLogicalLog.java:171)
at org.neo4j.kernel.impl.transaction.xaframework.XaContainer.openLogicalLog(XaContainer.java:64)
at org.neo4j.index.impl.lucene.LuceneDataSource.<init>(LuceneDataSource.java:229)
at org.neo4j.index.lucene.LuceneIndexProvider.load(LuceneIndexProvider.java:71)
at org.neo4j.kernel.AbstractGraphDatabase$DefaultKernelExtensionLoader.loadIndexImplementations(AbstractGraphDatabase.java:986)
at org.neo4j.kernel.AbstractGraphDatabase$DefaultKernelExtensionLoader.init(AbstractGraphDatabase.java:958)
at org.neo4j.kernel.LifeSupport$LifecycleInstance.init(LifeSupport.java:362)
at org.neo4j.kernel.LifeSupport.init(LifeSupport.java:76)
at org.neo4j.kernel.LifeSupport.start(LifeSupport.java:110)
at org.neo4j.kernel.AbstractGraphDatabase.run(AbstractGraphDatabase.java:178)
at org.neo4j.kernel.EmbeddedGraphDatabase.<init>(EmbeddedGraphDatabase.java:69)
at org.neo4j.server.NeoServerBootstrapper$1.createDatabase(NeoServerBootstrapper.java:65)
at org.neo4j.server.database.Database.createDatabase(Database.java:80)
at org.neo4j.server.database.Database.<init>(Database.java:63)
at org.neo4j.server.NeoServerWithEmbeddedWebServer.startDatabase(NeoServerWithEmbeddedWebServer.java:186)
at org.neo4j.server.NeoServerWithEmbeddedWebServer.start(NeoServerWithEmbeddedWebServer.java:97)
at org.neo4j.server.Bootstrapper.start(Bootstrapper.java:87)
at org.neo4j.server.Bootstrapper.main(Bootstrapper.java:52)
2013-03-14 11:26:28.160+0000: TM shutting down
2013-03-14 11:26:28.382+0000: Closed log /home/biju/graph_db/neo4j-community-1.7.M01/data/graph.db/nioneo_logical.log
2013-03-14 11:26:28.945+0000: NeoStore closed
2013-03-14 11:26:28.946+0000: --- SHUTDOWN diagnostics START ---
2013-03-14 11:26:28.947+0000: --- SHUTDOWN diagnostics END ---
This started happening when I have installed ElasticSearch on my machine. There was one issue with starting Elastic search "JAVA_HOME issue", which is sorted.
I had such a problem when I was installing Neo4j the first time on my Linux laptop, I solved putting this couple of rows at the end of the /etc/security/limits.conf file:
user hard nofile 100000
user soft nofile 40000
where user is the login name of the user who starts Neo4j.
The 10000 and 40000 are somewhat arbirtrary, they were ok for me, in case you still get the error try to increase them.
If you've got a db with that problem, upgrading won't make it go away. 1.8.2 will prevent this from happening though. You're running community I see so keeping old logs around isn't all that necessary. Try deleting the existing lucene.log.v318 file, or move it away at least and see what happens the next startup.

hadoop only launch local job by default why?

I have written my own hadoop program and I can run using pseudo distribute mode in my own laptop, however, when I put the program in the cluster which can run example jar of hadoop, it by default launches the local job though I indicate the hdfs file path, below is the output, give suggestions?
./hadoop -jar MyRandomForest_oob_distance.jar hdfs://montana-01:8020/user/randomforest/input/genotype1.txt hdfs://montana-01:8020/user/randomforest/input/phenotype1.txt hdfs://montana-01:8020/user/randomforest/output1_distance/ hdfs://montana-01:8020/user/randomforest/input/genotype101.txt hdfs://montana-01:8020/user/randomforest/input/phenotype101.txt 33 500 1
12/03/16 16:21:25 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
12/03/16 16:21:25 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/03/16 16:21:25 INFO mapred.JobClient: Running job: job_local_0001
12/03/16 16:21:25 INFO mapred.MapTask: io.sort.mb = 100
12/03/16 16:21:25 INFO mapred.MapTask: data buffer = 79691776/99614720
12/03/16 16:21:25 INFO mapred.MapTask: record buffer = 262144/327680
12/03/16 16:21:25 WARN mapred.LocalJobRunner: job_local_0001
java.io.FileNotFoundException: File /user/randomforest/input/genotype1.txt does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:125)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356)
at Data.Data.loadData(Data.java:103)
at MapReduce.DearMapper.loadData(DearMapper.java:261)
at MapReduce.DearMapper.setup(DearMapper.java:332)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
12/03/16 16:21:26 INFO mapred.JobClient: map 0% reduce 0%
12/03/16 16:21:26 INFO mapred.JobClient: Job complete: job_local_0001
12/03/16 16:21:26 INFO mapred.JobClient: Counters: 0
Total Running time is: 1 secs
LocalJobRunner has been chosen as your configuration most probably has the mapred.job.tracker property set to local or has not been set at all (in which case the default is local). To check, go to "wherever you extracted/installed hadoop"/etc/hadoop/ and see if the file mapred-site.xml exists (for me it did not, a file called mapped-site.xml.template was there). In that file (or create it if it doesn't exist) make sure it has the following property:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
See the source for org.apache.hadoop.mapred.JobClient.init(JobConf)
What is the value of this configuration property in the hadoop configuration on the machine you are submitting this from? Also confirm that the hadoop executable you are running references this configuration (and that you don't have 2+ installations configured differently) - type which hadoop and trace any symlinks you come across.
Alternatively you can override this when you submit your job, if you know the JobTracker host and port number using the -jt option:
hadoop jar MyRandomForest_oob_distance.jar -jt hostname:port hdfs://montana-01:8020/user/randomforest/input/genotype1.txt hdfs://montana-01:8020/user/randomforest/input/phenotype1.txt hdfs://montana-01:8020/user/randomforest/output1_distance/ hdfs://montana-01:8020/user/randomforest/input/genotype101.txt hdfs://montana-01:8020/user/randomforest/input/phenotype101.txt 33 500 1
If you're using Hadoop 2 and your job is running locally instead of on the cluster, ensure that you have setup mapred-site.xml to contain the mapreduce.framework.name property with a value of yarn. You also need to set up an aux-service in yarn-site.xml
Checkout the Cloudera Hadoop 2 operator migration blog for more information.
I had the same problem that every mapreduce v2 (mrv2) or yarn task only ran with the mapred.LocalJobRunner
INFO mapred.LocalJobRunner: Starting task: attempt_local284299729_0001_m_000000_0
The Resourcemanager and Nodemanagers were accessible and the mapreduce.framework.name was set to yarn.
Setting the HADOOP_MAPRED_HOME before executing the job fixed the problem for me.
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
cheers
dan

Resources