Confluent Docker log4j logger level configurations - docker

I am running locally Kafka using the confluentinc/cp-kafka Docker image and I am setting the following logging container environment variables:
KAFKA_LOG4J_ROOT_LOGLEVEL: ERROR
KAFKA_LOG4J_LOGGERS: >-
org.apache.zookeeper=ERROR,
org.apache.kafka=ERROR,
kafka=ERROR,
kafka.cluster=ERROR,
kafka.controller=ERROR,
kafka.coordinator=ERROR,
kafka.log=ERROR,
kafka.server=ERROR,
kafka.zookeeper=ERROR,
state.change.logger=ERROR
and I see in the Kafka logs that Kafka is starting with the following configuration:
===> ENV Variables ...
ALLOW_UNSIGNED=false
COMPONENT=kafka
CONFLUENT_DEB_VERSION=1
CONFLUENT_PLATFORM_LABEL=
CONFLUENT_VERSION=5.4.1
...
KAFKA_LOG4J_LOGGERS=org.apache.zookeeper=ERROR, org.apache.kafka=ERROR, kafka=ERROR, kafka.cluster=ERROR, kafka.controller=ERROR, kafka.coordinator=ERROR, kafka.log=ERROR, kafka.server=ERROR, kafka.zookeeper=ERROR, state.change.logger=ERROR
KAFKA_LOG4J_ROOT_LOGLEVEL=ERROR
...
Still I see further down in the logs the INFO and TRACE log levels. For example:
[2020-03-26 16:22:12,838] INFO [Controller id=1001] Ready to serve as the new controller with epoch 1 (kafka.controller.KafkaController)
[2020-03-26 16:22:12,848] INFO [Controller id=1001] Partitions undergoing preferred replica election: (kafka.controller.KafkaController)
[2020-03-26 16:22:12,849] INFO [Controller id=1001] Partitions that completed preferred replica election: (kafka.controller.KafkaController)
[2020-03-26 16:22:12,855] INFO [Controller id=1001] Skipping preferred replica election for partitions due to topic deletion: (kafka.controller.KafkaController)
How can I really deactivate the logs below a certain level? In the example above, I really want only ERROR logs.
The approach above is the way described in the Confluent documentation.
And the Apache Kafka source code lists all sorts of loggers that I could not influence using the KAFKA_LOG4J_LOGGERS Docker environment variable.

I went and troubleshot the Dockerfile's and inspected the Kafka container. The cause of this behaviour was the YAML multiline string folding.
Hence the provided environment variable (using a YAML multiline value) is at runtime:
KAFKA_LOG4J_LOGGERS=org.apache.zookeeper=ERROR, org.apache.kafka=ERROR, kafka=ERROR, kafka.cluster=ERROR, kafka.controller=ERROR, kafka.coordinator=ERROR, kafka.log=ERROR, kafka.server=ERROR, kafka.zookeeper=ERROR, state.change.logger=ERROR
instead of (no spaces in between):
KAFKA_LOG4J_LOGGERS=org.apache.zookeeper=ERROR,org.apache.kafka=ERROR, kafka=ERROR, kafka.cluster=ERROR,kafka.controller=ERROR, kafka.coordinator=ERROR,kafka.log=ERROR,kafka.server=ERROR,kafka.zookeeper=ERROR,state.change.logger=ERROR
And this was visible inside the container in the generated /etc/kafka/log4j.properties file:
log4j.rootLogger=ERROR, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c)%n
log4j.logger.kafka.authorizer.logger=WARN
log4j.logger.kafka.cluster=ERROR
log4j.logger.kafka.producer.async.DefaultEventHandler=DEBUG
log4j.logger.kafka.zookeeper=ERROR
log4j.logger.org.apache.kafka=ERROR
log4j.logger.kafka.coordinator=ERROR
log4j.logger.org.apache.zookeeper=ERROR
log4j.logger.kafka.log.LogCleaner=INFO
log4j.logger.kafka.controller=ERROR
log4j.logger.kafka=INFO
log4j.logger.kafka.log=ERROR
log4j.logger.state.change.logger=ERROR
log4j.logger.kafka=ERROR
log4j.logger.kafka.server=ERROR
log4j.logger.kafka.controller=TRACE
log4j.logger.kafka.network.RequestChannel$=WARN
log4j.logger.kafka.request.logger=WARN
log4j.logger.state.change.logger=TRACE
If you really need to split the long line in a YAML multiline value, you would have to use this YAML syntax.
More hints from the code:
here is where the log4j.properties file is generated when a confluent container is run.
these are the default log levels that Kafka will start with.
these should be all the loggers supported by Kafka

Related

Spark executor sends result to a random port though all the ports are explicitly set up

I am trying to run a spark job with PySpark through Jupyter notebook running in Docker. Workers are located on separate machines in the same network. I am performing a take operation on RDD:
data.take(number_of_elements)
When the number_of_elements is 2000 everything works fine. When it is 20000 an exception occurs. From my point of view it breaks when the size of the result exceeds 2GB (or it seems for me so). The idea about 2GB comes from that spark can send results smaller than 2GB in one block and when the result is bigger than 2GB another mechanism starts to work and something breaks there (see here). Here is the exception from executor log:
19/11/05 10:27:14 INFO CodeGenerator: Code generated in 205.7623 ms
19/11/05 10:27:40 INFO PythonRunner: Times: total = 25421, boot = 3, init = 1751, finish = 23667
19/11/05 10:27:42 INFO MemoryStore: Block taskresult_4 stored as bytes in memory (estimated size 927.7 MB, free 6.4 GB)
19/11/05 10:27:42 INFO Executor: Finished task 0.0 in stage 3.0 (TID 4). 972788748 bytes result sent via BlockManager)
19/11/05 10:27:49 ERROR TransportRequestHandler: Error sending result ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=1585998572000, chunkIndex=0}, buffer=org.apache.spark.storage.BlockManagerManagedBuffer#4399ad49} to /10.0.0.9:56222; closing connection
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
at org.apache.spark.util.io.ChunkedByteBufferFileRegion.transferTo(ChunkedByteBufferFileRegion.scala:64)
at org.apache.spark.network.protocol.MessageWithHeader.transferTo(MessageWithHeader.java:121)
at io.netty.channel.socket.nio.NioSocketChannel.doWriteFileRegion(NioSocketChannel.java:355)
at io.netty.channel.nio.AbstractNioByteChannel.doWrite(AbstractNioByteChannel.java:224)
at io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:382)
at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:934)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:362)
at io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:901)
at io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1321)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
at io.netty.channel.ChannelOutboundHandlerAdapter.flush(ChannelOutboundHandlerAdapter.java:115)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
at io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:117)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
at io.netty.channel.DefaultChannelPipeline.flush(DefaultChannelPipeline.java:983)
at io.netty.channel.AbstractChannel.flush(AbstractChannel.java:248)
at io.netty.channel.nio.AbstractNioByteChannel$1.run(AbstractNioByteChannel.java:284)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
at java.lang.Thread.run(Thread.java:748)
As we can see from the log executor tries to send result to 10.0.0.9:56222. It fails because the port is not opened in docker compose. 10.0.0.9 is an IP address of a master node but port 56222 is random though I explicitly set up all ports I can find in documentation to disable random port selection:
spark = SparkSession.builder\
.master('spark://spark.cyber.com:7077')\
.appName('My App')\
.config('spark.task.maxFailures', '16')\
.config('spark.driver.port', '20002')\
.config('spark.driver.host', 'spark.cyber.com')\
.config('spark.driver.bindAddress', '0.0.0.0')\
.config('spark.blockManager.port', '6060')\
.config('spark.driver.blockManager.port', '6060')\
.config('spark.shuffle.service.port', '7070')\
.config('spark.driver.maxResultSize', '14g')\
.getOrCreate()
I mapped these ports with docker compose:
version: "3"
services:
jupyter:
image: jupyter/pyspark-notebook:latest
ports:
- "4040-4050:4040-4050"
- "6060:6060"
- "7070:7070"
- "8888:8888"
- "20000-20010:20000-20010"
You should probably configure you spark driver memory to follow your docker container memory settings
I added
.config('spark.driver.memory', '14g')
as #ML_TN proposed and everything works now.
From my point of view it is strange that the memory setting affects the ports that spark uses.

Starting Zabbix Server within docker replaces strings with nothing in config file →

→ or totally ignored strings like name of new DB for testing purposes.
Firstly tries to add something about ~250 to 250 already added hosts and Z-server shutted down. I've restarted it and inside docker logs I saw this:
6:20191014:091840.201 using configuration file: /etc/zabbix/zabbix_server.conf
6:20191014:091840.223 current database version (mandatory/optional): 04020000/04020001
6:20191014:091840.223 required mandatory version: 04020000
6:20191014:091840.484 __mem_malloc: skipped 7 asked 108424 skip_min 304 skip_max 12192
6:20191014:091840.484 [file:dbconfig.c,line:94] __zbx_mem_realloc(): out of memory (requested 108424 bytes)
6:20191014:091840.484 [file:dbconfig.c,line:94] __zbx_mem_realloc(): please increase CacheSize configuration parameter
6:20191014:091840.484 === memory statistics for configuration cache ===
Solution for those problem was to increase CacheSize in zabbix_server.conf . Okay, that's not a problem and after this Im push a new config to Z-server and restart it... → and z-server stops already after start and logs says the same problem. After reading config in container I saw what string what I corrected to matching my wishes are missing O_o. Strings are deleted.
My config:
LogType=console
DBHost=postgres-server
DBName=zabbix_pwd
DBSchema=public
DBUser=zabbix
DBPassword=zabbix
DBPort=5432
StartPollers=5
StartIPMIPollers=5
StartPollersUnreachable=5
SNMPTrapperFile=/var/lib/zabbix/snmptraps/snmptraps.log
StartSNMPTrapper=1
CacheSize=512M
HistoryCacheSize=512M
HistoryIndexCacheSize=512M
TrendCacheSize=512m
ValueCacheSize=256M
AlertScriptsPath=/usr/lib/zabbix/alertscripts
ExternalScripts=/usr/lib/zabbix/externalscripts
FpingLocation=/usr/sbin/fping
Fping6Location=/usr/sbin/fping6
SSHKeyLocation=/var/lib/zabbix/ssh_keys
SSLCertLocation=/var/lib/zabbix/ssl/certs/
SSLKeyLocation=/var/lib/zabbix/ssl/keys/
SSLCALocation=/var/lib/zabbix/ssl/ssl_ca/
LoadModulePath=/var/lib/zabbix/modules/
And what I've getting after starting z-server:
LogType=console
DBHost=postgres-server
DBName=zabbix_pwd
DBSchema=public
DBUser=zabbix
DBPassword=zabbix
DBPort=5432
SNMPTrapperFile=/var/lib/zabbix/snmptraps/snmptraps.log
StartSNMPTrapper=1
AlertScriptsPath=/usr/lib/zabbix/alertscripts
ExternalScripts=/usr/lib/zabbix/externalscripts
FpingLocation=/usr/sbin/fping
Fping6Location=/usr/sbin/fping6
SSHKeyLocation=/var/lib/zabbix/ssh_keys
SSLCertLocation=/var/lib/zabbix/ssl/certs/
SSLKeyLocation=/var/lib/zabbix/ssl/keys/
SSLCALocation=/var/lib/zabbix/ssl/ssl_ca/
LoadModulePath=/var/lib/zabbix/modules/
Any suggestions to how-to rule the world and don't be captured by doctors ?
With docker you need to send conf parameters in the docker-compose.yml file, or in your docker run command using the -e :
For example from my docker yml file:
zabbix-server:
image: zabbix/zabbix-server-pgsql:ubuntu-4.2.6
environment:
ZBX_MAXHOUSEKEEPERDELETE: 5000
ZBX_STARTPOLLERS: 15
ZBX_CACHESIZE: 8M
ZBX_STARTDBSYNCERS: 4
ZBX_HISTORYCACHESIZE: 16M
ZBX_TRENDCACHESIZE: 4M
ZBX_VALUECACHESIZE: 8M
ZBX_LOGSLOWQUERIES: 3000
Another way to work with zabbix:
https://hub.docker.com/r/monitoringartist/zabbix-3.0-xxl/

Docker container Application logs to ELK stack without filebeat

I'm using the Elasti Cloud as it appears to be the most suitable for quickly setting up application logging. I have 24 docker container running in different nodes, and some containers have no of replicas also. i want to export inside docker container logs to elk stack.. I don't want to install Filebeat on each of my containers because that seems like it goes directly against Docker's separation of duties mantra.
.... how do I get logs from my application containers to log stash server
You can send your syslog to Logstash by configuring rsyslogd like this
# /etc/rsyslog.d/99-ship-syslog.conf
*.*;syslog;auth,authpriv.none action(
type="omfwd"
Target="myremote.elk-server.net"
Port="5001"
Protocol="udp"
)
If you don't have rsyslog running yet, you can add it like so (alpine linux example):
# Dockerfile
FROM alpine:3.7
RUN apk update \
&& apk add rsyslog
COPY rsyslog.conf /etc/rsyslog.conf
EXPOSE 514 514/udp
VOLUME [ "/var/log", "/etc/rsyslog.d" ]
ENTRYPOINT [ "rsyslogd", "-n" ]
--
# rsyslogd.conf
#
# if you experience problems, check:
# http://www.rsyslog.com/troubleshoot
#### MODULES ####
module(load="imuxsock") # local system logging support (e.g. via logger command)
#module(load="imklog") # kernel logging support (previously done by rklogd)
module(load="immark") # --MARK-- message support
module(load="imudp") # UDP listener support
input(type="imudp" port="514")
# Log all kernel messages to the console.
# Logging much else clutters up the screen.
#kern.* action(type="omfile" file="/dev/console")
# Log anything (except mail) of level info or higher.
# Don't log private authentication messages!
*.info;mail.none;authpriv.none;cron.none action(type="omfile" file="/var/log/messages")
# The authpriv file has restricted access.
authpriv.* action(type="omfile" file="/var/log/secure")
# Log all the mail messages in one place.
mail.* action(type="omfile" file="/var/log/maillog")
# Log cron stuff
cron.* action(type="omfile" file="/var/log/cron")
# Everybody gets emergency messages
*.emerg action(type="omusrmsg" users="*")
# Save news errors of level crit and higher in a special file.
uucp,news.crit action(type="omfile" file="/var/log/spooler")
# Save boot messages also to boot.log
local7.* action(type="omfile" file="/var/log/boot.log")
# log every host in its own directory
if $fromhost-ip then /var/log/$fromhost-ip/messages
# Include all .conf files in /etc/rsyslog.d
$IncludeConfig /etc/rsyslog.d/*.conf
$template GRAYLOGRFC5424,"<%PRI%>%PROTOCOL-VERSION% %TIMESTAMP:::date-rfc3339% %HOSTNAME% %APP-NAME% %PROCID% %MSGID% %STRUCTURED-DATA% %msg%\n"
*.info;mail.none;authpriv.none;cron.none;*.* ##graylog:514;GRAYLOGRFC5424 # forward everything to remote server
As you're running within a java-application, you can even send you logs directly to syslog. Here's a small configuration example with log4j
log4j.rootLogger=INFO, SYSLOG
log4j.appender.SYSLOG=org.apache.log4j.net.SyslogAppender
log4j.appender.SYSLOG.syslogHost=myremote.elk-server.net
log4j.appender.SYSLOG.layout=org.apache.log4j.PatternLayout
log4j.appender.SYSLOG.layout.conversionPattern=%d{ISO8601} %-5p [%t] %c{2} %x - %m%n
log4j.appender.SYSLOG.Facility=LOCAL1

vertx clustered mode hazelcast log config on linux

Using Eclipse on Windows, a vertx Verticle with a misconfigured cluster.xml shows the following error in the Eclipse console:
11:46:18.536 [hz._hzInstance_1_dev.generic-operation.thread-0] ERROR com.hazelcast.cluster - [192.168.25.8]:5701 [dev] [3.5.2] Node could not join cluster. A Configuration mismatch was detected: Incompatible joiners! expected: multicast, found: tcp-ip Node is going to shutdown now!
11:46:22.529 [vert.x-worker-thread-0] ERROR com.hazelcast.cluster.impl.TcpIpJoiner - [192.168.25.8]:5701 [dev] [3.5.2] com.hazelcast.core.HazelcastInstanceNotActiveException: Hazelcast instance is not active!
This is fine, I know to reconfigure the cluster for multicast. The problem is when I deploy the same code and configuration to Linux, and run it as a fat jar then the same log doesn't show either the hz thread or the vertx worker thread logs. Instead it shows the verticle logs as:
2015-11-05 12:03:09,329 Starting clustered Vertx
2015-11-05 12:03:13,549 ERROR: VerticleService failed to start: java.lang.NullPointerException
So if I run on Linux the log to tell me there's a misconfiguration isn't showing. There's something I am missing in the vertx / maven log config but I don't know what. Maven properties are as follows:
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<exec.mainClass>main.java.eiger.isct.service.Verticle</exec.mainClass>
<log4j.configurationFile>log4j2.xml</log4j.configurationFile>
<hazelcast.logging.type>log4j2</hazelcast.logging.type>
</properties>
and I start the fat jar using:
java -Dlog4j.configuration=log4j2.xml -jar Verticle-0.5-SNAPSHOT-fat.jar
How can I get the hz thread and vertx thread to log on Linux?
I've tried adding a vertx-default-jul-logging.properties file below to the maven resources dir but no luck.
com.hazelcast.level=ALL
java.util.logging.ConsoleHandler.level=ALL
java.util.logging.FileHandler.level=ALL
THANKS for your comment.
Vertx has started logging having added
-Djava.util.logging.config.file=../logging.properties
to the java start command and with the default logging.properties like (and this is a nice config for lower level stuff):
handlers=java.util.logging.ConsoleHandler,java.util.logging.FileHandler
java.util.logging.SimpleFormatter.format=%1$tY-%1$tm-%1$td %1$tH:%1$tM:%1$tS:%1$tL %4$s %2$s %5$s%6$s%n
java.util.logging.ConsoleHandler.formatter=java.util.logging.SimpleFormatter
java.util.logging.ConsoleHandler.level=ALL
java.util.logging.FileHandler.level=ALL
java.util.logging.FileHandler.formatter=java.util.logging.SimpleFormatter
java.util.logging.FileHandler.pattern=../logs/vertx.log
.level=ALL
io.vertx.level=ALL
com.hazelcast.level=ALL
io.netty.util.internal.PlatformDependent.level=ALL
and vertx is logging to ../logs/vertx.log on Linux

hadoop only launch local job by default why?

I have written my own hadoop program and I can run using pseudo distribute mode in my own laptop, however, when I put the program in the cluster which can run example jar of hadoop, it by default launches the local job though I indicate the hdfs file path, below is the output, give suggestions?
./hadoop -jar MyRandomForest_oob_distance.jar hdfs://montana-01:8020/user/randomforest/input/genotype1.txt hdfs://montana-01:8020/user/randomforest/input/phenotype1.txt hdfs://montana-01:8020/user/randomforest/output1_distance/ hdfs://montana-01:8020/user/randomforest/input/genotype101.txt hdfs://montana-01:8020/user/randomforest/input/phenotype101.txt 33 500 1
12/03/16 16:21:25 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
12/03/16 16:21:25 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/03/16 16:21:25 INFO mapred.JobClient: Running job: job_local_0001
12/03/16 16:21:25 INFO mapred.MapTask: io.sort.mb = 100
12/03/16 16:21:25 INFO mapred.MapTask: data buffer = 79691776/99614720
12/03/16 16:21:25 INFO mapred.MapTask: record buffer = 262144/327680
12/03/16 16:21:25 WARN mapred.LocalJobRunner: job_local_0001
java.io.FileNotFoundException: File /user/randomforest/input/genotype1.txt does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:125)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356)
at Data.Data.loadData(Data.java:103)
at MapReduce.DearMapper.loadData(DearMapper.java:261)
at MapReduce.DearMapper.setup(DearMapper.java:332)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
12/03/16 16:21:26 INFO mapred.JobClient: map 0% reduce 0%
12/03/16 16:21:26 INFO mapred.JobClient: Job complete: job_local_0001
12/03/16 16:21:26 INFO mapred.JobClient: Counters: 0
Total Running time is: 1 secs
LocalJobRunner has been chosen as your configuration most probably has the mapred.job.tracker property set to local or has not been set at all (in which case the default is local). To check, go to "wherever you extracted/installed hadoop"/etc/hadoop/ and see if the file mapred-site.xml exists (for me it did not, a file called mapped-site.xml.template was there). In that file (or create it if it doesn't exist) make sure it has the following property:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
See the source for org.apache.hadoop.mapred.JobClient.init(JobConf)
What is the value of this configuration property in the hadoop configuration on the machine you are submitting this from? Also confirm that the hadoop executable you are running references this configuration (and that you don't have 2+ installations configured differently) - type which hadoop and trace any symlinks you come across.
Alternatively you can override this when you submit your job, if you know the JobTracker host and port number using the -jt option:
hadoop jar MyRandomForest_oob_distance.jar -jt hostname:port hdfs://montana-01:8020/user/randomforest/input/genotype1.txt hdfs://montana-01:8020/user/randomforest/input/phenotype1.txt hdfs://montana-01:8020/user/randomforest/output1_distance/ hdfs://montana-01:8020/user/randomforest/input/genotype101.txt hdfs://montana-01:8020/user/randomforest/input/phenotype101.txt 33 500 1
If you're using Hadoop 2 and your job is running locally instead of on the cluster, ensure that you have setup mapred-site.xml to contain the mapreduce.framework.name property with a value of yarn. You also need to set up an aux-service in yarn-site.xml
Checkout the Cloudera Hadoop 2 operator migration blog for more information.
I had the same problem that every mapreduce v2 (mrv2) or yarn task only ran with the mapred.LocalJobRunner
INFO mapred.LocalJobRunner: Starting task: attempt_local284299729_0001_m_000000_0
The Resourcemanager and Nodemanagers were accessible and the mapreduce.framework.name was set to yarn.
Setting the HADOOP_MAPRED_HOME before executing the job fixed the problem for me.
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
cheers
dan

Resources