Kafka cannot resolve Zookeper's DNS name - docker

I have a kafka 0.10.1.0 cluster (2 nodes) and zookeeper 3.4.6 (3 nodes)
The clusters are hosted on Kubernetes following this tutorial.
Relevant entries from Kafka's server.properties:
listeners=PLAINTEXT://0.0.0.0:9092
advertised.listeners=PLAINTEXT://kafka.internal.<companyname>.com:9092
zookeeper.connect=zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181
Upon server startup, each Kafka broker fails quickly with the following. To me, it looks like it cannot resolve the DNS name zookeeper-1. I also attempted removing the ports from zookeeper.connect, although my reading of the relevant code, I don't believe that will make a difference.
Naturally, I confirmed that zookeeper-1 can be resolved from within the cluster. Other containers from within the cluster can resolve the name.
I also attempted with a series of other aliases, including the services' DNS name and Zookeeper's load balancer(s), all of which I independently confirmed working. In each case, Kafka alone reported Name or service not known.
[2016-11-22 19:55:45,506] INFO Initiating client connection, connectString=zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181 sessionTimeout=6000 watcher=org.I0Itec.zkclient.ZkClient#7722c3c3 (org.apache.zookeeper.ZooKeeper)
[2016-11-22 19:56:05,571] INFO Terminate ZkClient event thread. (org.I0Itec.zkclient.ZkEventThread)
[2016-11-22 19:56:05,572] FATAL Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
org.I0Itec.zkclient.exception.ZkException: Unable to connect to zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181
at org.I0Itec.zkclient.ZkConnection.connect(ZkConnection.java:71)
at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1227)
at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:156)
at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:130)
at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:76)
at kafka.utils.ZkUtils$.apply(ZkUtils.scala:58)
at kafka.server.KafkaServer.initZk(KafkaServer.scala:327)
at kafka.server.KafkaServer.startup(KafkaServer.scala:200)
at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:39)
at kafka.Kafka$.main(Kafka.scala:67)
at kafka.Kafka.main(Kafka.scala)
Caused by: java.net.UnknownHostException: zookeeper-1: Name or service not known
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
at java.net.InetAddress.getAllByName(InetAddress.java:1192)
at java.net.InetAddress.getAllByName(InetAddress.java:1126)
at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61)
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:446)
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:380)
at org.I0Itec.zkclient.ZkConnection.connect(ZkConnection.java:69)
... 10 more
[2016-11-22 19:56:05,575] INFO shutting down (kafka.server.KafkaServer)
[2016-11-22 19:56:05,616] INFO shut down completed (kafka.server.KafkaServer)
Other info related to the Kafka image: It is based off wurstmeister/kafka-docker but updated to inherit from openjdk:8-jre.

It turns out that this was an issue with Kubernetes itself.
After an unrelated upgrade to v1.4.6 and no other changes, the names were able to resolve normally.

Related

Q: Apache Geode - Unable to reconnect a node after SO patching "15 seconds have elapsed while waiting for replies"

I have a cluster situation consisting of 4 total nodes, 3 servers and 1 management node, working properly.
At the beginning of the month we planned to patch the OS and we started from the first server node with this procedure:
Stop service
S.O. patching
Server restart
Start service
The service of the first patched node named "serverA" fails to restart with this error:
Log entries cluster join:
serverA:
| INFO | region-dm-12 | ache.geode.internal.tcp.Connection | --> Connection: shared=true ordered=false failed to connect to peer 10.237.110.195( Server serverB:9993):1024 because: java.net.ConnectException: Connection timed out (Connection timed out)
| WARN | region-dm-12 | ache.geode.internal.tcp.Connection | --> Connection: Attempting reconnect to peer 10.237.110.195( Server serverB:9993):1024
ServerMgmt:
| WARN | pool-3-thread-1 | tributed.internal.ReplyProcessor21 | --> 15 seconds have elapsed while waiting for replies: <CreateRegionProcessor$CreateRegionReplyProcessor 44180 waiting for 1 replies from [10.237.110.194( Server serverA:632):1024]> on 10.237.110.225( Management:6033):1024 whose current membership list is: [[10.237.110.196( Server serverC:16805):1024, 10.237.110.225( Management:6033):1024, 10.237.110.195( Server serverB:9993):1024, 10.237.110.194( Server serverA:632):1024]]
The connection between the systems was verified with tcpdumps, udp 1024 is running fine.
We have tried redeploying the service and making numerous attempts but we always get the same error during startup.
Any suggestions? Thank you.
Marco.
I think to see this error message, serverA was probably able to send UDP messages to serverB but it is failing to create a TCP connection. It's hard to say why though - a firewall issue, some TCP configuration issue, ... ?
Check to see if serverB has anything interesting in its logs. Since you are using TCP dump, you should be watching for that TCP connection for serverB:9993, since it looks like that is wwhat failed.
There is no firewall between the systems, we've analyzed again the network connection, during startup from node a, and we can see that the communication can be established between all systems. But what we detected is, that on port 2323 which is configured as locater, the node sends packages to the b and c node, but only receives back packages from the c node, and not from the b node. This is for us again a sign that the b node has an issue. Does it give a way to check our assumption from the b node?
A node ip .194
B node ip .195
C node ip .196
Management ip .225

Quarkus Docker JVM SSL issue

I have a quarkus app connecting to a MySQL database.
I use the generated Dockerfile.jvm to build my docker image. And I have no issue to run this image on my local desktop.
However when I want to run the same image on my vm, I have the following stacktrace:
Caused by: com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at com.mysql.cj.jdbc.exceptions.SQLError.createCommunicationsException(SQLError.java:174)
at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:64)
at com.mysql.cj.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:836)
at com.mysql.cj.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:456)
at com.mysql.cj.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:246)
at com.mysql.cj.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:197)
at io.agroal.pool.ConnectionFactory.createConnection(ConnectionFactory.java:200)
at io.agroal.pool.ConnectionPool$CreateConnectionTask.call(ConnectionPool.java:419)
at io.agroal.pool.ConnectionPool$CreateConnectionTask.call(ConnectionPool.java:401)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at io.agroal.pool.util.PriorityScheduledExecutor.beforeExecute(PriorityScheduledExecutor.java:65)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1126)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: com.mysql.cj.exceptions.CJCommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:61)
at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:105)
at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:151)
at com.mysql.cj.exceptions.ExceptionFactory.createCommunicationsException(ExceptionFactory.java:167)
at com.mysql.cj.protocol.a.NativeProtocol.negotiateSSLConnection(NativeProtocol.java:334)
at com.mysql.cj.protocol.a.NativeAuthenticationProvider.connect(NativeAuthenticationProvider.java:164)
at com.mysql.cj.protocol.a.NativeProtocol.connect(NativeProtocol.java:1342)
at com.mysql.cj.NativeSession.connect(NativeSession.java:157)
at com.mysql.cj.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:956)
at com.mysql.cj.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:826)
... 11 more
Caused by: javax.net.ssl.SSLHandshakeException: No appropriate protocol (protocol is disabled or cipher suites are inappropriate)
at java.base/sun.security.ssl.HandshakeContext.<init>(HandshakeContext.java:171)
at java.base/sun.security.ssl.ClientHandshakeContext.<init>(ClientHandshakeContext.java:98)
at java.base/sun.security.ssl.TransportContext.kickstart(TransportContext.java:222)
at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:433)
at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:411)
at com.mysql.cj.protocol.ExportControlled.performTlsHandshake(ExportControlled.java:336)
at com.mysql.cj.protocol.StandardSocketFactory.performTlsHandshake(StandardSocketFactory.java:188)
at com.mysql.cj.protocol.a.NativeSocketConnection.performTlsHandshake(NativeSocketConnection.java:99)
at com.mysql.cj.protocol.a.NativeProtocol.negotiateSSLConnection(NativeProtocol.java:325)
... 16 more
I have no idea about how to solve this issue. Any help would be very appreciated.
I solved my problem with replacing the original Dockerfile.jvm content by the following one:
FROM openjdk:11-jdk-slim
COPY target/lib/* lib/
COPY target/*.jar app.jar
ENTRYPOINT ["java","-jar","/app.jar"]
This maybe create heavier image, but at least this is always working.
Just stumbled into this;
My quarkus container in running in ubi8 (rhel8) and connecting to mysql 5.7
https://access.redhat.com/documentation/en-us/red_hat_support_for_spring_boot/2.1/html/release_notes_for_spring_boot_2.1/known-issues-spring-boot
Just modify mysql jdbc url to "useSSL=true&enabledTLSProtocols=TLSv1.2"

Neo4J bolt connector not working

I am trying to start Neo4J embedded db (3.2.0) and then access this db from another jvm process via bolt driver (1.4.4). Although my code below do print sysouts indicating that db has started,
System.out.println("Starting Neo embedded database at " + databaseFile);
BoltConnector bolt = new BoltConnector("key");
service = new GraphDatabaseFactory().newEmbeddedDatabaseBuilder(new File(databaseFile))
.setConfig(bolt.enabled, "true").setConfig(bolt.type, "BOLT")
.setConfig(bolt.encryption_level, "DISABLED").newGraphDatabase();
System.out.println("Started Neo embedded database");
my another jvm process running on same machine trying to query db via bolt driver fails with below error:
Exception in thread "main" org.neo4j.driver.v1.exceptions.ServiceUnavailableException: Unable to connect to localhost:7687, ensure the database is running and that there is a working network connection to it.
at org.neo4j.driver.internal.net.SocketClient.start(SocketClient.java:132)
at org.neo4j.driver.internal.net.SocketConnection.startSocketClient(SocketConnection.java:92)
at org.neo4j.driver.internal.net.SocketConnection.<init>(SocketConnection.java:67)
at org.neo4j.driver.internal.net.SocketConnector.createConnection(SocketConnector.java:77)
at org.neo4j.driver.internal.net.SocketConnector.connect(SocketConnector.java:50)
at org.neo4j.driver.internal.net.pooling.SocketConnectionPool$ConnectionSupplier.get(SocketConnectionPool.java:216)
at org.neo4j.driver.internal.net.pooling.SocketConnectionPool$ConnectionSupplier.get(SocketConnectionPool.java:198)
at org.neo4j.driver.internal.net.pooling.BlockingPooledConnectionQueue.acquire(BlockingPooledConnectionQueue.java:96)
at org.neo4j.driver.internal.net.pooling.SocketConnectionPool.acquireConnection(SocketConnectionPool.java:149)
at org.neo4j.driver.internal.net.pooling.SocketConnectionPool.acquire(SocketConnectionPool.java:76)
at org.neo4j.driver.internal.DirectConnectionProvider.acquireConnection(DirectConnectionProvider.java:47)
at org.neo4j.driver.internal.DirectConnectionProvider.verifyConnectivity(DirectConnectionProvider.java:67)
at org.neo4j.driver.internal.DirectConnectionProvider.<init>(DirectConnectionProvider.java:41)
at org.neo4j.driver.internal.DriverFactory.createDirectDriver(DriverFactory.java:109)
at org.neo4j.driver.internal.DriverFactory.createDriver(DriverFactory.java:93)
at org.neo4j.driver.internal.DriverFactory.newInstance(DriverFactory.java:67)
at org.neo4j.driver.v1.GraphDatabase.driver(GraphDatabase.java:135)
at org.neo4j.driver.v1.GraphDatabase.driver(GraphDatabase.java:117)
at
Caused by: java.net.ConnectException: Connection refused: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:111)
at org.neo4j.driver.internal.net.ChannelFactory.connect(ChannelFactory.java:79)
at org.neo4j.driver.internal.net.ChannelFactory.create(ChannelFactory.java:41)
at org.neo4j.driver.internal.net.SocketClient.start(SocketClient.java:126)
... 20 more
running netstat on windows doesn't show any process listening to port 7687, can someone please help.

Apache nifi is not starting up

I am trying to start Apache nifi version 1.2.0 on window 8 machine. It used to start properly. After I restarted the system the nifi is not starting at all. I had check status Its keep getting "Apacha Nifi not running".
Below are logs from nifi.bootstrap.log file:-
2017-07-05 15:41:57,105 WARN [NiFi Bootstrap Command Listener]
org.apache.nifi.bootstrap.RunNiFi Failed to set permissions so that only the
owner can read pid file E:\softwares\nifi-1.2.0\bin\..\run\nifi.pid; this
may allows others to have access to the key needed to communicate with NiFi.
Permissions should be changed so that only the owner can read this file
2017-07-05 15:41:57,142 WARN [NiFi Bootstrap Command Listener]
org.apache.nifi.bootstrap.RunNiFi Failed to set permissions so that only the
owner can read status file E:\softwares\nifi-1.2.0\bin\..\run\nifi.status;
this may allows others to have access to the key needed to communicate with
NiFi. Permissions should be changed so that only the owner can read this
file
2017-07-05 15:41:57,168 INFO [NiFi Bootstrap Command Listener]
org.apache.nifi.bootstrap.RunNiFi Apache NiFi now running and listening for
Bootstrap requests on port 50765
2017-07-05 15:43:12,077 ERROR [NiFi logging handler] org.apache.nifi.StdErr
Failed to start web server: Unable to start Flow Controller.
2017-07-05 15:43:12,078 ERROR [NiFi logging handler] org.apache.nifi.StdErr
Shutting down...
2017-07-05 15:43:14,501 INFO [main] org.apache.nifi.bootstrap.RunNiFi NiFi
never started. Will not restart NiFi
Stack trace from nifi.app.log: -
2017-07-05 15:43:12,077 WARN [main] org.apache.nifi.web.server.JettyServer Failed to start web server... shutting down.
org.apache.nifi.web.NiFiCoreException: Unable to start Flow Controller.
at org.apache.nifi.web.contextlistener.ApplicationStartupContextListener.contextInitialized(ApplicationStartupContextListener.java:88)
at org.eclipse.jetty.server.handler.ContextHandler.callContextInitialized(ContextHandler.java:876)
at org.eclipse.jetty.servlet.ServletContextHandler.callContextInitialized(ServletContextHandler.java:532)
at org.eclipse.jetty.server.handler.ContextHandler.startContext(ContextHandler.java:839)
at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:344)
at org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1480)
at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1442)
at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:799)
at org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:261)
at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:540)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:131)
at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:113)
at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:113)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:131)
at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:105)
at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:113)
at org.eclipse.jetty.server.handler.gzip.GzipHandler.doStart(GzipHandler.java:290)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:131)
at org.eclipse.jetty.server.Server.start(Server.java:452)
at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:105)
at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:113)
at org.eclipse.jetty.server.Server.doStart(Server.java:419)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.apache.nifi.web.server.JettyServer.start(JettyServer.java:695)
at org.apache.nifi.NiFi.<init>(NiFi.java:160)
at org.apache.nifi.NiFi.main(NiFi.java:267)
Caused by: java.io.IOException: Expected to read a Sentinel Byte of '1' but got a value of '0' instead
at org.apache.nifi.repository.schema.SchemaRecordReader.readRecord(SchemaRecordReader.java:65)
at org.apache.nifi.controller.repository.SchemaRepositoryRecordSerde.deserializeRecord(SchemaRepositoryRecordSerde.java:115)
at org.apache.nifi.controller.repository.SchemaRepositoryRecordSerde.deserializeEdit(SchemaRepositoryRecordSerde.java:109)
at org.apache.nifi.controller.repository.SchemaRepositoryRecordSerde.deserializeEdit(SchemaRepositoryRecordSerde.java:46)
at org.wali.MinimalLockingWriteAheadLog$Partition.recoverNextTransaction(MinimalLockingWriteAheadLog.java:1096)
at org.wali.MinimalLockingWriteAheadLog.recoverFromEdits(MinimalLockingWriteAheadLog.java:459)
at org.wali.MinimalLockingWriteAheadLog.recoverRecords(MinimalLockingWriteAheadLog.java:301)
at org.apache.nifi.controller.repository.WriteAheadFlowFileRepository.loadFlowFiles(WriteAheadFlowFileRepository.java:381)
at org.apache.nifi.controller.FlowController.initializeFlow(FlowController.java:712)
at org.apache.nifi.controller.StandardFlowService.initializeController(StandardFlowService.java:953)
at org.apache.nifi.controller.StandardFlowService.load(StandardFlowService.java:534)
at org.apache.nifi.web.contextlistener.ApplicationStartupContextListener.contextInitialized(ApplicationStartupContextListener.java:72)
... 28 common frames omitted
Thanks in advance
After Googling on this error "Caused by: java.io.IOException: Expected to read a Sentinel Byte of '1' but got a value of '0' instead" I found that this error indicates a partial write to the repos.
Here are a couple of things you can check/try to bring your Dataflow back online ;
check if your dsks are not full
Did you launch nifi with the same user ? Did you run it with administrator privileges ?
You can backup/move your repositories and try to start Nifi with empty repositories, you will still have your dataflows there but any file that was processing when you shutdown will be gone.
Could you please try that ?
I think the issue is with incompatible java version, use JAVA 8 version.
If you haven't set JAVA_HOME then set in environment variables with path Like "C:/program files/jdk1.8"
Jira addressing when NiFi run with java 9 version and the issue not resolved yet
https://issues.apache.org/jira/browse/NIFI-4419

Failed to bind to: spark-master, using a remote cluster with two workers

I am managing to get everything working with the local master and two remote workers. Now, I want to connect to a remote master that has the same remote workers. I have tried different combinations of settings withing the /etc/hosts and other reccomendations on the Internet, but NOTHING worked.
The Main class is:
public static void main(String[] args) {
ScalaInterface sInterface = new ScalaInterface(CHUNK_SIZE,
"awsAccessKeyId",
"awsSecretAccessKey");
SparkConf conf = new SparkConf().setAppName("POC_JAVA_AND_SPARK")
.setMaster("spark://spark-master:7077");
org.apache.spark.SparkContext sc = new org.apache.spark.SparkContext(
conf);
sInterface.enableS3Connection(sc);
org.apache.spark.rdd.RDD<Tuple2<Path, Text>> fileAndLine = (RDD<Tuple2<Path, Text>>) sInterface.getMappedRDD(sc, "s3n://somebucket/");
org.apache.spark.rdd.RDD<String> pInfo = (RDD<String>) sInterface.mapPartitionsWithIndex(fileAndLine);
JavaRDD<String> pInfoJ = pInfo.toJavaRDD();
List<String> result = pInfoJ.collect();
String miscInfo = sInterface.getMiscInfo(sc, pInfo);
System.out.println(miscInfo);
}
It fails at:
List<String> result = pInfoJ.collect();
The error I am getting is:
1354 [sparkDriver-akka.actor.default-dispatcher-3] ERROR akka.remote.transport.netty.NettyTransport - failed to bind to spark-master/192.168.0.191:0, shutting down Netty transport
1354 [main] WARN org.apache.spark.util.Utils - Service 'sparkDriver' could not bind on port 0. Attempting port 1.
1355 [main] DEBUG org.apache.spark.util.AkkaUtils - In createActorSystem, requireCookie is: off
1363 [sparkDriver-akka.actor.default-dispatcher-3] INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down remote daemon.
1364 [sparkDriver-akka.actor.default-dispatcher-3] INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon shut down; proceeding with flushing remote transports.
1364 [sparkDriver-akka.actor.default-dispatcher-5] INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut down.
1367 [sparkDriver-akka.actor.default-dispatcher-4] INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
1370 [sparkDriver-akka.actor.default-dispatcher-6] INFO Remoting - Starting remoting
1380 [sparkDriver-akka.actor.default-dispatcher-4] ERROR akka.remote.transport.netty.NettyTransport - failed to bind to spark-master/192.168.0.191:0, shutting down Netty transport
Exception in thread "main" 1382 [sparkDriver-akka.actor.default-dispatcher-6] INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down remote daemon.
1382 [sparkDriver-akka.actor.default-dispatcher-6] INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon shut down; proceeding with flushing remote transports.
java.net.BindException: Failed to bind to: spark-master/192.168.0.191:0: Service 'sparkDriver' failed after 16 retries!
at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:393)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:389)
at scala.util.Success$$anonfun$map$1.apply(Try.scala:206)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Success.map(Try.scala:206)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
1383 [sparkDriver-akka.actor.default-dispatcher-7] INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut down.
1385 [delete Spark temp dirs] DEBUG org.apache.spark.util.Utils - Shutdown hook called
Thank you kindly for your help!
Setting the environment variable SPARK_LOCAL_IP=127.0.0.1 solved this for me.
I had this problem when my /etc/hosts file was mapping the wrong IP address to my local hostname.
The BindException in your logs complains about the IP address 192.168.0.191. I assume that resolves to the hostname of your machine and it's not the actual IP address that your network interface is using. It should work fine once you fix that.
I had spark working in my EC2 instance. I started a new web server and to meet its requirement I had to change hostname to ec2 public DNS name i.e.
hostname ec2-54-xxx-xxx-xxx.compute-1.amazonaws.com
After that my spark could not work and showed error as below:
16/09/20 21:02:22 WARN Utils: Service 'sparkDriver' could not bind on port 0. Attempting port 1.
16/09/20 21:02:22 ERROR SparkContext: Error initializing SparkContext.
I solve it by setting SPARK_LOCAL_IP to as below:
export SPARK_LOCAL_IP="localhost"
then just launched sparkling shell as below:
$SPARK_HOME/bin/spark-shell
Possily your master is running on non-default port. Can you post your submit command?
Have a look in https://spark.apache.org/docs/latest/spark-standalone.html#connecting-an-application-to-the-cluster

Resources