End of File Exception in Spark cluster - docker

I created docker containers for Spark Standalone mode. Like in this article:
https://dev.to/mvillarrealb/creating-a-spark-standalone-cluster-with-docker-and-docker-compose-2021-update-6l4
As a driver, I make a simple spark job with calculation and try to start it in cluster mode.
val handle = new SparkLauncher()
.setMaster("spark://spark-master:7077")
.setAppName("MyDriver")
.setVerbose(true)
.setAppResource(s"hdfs://spark-master:7077/opt/spark-apps/sparkcluster_2.12-0.1.0-SNAPSHOT.jar")
.setMainClass(DriverTest.getClass.getName)
.setDeployMode("cluster")
.startApplication()
import org.apache.spark.launcher.SparkAppHandle
while (!handle.getState.equals(SparkAppHandle.State.FINISHED)) {
println("App State: " + handle.getState)
Thread.sleep(1000)
}
Could you please tell me, what URL needs set to "setAppResource" from the machine where I run SparkLaunch, from master node or HDFS path?
If I used path "hdfs://spark-master:7077/opt/spark-apps/sparkcluster_2.12-0.1.0-SNAPSHOT.jar" where spark-master:7077 master node, /opt/spark-apps/sparkcluster_2.12-0.1.0-SNAPSHOT.jar path on master node, I will get this exception.
INFO: 22/05/26 16:48:04 ERROR ClientEndpoint: Exception from cluster
was: java.io.EOFException: End of File Exception between local host
is: "spark-worker-b/172.26.0.3"; destination host is:
"spark-master":7077; : java.io.EOFException; For more details see:
http://wiki.apache.org/hadoop/EOFException мая 26, 2022 4:48:04 PM
org.apache.spark.launcher.OutputRedirector redirect INFO:
java.io.EOFException: End of File Exception between local host is:
"spark-worker-b/172.26.0.3"; destination host is: "spark-master":7077;
: java.io.EOFException; For more details see:
http://wiki.apache.org/hadoop/EOFException
Also I see, that my master node lost worker nodes in execution time. After kill process Master see workers.

Related

jib gradle plugin + static docker client: cannot build image due to permission error: layer.tar: A required privilege is not held by the client

I want to build Docker image with jib Gradle plugin in Windows, and use a Windows docker client to load it into my WSL 2 container running dockerd, and use WSL 2 as server. Resource-wise I think this is the lightest solution. .
On WSL 2 side, I run dockerd service in Ubuntu 20 on WSL 2, and it's listening on [::]:2375. TLS disabled(--tls=false), only http.
On Windows side, I only downloaded the Docker client(static client, from https://download.docker.com/win/static/stable/x86_64/), and added the dynamic WSL 2 container IP into the insecure-registry in daemon.json. This file is put in the same dir of docker.exe client.
On Intellij IDEA side, I use gradle 5.2.1 wrapper, and jib plugin 3.2.1. I configure jib as follows:
jib {
dockerClient.executable = 'E:\\coding\\environment\\docker\\docker.exe'
dockerClient.environment = [ DOCKER_HOST: '172.21.169.180:2375',
DOCKER_INSECURE_REGISTRIES: "172.21.169.180:5000"]
from.image = 'docker://mini/java#sha256:d3ded1fd0df592c33185d930d976304994bbc539c7bf70a6091cb3da0f7e11fa'
to.image = 'spring-plugins-demo'
container.mainClass = 'dev.westerngun.oldway.ApplicationV1'
}
I know it can connect to dockerd in my WSL 2, because before I add the dynamic IP of Ubuntu the error was not able to connect to daemon. Now it can load the base image and start building.
Then, when I run jibDockerBuild --stacktrace, I see this error:
Execution failed for task ':jibDockerBuild'.
> com.google.cloud.tools.jib.plugins.common.BuildStepsExecutionException: C:\Users\WESTER~1\AppData\Local\Temp\16164656866093264693\33e3f3775358985441c3bea658f06f5307326c83f9c0bcbf8aa4acb327abffde\layer.tar: �ͻ���û���������Ȩ��
* Try:
Run with --info or --debug option to get more log output. Run with --scan to get full insights.
* Exception is:
org.gradle.api.tasks.TaskExecutionException: Execution failed for task ':jibDockerBuild'.
at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter$2.accept(ExecuteActionsTaskExecuter.java:121)
at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter$2.accept(ExecuteActionsTaskExecuter.java:117)
at org.gradle.internal.Try$Failure.ifSuccessfulOrElse(Try.java:184)
at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.execute(ExecuteActionsTaskExecuter.java:110)
at org.gradle.api.internal.tasks.execution.ResolveIncrementalChangesTaskExecuter.execute(ResolveIncrementalChangesTaskExecuter.java:84)
at org.gradle.api.internal.tasks.execution.ResolveTaskOutputCachingStateExecuter.execute(ResolveTaskOutputCachingStateExecuter.java:91)
at org.gradle.api.internal.tasks.execution.ResolveBeforeExecutionStateTaskExecuter.execute(ResolveBeforeExecutionStateTaskExecuter.java:74)
at org.gradle.api.internal.tasks.execution.ValidatingTaskExecuter.execute(ValidatingTaskExecuter.java:58)
at org.gradle.api.internal.tasks.execution.SkipEmptySourceFilesTaskExecuter.execute(SkipEmptySourceFilesTaskExecuter.java:109)
at org.gradle.api.internal.tasks.execution.ResolveBeforeExecutionOutputsTaskExecuter.execute(ResolveBeforeExecutionOutputsTaskExecuter.java:67)
at org.gradle.api.internal.tasks.execution.ResolveAfterPreviousExecutionStateTaskExecuter.execute(ResolveAfterPreviousExecutionStateTaskExecuter.java:46)
at org.gradle.api.internal.tasks.execution.CleanupStaleOutputsExecuter.execute(CleanupStaleOutputsExecuter.java:93)
at org.gradle.api.internal.tasks.execution.FinalizePropertiesTaskExecuter.execute(FinalizePropertiesTaskExecuter.java:45)
at org.gradle.api.internal.tasks.execution.ResolveTaskExecutionModeExecuter.execute(ResolveTaskExecutionModeExecuter.java:94)
at org.gradle.api.internal.tasks.execution.SkipTaskWithNoActionsExecuter.execute(SkipTaskWithNoActionsExecuter.java:57)
at org.gradle.api.internal.tasks.execution.SkipOnlyIfTaskExecuter.execute(SkipOnlyIfTaskExecuter.java:56)
at org.gradle.api.internal.tasks.execution.CatchExceptionTaskExecuter.execute(CatchExceptionTaskExecuter.java:36)
at org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.executeTask(EventFiringTaskExecuter.java:63)
at org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:49)
at org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:46)
at org.gradle.internal.operations.DefaultBuildOperationExecutor$CallableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:416)
at org.gradle.internal.operations.DefaultBuildOperationExecutor$CallableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:406)
at org.gradle.internal.operations.DefaultBuildOperationExecutor$1.execute(DefaultBuildOperationExecutor.java:165)
at org.gradle.internal.operations.DefaultBuildOperationExecutor.execute(DefaultBuildOperationExecutor.java:250)
at org.gradle.internal.operations.DefaultBuildOperationExecutor.execute(DefaultBuildOperationExecutor.java:158)
at org.gradle.internal.operations.DefaultBuildOperationExecutor.call(DefaultBuildOperationExecutor.java:102)
at org.gradle.internal.operations.DelegatingBuildOperationExecutor.call(DelegatingBuildOperationExecutor.java:36)
at org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter.execute(EventFiringTaskExecuter.java:46)
at org.gradle.execution.plan.LocalTaskNodeExecutor.execute(LocalTaskNodeExecutor.java:43)
at org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$InvokeNodeExecutorsAction.execute(DefaultTaskExecutionGraph.java:355)
at org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$InvokeNodeExecutorsAction.execute(DefaultTaskExecutionGraph.java:343)
at org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$BuildOperationAwareExecutionAction.execute(DefaultTaskExecutionGraph.java:336)
at org.gradle.execution.taskgraph.DefaultTaskExecutionGraph$BuildOperationAwareExecutionAction.execute(DefaultTaskExecutionGraph.java:322)
at org.gradle.execution.plan.DefaultPlanExecutor$ExecutorWorker$1.execute(DefaultPlanExecutor.java:134)
at org.gradle.execution.plan.DefaultPlanExecutor$ExecutorWorker$1.execute(DefaultPlanExecutor.java:129)
at org.gradle.execution.plan.DefaultPlanExecutor$ExecutorWorker.execute(DefaultPlanExecutor.java:202)
at org.gradle.execution.plan.DefaultPlanExecutor$ExecutorWorker.executeNextNode(DefaultPlanExecutor.java:193)
at org.gradle.execution.plan.DefaultPlanExecutor$ExecutorWorker.run(DefaultPlanExecutor.java:129)
at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55)
Caused by: org.gradle.internal.UncheckedException: com.google.cloud.tools.jib.plugins.common.BuildStepsExecutionException: C:\Users\WESTER~1\AppData\Local\Temp\16164656866093264693\33e3f3775358985441c3bea658f06f5307326c83f9c0bcbf8aa4acb327abffde\layer.tar: �ͻ���û���������Ȩ��
at org.gradle.internal.UncheckedException.throwAsUncheckedException(UncheckedException.java:67)
at org.gradle.internal.UncheckedException.throwAsUncheckedException(UncheckedException.java:41)
at org.gradle.internal.reflect.JavaMethod.invoke(JavaMethod.java:106)
at org.gradle.api.internal.project.taskfactory.StandardTaskAction.doExecute(StandardTaskAction.java:48)
at org.gradle.api.internal.project.taskfactory.StandardTaskAction.execute(StandardTaskAction.java:41)
at org.gradle.api.internal.project.taskfactory.StandardTaskAction.execute(StandardTaskAction.java:28)
at org.gradle.api.internal.AbstractTask$TaskActionWrapper.execute(AbstractTask.java:705)
at org.gradle.api.internal.AbstractTask$TaskActionWrapper.execute(AbstractTask.java:672)
at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter$4.run(ExecuteActionsTaskExecuter.java:338)
at org.gradle.internal.operations.DefaultBuildOperationExecutor$RunnableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:402)
at org.gradle.internal.operations.DefaultBuildOperationExecutor$RunnableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:394)
at org.gradle.internal.operations.DefaultBuildOperationExecutor$1.execute(DefaultBuildOperationExecutor.java:165)
at org.gradle.internal.operations.DefaultBuildOperationExecutor.execute(DefaultBuildOperationExecutor.java:250)
at org.gradle.internal.operations.DefaultBuildOperationExecutor.execute(DefaultBuildOperationExecutor.java:158)
at org.gradle.internal.operations.DefaultBuildOperationExecutor.run(DefaultBuildOperationExecutor.java:92)
at org.gradle.internal.operations.DelegatingBuildOperationExecutor.run(DelegatingBuildOperationExecutor.java:31)
at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.executeAction(ExecuteActionsTaskExecuter.java:327)
at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.executeActions(ExecuteActionsTaskExecuter.java:312)
at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.access$200(ExecuteActionsTaskExecuter.java:75)
at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter$TaskExecution.execute(ExecuteActionsTaskExecuter.java:158)
at org.gradle.internal.execution.impl.steps.ExecuteStep.execute(ExecuteStep.java:46)
at org.gradle.internal.execution.impl.steps.CancelExecutionStep.execute(CancelExecutionStep.java:34)
at org.gradle.internal.execution.impl.steps.TimeoutStep.executeWithoutTimeout(TimeoutStep.java:69)
at org.gradle.internal.execution.impl.steps.TimeoutStep.execute(TimeoutStep.java:49)
at org.gradle.internal.execution.impl.steps.CatchExceptionStep.execute(CatchExceptionStep.java:34)
at org.gradle.internal.execution.impl.steps.CreateOutputsStep.execute(CreateOutputsStep.java:49)
at org.gradle.internal.execution.impl.steps.SnapshotOutputStep.execute(SnapshotOutputStep.java:42)
at org.gradle.internal.execution.impl.steps.SnapshotOutputStep.execute(SnapshotOutputStep.java:28)
at org.gradle.internal.execution.impl.steps.CacheStep.executeWithoutCache(CacheStep.java:133)
at org.gradle.internal.execution.impl.steps.CacheStep.lambda$execute$5(CacheStep.java:83)
at org.gradle.internal.execution.impl.steps.CacheStep.execute(CacheStep.java:82)
at org.gradle.internal.execution.impl.steps.CacheStep.execute(CacheStep.java:37)
at org.gradle.internal.execution.impl.steps.PrepareCachingStep.execute(PrepareCachingStep.java:33)
at org.gradle.internal.execution.impl.steps.StoreSnapshotsStep.execute(StoreSnapshotsStep.java:38)
at org.gradle.internal.execution.impl.steps.StoreSnapshotsStep.execute(StoreSnapshotsStep.java:23)
at org.gradle.internal.execution.impl.steps.SkipUpToDateStep.executeBecause(SkipUpToDateStep.java:95)
at org.gradle.internal.execution.impl.steps.SkipUpToDateStep.lambda$execute$0(SkipUpToDateStep.java:88)
at org.gradle.internal.execution.impl.steps.SkipUpToDateStep.execute(SkipUpToDateStep.java:52)
at org.gradle.internal.execution.impl.steps.SkipUpToDateStep.execute(SkipUpToDateStep.java:36)
at org.gradle.internal.execution.impl.DefaultWorkExecutor.execute(DefaultWorkExecutor.java:34)
at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.execute(ExecuteActionsTaskExecuter.java:109)
... 37 more
Caused by: com.google.cloud.tools.jib.plugins.common.BuildStepsExecutionException: C:\Users\WESTER~1\AppData\Local\Temp\16164656866093264693\33e3f3775358985441c3bea658f06f5307326c83f9c0bcbf8aa4acb327abffde\layer.tar: �ͻ���û���������Ȩ��
at com.google.cloud.tools.jib.plugins.common.JibBuildRunner.runBuild(JibBuildRunner.java:285)
at com.google.cloud.tools.jib.gradle.BuildDockerTask.buildDocker(BuildDockerTask.java:126)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at org.gradle.internal.reflect.JavaMethod.invoke(JavaMethod.java:103)
... 75 more
Caused by: java.nio.file.FileSystemException: C:\Users\WESTER~1\AppData\Local\Temp\16164656866093264693\33e3f3775358985441c3bea658f06f5307326c83f9c0bcbf8aa4acb327abffde\layer.tar: �ͻ���û���������Ȩ��
at com.google.cloud.tools.jib.tar.TarExtractor.extract(TarExtractor.java:93)
at com.google.cloud.tools.jib.tar.TarExtractor.extract(TarExtractor.java:49)
at com.google.cloud.tools.jib.builder.steps.LocalBaseImageSteps.cacheDockerImageTar(LocalBaseImageSteps.java:217)
at com.google.cloud.tools.jib.builder.steps.LocalBaseImageSteps.lambda$retrieveDockerDaemonLayersStep$0(LocalBaseImageSteps.java:133)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
The error message in Chinese is
C:\Users\WESTER~1\AppData\Local\Temp\16164656866093264693\33e3f3775358985441c3bea658f06f5307326c83f9c0bcbf8aa4acb327abffde\layer.tar: 客户端没有所需的特权。
And I think it can be translated into "A required permission is not held by the client".
I suspect this is because my user is not added to docker-user group, as stated here. But, I uninstalled Docker toolbox and I don't see this group anymore, as it sets DOCKER_HOST and interferes with my setup. Secondly, I don't have Local Users and Group available, seems Windows 10 Home edition does not have it.
Should I try to install gpedit in my Home Edition, add the group and try? But without Docker toolbox, I doubt it would work. Docker documentation explains here that it creates the group and configure it to ensure separation of permissions between root/admin and non-root/non-admin users; I think only creating that group will not work. https://docs.docker.com/desktop/windows/permission-requirements/
But, when I use docker.exe to connect to WSL 2 and save a tar file to C:\Users\WESTER~1\AppData\Local\Temp, it works. The tar file is created and not corrupted. So I think it's not a permission error; anyone can access that dir.
Windows bundled bsd-tar.exe has nothing to do with it; renaming the tar.exe in System32 and build, the error is the same.
It is solved when I run cmd as admin and cd to project dir and do gradlew jibDockerBuild. Image built and loaded into WSL 2 daemon successfully. It is indeed file system permission error.
Although still very strange(as I allowed the permission to everyone on that folder), but at least this is one workaround.
Another workaround, even better:
As per https://github.com/microsoft/WSL/issues/4983, I changed the jib config to set docker host to be http://[::1]:2375, and suddenly it works. Seems only ipv6 is bind.
Now not only the host is reachable, even permission error disappears; no insecure_registries settings needed, neither.

Facing issue while deploying Docker images through AWS-Greengrass Connector Service

BACKGROUND:
We are trying to deploy App as a docker container through AWS-Greengrass Connector Service to the edge device (Running Greengrass core as container in Linux env).
We are configuring the greengrass group connector in cloud for docker app deployment.
ISSUES:
While deploying from AWS greengrass group (AWS cloud), we are able to see successful deployment message, but application is not getting deployed to the edge device (running greengrass core as container).
LOGS:
DockerApplicationDeploymentLog:
[2020-11-05T10:35:42.632Z][FATAL]-lambda_runtime.py:381,Failed to initialize Lambda runtime due to exception: "getgrnam(): name not found: 'docker'"
[2020-11-05T10:35:44.789Z][WARN]-ipc_client.py:162,deprecated arg port=8000 will be ignored
[2020-11-05T10:35:45.012Z][WARN]-ipc_client.py:162,deprecated arg port=8000 will be ignored
[2020-11-05T10:35:45.012Z][INFO]-docker_deployer.py:41,docker deployer starting up
[2020-11-05T10:35:45.012Z][INFO]-docker_deployer.py:45,checking inputs
[2020-11-05T10:35:45.012Z][INFO]-docker_deployer.py:52,docker group permissions
[2020-11-05T10:35:45.02Z][FATAL]-lambda_runtime.py:141,Failed to import handler function "handlers.function_handler" due to exception: "getgrnam(): name not found: 'docker'"
RuntimeSystemLog:
[2020-11-05T10:31:49.78Z][DEBUG]-Restart worker because it was killed. {"workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5", "funcArn": "arn:aws:lambda:ap-south-1:aws:function:DockerApplicationDeployment:6"}
[2020-11-05T10:31:49.78Z][DEBUG]-Reserve worker. {"workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5", "funcArn": "arn:aws:lambda:ap-south-1:aws:function:DockerApplicationDeployment:6"}
[2020-11-05T10:31:49.78Z][DEBUG]-Doing start attempt: {"Attempt count": 0, "workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5", "funcArn": "arn:aws:lambda:ap-south-1:aws:function:DockerApplicationDeployment:6"}
[2020-11-05T10:31:49.78Z][DEBUG]-Creating directory. {"dir": "/greengrass/ggc/packages/1.11.0/var/lambda/8b0ee21d-e481-4d27-5e30-cb4d912547f5"}
[2020-11-05T10:31:49.78Z][DEBUG]-changed ownership {"path": "/greengrass/ggc/packages/1.11.0/var/lambda/8b0ee21d-e481-4d27-5e30-cb4d912547f5", "new uid": 121, "new gid": 121}
[2020-11-05T10:31:49.782Z][DEBUG]-Resolving environment variable {"Variable": "PYTHONPATH=/greengrass/ggc/deployment/lambda/arn.aws.lambda.ap-south-1.aws.function.DockerApplicationDeployment.6"}
[2020-11-05T10:31:49.79Z][DEBUG]-Resolving environment variable {"Variable": "PATH=/usr/bin:/usr/local/bin"}
[2020-11-05T10:31:49.799Z][DEBUG]-Resolving environment variable {"Variable": "DOCKER_DEPLOYER_DOCKER_COMPOSE_DESTINATION_FILE_PATH=/home/ggc_user"}
[2020-11-05T10:31:49.82Z][DEBUG]-Creating new worker. {"functionArn": "arn:aws:lambda:ap-south-1:aws:function:DockerApplicationDeployment:6", "workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5"}
[2020-11-05T10:31:49.82Z][DEBUG]-Starting worker process. {"workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5"}
[2020-11-05T10:31:49.829Z][DEBUG]-Worker process started. {"workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5", "pid": 20471}
[2020-11-05T10:31:49.83Z][DEBUG]-Start work result: {"workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5", "funcArn": "arn:aws:lambda:ap-south-1:aws:function:DockerApplicationDeployment:6", "state": "Starting", "initDurationSeconds": 0.012234454}
[2020-11-05T10:31:49.831Z][INFO]-Created worker. {"functionArn": "arn:aws:lambda:ap-south-1:aws:function:DockerApplicationDeployment:6", "workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5", "pid": 20471}
[2020-11-05T10:31:53.155Z][DEBUG]-Received a credential provider request {"serverLambdaArn": "arn:aws:lambda:::function:GGTES", "clientId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5"}
[2020-11-05T10:31:53.156Z][DEBUG]-WorkManager getting work {"workerId": "148f7a1a-168f-40a5-682d-92e00d56a5df", "funcArn": "arn:aws:lambda:::function:GGTES", "invocationId": "955c2c43-1187-4001-7988-4213b95eb584"}
[2020-11-05T10:31:53.156Z][DEBUG]-Successfully GET work. {"invocationId": "955c2c43-1187-4001-7988-4213b95eb584", "fromWorkerId": "148f7a1a-168f-40a5-682d-92e00d56a5df", "ofFunction": "arn:aws:lambda:::function:GGTES"}
[2020-11-05T10:31:53.156Z][DEBUG]-POST work result. {"invocationId": "955c2c43-1187-4001-7988-4213b95eb584", "ofFunction": "arn:aws:lambda:::function:GGTES"}
[2020-11-05T10:31:53.156Z][DEBUG]-WorkManager putting work result. {"workerId": "148f7a1a-168f-40a5-682d-92e00d56a5df", "invocationId": "955c2c43-1187-4001-7988-4213b95eb584"}
[2020-11-05T10:31:53.156Z][DEBUG]-WorkManager put work result successfully. {"workerId": "148f7a1a-168f-40a5-682d-92e00d56a5df", "invocationId": "955c2c43-1187-4001-7988-4213b95eb584"}
[2020-11-05T10:31:53.156Z][DEBUG]-Successfully POST work result. {"invocationId": "955c2c43-1187-4001-7988-4213b95eb584", "ofFunction": "arn:aws:lambda:::function:GGTES"}
[2020-11-05T10:31:53.157Z][DEBUG]-Handled a credential provider request {"clientId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5"}
[2020-11-05T10:31:53.158Z][DEBUG]-GET work item. {"fromWorkerId": "148f7a1a-168f-40a5-682d-92e00d56a5df", "ofFunction": "arn:aws:lambda:::function:GGTES"}
[2020-11-05T10:31:53.158Z][DEBUG]-Worker timer doesn't exist. {"workerId": "148f7a1a-168f-40a5-682d-92e00d56a5df"}
Did you doublecheck to meet the requirments listed in
https://docs.aws.amazon.com/greengrass/latest/developerguide/docker-app-connector.html
https://docs.aws.amazon.com/greengrass/latest/developerguide/docker-app-connector.html#docker-app-connector-linux-user
I dont know this particular error, but it complains about some missing basic user/group settings:
[2020-11-05T10:35:42.632Z][FATAL]-lambda_runtime.py:381,Failed to initialize Lambda runtime due to exception: "getgrnam(): name not found: 'docker'"

Clustered Vert.x not working in docker if vert.x instance is created manually

I setup a small project for learning the capabilities of Vert.x in a cluster environment but I'm facing some weird issues when I try to create the vertx instance inside a Docker image.
The project consists just in 2 verticles being deployed in different Docker containers and using the event bus to communicate with each other
If I use the vertx provided launcher:
Launcher.executeCommand("run", verticleClass, "--cluster")
(or just stating that the main class is io.vertx.core.Launcher and putting the right arguments)
Everything works both locally and inside docker images. But if I try to create the vertx instance manually with
Vertx.rxClusteredVertx(VertxOptions())
.flatMap { it.rxDeployVerticle(verticleClass) }
.subscribe()
Then it's not working in Docker (it works locally). Or, more visually
| | Local | Docker |
|:---------------: |:-----: |:------: |
| Vertx launcher | Y | Y |
| Custom launcher | Y | N |
By checking the Docker logs it seems that everything works. I can see that both verticles know each other:
Members [2] {
Member [172.18.0.2]:5701 - c5e9636d-b3cd-4e24-a8ce-e881218bf3ce
Member [172.18.0.3]:5701 - a09ce83d-e0b3-48eb-aad7-fbd818c389bc this
}
But when I try to send a message through the event bus the following exception is thrown:
WARNING: Connecting to server localhost:33845 failed
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:33845
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:325)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:886)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused
... 11 more
Just for simplifying stuff I uploaded the project to Github. I tried to make it as simple as possible so it has 2 verticles & 2 main classes and lots of scripts for every combination
By checking the Docker logs it seems that everything works. I can see
that both verticles know each other
Yes, because your cluster manager works fine, but you should also make your event bus configuration be consistent in every node (machine/docker container) in your cluster because as mentioned in Vert.x documentation: cluster managers do not handle the event bus inter-node transport, this is done directly by Vert.x with TCP connections.
You have to set the cluster host on each node to be the IP address of this node itself and any arbitrary port number (unless you try to run more than Vert.x instance on the same node you have to choose a different port number for each Vert.x instance). The error you are facing is because the default cluster host is local host
For example if a node's IP address is 192.168.1.12 then you would do the following:
VertxOptions options = new VertxOptions()
.setClustered(true)
.setClusterHost("192.168.1.12") // node ip
.setClusterPort(17001) // any arbitrary port but make sure no other Vert.x instances using same port on the same node
.setClusterManager(clusterManager);
on another node whose IP address is 192.168.1.56 then you would do the following:
VertxOptions options = new VertxOptions()
.setClustered(true)
.setClusterHost("192.168.1.56") // other node ip
.setClusterPort(17001) // it is ok because this is a different node
.setClusterManager(clusterManager);
By researching a bit more into the Vert.X Launcher code I found out that internally it's doing more things than just "parsing" the inputs
In case vertx is configured to run in cluster mode the Launcher is setting the cluster address itself (via BareCommand class) so in order to replicate the Launcher behaviour with your own Main class (and having the flexibility to configure your vertx instance via code instead of via args) the code will be as follow:
fun main(args: Array<String>) {
Vertx.rxClusteredVertx(
VertxOptions(
clusterHost = getDefaultAddress()
)
)
.flatMap { it.rxDeployVerticle(verticleClass) }
.subscribe()
}
// As taken from io.vertx.core.impl.launcher.commands.BareCommand
fun getDefaultAddress(): String? {
val nets: Enumeration<NetworkInterface>
try {
nets = NetworkInterface.getNetworkInterfaces()
} catch (e: SocketException) {
return null
}
var netinf: NetworkInterface
while (nets.hasMoreElements()) {
netinf = nets.nextElement()
val addresses = netinf.inetAddresses
while (addresses.hasMoreElements()) {
val address = addresses.nextElement()
if (!address.isAnyLocalAddress && !address.isMulticastAddress
&& address !is Inet6Address) {
return address.hostAddress
}
}
}
return null
}

vertx clustered mode hazelcast log config on linux

Using Eclipse on Windows, a vertx Verticle with a misconfigured cluster.xml shows the following error in the Eclipse console:
11:46:18.536 [hz._hzInstance_1_dev.generic-operation.thread-0] ERROR com.hazelcast.cluster - [192.168.25.8]:5701 [dev] [3.5.2] Node could not join cluster. A Configuration mismatch was detected: Incompatible joiners! expected: multicast, found: tcp-ip Node is going to shutdown now!
11:46:22.529 [vert.x-worker-thread-0] ERROR com.hazelcast.cluster.impl.TcpIpJoiner - [192.168.25.8]:5701 [dev] [3.5.2] com.hazelcast.core.HazelcastInstanceNotActiveException: Hazelcast instance is not active!
This is fine, I know to reconfigure the cluster for multicast. The problem is when I deploy the same code and configuration to Linux, and run it as a fat jar then the same log doesn't show either the hz thread or the vertx worker thread logs. Instead it shows the verticle logs as:
2015-11-05 12:03:09,329 Starting clustered Vertx
2015-11-05 12:03:13,549 ERROR: VerticleService failed to start: java.lang.NullPointerException
So if I run on Linux the log to tell me there's a misconfiguration isn't showing. There's something I am missing in the vertx / maven log config but I don't know what. Maven properties are as follows:
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<exec.mainClass>main.java.eiger.isct.service.Verticle</exec.mainClass>
<log4j.configurationFile>log4j2.xml</log4j.configurationFile>
<hazelcast.logging.type>log4j2</hazelcast.logging.type>
</properties>
and I start the fat jar using:
java -Dlog4j.configuration=log4j2.xml -jar Verticle-0.5-SNAPSHOT-fat.jar
How can I get the hz thread and vertx thread to log on Linux?
I've tried adding a vertx-default-jul-logging.properties file below to the maven resources dir but no luck.
com.hazelcast.level=ALL
java.util.logging.ConsoleHandler.level=ALL
java.util.logging.FileHandler.level=ALL
THANKS for your comment.
Vertx has started logging having added
-Djava.util.logging.config.file=../logging.properties
to the java start command and with the default logging.properties like (and this is a nice config for lower level stuff):
handlers=java.util.logging.ConsoleHandler,java.util.logging.FileHandler
java.util.logging.SimpleFormatter.format=%1$tY-%1$tm-%1$td %1$tH:%1$tM:%1$tS:%1$tL %4$s %2$s %5$s%6$s%n
java.util.logging.ConsoleHandler.formatter=java.util.logging.SimpleFormatter
java.util.logging.ConsoleHandler.level=ALL
java.util.logging.FileHandler.level=ALL
java.util.logging.FileHandler.formatter=java.util.logging.SimpleFormatter
java.util.logging.FileHandler.pattern=../logs/vertx.log
.level=ALL
io.vertx.level=ALL
com.hazelcast.level=ALL
io.netty.util.internal.PlatformDependent.level=ALL
and vertx is logging to ../logs/vertx.log on Linux

Flume NullPointerExceptions on checkpoint

I've setup a file to file source/sink , just as a test of basic flume functionality.
Im currently using the "exec" source, with the command being "tail -F mytmpfile".
In my script, I continuously echo "....." >> mytmpfile , so that the tail command constitutes a stream.
However, I've started seeing the following exception in the flume logs:
java.lang. IllegalStateException: Channel closed [channel=c1]. Due to
java.lang.NullPointerException: null
at org.apache.flume.channel.file.FileChannel.createTransaction(FileChannel.java:353)
at org.apache.flume.channel.BasicChannelSemantics.getTransaction(BasicChannelSemantics.java:122)
at org.apache.flume.sink.RollingFileSink.process(RollingFileSink.java:183)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.NullPointerException
at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:895)
at org.apache.flume.channel.file.Log.replay(Log.java:406)
at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:303)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:236)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
... 1 more
Any thoughts on where this NullPointerException is coming from? It appears from scanning the code that maybe it related to a missing folder or directory. But I cant find the exact line on the git hub branches.
This is using apache-flume-1.3.1.23-...
In the past I've had problems with file channels, and they've normally boiled down to two problems:
1) If you're running multiple agents on the same box, make sure you configure them to have separate dataDirs and checkpointDir.
2) On Linux boxes, check that your tmpfs isn't near its capacity. If it's getting full, flume will complain. Try stopping the flume agent, unmount tmpfs, enlarge it, remount and restart the agent.

Resources