Dask Gateway, set worker resources - dask

I am trying to set the resources for workers as per the docs here, but on a set up that uses Dask Gateway. Specifically, I'd like to be able to follow the answer to this question, but using Dask Gateway.
I haven't been able to find a reference to worker resources in the ClusterConfig options, and I tried the following (as per this answer), which doesn't seem to work:
def set_resources(dask_worker):
dask_worker.set_resources(task_limit=1)
return dask_worker.available_resources, dask_worker.total_resources
client.run(set_resources)
# output from a 1 worker cluster
> {'tls://255.0.91.211:39302': ({}, {})}
# checking info known by scheduler
cluster.scheduler_info
> {'type': 'Scheduler',
'id': 'Scheduler-410438c9-6b3a-494d-974a-52d9e9fss121',
'address': 'tls://255.0.44.161:8786',
'services': {'dashboard': 8787, 'gateway': 8788},
'started': 1632434883.9022279,
'workers': {'tls://255.0.92.232:39305': {'type': 'Worker',
'id': 'dask-worker-f95c163cf41647c6a6d85da9efa9919b-wvnf6',
'host': '255.0.91.211',
'resources': {}, #### still {} empty dict
'local_directory': '/home/jovyan/dask-worker-space/worker-ir8tpkz_',
'name': 'dask-worker-f95c157cf41647c6a6d85da9efa9919b-wvnf6',
'nthreads': 4,
'memory_limit': 6952476672,
'services': {'dashboard': 8787},
'nanny': 'tls://255.0.92.232:40499'}}}
How can this be done, either when the cluster is created using the config.yaml of the helm chart (ideally, a field in the cluster options that a user can change!) for Dask Gateway, or after the workers are already up and running?

I've found a way to specify this, at least on Kubernetes, is through the KubeClusterConfig.worker_extra_container_config. This is my yaml snippet for a working configuration (specifically, this is in my config for the daskhub helm deploy):
dask-gateway:
gateway:
backend:
worker:
extraContainerConfig:
env:
- name: DASK_DISTRIBUTED__WORKER__RESOURCES__TASKSLOTS
value: "1"
An option to set worker resources isn't exposed in the cluster options, and isn't explicitly exposed in the KubeClusterConfig. The specific format for the environment variable is described here. Resource environment variables need to be set before the dask worker process is started, I found it doesn't work when I set KubeClusterConfig.environment.
Using this, I am able to run multithreaded numpy (np.dot) using mkl in a dask worker container that has been given 4 cores. I see 400% CPU usage and only one task assigned to each worker.

Related

Cypher queries fails with Neo4jError: Unknown function 'apoc.convert.fromJsonMap' but apoc should be installed

I deployed Neo4j in my AKS cluster using the standalone Helm chart.
It all gets deployed and my Node.js server connects to Neo4j correctly.
However queries throw the Neo4jError: Unknown function 'apoc.convert.fromJsonMap' error, so apoc is clearly missing.
I followed the procedure described here https://neo4j.com/docs/operations-manual/current/kubernetes/configuration/#operations-installing-plugins and my Values are here below.
The only difference I find is that in the guide apoc core is actually enabled afterwards by upgrading the helm chart, while I'm installing it with the option enabled already.
Looking at https://neo4j.com/docs/apoc/current/config/ I saw
As of Neo4j v.5.0, APOC config settings are no longer supported in the neo4j.conf file. Please move all apoc.* settings to apoc.conf. It is also possible to set the config settings using environment variables.
so as neo4j-standalone is using version 4.4.16 I moved the apoc configurations from apoc.config to neo4.config but still apoc procedures are not found by the queries.
Is there something I'm missing out to configure in order to enable apoc?
Thank you very much.
neo4j-db:
# neo4j-standalone:
nameOverride: "neo4j"
fullnameOverride: 'neo4j'
neo4j:
# Name of your cluster
name: "fixit-neo4j" # this will be the label: app: value for the service selector
password: "password"
##
passwordFromSecret: ""
passwordFromSecretLookup: false
edition: "community"
acceptLicenseAgreement: "yes"
offlineMaintenanceModeEnabled: false
resources:
cpu: "1000m"
memory: "2Gi"
volumes:
data:
mode: 'volumeClaimTemplate'
volumeClaimTemplate:
accessModes:
- ReadWriteOnce
storageClassName: neo4j-sc-data
resources:
requests:
storage: 4Gi
backups:
mode: 'share' # share an existing volume (e.g. the data volume)
share:
name: 'logs'
logs:
mode: 'volumeClaimTemplate'
volumeClaimTemplate:
accessModes:
- ReadWriteOnce
storageClassName: neo4j-sc-logs
resources:
requests:
storage: 4Gi
services:
# A ClusterIP service with the same name as the Helm Release name should be used for Neo4j Driver connections originating inside the
# Kubernetes cluster.
default:
# Annotations for the K8s Service object
annotations: { }
# A LoadBalancer Service for external Neo4j driver applications and Neo4j Browser
neo4j:
### this would create cluster-neo4j svc
enabled: false
# env:
# NEO4J_PLUGINS: '["graph-data-science"]'
config:
server.bolt.enabled : "true"
server.bolt.tls_level: "REQUIRED"
server.bolt.listen_address: "0.0.0.0:7687"
dbms.ssl.policy.bolt.client_auth: "NONE"
dbms.ssl.policy.bolt.enabled: "true"
server.directories.plugins: "/var/lib/neo4j/labs"
dbms.security.procedures.unrestricted: "apoc.*"
server.config.strict_validation.enabled: "false"
dbms.security.procedures.allowlist: "gds.*,apoc.*"
apoc_config:
apoc.trigger.enabled: "true"
apoc.jdbc.neo4j.url: "jdbc:foo:bar"
apoc.import.file.enabled: "true"
startupProbe:
failureThreshold: 1000
periodSeconds: 50
ssl:
# setting per "connector" matching neo4j config
bolt:
privateKey:
secretName: tls-secret
subPath: tls.key
publicCertificate:
secretName: tls-secret
subPath: tls.crt
trustedCerts:
sources: [ ]
revokedCerts:
sources: [ ]
OK after a bit of looking at quite a few issues on the same subject, I found that some solutions for this problem was to add dbms.directories.plugins: "/var/lib/neo4j/labs" and dbms.config.strict_validation: "false" in the config section which, as I understand it, mirrors these settings both for server and dbms. It indeed worked, but it's weird that in the official guide it's not mentioned. I mean, these mirrored settings make sense, tell both the server and the dbms where to look for plugins, but still it should be mentioned. I see so many post about this, which means the documentation is not clear enough. It's easy to take things for granted and in fact because this mirrored plugin location both for the server AND dbms need is just not stated anywhere in the docs, I as many others thought that dbms was already configured with the same location as server.directories.plugins: "/var/lib/neo4j/labs" ( which the docs say to configure ) and haven't added it, but hey.. ain't nobody's perfect I guess. Hope they change the docs then for future devs' sake, but meanwhile this answer could be helpful.
So the correct configuration is
env:
NEO4J_PLUGINS: '["graph-data-science"]'
config:
server.bolt.enabled: 'true'
server.bolt.tls_level: 'REQUIRED'
server.bolt.listen_address: '0.0.0.0:7687'
dbms.ssl.policy.bolt.client_auth: 'NONE'
dbms.ssl.policy.bolt.enabled: 'true'
## apoc
server.directories.plugins: '/var/lib/neo4j/labs'
server.config.strict_validation.enabled: 'false'
dbms.security.procedures.unrestricted: 'apoc.*'
dbms.security.procedures.allowlist: 'gds.*,apoc.*'
### additional needed dbms config mirroring server config
dbms.directories.plugins: "/var/lib/neo4j/labs"
dbms.config.strict_validation: "false"
apoc_config:
apoc.trigger.enabled: "true"
apoc.jdbc.neo4j.url: "jdbc:foo:bar"
apoc.import.file.enabled: "true"
It seems the docs are missing installing the APOC plugin. Change the following line to include APOC as well:
NEO4J_PLUGINS: '["graph-data-science", "apoc"]'
and you should be good

Deploying smart contract using truffle on private blockchain node on docker

I am facing problems deploying a smart contract on my private blockchain network. I created my blockchain network on three VMs (miners) using puppeth on a fourth VM (controller) by following the steps in this blog: https://medium.com/#collin.cusce/using-puppeth-to-manually-create-an-ethereum-proof-of-authority-clique-network-on-aws-ae0d7c906cce
Afterwards, I installed truffle on one of the miners VMs and i initialized truffle using the command:
truffle init
Then I wrote a simple hello world smart contract, compiled it and deployed it on truffle development blockchain and it worked. However, I tried to deploy it on my private blockchain but I can't connect to the network.
The admin.nodeInfo command in geth console returns the folowing output:
docker exec -it 954cd3955065 geth attach ipc:/root/.ethereum/geth.ipc
Welcome to the Geth JavaScript console!
instance: Geth/v1.9.25-unstable-ead81461-20201123/linux-amd64/go1.15.5
coinbase: 0xe8cc4bea2cfdfd14cddefe1141bedd109576b9a9
at block: 78558 (Tue Dec 01 2020 22:01:02 GMT+0000 (UTC))
datadir: /root/.ethereum
modules: admin:1.0 clique:1.0 debug:1.0 eth:1.0 miner:1.0 net:1.0 personal:1.0 rpc:1.0 txpool:1.0 web3:1.0
To exit, press ctrl-d
> admin.nodeInfo
{
enode: "enode://7206ca3c62f6db47e1230dcf14a765d4c9b4870a66470dbb21fcc5ed2fab2167d6bcc47eec8044c42037b3e6e0017aeb8ddfc3580471da54a6c7274a0c1fe46b#10.100.2.32:30303",
enr: "enr:-Je4QGXlVAESp8r2s1uHBJxoDLWQo8IvZsbe5sX2YRBb0un9Gdlt8nfDKQBR_j0lDPtaoCCuis4cJJlqtEHfa4tLO2EIg2V0aMfGhG5b-B6AgmlkgnY0gmlwhApkAiCJc2VjcDI1NmsxoQNyBso8YvbbR-EjDc8Up2XUybSHCmZHDbsh_MXtL6shZ4N0Y3CCdl-DdWRwgnZf",
id: "027a351994ac1b127df56180b6210310cc0164f17f1b12c167cb167c4ffaa122",
ip: "10.100.2.32",
listenAddr: "[::]:30303",
name: "Geth/v1.9.25-unstable-ead81461-20201123/linux-amd64/go1.15.5",
ports: {
discovery: 30303,
listener: 30303
},
protocols: {
eth: {
config: {
byzantiumBlock: 0,
chainId: 1515,
clique: {...},
constantinopleBlock: 0,
eip150Block: 0,
eip150Hash: "0x0000000000000000000000000000000000000000000000000000000000000000",
eip155Block: 0,
eip158Block: 0,
homesteadBlock: 0,
istanbulBlock: 0,
petersburgBlock: 0
},
difficulty: 98201,
genesis: "0x17f752387c901db617cf0594ecd2cb9811dfcd666318c2e0e7cb0239471da979",
head: "0xf8a37d0390558746901faa55463c127c553f02cf2d23ce0cb469fcd470c810f9",
network: 1515
}
}
}
I tried adding the network configuration in truffle-config.js like this:
devnet2: {
host: "localhost",
port: "30303", //port where the node is
network_id: "*",
from: 0x91cd7b879fefff34259d577a56d290b3315bf9b3 // Treats this network as if it was a public net. (default: false)
}
then, when deploying using the command truffle deploy --network devnet2 i always get this error:
Compiling your contracts...
===========================
> Everything is up to date, there is nothing to compile.
/usr/local/lib/node_modules/truffle/build/webpack:/packages/provider/index.js:56
throw new Error(errorMessage);
^
Error: There was a timeout while attempting to connect to the network.
Check to see that your provider is valid.
If you have a slow internet connection, try configuring a longer timeout in your Truffle config. Use the networks[networkName].networkCheckTimeout property to do this.
at Timeout.setTimeout (/usr/local/lib/node_modules/truffle/build/webpack:/packages/provider/index.js:56:1)
at ontimeout (timers.js:436:11)
at tryOnTimeout (timers.js:300:5)
at listOnTimeout (timers.js:263:5)
at Timer.processTimers (timers.js:223:10)
I tried extending the timeout limit but it didn't work. I also tried using Web3 Providers (HTTPProvider and IPCProvider) but without any luck (i can give more details, if needed).
Any help is well appreciated because i spent a lot of time on it without getting anywhere. Unfortunately, i couldn't find anything on deploying smart contracts to a node that is running on docker. If needed, i can gladly give more details about what i did.
I managed to run smart contracts on a private network, not using docker however. Some things come to mind. did you run a miner on your network? you will need to run a miner so that the contract gets migrated. Did you make sure that the gaslimit is met when running the contract? the miners will wait for the max gas limit to be reached before processing any request.
Did you already deploy the contract? in the migration scripts you either create a new migration script by bumping the version or use the reset flag to run all migration scripts again.

Facing issue while deploying Docker images through AWS-Greengrass Connector Service

BACKGROUND:
We are trying to deploy App as a docker container through AWS-Greengrass Connector Service to the edge device (Running Greengrass core as container in Linux env).
We are configuring the greengrass group connector in cloud for docker app deployment.
ISSUES:
While deploying from AWS greengrass group (AWS cloud), we are able to see successful deployment message, but application is not getting deployed to the edge device (running greengrass core as container).
LOGS:
DockerApplicationDeploymentLog:
[2020-11-05T10:35:42.632Z][FATAL]-lambda_runtime.py:381,Failed to initialize Lambda runtime due to exception: "getgrnam(): name not found: 'docker'"
[2020-11-05T10:35:44.789Z][WARN]-ipc_client.py:162,deprecated arg port=8000 will be ignored
[2020-11-05T10:35:45.012Z][WARN]-ipc_client.py:162,deprecated arg port=8000 will be ignored
[2020-11-05T10:35:45.012Z][INFO]-docker_deployer.py:41,docker deployer starting up
[2020-11-05T10:35:45.012Z][INFO]-docker_deployer.py:45,checking inputs
[2020-11-05T10:35:45.012Z][INFO]-docker_deployer.py:52,docker group permissions
[2020-11-05T10:35:45.02Z][FATAL]-lambda_runtime.py:141,Failed to import handler function "handlers.function_handler" due to exception: "getgrnam(): name not found: 'docker'"
RuntimeSystemLog:
[2020-11-05T10:31:49.78Z][DEBUG]-Restart worker because it was killed. {"workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5", "funcArn": "arn:aws:lambda:ap-south-1:aws:function:DockerApplicationDeployment:6"}
[2020-11-05T10:31:49.78Z][DEBUG]-Reserve worker. {"workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5", "funcArn": "arn:aws:lambda:ap-south-1:aws:function:DockerApplicationDeployment:6"}
[2020-11-05T10:31:49.78Z][DEBUG]-Doing start attempt: {"Attempt count": 0, "workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5", "funcArn": "arn:aws:lambda:ap-south-1:aws:function:DockerApplicationDeployment:6"}
[2020-11-05T10:31:49.78Z][DEBUG]-Creating directory. {"dir": "/greengrass/ggc/packages/1.11.0/var/lambda/8b0ee21d-e481-4d27-5e30-cb4d912547f5"}
[2020-11-05T10:31:49.78Z][DEBUG]-changed ownership {"path": "/greengrass/ggc/packages/1.11.0/var/lambda/8b0ee21d-e481-4d27-5e30-cb4d912547f5", "new uid": 121, "new gid": 121}
[2020-11-05T10:31:49.782Z][DEBUG]-Resolving environment variable {"Variable": "PYTHONPATH=/greengrass/ggc/deployment/lambda/arn.aws.lambda.ap-south-1.aws.function.DockerApplicationDeployment.6"}
[2020-11-05T10:31:49.79Z][DEBUG]-Resolving environment variable {"Variable": "PATH=/usr/bin:/usr/local/bin"}
[2020-11-05T10:31:49.799Z][DEBUG]-Resolving environment variable {"Variable": "DOCKER_DEPLOYER_DOCKER_COMPOSE_DESTINATION_FILE_PATH=/home/ggc_user"}
[2020-11-05T10:31:49.82Z][DEBUG]-Creating new worker. {"functionArn": "arn:aws:lambda:ap-south-1:aws:function:DockerApplicationDeployment:6", "workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5"}
[2020-11-05T10:31:49.82Z][DEBUG]-Starting worker process. {"workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5"}
[2020-11-05T10:31:49.829Z][DEBUG]-Worker process started. {"workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5", "pid": 20471}
[2020-11-05T10:31:49.83Z][DEBUG]-Start work result: {"workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5", "funcArn": "arn:aws:lambda:ap-south-1:aws:function:DockerApplicationDeployment:6", "state": "Starting", "initDurationSeconds": 0.012234454}
[2020-11-05T10:31:49.831Z][INFO]-Created worker. {"functionArn": "arn:aws:lambda:ap-south-1:aws:function:DockerApplicationDeployment:6", "workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5", "pid": 20471}
[2020-11-05T10:31:53.155Z][DEBUG]-Received a credential provider request {"serverLambdaArn": "arn:aws:lambda:::function:GGTES", "clientId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5"}
[2020-11-05T10:31:53.156Z][DEBUG]-WorkManager getting work {"workerId": "148f7a1a-168f-40a5-682d-92e00d56a5df", "funcArn": "arn:aws:lambda:::function:GGTES", "invocationId": "955c2c43-1187-4001-7988-4213b95eb584"}
[2020-11-05T10:31:53.156Z][DEBUG]-Successfully GET work. {"invocationId": "955c2c43-1187-4001-7988-4213b95eb584", "fromWorkerId": "148f7a1a-168f-40a5-682d-92e00d56a5df", "ofFunction": "arn:aws:lambda:::function:GGTES"}
[2020-11-05T10:31:53.156Z][DEBUG]-POST work result. {"invocationId": "955c2c43-1187-4001-7988-4213b95eb584", "ofFunction": "arn:aws:lambda:::function:GGTES"}
[2020-11-05T10:31:53.156Z][DEBUG]-WorkManager putting work result. {"workerId": "148f7a1a-168f-40a5-682d-92e00d56a5df", "invocationId": "955c2c43-1187-4001-7988-4213b95eb584"}
[2020-11-05T10:31:53.156Z][DEBUG]-WorkManager put work result successfully. {"workerId": "148f7a1a-168f-40a5-682d-92e00d56a5df", "invocationId": "955c2c43-1187-4001-7988-4213b95eb584"}
[2020-11-05T10:31:53.156Z][DEBUG]-Successfully POST work result. {"invocationId": "955c2c43-1187-4001-7988-4213b95eb584", "ofFunction": "arn:aws:lambda:::function:GGTES"}
[2020-11-05T10:31:53.157Z][DEBUG]-Handled a credential provider request {"clientId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5"}
[2020-11-05T10:31:53.158Z][DEBUG]-GET work item. {"fromWorkerId": "148f7a1a-168f-40a5-682d-92e00d56a5df", "ofFunction": "arn:aws:lambda:::function:GGTES"}
[2020-11-05T10:31:53.158Z][DEBUG]-Worker timer doesn't exist. {"workerId": "148f7a1a-168f-40a5-682d-92e00d56a5df"}
Did you doublecheck to meet the requirments listed in
https://docs.aws.amazon.com/greengrass/latest/developerguide/docker-app-connector.html
https://docs.aws.amazon.com/greengrass/latest/developerguide/docker-app-connector.html#docker-app-connector-linux-user
I dont know this particular error, but it complains about some missing basic user/group settings:
[2020-11-05T10:35:42.632Z][FATAL]-lambda_runtime.py:381,Failed to initialize Lambda runtime due to exception: "getgrnam(): name not found: 'docker'"

Flink consumer job is unable to connect to Kafka from Docker

I have a simple streaming Flink Scala job which connects to a Kafka topic and maps its
org.apache.avro.generic.GenericRecord messages and map into Json string.
When it is running in IntelliJ it ingests the topic well and printing out the jsons.
When I run it in docker-compose I got the following exception:
com.esotericsoftware.kryo.KryoException: Error constructing instance of class: org.apache.avro.Schema$LockableArrayList
Serialization trace:
types (org.apache.avro.Schema$UnionSchema)
schema (org.apache.avro.Schema$Field)
fieldMap (org.apache.avro.Schema$RecordSchema)
schema (org.apache.avro.generic.GenericData$Record)
at com.twitter.chill.Instantiators$$anon$1.newInstance(KryoBase.scala:136)
at com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1061)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.create(CollectionSerializer.java:89)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:93)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:22)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:143)
at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:21)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:657)
at org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.copy(KryoSerializer.java:262)
at org.apache.flink.api.java.typeutils.runtime.TupleSerializer.copy(TupleSerializer.java:111)
at org.apache.flink.api.java.typeutils.runtime.TupleSerializer.copy(TupleSerializer.java:37)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:635)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:612)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:592)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:727)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:705)
at org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104)
at org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collectWithTimestamp(StreamSourceContexts.java:111)
at org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordWithTimestamp(AbstractFetcher.java:398)
at org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.emitRecord(KafkaFetcher.java:185)
at org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.runFetchLoop(KafkaFetcher.java:150)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:715)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:208)
Caused by: java.lang.IllegalAccessException: Class com.twitter.chill.Instantiators$ can not access a member of class org.apache.avro.Schema$LockableArrayList with modifiers "public"
at sun.reflect.Reflection.ensureMemberAccess(Reflection.java:102)
at java.lang.reflect.AccessibleObject.slowCheckMemberAccess(AccessibleObject.java:296)
at java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:288)
at java.lang.reflect.Constructor.newInstance(Constructor.java:413)
at com.twitter.chill.Instantiators$.$anonfun$normalJava$1(KryoBase.scala:170)
at com.twitter.chill.Instantiators$$anon$1.newInstance(KryoBase.scala:133)
... 37 more
I tried forcing Avro serialization with:
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
env.getConfig.disableForceKryo()
env.getConfig.enableForceAvro()
but got the same error.
Based on [this][1] I'm using all Flink related dependencies as "provided" with no good result.
What can be the difference between running the job in the IDE and in
Docker?
How can I fix the job to be able to read the Kafka topic from
Docker?
How shall I setup Docker for this?
What can I handle Kryo/Avro serialization issue?
SS
[1]: http://www.alternatestack.com/development/com-esotericsoftware-kryo-kryoexception-unusual-solution-upgrading-flink/

How to capture logs from workers from a Dask-Yarn job?

I have tried using the following in ~/.config/dask/distributed.yaml and ~/.config/dask/yarn.yaml,
logging-file-config: "/path/to/config.ini"
or
logging:
version: 1
disable_existing_loggers: false
root:
level: INFO
handlers: [consoleHandler]
handlers:
consoleHandler:
class: logging.StreamHandler
level: INFO
formatter: sample_formatter
stream: ext://sys.stderr
formatters:
sample_formatter:
format: '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
and then in my function that gets evaluated at the worker:
import logging
from distributed.worker import logger
import dask
from dask.distributed import Client
from dask_yarn import YarnCluster
log = logging.getLogger(__name__)
#dask.delayed
def worker_func(args):
logger.info("This will show up in the worker logs")
log.info("This does not show up in worker logs")
return
if __name__ == "__main__":
dag_1 = {'worker_func': (worker_func, arg_1)}
tasks = dask.get(dag_1, 'load-1')
log.info("This also shows up in logs, and custom formatted)
cluster = YarnCluster()
client = Client(cluster)
dask.compute(tasks)
When I try to view the yarn logs using:
yarn logs -applicationId {application_id}
I do not see the log from log.info inside worker_func, but I do see the logs from distributed.worker.logger and from outside that function on the console. I also tried using client.get_worker_logs(), but that returned an empty dictionary. Is there a way to see customized logs from inside the function that gets evaluated at a worker?
There's a lot going on in this question, so I'm going to answer "How do I configure logging for dask-yarn workers" and hope everything else becomes clear by answering that.
Dask's configuration system is loaded locally on the machine you start a dask cluster from (usually the edge node). This configuration is not distributed to the workers automatically, you're responsible for doing that yourself. You have a few options here:
Have admin/IT put configuration in /etc/dask/ on every node, which will affect all users.
Bundle configuration with your packaged environment. Dask will load configuration from {prefix}/etc/dask/, where prefix is sys.prefix.
For example, if you have a conda environment at /path/to/environment, you'd do the following to bundle the configuration
# Create the configuration directory in the environment
mkdir -p /path/to/environment/etc/dask/
# Add your configuration to this directory
mv config.yaml /path/to/environment/etc/dask/config.yaml
# Package the environment
conda pack -p /path/to/environment -o environment.tar.gz
Any configuration values set in config.yaml will now be available on all the worker nodes. An example configuration file setting some logging configuration would be:
logging:
version: 1
root:
level: INFO
handlers: [consoleHandler]
handlers:
consoleHandler:
class: logging.StreamHandler
level: INFO
formatter: sample_formatter
stream: ext://sys.stderr
formatters:
sample_formatter:
format: '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
Logs from completed dask-yarn applications can be retrieved using the YARN cli at
yarn logs -applicationId <application-id>
Logs for running dask-yarn applications can be retrieved using client.get_worker_logs(). Note that these logs will only contain logs written to the distributed.worker logger. You cannot write to your own logger and have them appear in the output of client.get_worker_logs(). To write to this logger, get it via
import logging
logger = logging.getLogger("distributed.worker")
logger.info("Writing with the worker logger")
Any logger appropriately configured to log to stdout or stderr will appear in the logs accessed via the yarn CLI, but only the distributed.worker logger output will also be available to get_worker_logs().
Side note
I have tried using the following in ~/.config/dask/distributed.yaml and ~/.config/dask/yarn.yaml
The name of the config files doesn't matter, dask loads all yaml files in all config directories and merges their contents. For more information please read the configuration docs

Resources