sparklyr multiple Databricks notebooks, second connections fails with org.apache.spark.SparkException: Missing Credential Scope - connection

First Databricks notebook
After restart of the cluster this is working fine:
library(sparklyr)
sc <- sparklyr::spark_connect(method="databricks")
iris_tbl <- copy_to(sc, iris)
Second Databricks notebook
The second notebook fails:
library(sparklyr)
sc <- sparklyr::spark_connect(method="databricks")
iris_tbl <- copy_to(sc, iris)
Error : org.apache.spark.SparkException: Missing Credential Scope.
at com.databricks.unity.UCSDriver$Manager.$anonfun$scope$1(UCSDriver.scala:103)
at scala.Option.getOrElse(Option.scala:189)
at com.databricks.unity.UCSDriver$Manager.scope(UCSDriver.scala:103)
at com.databricks.unity.UCSDriver$Manager.currentScope(UCSDriver.scala:97)
at com.databricks.unity.UnityCredentialScope$.currentScope(UnityCredentialScope.scala:100)
For me it make no sense or do i have to set a diffremct configuration on the second connection?

Related

End of File Exception in Spark cluster

I created docker containers for Spark Standalone mode. Like in this article:
https://dev.to/mvillarrealb/creating-a-spark-standalone-cluster-with-docker-and-docker-compose-2021-update-6l4
As a driver, I make a simple spark job with calculation and try to start it in cluster mode.
val handle = new SparkLauncher()
.setMaster("spark://spark-master:7077")
.setAppName("MyDriver")
.setVerbose(true)
.setAppResource(s"hdfs://spark-master:7077/opt/spark-apps/sparkcluster_2.12-0.1.0-SNAPSHOT.jar")
.setMainClass(DriverTest.getClass.getName)
.setDeployMode("cluster")
.startApplication()
import org.apache.spark.launcher.SparkAppHandle
while (!handle.getState.equals(SparkAppHandle.State.FINISHED)) {
println("App State: " + handle.getState)
Thread.sleep(1000)
}
Could you please tell me, what URL needs set to "setAppResource" from the machine where I run SparkLaunch, from master node or HDFS path?
If I used path "hdfs://spark-master:7077/opt/spark-apps/sparkcluster_2.12-0.1.0-SNAPSHOT.jar" where spark-master:7077 master node, /opt/spark-apps/sparkcluster_2.12-0.1.0-SNAPSHOT.jar path on master node, I will get this exception.
INFO: 22/05/26 16:48:04 ERROR ClientEndpoint: Exception from cluster
was: java.io.EOFException: End of File Exception between local host
is: "spark-worker-b/172.26.0.3"; destination host is:
"spark-master":7077; : java.io.EOFException; For more details see:
http://wiki.apache.org/hadoop/EOFException мая 26, 2022 4:48:04 PM
org.apache.spark.launcher.OutputRedirector redirect INFO:
java.io.EOFException: End of File Exception between local host is:
"spark-worker-b/172.26.0.3"; destination host is: "spark-master":7077;
: java.io.EOFException; For more details see:
http://wiki.apache.org/hadoop/EOFException
Also I see, that my master node lost worker nodes in execution time. After kill process Master see workers.

Facing issue while deploying Docker images through AWS-Greengrass Connector Service

BACKGROUND:
We are trying to deploy App as a docker container through AWS-Greengrass Connector Service to the edge device (Running Greengrass core as container in Linux env).
We are configuring the greengrass group connector in cloud for docker app deployment.
ISSUES:
While deploying from AWS greengrass group (AWS cloud), we are able to see successful deployment message, but application is not getting deployed to the edge device (running greengrass core as container).
LOGS:
DockerApplicationDeploymentLog:
[2020-11-05T10:35:42.632Z][FATAL]-lambda_runtime.py:381,Failed to initialize Lambda runtime due to exception: "getgrnam(): name not found: 'docker'"
[2020-11-05T10:35:44.789Z][WARN]-ipc_client.py:162,deprecated arg port=8000 will be ignored
[2020-11-05T10:35:45.012Z][WARN]-ipc_client.py:162,deprecated arg port=8000 will be ignored
[2020-11-05T10:35:45.012Z][INFO]-docker_deployer.py:41,docker deployer starting up
[2020-11-05T10:35:45.012Z][INFO]-docker_deployer.py:45,checking inputs
[2020-11-05T10:35:45.012Z][INFO]-docker_deployer.py:52,docker group permissions
[2020-11-05T10:35:45.02Z][FATAL]-lambda_runtime.py:141,Failed to import handler function "handlers.function_handler" due to exception: "getgrnam(): name not found: 'docker'"
RuntimeSystemLog:
[2020-11-05T10:31:49.78Z][DEBUG]-Restart worker because it was killed. {"workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5", "funcArn": "arn:aws:lambda:ap-south-1:aws:function:DockerApplicationDeployment:6"}
[2020-11-05T10:31:49.78Z][DEBUG]-Reserve worker. {"workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5", "funcArn": "arn:aws:lambda:ap-south-1:aws:function:DockerApplicationDeployment:6"}
[2020-11-05T10:31:49.78Z][DEBUG]-Doing start attempt: {"Attempt count": 0, "workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5", "funcArn": "arn:aws:lambda:ap-south-1:aws:function:DockerApplicationDeployment:6"}
[2020-11-05T10:31:49.78Z][DEBUG]-Creating directory. {"dir": "/greengrass/ggc/packages/1.11.0/var/lambda/8b0ee21d-e481-4d27-5e30-cb4d912547f5"}
[2020-11-05T10:31:49.78Z][DEBUG]-changed ownership {"path": "/greengrass/ggc/packages/1.11.0/var/lambda/8b0ee21d-e481-4d27-5e30-cb4d912547f5", "new uid": 121, "new gid": 121}
[2020-11-05T10:31:49.782Z][DEBUG]-Resolving environment variable {"Variable": "PYTHONPATH=/greengrass/ggc/deployment/lambda/arn.aws.lambda.ap-south-1.aws.function.DockerApplicationDeployment.6"}
[2020-11-05T10:31:49.79Z][DEBUG]-Resolving environment variable {"Variable": "PATH=/usr/bin:/usr/local/bin"}
[2020-11-05T10:31:49.799Z][DEBUG]-Resolving environment variable {"Variable": "DOCKER_DEPLOYER_DOCKER_COMPOSE_DESTINATION_FILE_PATH=/home/ggc_user"}
[2020-11-05T10:31:49.82Z][DEBUG]-Creating new worker. {"functionArn": "arn:aws:lambda:ap-south-1:aws:function:DockerApplicationDeployment:6", "workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5"}
[2020-11-05T10:31:49.82Z][DEBUG]-Starting worker process. {"workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5"}
[2020-11-05T10:31:49.829Z][DEBUG]-Worker process started. {"workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5", "pid": 20471}
[2020-11-05T10:31:49.83Z][DEBUG]-Start work result: {"workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5", "funcArn": "arn:aws:lambda:ap-south-1:aws:function:DockerApplicationDeployment:6", "state": "Starting", "initDurationSeconds": 0.012234454}
[2020-11-05T10:31:49.831Z][INFO]-Created worker. {"functionArn": "arn:aws:lambda:ap-south-1:aws:function:DockerApplicationDeployment:6", "workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5", "pid": 20471}
[2020-11-05T10:31:53.155Z][DEBUG]-Received a credential provider request {"serverLambdaArn": "arn:aws:lambda:::function:GGTES", "clientId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5"}
[2020-11-05T10:31:53.156Z][DEBUG]-WorkManager getting work {"workerId": "148f7a1a-168f-40a5-682d-92e00d56a5df", "funcArn": "arn:aws:lambda:::function:GGTES", "invocationId": "955c2c43-1187-4001-7988-4213b95eb584"}
[2020-11-05T10:31:53.156Z][DEBUG]-Successfully GET work. {"invocationId": "955c2c43-1187-4001-7988-4213b95eb584", "fromWorkerId": "148f7a1a-168f-40a5-682d-92e00d56a5df", "ofFunction": "arn:aws:lambda:::function:GGTES"}
[2020-11-05T10:31:53.156Z][DEBUG]-POST work result. {"invocationId": "955c2c43-1187-4001-7988-4213b95eb584", "ofFunction": "arn:aws:lambda:::function:GGTES"}
[2020-11-05T10:31:53.156Z][DEBUG]-WorkManager putting work result. {"workerId": "148f7a1a-168f-40a5-682d-92e00d56a5df", "invocationId": "955c2c43-1187-4001-7988-4213b95eb584"}
[2020-11-05T10:31:53.156Z][DEBUG]-WorkManager put work result successfully. {"workerId": "148f7a1a-168f-40a5-682d-92e00d56a5df", "invocationId": "955c2c43-1187-4001-7988-4213b95eb584"}
[2020-11-05T10:31:53.156Z][DEBUG]-Successfully POST work result. {"invocationId": "955c2c43-1187-4001-7988-4213b95eb584", "ofFunction": "arn:aws:lambda:::function:GGTES"}
[2020-11-05T10:31:53.157Z][DEBUG]-Handled a credential provider request {"clientId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5"}
[2020-11-05T10:31:53.158Z][DEBUG]-GET work item. {"fromWorkerId": "148f7a1a-168f-40a5-682d-92e00d56a5df", "ofFunction": "arn:aws:lambda:::function:GGTES"}
[2020-11-05T10:31:53.158Z][DEBUG]-Worker timer doesn't exist. {"workerId": "148f7a1a-168f-40a5-682d-92e00d56a5df"}
Did you doublecheck to meet the requirments listed in
https://docs.aws.amazon.com/greengrass/latest/developerguide/docker-app-connector.html
https://docs.aws.amazon.com/greengrass/latest/developerguide/docker-app-connector.html#docker-app-connector-linux-user
I dont know this particular error, but it complains about some missing basic user/group settings:
[2020-11-05T10:35:42.632Z][FATAL]-lambda_runtime.py:381,Failed to initialize Lambda runtime due to exception: "getgrnam(): name not found: 'docker'"

Flink consumer job is unable to connect to Kafka from Docker

I have a simple streaming Flink Scala job which connects to a Kafka topic and maps its
org.apache.avro.generic.GenericRecord messages and map into Json string.
When it is running in IntelliJ it ingests the topic well and printing out the jsons.
When I run it in docker-compose I got the following exception:
com.esotericsoftware.kryo.KryoException: Error constructing instance of class: org.apache.avro.Schema$LockableArrayList
Serialization trace:
types (org.apache.avro.Schema$UnionSchema)
schema (org.apache.avro.Schema$Field)
fieldMap (org.apache.avro.Schema$RecordSchema)
schema (org.apache.avro.generic.GenericData$Record)
at com.twitter.chill.Instantiators$$anon$1.newInstance(KryoBase.scala:136)
at com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1061)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.create(CollectionSerializer.java:89)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:93)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:22)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:143)
at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:21)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:657)
at org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.copy(KryoSerializer.java:262)
at org.apache.flink.api.java.typeutils.runtime.TupleSerializer.copy(TupleSerializer.java:111)
at org.apache.flink.api.java.typeutils.runtime.TupleSerializer.copy(TupleSerializer.java:37)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:635)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:612)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:592)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:727)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:705)
at org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104)
at org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collectWithTimestamp(StreamSourceContexts.java:111)
at org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordWithTimestamp(AbstractFetcher.java:398)
at org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.emitRecord(KafkaFetcher.java:185)
at org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.runFetchLoop(KafkaFetcher.java:150)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:715)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:208)
Caused by: java.lang.IllegalAccessException: Class com.twitter.chill.Instantiators$ can not access a member of class org.apache.avro.Schema$LockableArrayList with modifiers "public"
at sun.reflect.Reflection.ensureMemberAccess(Reflection.java:102)
at java.lang.reflect.AccessibleObject.slowCheckMemberAccess(AccessibleObject.java:296)
at java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:288)
at java.lang.reflect.Constructor.newInstance(Constructor.java:413)
at com.twitter.chill.Instantiators$.$anonfun$normalJava$1(KryoBase.scala:170)
at com.twitter.chill.Instantiators$$anon$1.newInstance(KryoBase.scala:133)
... 37 more
I tried forcing Avro serialization with:
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
env.getConfig.disableForceKryo()
env.getConfig.enableForceAvro()
but got the same error.
Based on [this][1] I'm using all Flink related dependencies as "provided" with no good result.
What can be the difference between running the job in the IDE and in
Docker?
How can I fix the job to be able to read the Kafka topic from
Docker?
How shall I setup Docker for this?
What can I handle Kryo/Avro serialization issue?
SS
[1]: http://www.alternatestack.com/development/com-esotericsoftware-kryo-kryoexception-unusual-solution-upgrading-flink/

How to run presto queries in python using pyhive?

I am trying to run presto query in python using pyhive library but max retries error is coming. I am running it in jupyter notebook locally(laptop). I think its not able to connect to presto node. I am using Azure hdinsight cluster and installed presto application on head node(using starburst distribution). I have used cluster user name and password and also i have tried head node ssh user and password but nothing is working. Below is my code:
from pyhive import presto
conn= presto.connect(
host='clustername-ssh.azurehdinsight.net',
port=8085,
username='sshuser'
password='sshpassword',
protocol='https'
).cursor()
conn.execute('SELECT * FROM hive.default.parquettest limit 1')
The error i am getting is:
ConnectionError:
HTTPConnectionPool(host='sm-hdinsight01-ssh.azurehdinsight.net',
port=8085): Max retries exceeded with url: /v1/statement (Caused by
NewConnectionError(': Failed to establish a new connection: [Errno 110]
Connection timed out',))
But when i am running it in terminal of head node it works:
from pyhive import presto
conn= presto.connect(
host='localhost',
port=8085).cursor()
conn.execute('SELECT * FROM hive.default.parquettest limit 1')
I think i am missing some crucial thing here. please help.
sounds like an permission/authentification problem. i am currently using a jupyter notebook on my local machine t query the company presto cluster like this using the prestodb library.
so basically:
import prestodb
conn=prestodb.dbapi.connect(
host='presto.bar.foo.com',
port=80, user='foo',
password='bar'
catalog='hive',
schema='default',
)
cur = conn.cursor()
cur.execute(
'SELECT * FROM "schema"."db" limit 10')
records = cur.fetchall()
print(records[0])

Clustered Vert.x not working in docker if vert.x instance is created manually

I setup a small project for learning the capabilities of Vert.x in a cluster environment but I'm facing some weird issues when I try to create the vertx instance inside a Docker image.
The project consists just in 2 verticles being deployed in different Docker containers and using the event bus to communicate with each other
If I use the vertx provided launcher:
Launcher.executeCommand("run", verticleClass, "--cluster")
(or just stating that the main class is io.vertx.core.Launcher and putting the right arguments)
Everything works both locally and inside docker images. But if I try to create the vertx instance manually with
Vertx.rxClusteredVertx(VertxOptions())
.flatMap { it.rxDeployVerticle(verticleClass) }
.subscribe()
Then it's not working in Docker (it works locally). Or, more visually
| | Local | Docker |
|:---------------: |:-----: |:------: |
| Vertx launcher | Y | Y |
| Custom launcher | Y | N |
By checking the Docker logs it seems that everything works. I can see that both verticles know each other:
Members [2] {
Member [172.18.0.2]:5701 - c5e9636d-b3cd-4e24-a8ce-e881218bf3ce
Member [172.18.0.3]:5701 - a09ce83d-e0b3-48eb-aad7-fbd818c389bc this
}
But when I try to send a message through the event bus the following exception is thrown:
WARNING: Connecting to server localhost:33845 failed
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:33845
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:325)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:886)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused
... 11 more
Just for simplifying stuff I uploaded the project to Github. I tried to make it as simple as possible so it has 2 verticles & 2 main classes and lots of scripts for every combination
By checking the Docker logs it seems that everything works. I can see
that both verticles know each other
Yes, because your cluster manager works fine, but you should also make your event bus configuration be consistent in every node (machine/docker container) in your cluster because as mentioned in Vert.x documentation: cluster managers do not handle the event bus inter-node transport, this is done directly by Vert.x with TCP connections.
You have to set the cluster host on each node to be the IP address of this node itself and any arbitrary port number (unless you try to run more than Vert.x instance on the same node you have to choose a different port number for each Vert.x instance). The error you are facing is because the default cluster host is local host
For example if a node's IP address is 192.168.1.12 then you would do the following:
VertxOptions options = new VertxOptions()
.setClustered(true)
.setClusterHost("192.168.1.12") // node ip
.setClusterPort(17001) // any arbitrary port but make sure no other Vert.x instances using same port on the same node
.setClusterManager(clusterManager);
on another node whose IP address is 192.168.1.56 then you would do the following:
VertxOptions options = new VertxOptions()
.setClustered(true)
.setClusterHost("192.168.1.56") // other node ip
.setClusterPort(17001) // it is ok because this is a different node
.setClusterManager(clusterManager);
By researching a bit more into the Vert.X Launcher code I found out that internally it's doing more things than just "parsing" the inputs
In case vertx is configured to run in cluster mode the Launcher is setting the cluster address itself (via BareCommand class) so in order to replicate the Launcher behaviour with your own Main class (and having the flexibility to configure your vertx instance via code instead of via args) the code will be as follow:
fun main(args: Array<String>) {
Vertx.rxClusteredVertx(
VertxOptions(
clusterHost = getDefaultAddress()
)
)
.flatMap { it.rxDeployVerticle(verticleClass) }
.subscribe()
}
// As taken from io.vertx.core.impl.launcher.commands.BareCommand
fun getDefaultAddress(): String? {
val nets: Enumeration<NetworkInterface>
try {
nets = NetworkInterface.getNetworkInterfaces()
} catch (e: SocketException) {
return null
}
var netinf: NetworkInterface
while (nets.hasMoreElements()) {
netinf = nets.nextElement()
val addresses = netinf.inetAddresses
while (addresses.hasMoreElements()) {
val address = addresses.nextElement()
if (!address.isAnyLocalAddress && !address.isMulticastAddress
&& address !is Inet6Address) {
return address.hostAddress
}
}
}
return null
}

Resources