IoT Edge with certificate DPS provisioning fails on cert permissions error - azure-iot-edge

I have set up Azure IoT Edge on Ubuntu 20.04 and configured it to provision via DPS with self-signed certificates. If I store the id cert and key in /home/[user]/certs it works fine. If I move the cert files to /var/certs and update /etc/aziot/config.toml accordingly, I get the following errors from sudo iotedge system logs:
systemd[1]: Started Azure IoT Identity Service.
aziot-identityd[5699]: 2021-12-20T03:46:33Z [INFO] - Starting service...
aziot-identityd[5699]: 2021-12-20T03:46:33Z [INFO] - Version - 1.2.4
aziot-identityd[5699]: 2021-12-20T03:46:33Z [INFO] - Provisioning starting. Reason: Startup
aziot-certd[2113]: 2021-12-20T03:46:33Z [INFO] - <-- GET /certificates/device-id?api-version=2020-09-01 {"host": "certd.sock"}
aziot-certd[2113]: 2021-12-20T03:46:33Z [ERR!] - !!! internal error
aziot-certd[2113]: 2021-12-20T03:46:33Z [ERR!] - !!! caused by: could not read cert file
aziot-certd[2113]: 2021-12-20T03:46:33Z [ERR!] - !!! caused by: Permission denied (os error 13)
aziot-certd[2113]: 2021-12-20T03:46:33Z [INFO] - --> 500 {"content-type": "application/json"}
aziot-keyd[2121]: 2021-12-20T03:46:33Z [INFO] - <-- POST /keypair?api-version=2020-09-01 {"content-type": "application/json", "host": "keyd.sock", "content-length": "56"}
aziot-keyd[2121]: 2021-12-20T03:46:33Z [ERR!] - Permission denied (os error 13)
aziot-keyd[2121]: 2021-12-20T03:46:33Z [ERR!] - !!! internal error
aziot-keyd[2121]: 2021-12-20T03:46:33Z [ERR!] - !!! caused by: could not create key pair
aziot-keyd[2121]: 2021-12-20T03:46:33Z [ERR!] - !!! caused by: could not create key pair: AZIOT_KEYS_RC_ERR_EXTERNAL
aziot-keyd[2121]: 2021-12-20T03:46:33Z [INFO] - --> 500 {"content-type": "application/json"}
aziot-identityd[5699]: 2021-12-20T03:46:33Z [ERR!] - Failed to provision with IoT Hub, and no valid device backup was found: internal error
aziot-identityd[5699]: 2021-12-20T03:46:33Z [ERR!] - service encountered an error
aziot-identityd[5699]: 2021-12-20T03:46:33Z [ERR!] - caused by: internal error
aziot-identityd[5699]: 2021-12-20T03:46:33Z [ERR!] - caused by: could not create certificate
aziot-identityd[5699]: 2021-12-20T03:46:33Z [ERR!] - caused by: internal error
aziot-identityd[5699]: 2021-12-20T03:46:33Z [ERR!] - 0: <unknown>
aziot-identityd[5699]: 1: <unknown>
systemd[1]: aziot-identityd.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: aziot-identityd.service: Failed with result 'exit-code'.
There is obviously a permissions error with the new certificate location, but I can't find proper documentation on what users/groups/services need which permissions. Some docs say the user iotedge needs read access, but even making iotedge the owner does not fix the problem. There also appear to be groups aziotid, aziotks, aziotcs which may or may not need access, but assigning ownership or read permissions to those groups doesn't fix the problem. If anyone can help with their experience or point me to valid up-to-date docs I'd appreciate it.

Related

Orderer Container exited while turning on fabric network

2022-06-11 05:12:53.108 UTC [orderer.common.server] initializeServerConfig -> INFO 004 Starting orderer with TLS enabled
2022-06-11 05:12:53.120 UTC [blkstorage] NewProvider -> INFO 005 Creating new file ledger directory at /var/hyperledger/production/orderer/chains
2022-06-11 05:12:53.128 UTC [orderer.common.server] Main -> PANI 006 Failed validating bootstrap block: initializing channelconfig failed: could not create channel Orderer sub-group config: setting up the MSP manager failed: the supplied identity is not valid: x509: certificate signed by unknown authority (possibly because of "x509: ECDSA verification failure" while trying to verify candidate authority certificate "ca.example.com")
panic: Failed validating bootstrap block: initializing channelconfig failed: could not create channel Orderer sub-group config: setting up the MSP manager failed: the supplied identity is not valid: x509: certificate signed by unknown authority (possibly because of "x509: ECDSA verification failure" while trying to verify candidate authority certificate "ca.example.com")

IoTEdge on K8S, Could not initialize module runtime

I'm running iotedge on kubernetes.
The K8S cluster is a local cluster setup largely using the "Kubernetes the hard way" method, with some modifications.
I did manage to get things working on one installation. However, I'm now getting this on another installation. The initial installation works fine, but after shutting down a machine to simulate a hardware failure, the pod gets recreated, but starts to show this error again. This error happens EVEN if the node shutdown is NOT the one iotedged is running on.
Environment
3 Nodes running Ubuntu 20.04 LTS
Two networks on each node, one for the internet, one for an internal network. K8S is setup using the internal, static IP address
HAProxy/Keepalived for HA without a load balancer, running on a Virtual IP address
Multus CNI for attaching pods to additional networks
CoreDNS
Troubleshooting
Confirmed that CoreDNS seems to be functioning fine, and is able to resolve internal and external addresses
Remaining nodes are able to ping pods on other nodes
Deleting the iotedged pod and allowing k8s to recreate it works, but then edgeAgent an edgeHub have errors until I delete/recreate them as well
Re-run the entire k8s installation. Initial installation works fine, but simulating machine failure continues to be problematic.
Kubernetes Versions:
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:31:21Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:25:06Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}
edgeiotd error:
<6>2021-07-09T22:00:02Z [INFO] - Starting Azure IoT Edge Security Daemon - Kubernetes mode
<6>2021-07-09T22:00:02Z [INFO] - Version - 1.1.3
<6>2021-07-09T22:00:02Z [INFO] - Using config file: /etc/iotedged/config.yaml
<6>2021-07-09T22:00:02Z [INFO] - Configuring /var/lib/iotedge as the home directory.
<6>2021-07-09T22:00:02Z [INFO] - Configuring certificates...
<6>2021-07-09T22:00:02Z [INFO] - Transparent gateway certificates not found, operating in quick start mode...
<6>2021-07-09T22:00:02Z [INFO] - Finished configuring provisioning environment variables and certificates.
<6>2021-07-09T22:00:02Z [INFO] - Initializing hsm...
<6>2021-07-09T22:00:02Z [INFO] - Finished initializing hsm.
<6>2021-07-09T22:00:02Z [INFO] - Provisioning edge device...
<6>2021-07-09T22:00:02Z [INFO] - Starting provisioning edge device via manual mode using a device connection string...
<6>2021-07-09T22:00:02Z [INFO] - Manually provisioning device "********" in hub "********.azure-devices.net"
<6>2021-07-09T22:00:02Z [INFO] - Finished provisioning edge device.
<6>2021-07-09T22:00:02Z [INFO] - Initializing the module runtime...
<6>2021-07-09T22:00:02Z [INFO] - Attempting to use config from /home/edgeletuser/.kube/config file.
<6>2021-07-09T22:00:02Z [INFO] - Using in-cluster config
<3>2021-07-09T22:00:34Z [ERR!] - The daemon could not start up successfully: Could not initialize module runtime
<3>2021-07-09T22:00:34Z [ERR!] - caused by: Could not initialize kubernetes module runtime
<3>2021-07-09T22:00:34Z [ERR!] - caused by: HTTP response error: SelfSubjectAccessReviewCreate
<3>2021-07-09T22:00:34Z [ERR!] - caused by: Hyper HTTP error
<3>2021-07-09T22:00:34Z [ERR!] - caused by: error trying to connect: Connection timed out (os error 110)
<6>2021-07-09T22:00:02Z [INFO] (/project/hsm-sys/azure-iot-hsm-c/src/hsm_log.c:log_init:41) Initialized logging
edgeHub Logs after recreating iotedged:
2021-08-18 19:05:40 Starting Edge Hub
2021-08-18 19:05:40.481 +00:00 Edge Hub Main()
<7> 2021-08-18 19:05:40.609 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClient] - Making a Http call to http://localhost:35001/ to CreateServerCertificateAsync
<7> 2021-08-18 19:05:40.912 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClient] - Error when getting an Http response from http://localhost:35001/ for CreateServerCertificateAsync
HTTP Response:
{"message":"Module not found"}
Microsoft.Azure.Devices.Edge.Util.Edged.Version_2019_01_30.GeneratedCode.IoTEdgedException`1[Microsoft.Azure.Devices.Edge.Util.Edged.Version_2019_01_30.GeneratedCode.ErrorResponse]: Not Found
at Microsoft.Azure.Devices.Edge.Util.Edged.Version_2019_01_30.GeneratedCode.HttpWorkloadClient.CreateServerCertificateAsync(String api_version, String name, String genid, ServerCertificateRequest request, CancellationToken cancellationToken) in /home/vsts/work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/version_2019_01_30/generatedCode/HttpWorkloadClient.cs:line 624
at Microsoft.Azure.Devices.Edge.Util.TaskEx.TimeoutAfter[T](Task`1 task, TimeSpan timeout) in /home/vsts/work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/TaskEx.cs:line 126
at Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClientVersioned.Execute[T](Func`1 func, String operation) in /home/vsts/work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/WorkloadClientVersioned.cs:line 59
Unhandled exception. System.AggregateException: One or more errors occurred. (Error calling CreateServerCertificateAsync: Module not found)
---> Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadCommunicationException- Message:Error calling CreateServerCertificateAsync: Module not found, StatusCode:404, at: at Microsoft.Azure.Devices.Edge.Util.Edged.Version_2019_01_30.WorkloadClient.HandleException(Exception ex, String operation) in /home/vsts/work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/version_2019_01_30/WorkloadClient.cs:line 109
at Microsoft.Azure.Devices.Edge.Util.Edged.WorkloadClientVersioned.Execute[T](Func`1 func, String operation) in /home/vsts/work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/WorkloadClientVersioned.cs:line 77
at Microsoft.Azure.Devices.Edge.Util.Edged.Version_2019_01_30.WorkloadClient.CreateServerCertificateAsync(String hostname, DateTime expiration) in /home/vsts/work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/edged/version_2019_01_30/WorkloadClient.cs:line 35
at Microsoft.Azure.Devices.Edge.Util.CertificateHelper.GetServerCertificatesFromEdgelet(Uri workloadUri, String workloadApiVersion, String workloadClientApiVersion, String moduleId, String moduleGenerationId, String edgeHubHostname, DateTime expiration) in /home/vsts/work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/CertificateHelper.cs:line 260
at Microsoft.Azure.Devices.Edge.Hub.Service.EdgeHubCertificates.LoadAsync(IConfigurationRoot configuration, ILogger logger) in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/EdgeHubCertificates.cs:line 54
at Microsoft.Azure.Devices.Edge.Hub.Service.Program.MainAsync(IConfigurationRoot configuration) in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/Program.cs:line 54
--- End of inner exception stack trace ---
at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
at System.Threading.Tasks.Task`1.get_Result()
at Microsoft.Azure.Devices.Edge.Hub.Service.Program.Main() in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/Program.cs:line 33
Are you still blocked? what troubleshooting steps you have tried so far? Did you check the Common issues and resolutions for Azure IoT Edge? As per the error messages Transparent gateway certificates not found, operating in quick start mode and The daemon could not start up successfully: Could not initialize module runtime looks like the setup is not configured properly. Try restarting the server and check the transparent gateway setup. Please refer the transparent gateway setup and check if you have missed anything.

Timeout exception Flink

I have a question regarding Flink. I am running an application in a local cluster, with 1 TaskManager and 4 Taskslots.
After some time of running the application, I got an Timeout error:
java.util.concurrent.TimeoutException: Heartbeat of TaskManager with id feea6a6702a0cf960ae2847b5bd25665 timed out.
I have seen some posts with this topic but any answer to it. Could you help me to see the root cause, or a posible troubleshooting?
I am using flink version 1.5.3
It seems that the docker container of taskmanagers and JobManager are stopped when this happens.
Let me add the error trace from the JobManager container logs:
2019-06-09 13:31:06,300 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Socket Window NgsiEvent (ef3a860de48d54544d973754c6170d8b) switched from state FAILING to FAILED.
java.util.concurrent.TimeoutException: Heartbeat of TaskManager with id 63dbab620797b84da023b33578478238 timed out.
at org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(JobMaster.java:1609)
at org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl$HeartbeatMonitor.run(HeartbeatManagerImpl.java:339)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.flink.runtime.concurrent.akka.ActorSystemScheduledExecutorAdapter$ScheduledFutureTask.run(ActorSystemScheduledExecutorAdapter.java:154)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2019-06-09 13:31:06,308 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Could not restart the job Socket Window NgsiEvent (ef3a860de48d54544d973754c6170d8b) because the restart strategy prevented it.
java.util.concurrent.TimeoutException: Heartbeat of TaskManager with id 63dbab620797b84da023b33578478238 timed out.
at org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(JobMaster.java:1609)
at org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl$HeartbeatMonitor.run(HeartbeatManagerImpl.java:339)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.flink.runtime.concurrent.akka.ActorSystemScheduledExecutorAdapter$ScheduledFutureTask.run(ActorSystemScheduledExecutorAdapter.java:154)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2019-06-09 13:31:06,317 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Stopping checkpoint coordinator for job ef3a860de48d54544d973754c6170d8b.
2019-06-09 13:31:06,322 INFO org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore - Shutting down
2019-06-09 13:31:06,331 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink#16363182f31f:36715] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink#16363182f31f:36715]] Caused by: [16363182f31f]
2019-06-09 13:31:06,351 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Job ef3a860de48d54544d973754c6170d8b reached globally terminal state FAILED.
2019-06-09 13:31:06,434 INFO org.apache.flink.runtime.jobmaster.JobMaster - Stopping the JobMaster for job Socket Window NgsiEvent(ef3a860de48d54544d973754c6170d8b).
2019-06-09 13:31:06,447 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Suspending SlotPool.
2019-06-09 13:31:06,448 INFO org.apache.flink.runtime.jobmaster.JobMaster - Close ResourceManager connection 883e842633b0fd9a2e53ab45778581fe: JobManager is shutting down..
2019-06-09 13:31:06,449 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcActor - The rpc endpoint org.apache.flink.runtime.jobmaster.slotpool.SlotPool has not been started yet. Discarding message org.apache.flink.runtime.rpc.messages.LocalRpcInvocation until processing is started.
2019-06-09 13:31:06,457 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Disconnect job manager 00000000000000000000000000000000#akka.tcp://flink#jobmanager:6123/user/jobmanager_2 for job ef3a860de48d54544d973754c6170d8b from the resource manager.
2019-06-09 13:31:06,459 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPool - Stopping SlotPool.
2019-06-09 13:31:06,460 INFO org.apache.flink.runtime.jobmaster.JobManagerRunner - JobManagerRunner already shutdown.
2019-06-09 13:31:16,304 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink#16363182f31f:36715] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink#16363182f31f:36715]] Caused by: [16363182f31f: Name or service not known]
2019-06-09 13:31:26,320 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink#16363182f31f:36715] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink#16363182f31f:36715]] Caused by: [16363182f31f: Name or service not known]
2019-06-09 13:31:36,286 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink#16363182f31f:36715] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink#16363182f31f:36715]] Caused by: [16363182f31f]
Thanks in advance!

Can not enable Alwayson sql in DSE

I get this error when start Alwayson sql, tried many ways but the results still same. any ideas why?
Im using 1 cluster, 1 analytics+search center, 2 ubuntu 16.04 nodes.
INFO [ALWAYSON-SQL] 2019-02-14 11:36:01,348 ALWAYSON-SQL AlwaysOnSqlRunner.scala:304 - Shutting down AlwaysOn SQL.
INFO [ALWAYSON-SQL] 2019-02-14 11:36:01,617 ALWAYSON-SQL AlwaysOnSqlRunner.scala:328 - Set status to stopped
INFO [ALWAYSON-SQL] 2019-02-14 11:36:01,620 ALWAYSON-SQL AlwaysOnSqlRunner.scala:382 - Reserve port for AlwaysOn SQL
INFO [ALWAYSON-SQL] 2019-02-14 11:36:04,621 ALWAYSON-SQL AlwaysOnSqlRunner.scala:375 - Release reserved port
INFO [ALWAYSON-SQL] 2019-02-14 11:36:04,622 ALWAYSON-SQL AlwaysOnSqlRunner.scala:805 - Set InCluster token to DseFs client
INFO [ForkJoinPool-1-worker-1] 2019-02-14 11:36:04,650 AlwaysOnSqlRunner.scala:740 - dsefs server heartbeat response: pong
INFO [ForkJoinPool-1-worker-3] 2019-02-14 11:36:04,757 AlwaysOnSqlRunner.scala:704 - Create DseFs directory /var/log/spark/alwayson_sql
INFO [ForkJoinPool-1-worker-3] 2019-02-14 11:36:04,758 AlwaysOnSqlRunner.scala:805 - Set InCluster token to DseFs client
ERROR [ForkJoinPool-1-worker-3] 2019-02-14 11:36:04,788 AlwaysOnSqlRunner.scala:722 - Failed to check dsefs directory alwayson_sql
com.datastax.bdp.fs.model.AccessDeniedException: Insufficient permissions to path /
at com.datastax.bdp.fs.model.DseFsJsonProtocol$ThrowableReader$.read(DseFsJsonProtocol.scala:258)
at com.datastax.bdp.fs.model.DseFsJsonProtocol$ThrowableReader$.read(DseFsJsonProtocol.scala:232)
at spray.json.JsValue.convertTo(JsValue.scala:31)
at com.datastax.bdp.fs.rest.RestResponse$stateMachine$macro$331$1.apply(RestResponse.scala:48)
at com.datastax.bdp.fs.rest.RestResponse$stateMachine$macro$331$1.apply(RestResponse.scala:44)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:465)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
at java.lang.Thread.run(Thread.java:748)
INFO [ALWAYSON-SQL] 2019-02-14 11:36:04,788 ALWAYSON-SQL AlwaysOnSqlRunner.scala:247 - ALWAYSON-SQL caused an exception in state RUNNING : com.datastax.bdp.fs.model.AccessDeniedException: Insufficient permissions to path /
com.datastax.bdp.fs.model.AccessDeniedException: Insufficient permissions to path /
at com.datastax.bdp.fs.model.DseFsJsonProtocol$ThrowableReader$.read(DseFsJsonProtocol.scala:258)
at com.datastax.bdp.fs.model.DseFsJsonProtocol$ThrowableReader$.read(DseFsJsonProtocol.scala:232)
at spray.json.JsValue.convertTo(JsValue.scala:31)
at com.datastax.bdp.fs.rest.RestResponse$stateMachine$macro$331$1.apply(RestResponse.scala:48)
at com.datastax.bdp.fs.rest.RestResponse$stateMachine$macro$331$1.apply(RestResponse.scala:44)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:465)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
at java.lang.Thread.run(Thread.java:748)
I have seen this problem too! It was a permissions problem in dsefs! To fix, login with the root Cassandra user, and change permissions of the your alwayson log directory to the alwayson user.

Helm's Tiller container gets x509: certificate signed by unknown authority

I'm running Kubernetes on an AWS (version 1.5.2). I have installed helm using
helm init --node-selectors="nodeType=master"
forcing it running on the master.
When I try to run helm list i get the following error Error: Get https://192.0.0.1:443/api/v1/namespaces/kube-system/configmaps?labelSelector=OWNER%3DTILLER: x509: certificate signed by unknown authority
The logs from the tiller container (seems the issue is from the tiller to Kubernetes-api):
E0219 08:15:12.546100 1 config.go:330] Expected to load root CA config from /var/run/secrets/kubernetes.io/serviceaccount/ca.crt, but got err: open /var/run/secrets/kubernetes.io/serviceaccount/ca.crt: no such file or directory
E0219 08:15:12.547957 1 config.go:330] Expected to load root CA config from /var/run/secrets/kubernetes.io/serviceaccount/ca.crt, but got err: open /var/run/secrets/kubernetes.io/serviceaccount/ca.crt: no such file or directory
[main] 2018/02/19 08:15:12 Starting Tiller v2.7.0 (tls=false)
[main] 2018/02/19 08:15:12 GRPC listening on :44134
[main] 2018/02/19 08:15:12 Probes listening on :44135
[main] 2018/02/19 08:15:12 Storage driver is ConfigMap
[main] 2018/02/19 08:15:12 Max history per release is 0
[storage] 2018/02/19 08:20:47 listing all releases with filter
[storage/driver] 2018/02/19 08:20:47 list: failed to list: Get https://192.0.0.1:443/api/v1/namespaces/kube-system/configmaps?labelSelector=OWNER%3DTILLER: x509: certificate signed by unknown authority
Is there a way to configure tiller to ignore the untrusted certificate?
It looks like your Kubernetes cluster isn't properly configured. Usually there is a CA certificate for every pod in /var/run/secrets/kubernetes.io/serviceaccount/ca.crt that allows pods to communicate with the API server.
The first two lines in your log show that no such file could be found:
Expected to load root CA config from /var/run/secrets/kubernetes.io/serviceaccount/ca.crt, but got err: open /var/run/secrets/kubernetes.io/serviceaccount/ca.crt: no such file or directory.

Resources