Do the IoT Edge Built-In Metrics work in the iotedgehubdev simulator? - azure-iot-edge

I'm trying to determine if the Built-In Metrics for Azure IoT Edge (described here) can be accessed when running locally with the edgeHubDev simulator for v1.0.9. I've created a deployment.template.json with a edgeHub configuration like so
"edgeHub": {
"type": "docker",
"status": "running",
"restartPolicy": "always",
"settings": {
"image": "mcr.microsoft.com/azureiotedge-hub:1.0.9",
"createOptions": {
"User": "ContainerAdministrator",
"ExposedPorts": {
"9600/tcp": {}
},
"HostConfig": {
"PortBindings": {
"5671/tcp": [
{
"HostPort": "5671"
}
],
"8883/tcp": [
{
"HostPort": "8883"
}
],
"443/tcp": [
{
"HostPort": "443"
}
]
}
}
}
},
"env": {
"ExperimentalFeatures__Enabled": {
"value": "true"
},
"ExperimentalFeatures__EnableMetrics": {
"value": "true"
}
}
}
At present I get a
System.Net.Http.HttpRequestException: No connection could be made because the target machine actively refused it.
---> System.Net.Sockets.SocketException (10061): No connection could be made because the target machine actively refused it.
whether I try to access them at http://edgeHub:9600/metrics on the IoT Edge module network or http://localhost:9600/metrics from my pc
Anyone know if this is possible or better yet able to tell me what I might be doing wrong?
output of docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4753b9bb2416 mcr.microsoft.com/azureiotedge-simulated-temperature-sensor:1.0 "/bin/sh -c 'echo \"$…" 2 minutes ago Up 2 minutes SimulatedTemperatureSensor
d264bde9bfb1 mcr.microsoft.com/azureiotedge-hub:1.0 "/bin/sh -c 'echo \"$…" 2 minutes ago Up 2 minutes 0.0.0.0:443->443/tcp, 0.0.0.0:5671->5671/tcp, 0.0.0.0:8883->8883/tcp, 0.0.0.0:9600->9600/tcp edgeHubDev

Related

"Cloud Run error: Container failed to start. Failed to start and then listen on the port defined by the PORT environment variable."

I'm trying to build a container image that I will later use to update the code inside of a virtual machine. The docker image works fine as I can build and run it inside of my terminal. However, I keep getting an error when I try to deploy it to cloud run: "Cloud Run error: Container failed to start. Failed to start and then listen on the port defined by the PORT environment variable." How can I fix this error?
The build log contains this:
Deploying container to Cloud Run service [SERVICE] in project [PROJECT_ID] region [REGION]
Deploying...
Creating Revision.......................................................................................................................................................................failed
Deployment failed
ERROR: (gcloud.run.deploy) Cloud Run error: Container failed to start. Failed to start and then listen on the port defined by the PORT environment variable. Logs for this revision might contain more information.
The revision log contains this:
{
"protoPayload": {
"#type": "type.googleapis.com/google.cloud.audit.AuditLog",
"status": {
"code": 9,
"message": "Ready condition status changed to False for Revision {REVISION_NAME} with message: Cloud Run error: Container failed to start. Failed to start and then listen on the port defined by the PORT environment variable. Logs for this revision might contain more information.\n\nLogs URL:{URL_LINK}"
},
"serviceName": "run.googleapis.com",
"resourceName": "{REVISION_NAME}",
"response": {
"metadata": {
"name": "{REVISION_NAME}",
"namespace": "{NAMESPACE}",
"selfLink": "{SELFLINK}",
"uid": "{UID}",
"resourceVersion": "{RESOURCEVER}",
"generation": 1,
"creationTimestamp": "{TIMESTAMP}",
"labels": {
"serving.knative.dev/route": "{SERVICE}",
"serving.knative.dev/configuration": "{SERVICE}",
"serving.knative.dev/configurationGeneration": "15",
"serving.knative.dev/service": "{SERVICE}",
"serving.knative.dev/serviceUid": "{SERVICE_UID}",
"cloud.googleapis.com/location": "{REGION}"
},
"annotations": {
"run.googleapis.com/client-name": "gcloud",
"serving.knative.dev/creator": "{NAMESPACE}#cloudbuild.gserviceaccount.com",
"client.knative.dev/user-image": "gcr.io/{PROJECT_ID}/{IMAGE}",
"run.googleapis.com/client-version": "357.0.0",
"autoscaling.knative.dev/maxScale": "100"
},
"ownerReferences": [
{
"kind": "Configuration",
"name": "{SERVICE}",
"uid": "{UID}",
"apiVersion": "serving.knative.dev/v1",
"controller": true,
"blockOwnerDeletion": true
}
]
},
"apiVersion": "serving.knative.dev/v1",
"kind": "Revision",
"spec": {
"containerConcurrency": 80,
"timeoutSeconds": 300,
"serviceAccountName": "{NAMESPACE}-compute#developer.gserviceaccount.com",
"containers": [
{
"image": "gcr.io/{PROJECT_ID}/{IMAGE}",
"ports": [
{
"name": "h2c",
"containerPort": 8080
}
],
"resources": {
"limits": {
"cpu": "1000m",
"memory": "512Mi"
}
}
}
]
},
"status": {
"observedGeneration": 1,
"conditions": [
{
"type": "Ready",
"status": "False",
"reason": "HealthCheckContainerError",
"message": "Cloud Run error: Container failed to start. Failed to start and then listen on the port defined by the PORT environment variable. Logs for this revision might contain more information.\n\nLogs URL:{LOG_LINK}",
"lastTransitionTime": "{TIME}"
},
{
"type": "Active",
"status": "Unknown",
"reason": "Reserve",
"lastTransitionTime": "{TIME}",
"severity": "Info"
},
{
"type": "ContainerHealthy",
"status": "False",
"reason": "HealthCheckContainerError",
"message": "Cloud Run error: Container failed to start. Failed to start and then listen on the port defined by the PORT environment variable. Logs for this revision might contain more information.\n\nLogs URL:{LOG_LINK}",
"lastTransitionTime": "{TIME}"
},
{
"type": "ResourcesAvailable",
"status": "True",
"lastTransitionTime": "{TIME}"
},
{
"type": "Retry",
"status": "True",
"reason": "ImmediateRetry",
"message": "System will retry after 0:00:00 from lastTransitionTime for attempt 0.",
"lastTransitionTime": "{TIME}",
"severity": "Info"
}
],
"logUrl": "{LOG_LINK}",
"imageDigest": "gcr.io/{PROJECT_ID}/{IMAGE_SHA}"
},
"#type": "type.googleapis.com/google.cloud.run.v1.Revision"
}
},
"insertId": "{ID}",
"resource": {
"type": "cloud_run_revision",
"labels": {
"location": "{REGION}",
"configuration_name": "{SERVICE}",
"service_name": "{SERVICE}",
"project_id": "{PROJECT_ID}",
"revision_name": "{REVISION_NAME}"
}
},
"timestamp": "{TIME}",
"severity": "ERROR",
"logName": "projects/{PROJECT_ID}/logs/cloudaudit.googleapis.com%2Fsystem_event",
"receiveTimestamp": "{TIME}"
}
This is my cloudbuild.yaml:
steps:
- name: 'gcr.io/cloud-builders/docker'
args: ['build', '-t', 'gcr.io/PROJECT_ID/IMAGE', '.']
- name: 'gcr.io/cloud-builders/docker'
args: ['push', 'gcr.io/PROJECT_ID/IMAGE']
- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
entrypoint: gcloud
args: ['run', 'deploy', 'SERVICE-NAME', '--image', 'gcr.io/PROJECT_ID/IMAGE', '--region', 'REGION', '--port', '8080']
images:
- gcr.io/PROJECT_ID/IMAGE
This is my Dockerfile:
FROM python:3.9.7-slim-buster
WORKDIR /app
COPY . .
CMD [ "python3", "hello.py" ]
This is the code in hello.py:
print("Hello World")
When Cloud Run starts your container, a health check is sent to the container. Your container is not responding to the health check. Therefore, Cloud Run determines that your service is failing.
Cloud Run requires that a container provide service/process/program that listens for and responds to HTTP requests.
Your hello.py file only prints a message to stdout. Your program does not start a process to listen for requests.
A very simple example that converts your example into a working program:
import os
from flask import Flask
app = Flask(__name__)
#app.route('/')
def home():
return "Hello world"
if __name__ == '__main__':
app.run(debug=True, host='0.0.0.0', port=int(os.environ.get('PORT', 8080)))
Note: You will need to add a file requirements.txt to your build to include Flask. Create requirements.txt in the same location as Dockerfile.
requirements.txt:
Flask==2.0.1

Unable to spin dockerized cassandra cluster on Mesosphere DC/OS

Can anyone have some idea to create a Cassandra cluster on Mesosphere DC/OS using Docker?
The issue is that Cassandra containers keep getting started after every few seconds.
It seems that Marathon is failing to get the health status of the newly created containers because it keeps creating new ones continuously. In DC/OS GUI service debug, it shows
State: TASK_FAILED
Message: Container terminated with signal Broken pipe
While checking on the machine, the containers are up and running and also new containers are getting creating repeatedly in every minute or two.
Why marathon doesn't get the correct response from the container that it has started successfully so that it can stop creating a new one?
I am sharing my current JSON configuration for the service.
Cassandra.json
{
"id": "/cassandra",
"acceptedResourceRoles": [
"*"
],
"backoffFactor": 1.15,
"backoffSeconds": 1,
"container": {
"portMappings": [
{
"containerPort": 8000,
"hostPort": 0,
"protocol": "tcp",
"servicePort": 10003,
"name": "main"
}
],
"type": "DOCKER",
"volumes": [],
"docker": {
"image": "cassandra:3.9",
"forcePullImage": false,
"privileged": false,
"parameters": []
}
},
"cpus": 3,
"disk": 10000,
"instances": 1,
"maxLaunchDelaySeconds": 300,
"mem": 6000,
"gpus": 0,
"networks": [
{
"mode": "container/bridge"
}
],
"requirePorts": false,
"upgradeStrategy": {
"maximumOverCapacity": 1,
"minimumHealthCapacity": 1
},
"killSelection": "YOUNGEST_FIRST",
"unreachableStrategy": {
"inactiveAfterSeconds": 0,
"expungeAfterSeconds": 0
},
"fetch": [],
"constraints": []
}
DC/OS open source version 1.13
Marathon Version 1.8.194
Please help if anyone have some idea what's going on? I can share further details if needed.

Orphaned Tasks in Docker Swarm after removal of failed node

Last week I had to remove a failed node from my Docker Swarm Cluster, leaving some tasks that ran on that node in desired state "Remove".
Even after deleting the stack and recreating it with the same name, docker stack ps stackname still shows them.
Interestingly enough, after recreating the stack, the tasks are still there, but with no node assigned.
Here's what I tried so far to "cleanup" the stack:
Recreating the stack with the same name
docker container prune
docker volume prune
docker system prune
Is there a way to remove a specific task?
Here's the output for docker inspect fkgz0oihexzs, the first task in the list:
[
{
"ID": "fkgz0oihexzsjqwv4ju0szorh",
"Version": {
"Index": 14422171
},
"CreatedAt": "2018-11-05T16:15:31.528933998Z",
"UpdatedAt": "2018-11-05T16:27:07.422368364Z",
"Labels": {},
"Spec": {
"ContainerSpec": {
"Image": "redacted",
"Labels": {
"com.docker.stack.namespace": "redacted"
},
"Env": [
"redacted"
],
"Privileges": {
"CredentialSpec": null,
"SELinuxContext": null
},
"Isolation": "default"
},
"Resources": {},
"Placement": {
"Platforms": [
{
"Architecture": "amd64",
"OS": "linux"
}
]
},
"Networks": [
{
"Target": "3i998stqemnevzgiqw3ndik4f",
"Aliases": [
"redacted"
]
}
],
"ForceUpdate": 0
},
"ServiceID": "g3vk9tgfibmcigmf67ik7uhj6",
"Slot": 1,
"Status": {
"Timestamp": "2018-11-05T16:15:31.528892467Z",
"State": "new",
"Message": "created",
"PortStatus": {}
},
"DesiredState": "remove"
}
]
I had the same problem. I resolved it following this instructions :
docker run --rm -v /var/run/docker/swarm/control.sock:/var/run/swarmd.sock dperny/tasknuke <taskid>
Be sure to use the full long task id or it will not work (fkgz0oihexzsjqwv4ju0szorh in your case).

Mesos Marathon(ctl) Debugging - "Abnormal executor termination: unknown container"

I'm still new to Mesos, but am trying to figure out the best way to debug a Mesos application I'm attempting to develop. I'm getting the error message "Abnormal executor termination: unknown container" through the web application, and am unsure how to get more descriptive error messages to figure out what's going on. The error message would seem to indicate it can't find the Docker image, but I know for a fact it's referencing the correct image that is installed and running.
{
"id": "pgprimary",
"cmd": null,
"cpus": 1,
"mem": 128,
"disk": 0,
"instances": 1,
"container": {
"docker": {
"image": "example/postgres:centos7-10.0-1.6.0",
"network": "BRIDGE",
"parameters": [{
"key": "hostname",
"value": "pgprimary"
}],
"portMappings": [
]
},
"type": "DOCKER",
"volumes": [
{
"hostPath": "/mnt/nfsfileshare/pgdata",
"containerPath": "/pgdata",
"mode": "RW"
}
]
},
"env": {
"PG_MODE": "primary",
"PG_USER": "testuser",
"PG_PASSWORD": "testuser",
"PG_DATABASE": "userdb",
"PG_ROOT_PASSWORD": "password",
"PG_PRIMARY_USER": "primaryuser",
"PG_PRIMARY_PASSWORD": "password",
"PG_PRIMARY_PORT": "5432"
},
"labels": {},
"healthChecks": [
{
"protocol": "COMMAND",
"command": {
"value": "/usr/pgsql-10/bin/pg_isready --host=pgprimary.marathon.mesos"
},
"gracePeriodSeconds": 300,
"intervalSeconds": 60,
"timeoutSeconds": 20,
"maxConsecutiveFailures": 3,
"ignoreHttp1xx": false
}
]
}
The command I'm using to deploy the Marathon app:
marathonctl -h http://10.0.2.15:8080 app create postgres.json
Not image, but docker is what marathon cannot find.
Specify the use of the Docker containerizer:
echo 'docker,mesos' > /etc/mesos-slave/containerizers
Provisioning Containers with the Docker Containerizer
https://mesosphere.github.io/marathon/docs/native-docker.html

How to pull docker image with marathon which need to be authorized

I wan to deploy a docker container with marathon, if the docker image without authorized, the image can be pull normally, but when I try to pull an image from repository which need to be authorized, task deploy fail, the response is
Failed to launch container: Failed to run 'docker -H unix:///var/run/docker.sock pull example.com/web:laest': exited with status 1; stderr='Error response from daemon: repository example.com/web not found: does not exist or no pull access '
I changed the permission of /var/run/docker.sock file to 777 on node, and master, but the issue is still appeared, that seems permission is not the root cause for the issue; I try to run "docker login" on the node, and pull the image manually, then the marathon task run correctly, my marathon json like below:
{
"id": "/web",
"cmd": "docker login --username='sam' --passwoer='123456' example.com/web:latest",
"cpus": 0.3,
"mem": 32,
"disk": 0,
"instances": 1,
"env": {
"EMAIL_USE_TLS": "False",
"DATABASE_URI": "mysql://user:123456#RDS:3306/test"
},
"container": {
"type": "DOCKER",
"volumes": [
{
"containerPath": "/data/supervisor/",
"hostPath": "/data/workspace/logs/supervisor/",
"mode": "RW"
}
],
"docker": {
"image": "daocloud.io/gizwits2015/gwaccounts:1.6.0",
"network": "BRIDGE",
"portMappings": [
{
"containerPort": 0,
"hostPort": 0,
"servicePort": 10000,
"protocol": "tcp",
"labels": {}
}
],
"privileged": false,
"parameters": [
{
"key": "add-host",
"value": "RDS:10.66.125.161"
}
],
"forcePullImage": false
}
},
"portDefinitions": [
{
"port": 10000,
"protocol": "tcp",
"name": "default",
"labels": {}
}
]
}
How can I pull the image with authorized with marathon?
You should read: https://mesosphere.github.io/marathon/docs/native-docker-private-registry.html
Follow step 1, and in step 2 replace the uris section with
"fetch" : [
{
"uri" : "https://path.to/file",
"extract" : true,
"outputFile" : "dockerConfig.tar.gz"
}
]
I've written more detailed explanation here: http://blog.itaysk.com/2017/05/22/using-a-custom-private-docker-registry-with-marathon

Resources