Azure edgeAgent not starting after upgrade to 1.3.0 - azure-iot-edge

After upgrading our edge devices to the latest iot-edge version (1.3.0), the edgeAgent container refuses to start. This in turn completely bricks the devices. Only option is to prune the agent container + images so it reverts to an older version again.
Anyone experienced something similar or has a suggestion on how fix this?
Old situation:
Ubuntu 18.04 server on Amd64 hardware
IotEdge runtime version: 1.2.7
azureiotedge-hub:1.2.8
azureiotedge-agent:1.2.8
Running our modules without a problem.
New situation:
IotEdge runtime version: 1.3.0
azureiotedge-agent:1.3.0
azureiotedge:hub:1.2.8 (edgeAgent crashes before it upgrades to 1.3.0)
What happens:
After upgrading the iotEdge runtime to 1.3.0 everything works fine. Problems start after releasing the new iotedge-agent software. After deploying the new manifest to the devices the azureiotedge-agent:1.3.0 is being downloaded and started. It crashes because the service can't access the storage folder (/iotedge/storage/edgeAgent) which binds to the host machine.
I can follow the steps in the updated 'agentStart.sh' script:
I see a user 'edgeagentuser' with UID 13622 on the host has been created.
The ownership on the storage directory and management socket are being changed to '13622'.
The Edge Agent Service dll is being started and crashes.
The logs
iotedge check shows only a DNS server warning. Everything 'green' besides that.
iotEdgeAgent container logs
2022-07-19 08:23:29 Starting Edge Agent
2022-07-19 08:23:29 Changing ownership of storage folder: /iotedge/storage//edgeAgent to 13622
2022-07-19 08:23:29 Changing ownership of management socket: /var/run/iotedge/mgmt.sock
2022-07-19 08:23:29 Completed necessary setup. Starting Edge Agent.
2022-07-19 08:23:29.368 +00:00 Edge Agent Main()
<6> 2022-07-19 08:23:29.935 +00:00 [INF] - Initializing Edge Agent.
<6> 2022-07-19 08:23:30.473 +00:00 [INF] - Version - 1.3.0.57041647 (b022069058d21deb30c7760c4e384b637694f464)
<6> 2022-07-19 08:23:30.475 +00:00 [INF] -

[excluded the ASCII art]
<0> 2022-07-19 08:23:30.527 +00:00 [FTL] - Fatal error reading the Agent's configuration.
System.UnauthorizedAccessException: Access to the path '/iotedge/storage/edgeAgent' is denied.
---> System.IO.IOException: Permission denied
--- End of inner exception stack trace ---
at System.IO.FileSystem.CreateDirectory(String fullPath)
at System.IO.Directory.CreateDirectory(String path)
at Microsoft.Azure.Devices.Edge.Agent.Service.Program.GetOrCreateDirectoryPath(String baseDirectoryPath, String directoryName) in /mnt/vss/_work/1/s/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Service/Program.cs:line 361
at Microsoft.Azure.Devices.Edge.Agent.Service.Program.MainAsync(IConfiguration configuration)

We are currently discussing this issue with other people on this thread:
https://github.com/Azure/iotedge/issues/6541

Related

netty dubbojson read timeout

i has dev and test two env that delpoy with k8s cluster.
dev:
rpc frame : dubbo 2.7.0
protocol:dubbo
JVM version:open jdk1.8
operate system : redhat 8.5
kernel.sysrq=1
vm.swappiness=10
net.ipv4.ip_forward=1
net.bridge.bridge-nf-call-iptables=1
net.ipv4.neigh.default.gc_thresh1=4096
net.ipv4.neigh.default.gc_thresh2=6144
net.ipv4.neigh.default.gc_thresh3=8192
test:
rpc frame : dubbo 2.7.0
protocol:dubbo
JVM version:open jdk1.8
operate system : redhat 7.9
kernel config:
kernel.sysrq=1
net.ipv4.ip_forward=1
net.bridge.bridge-nf-call-iptables=1
net.ipv4.neigh.default.gc_thresh1=4096
net.ipv4.neigh.default.gc_thresh2=6144
net.ipv4.neigh.default.gc_thresh3=8192
with my project updata lib, but netty was still in 4.1.25version,when i deploy to test on k8s pod , it will throw an timeout exception via dubbojson.
however it was work well on dev.
or update netty version to 4.1.71 on redhat 7.9 , the timeout exception was gone.
the tcpdump that i get of netty4.1.25 on test env show everthing was normal
ie:
docker container (psh ack)---provider
provider (psh ack)---docker container
docker (ack) --- provider
tcpdump show data had been reccived and ack to provider,but netty was timeout to read data.
when i use two server ,one os is redhat7.9, anther os is redhat8.5,both installed docker
i pull image(problematic) delpoy on both server , the timeout not exists anymore.
has anyone can help me.
begin, i was think that ,may be it was netty version too lower has compatiable operation system version.
after test, that was wrong.

How to start the Web UI of Chirpstack-Application?

OS: Windows 10 Pro
The whole setup for properly starting up the Web UI seems confusing to me.
There’s the source code to the chirpstack-application-server and its finished docker image. Running docker-compose up at the source code directory starts all the necessary backend services, but not the UI. In the source code, there’s a section with the UI inside the /ui directory. Starting this through npm works up until after this console log:
Note that the development build is not optimized. To create a
production build, use npm run build.
After this I get this proxy error:
Proxy error: Could not proxy request /swagger/internal.swagger.json
from localhost:3000 to http ://localhost:8080/. See https:// nodejs.
org/api/errors.html#errors_common_system_errors for more information
(ECONNREFUSED).
Then there’s the chirpstack-appliaction from precompiled binary. I started this one by first creating the config file chirpstack-application-server configfile > chirpstack-application-server.toml and then starting the executable ./chirpstack-application-server.exe. Here I just get a connection error to PostgreSQL:
time=“2020-09-17T11:09:08+02:00” level=warning msg=“storage: ping
PostgreSQL database error, will retry in 2s” error=“dial tcp
[::1]:5432: connectex: No connection could be made because the target
machine actively refused it.”
So what am I missing to get the UI up and running locally?

OpenAM: Web Policy Agent login to OpenAM fails

I am unable to identify the error source. I checked the settings dozens of times, I tried out the local and public IPs, I even tried using different web agent versions and I read everything that I could find on the topic (at least that is what it feels like).
Question: Why is my Web Agent unable to login to OpenAM?
Initial situation: I have two docker containers. The first is running a Tomcat server with OpenAM and the second is running an Apache webserver. Both containers are deployed on two different virtual machines. Both machines can reach each other via their public as well as their private IPs and in the docker-compose files 'network_mode: host' is set.
Following this offical-guide I create an agent profile using the AM console with the following specifications:
Agent ID: WebAgent
Agent URL: http://<public_ip_apache_server>:80
Server URL: http://<public_ip_openam_server>:8080/openam
password: password
Within the container running the Apache webserver, I do the following:
Stopping the apache webserver.
Install OpenSSL.
Export /<path>/libcrypto.so and /<path>/libssl.so to LD_LIBRARY_PATH.
Make sure that libc.so.6 is available, and that it supports the GLIBC_2.3 API by running
strings libc.so.6 | grep GLIBC_2 within /usr/lib/x86_64-linux-gnu/.
Creating a password file via echo password > /tmp/pwd.txt followed by chmod 400 /tmp/pwd.txt.
Running the config command for the Web Agent:
/apache24_agent/bin/agentadmin --s "/usr/local/apache2/conf/httpd.conf" \
"http://<public_ip_openam_server>:8080/openam" "http://<public_ip_apache_server>:80" "/" \
"WebAgent" "/tmp/pwd.txt" --changeOwner --acceptLicence
Problem:
The last command always fails with the following output:
OpenAM Web Agent for Apache Server installation.
Validating...
Error validating OpenAM - Agent configuration.
Installation failed.
See installation log /usr/local/apache2/apache24_agent/bin/../log/install_20201227114136.log file for more details. Exiting.
Checking the error log:
2020-12-27 11:41:36 license accepted with --acceptLicence option
2020-12-27 11:41:36 license was accepted earlier
2020-12-27 11:41:36 Found user daemon, uid 1, gid 1
2020-12-27 11:41:36 Found group daemon, gid 1
2020-12-27 11:41:36 OpenSSL library status: <removed for readbility> OpenSSL v1.1.x library support is available
2020-12-27 11:41:36 validating configuration parameters...
2020-12-27 11:41:36 error validating OpenAM agent configuration
agent login to http://<public_ip_openam_server>:8080/openam fails
2020-12-27 11:41:36 installation error
2020-12-27 11:41:36 installation exit
System and software:
OpenAM Version: 14.5.4
Container running Apache Webserver: x86_64 system, Debian
Version Apache: 2.4.46
Web Policy Agent: Platform = Apache, Platform Version = 2.4, Operating System = Linux, Architecture = 64bit, Platform Version = 5.6, Version = 5.6.2.0
OpenSSL Version: v1.1
Are you using Open Identity Platform community version? I'm afraid Web Agent 5.6.2.0 and OpenAM 14.5.4 could be incompatible. Try to use an earlier Web Agent version for example 4.1.1, or switch to OpenIG as an alternative to Web Agent.
There are a couple of useful links below:
https://github.com/OpenIdentityPlatform/OpenAM/wiki/Quick-Start-Guide
https://github.com/OpenIdentityPlatform/OpenAM/wiki/How-to-Add-Authorization-and-Protect-Your-Application-With-OpenAM-and-OpenIG-Stack

IBM Bluemix - Unable to start a container even after job success

What is wrong in this deployment ?
Why a container is not created and running ?
It is a fork project from ice-pipeline-demo project in IBM Bluemix
----- START logs ------
LOGMET setup failed with return code 2 IMAGE_NAME:
registry.ng.bluemix.net/fs_container_demo/infydevopsdemoimage:3
debconf: unable to initialize frontend: Dialog debconf: (Dialog
frontend will not work on a dumb terminal, an emacs shell buffer, or
without a controlling terminal.) debconf: falling back to frontend:
Readline debconf: unable to initialize frontend: Readline debconf:
(This frontend requires a controlling tty.) debconf: falling back to
frontend: Teletype dpkg-preconfigure: unable to re-open stdin:
Initialization complete Init runtime of 0m 53s Starting deployment
script git clone https://github.com/Osthanes/deployscripts.git
deployscripts Cloning into 'deployscripts'... Deploying using clean
strategy, for myApplicationName, deploy number 3 Cleaning up previous
deployments. Will keep 1 versions active. No previous deployments
found to clean up Container Information: Group Id Name Status
Created Updated Port
Routes: Getting routes as e-mail id ...
host domain apps No routes found Running Containers:
Container Id Name Group Image Created StatePrivate IP Public IP
Ports
(Use '-q' to display container names non-truncated) IP addresses
Number of allocated public IP addresses: 0 Images:
Image Id Created Virt SizeImage Name
5996bb6e51a11afbca89793940269abf8b7b Oct 16 17:20:51 2015
0registry.ng.bluemix.net/ibm-mobilefirst-starter:latest
ef21e9d1656c5c90b8cb74eff007d6bb3aa8 Aug 26 21:53:12 2015
0registry.ng.bluemix.net/ibm-node-strong-pm:latest
2209a9732f35a906491005f87c130bb73e26 Jul 15 16:24:27 2015
0registry.ng.bluemix.net/ibmliberty:latest
8f962f6afc9a30b646b9347ecb7f458bf75b Jul 15 16:18:04 2015 8549240
registry.ng.bluemix.net/ibmnode:latest
90b7d9479645b76b9e359105985c9f47dc6f Dec 7 04:25:31 2015
0registry.ng.bluemix.net/fs_container_demo/infydevopsdemoimage:3
To send notifications, set SLACK_WEBHOOK_PATH or HIP_CHAT_TOKEN in the
environment Execution complete
Finished: SUCCESS
-----END ----
Thanks
Sachin
I suggest you to open a support request directly from your Bluemix console using the support/help widget: in this way you'll involve IBM Containers support team in checking and fix this issue. They will be able to perform in-depth investigation of your error.
Please provide org and space guids and some details on the image you used (for example the Dockerfile if you have it).
You can retrieve org and space guids using CF CLI (when you already logged in):
cf org <orgname> --guid
cf space <spacename> --guid

Problems running Protractor on a Windows Jenkins Slave Node

I'm getting an issue while trying to run Protractor on IE11 on a Windows Jenkins Slave Node.
When I connect by remote desktop, I'm able to run Protractor with no issues. However, when I try to run Protractor from Jenkins I run into this issue:
[launcher] Error: UnknownError: JavaScript error (WARNING: The server did not provide any stacktrace information)
Command duration or timeout: 396 milliseconds
Build info: version: '2.47.1', revision: '411b314', time: '2015-07-30 03:03:16'
System info: host: 'Win7', ip: '142.133.132.199', os.name: 'Windows 7', os.arch: 'amd64', os.version: '6.1', java.version: '1.8.0_60'
Driver info: org.openqa.selenium.ie.InternetExplorerDriver
Capabilities [{browserAttachTimeout=0, enablePersistentHover=true, ie.forceCreateProcessApi=false, pageLoadStrategy=normal, ie.usePerProcessProxy=false, ignoreZoomSetting=false, handlesAlerts=true, version=11, platform=WINDOWS, nativeEvents=true, ie.ensureCleanSession=false, elementScrollBehavior=0, ie.browserCommandLineSwitches=, requireWindowFocus=false, browserName=internet explorer, initialBrowserUrl=http://localhost:12492/, takesScreenshot=true, javascriptEnabled=true, ignoreProtectedModeSettings=true, enableElementCacheCleanup=true, cssSelectorsEnabled=true, unexpectedAlertBehaviour=dismiss}]
Session ID: a43ccc90-f9f7-4465-98c3-dfb88751a5a9
at new bot.Error (C:\Jenkins\workspace\sandbox\node_modules\protractor\node_modules\selenium-webdriver\lib\atoms\error.js:108:18)
at Object.bot.response.checkResponse (C:\Jenkins\workspace\sandbox\node_modules\protractor\node_modules\selenium-webdriver\lib\atoms\response.js:109:9)
at C:\Jenkins\workspace\sandbox\node_modules\protractor\node_modules\selenium-webdriver\lib\webdriver\webdriver.js:379:20
at Array.forEach (native)
at goog.async.run.processWorkQueue (C:\Jenkins\workspace\sandbox\node_modules\protractor\node_modules\selenium-webdriver\lib\goog\async\run.js:130:15)
at process._tickCallback (node.js:356:9)
[launcher] Process exited with error code 100
This is weird, because I can run the same tests using any other browsers. Here are some things I tried:
Made sure that Jenkins is using the same account as the one I'm logging in.
Followed the steps on this blog: http://elgalu.github.io/2014/run-protractor-against-internet-explorer-vm/
Changed the protected settings, zoom level and registry key as per this website: https://code.google.com/p/selenium/wiki/InternetExplorerDriver
Tried using the x86 version of the IEDriverServer
Is there anything that I am missing?
I found out that reading documentation is a good thing :) https://code.google.com/p/selenium/wiki/InternetExplorerDriver
Read the configuring section
For IE 11 only, you will need to set a registry entry on the target
computer so that the driver can maintain a connection to the instance
of Internet Explorer it creates. For 32-bit Windows installations, the
key you must examine in the registry editor is
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Internet
Explorer\Main\FeatureControl\FEATURE_BFCACHE. For 64-bit Windows
installations, the key is
HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Internet
Explorer\Main\FeatureControl\FEATURE_BFCACHE. Please note that the
FEATURE_BFCACHE subkey may or may not be present, and should be
created if it is not present. Important: Inside this key, create a
DWORD value named iexplore.exe with the value of 0.
Worked for me...
I too had problems with Protractor tests under Jenkins, all worked well when I run the test in console, but not in Jenkins.
Turned out, that IEDriverServer does not work, when Jenkins is running as a service (as is the default with jenkins window installer). For IE tests Jenkins MUST NOT RUN AS A SERVICE, instead Jenkins service must be closed and Jenkins started with
java -jar jenkins.war
(in jenkins directory)
(see https://github.com/SeleniumHQ/selenium/wiki/InternetExplorerDriver)
Make sure the versions of your node modules, specifically selenium, are the same on both the local machine and the remote machine.

Resources