IBM Bluemix - Unable to start a container even after job success

IBM Bluemix - Unable to start a container even after job success - docker

What is wrong in this deployment ?
Why a container is not created and running ?
It is a fork project from ice-pipeline-demo project in IBM Bluemix
----- START logs ------
LOGMET setup failed with return code 2 IMAGE_NAME:
registry.ng.bluemix.net/fs_container_demo/infydevopsdemoimage:3
debconf: unable to initialize frontend: Dialog debconf: (Dialog
frontend will not work on a dumb terminal, an emacs shell buffer, or
without a controlling terminal.) debconf: falling back to frontend:
Readline debconf: unable to initialize frontend: Readline debconf:
(This frontend requires a controlling tty.) debconf: falling back to
frontend: Teletype dpkg-preconfigure: unable to re-open stdin:
Initialization complete Init runtime of 0m 53s Starting deployment
script git clone https://github.com/Osthanes/deployscripts.git
deployscripts Cloning into 'deployscripts'... Deploying using clean
strategy, for myApplicationName, deploy number 3 Cleaning up previous
deployments. Will keep 1 versions active. No previous deployments
found to clean up Container Information: Group Id Name Status
Created Updated Port
Routes: Getting routes as e-mail id ...
host domain apps No routes found Running Containers:
Container Id Name Group Image Created StatePrivate IP Public IP
Ports
(Use '-q' to display container names non-truncated) IP addresses
Number of allocated public IP addresses: 0 Images:
Image Id Created Virt SizeImage Name
5996bb6e51a11afbca89793940269abf8b7b Oct 16 17:20:51 2015
0registry.ng.bluemix.net/ibm-mobilefirst-starter:latest
ef21e9d1656c5c90b8cb74eff007d6bb3aa8 Aug 26 21:53:12 2015
0registry.ng.bluemix.net/ibm-node-strong-pm:latest
2209a9732f35a906491005f87c130bb73e26 Jul 15 16:24:27 2015
0registry.ng.bluemix.net/ibmliberty:latest
8f962f6afc9a30b646b9347ecb7f458bf75b Jul 15 16:18:04 2015 8549240
registry.ng.bluemix.net/ibmnode:latest
90b7d9479645b76b9e359105985c9f47dc6f Dec 7 04:25:31 2015
0registry.ng.bluemix.net/fs_container_demo/infydevopsdemoimage:3
To send notifications, set SLACK_WEBHOOK_PATH or HIP_CHAT_TOKEN in the
environment Execution complete
Finished: SUCCESS
-----END ----
Thanks
Sachin

I suggest you to open a support request directly from your Bluemix console using the support/help widget: in this way you'll involve IBM Containers support team in checking and fix this issue. They will be able to perform in-depth investigation of your error.
Please provide org and space guids and some details on the image you used (for example the Dockerfile if you have it).
You can retrieve org and space guids using CF CLI (when you already logged in):
cf org <orgname> --guid
cf space <spacename> --guid

Related

Azure edgeAgent not starting after upgrade to 1.3.0

After upgrading our edge devices to the latest iot-edge version (1.3.0), the edgeAgent container refuses to start. This in turn completely bricks the devices. Only option is to prune the agent container + images so it reverts to an older version again.
Anyone experienced something similar or has a suggestion on how fix this?
Old situation:
Ubuntu 18.04 server on Amd64 hardware
IotEdge runtime version: 1.2.7
azureiotedge-hub:1.2.8
azureiotedge-agent:1.2.8
Running our modules without a problem.
New situation:
IotEdge runtime version: 1.3.0
azureiotedge-agent:1.3.0
azureiotedge:hub:1.2.8 (edgeAgent crashes before it upgrades to 1.3.0)
What happens:
After upgrading the iotEdge runtime to 1.3.0 everything works fine. Problems start after releasing the new iotedge-agent software. After deploying the new manifest to the devices the azureiotedge-agent:1.3.0 is being downloaded and started. It crashes because the service can't access the storage folder (/iotedge/storage/edgeAgent) which binds to the host machine.
I can follow the steps in the updated 'agentStart.sh' script:
I see a user 'edgeagentuser' with UID 13622 on the host has been created.
The ownership on the storage directory and management socket are being changed to '13622'.
The Edge Agent Service dll is being started and crashes.
The logs
iotedge check shows only a DNS server warning. Everything 'green' besides that.
iotEdgeAgent container logs
2022-07-19 08:23:29 Starting Edge Agent
2022-07-19 08:23:29 Changing ownership of storage folder: /iotedge/storage//edgeAgent to 13622
2022-07-19 08:23:29 Changing ownership of management socket: /var/run/iotedge/mgmt.sock
2022-07-19 08:23:29 Completed necessary setup. Starting Edge Agent.
2022-07-19 08:23:29.368 +00:00 Edge Agent Main()
<6> 2022-07-19 08:23:29.935 +00:00 [INF] - Initializing Edge Agent.
<6> 2022-07-19 08:23:30.473 +00:00 [INF] - Version - 1.3.0.57041647 (b022069058d21deb30c7760c4e384b637694f464)
<6> 2022-07-19 08:23:30.475 +00:00 [INF] -

[excluded the ASCII art]
<0> 2022-07-19 08:23:30.527 +00:00 [FTL] - Fatal error reading the Agent's configuration.
System.UnauthorizedAccessException: Access to the path '/iotedge/storage/edgeAgent' is denied.
---> System.IO.IOException: Permission denied
--- End of inner exception stack trace ---
at System.IO.FileSystem.CreateDirectory(String fullPath)
at System.IO.Directory.CreateDirectory(String path)
at Microsoft.Azure.Devices.Edge.Agent.Service.Program.GetOrCreateDirectoryPath(String baseDirectoryPath, String directoryName) in /mnt/vss/_work/1/s/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Service/Program.cs:line 361
at Microsoft.Azure.Devices.Edge.Agent.Service.Program.MainAsync(IConfiguration configuration)

We are currently discussing this issue with other people on this thread:
https://github.com/Azure/iotedge/issues/6541

Install Jenkins as Windows Service when I already have existing Jenkins install?

We have an existing Jenkins install that I run from the command line. I want to start using it as a Windows Service instead, so that it launches when the machine restarts, without requiring someone to log in.
I have read about how to do it, but I am worried that it might break our existing setup, the jobs and other scripts that rely on the current location. Apparently when you go to Install Jenkins as a Windows Service, it asks you for a location for JENKINS_HOME.
Can I just give it the existing location? Will it just work or is there a danger of it wiping out what's there? And if I want to be safe and back up everything just in case, can I just make a copy of the existing .jenkins folder and then copy it back if something goes wrong? Or are there other files somewhere that I need to back up?
My question is basically the same as this one, which never got an answer:
Installing existing Jenkins as a Windows Service
Thanks

You should just be able to do this directly from the UI. (It used to be documented on the Jenkins wiki, but that's presently down.)
Fire up your command line Jenkins java -jar -jenkins.war, go to "Manage Jenkins" (${JENKINS_URL}/manage). You should see an icon "Install as Windows Service".
Click on it and you arrive at ${JENKINS_URL}/install. Point it at your existing install and click "Install". You will get a prompt to restart as a service and then it restarts.
You're done. You should see in your logs the system restarting messages:
2021-09-10 00:25:44.077+0000 [id=96] INFO jenkins.model.Jenkins#cleanUp: Stopping Jenkins
2021-09-10 00:25:44.080+0000 [id=96] INFO jenkins.model.Jenkins$18#onAttained: Started termination
2021-09-10 00:25:44.099+0000 [id=96] INFO jenkins.model.Jenkins$18#onAttained: Completed termination
2021-09-10 00:25:44.100+0000 [id=96] INFO jenkins.model.Jenkins#_cleanUpDisconnectComputers: Starting node disconnection
2021-09-10 00:25:44.115+0000 [id=96] INFO jenkins.model.Jenkins#_cleanUpShutdownPluginManager: Stopping plugin manager
2021-09-10 00:25:44.115+0000 [id=96] INFO jenkins.model.Jenkins#_cleanUpPersistQueue: Persisting build queue
2021-09-10 00:25:44.127+0000 [id=96] INFO jenkins.model.Jenkins#_cleanUpAwaitDisconnects: Waiting for node disconnection completion
2021-09-10 00:25:44.127+0000 [id=96] INFO jenkins.model.Jenkins#cleanUp: Jenkins stopped
[.jenkins] $ C:\Users\ \.jenkins\jenkins.exe start
2021-09-09 17:25:45,153 INFO - Starting the service with id 'jenkins'
You should also now see the jenkins service running in Windows Services:
You can manage it via the Services UI, the command line via SC, or the jenkins.exe binary:
NOTE: The same security caveats regarding running as LocalSystem apply regardless of if using this mechanism or the MSI install. Recommend changing to run as a local user; needs LogonAsService permission (Using the LocalSystem Account as a Service Logon Account, Why running a service as Local System is bad on windows). Local Security Policy > Local Policies > User Rights Management > Log on as a service.
C:\>sc query jenkins
SERVICE_NAME: jenkins
TYPE : 10 WIN32_OWN_PROCESS
STATE : 4 RUNNING
(STOPPABLE, NOT_PAUSABLE, ACCEPTS_SHUTDOWN)
WIN32_EXIT_CODE : 0 (0x0)
SERVICE_EXIT_CODE : 0 (0x0)
CHECKPOINT : 0x0
WAIT_HINT : 0x0
C:\> sc stop jenkins
SERVICE_NAME: jenkins
TYPE : 10 WIN32_OWN_PROCESS
STATE : 3 STOP_PENDING
(STOPPABLE, NOT_PAUSABLE, ACCEPTS_SHUTDOWN)
WIN32_EXIT_CODE : 0 (0x0)
SERVICE_EXIT_CODE : 0 (0x0)
CHECKPOINT : 0x0
WAIT_HINT : 0x0
C:\> sc delete jenkins
[SC] DeleteService SUCCESS
C:\>
C:\Users\ \.jenkins> jenkins.exe /?
A wrapper binary that can be used to host executables as Windows services
Usage: winsw [/redirect file] <command> [<args>]
Missing arguments trigger the service mode
Available commands:
install install the service to Windows Service Controller
uninstall uninstall the service
start start the service (must be installed before)
stop stop the service
stopwait stop the service and wait until it's actually stopped
restart restart the service
restart! self-restart (can be called from child processes)
status check the current status of the service
test check if the service can be started and then stopped
testwait starts the service and waits until a key is pressed then stops the service
version print the version info
help print the help info (aliases: -h,--help,-?,/?)
Extra options:
/redirect redirect the wrapper's STDOUT and STDERR to the specified file
WinSW 2.9.0.0
More info: https://github.com/kohsuke/winsw
Bug tracker: https://github.com/kohsuke/winsw/issues
Images captured from 2.303.1 on Win 10 Enterprise; YMMV.

Failed to connect Hyperledger Explorer to Fabric project

I have a Fabric project up and running with 7 org/5 channel setup with each org having 2 peers. Everything is up and running. Now i am trying to connect Hyperledger Explorer to view the blockchain data. However there is an issue i am facing in the configuration part.
Steps i performed:
Pulled the images and added the following containers in a single docker-compose.yaml file for startup: hyperledger/explorer-db:latest, hyperledger/explorer:latest, prom/prometheus:latest, grafana/grafana:latest
Edited the created containers with the respective configurations needed and volume mounts.
volumes:
./config.json:/opt/explorer/app/platform/fabric/config.json
./connection-profile:/opt/explorer/app/platform/fabric/connection-profile/
./crypto-config:/tmp/crypto
walletstore:/opt/wallet
Since its a multi-org setup i edited the config.json files and accordingly pointed them to the respective connection profiles as per the organization setup
{
"network-configs": {
"org1-network": {
"name": "Sample-1",
"profile": "./connection-profile/org1-network.json"
}, and so on for other orgs
Edited the prometheus.yml to put in the static configurations
static_configs:
targets: ['localhost:8443','localhost:8444', and so on for every peer service]
targets: ['orderer0-service:8443','orderer1-service:8444', and so on for every orderer service]
Edited the peer services in my docker-compose.yaml file to add in the below values on each peer config
CORE_OPERATIONS_LISTENADDRESS=0.0.0.0:9449 # RESTful API for Hyperledger Explorer
CORE_METRICS_PROVIDER=prometheus # Prometheus will pull metrics
Issue: (Now resolved - see below)
It seems that explorer isn't able to find my Admin#org1-cert.pem' path in the given location. But i double checked everything and that particular path is present and also accessible. All permissions to that path is also open to avoid any permissioning issue.
Path in question [Full path is provided not the relative path]: /home/auro/Desktop/HLF/fabricapp/crypto-config/peerOrganizations/org1/users/Admin#org1/msp/signcerts/Admin#org1-cert.pem
The config files is also setup properly. I am unable to find a way to correct way. Would be really glad if someone can tell me what is going on with this path issue, because i tried everything i think i could but still not able to get it working.
Other details:
Using Hypereldger Explorer - v1.1.0 - Pulling the latest docker image
Using Hyperledger Fabric - v.1.4.6 - Pulling the specific version from docker hub for this
Update: Okay, i managed to solve this. Apparently the path to be given in the config file isnt that of the local system but of the docker container. I replaced the path with the path to my docker container where the files are placed and it worked.
New Problem -1: (Now solved) Now i am getting an error as shown below. Highlighted in yellow
I had a look at peer-0-org-1-service node logs when this happened and this is the error it had logged.
2020-07-20 04:38:15.995 UTC [core.comm] ServerHandshake -> ERRO 028 TLS handshake failed with error tls: first record does not look like a TLS handshake server=PeerServer remoteaddress=172.18.0.53:33300
Update: Okay, i managed to solve this too. There were 2 issues. The TLS handshake wasn't happening because the TLS certificate wasn't set to true in the config. The second issue of STREAM removed happened because the url in the config wasnt specified as grpc. Once changes were done, it resolved
New Problem -2: (Current Issue)
It seems that the channel issue is there. Somehow it still shows "not assigned to this channel" and a new error of "Error: 14 UNAVAILABLE: failed to connect to all addresses". This same error happened for all the peers across 7 orgs.
And not to mention suddenly the peers are not able to talk to each other.
Error Received: Could not connect to Endpoint: peer0-org2-service:7051, InternalEndpoint: peer0-org2-service:7051, PKI-ID: , Metadata: : context deadline exceeded
I checked the peer channel connection details and everything seems to be in order. Stuck in this for now. Let me know if anyone has any ideas.

As you can see from the edits i got one problem solved before another came along. After banging my head for a lot of times, i removed the entire build, rebuilt it again with my corrections given above and it simply started working.

You seem to be using old Explorer image. I strongly recommend to use the latest one v1.1.1. Note: There are some updates of settings format in connection profile (e.g. login credential of Explorer). Please refer README-CONFIG for detail.

Unable to create machine in docker

I've just installed docker on my windows 7 machine. When I start Docker QuickStart, I get following error which seems to be while creating the machine:
Creating machine...
(default) Unable to get the latest Boot2Docker ISO release version: Get https:/
/api.github.com/repos/boot2docker/boot2docker/releases/latest: dial tcp 192.30.2
52.124:443: connectex: A connection attempt failed because the connected party d
id not properly respond after a period of time, or established connection failed
because connected host has failed to respond.
(default) Copying C:\Users\robot\.docker\machine\cache\boot2docker.iso to C:\Use
rs\robot\.docker\machine\machines\default\boot2docker.iso...
(default) Creating VirtualBox VM...
(default) Creating SSH key...
Error attempting heartbeat call to plugin server: read tcp 127.0.0.1:60733->127.
0.0.1:60732: wsarecv: An existing connection was forcibly closed by the remote h
ost.
Error attempting heartbeat call to plugin server: connection is shut down
Error attempting heartbeat call to plugin server: connection is shut down
Error attempting heartbeat call to plugin server: connection is shut down
Error attempting heartbeat call to plugin server: connection is shut down
Error creating machine: Error in driver during machine creation: read tcp 127.0.
0.1:60733->127.0.0.1:60732: wsarecv: An existing connection was forcibly closed
by the remote host.
Looks like something went wrong... Press any key to continue...

There is a similar issue in docker/machine/issues/2773.
Try and see if the issue persists when creating a machine yourself instead of using quick-start:
Find where docker-machine.exe has been installed (or copy the latest released one in your %PATH%) and use that, from a regular CMD session:
First test the existing machine:
# find the name of the machine created.
docker-machine ls
docker-machine env --shell cmd <nameOfTheMachine>
docker machine ssh <nameOfTheMachine>
Then try creating a new one:
docker-machine create -d virtualbox <aNewMachine>
docker-machine env --shell cmd <aNewMachine>
docker machine ssh <aNewMachine>

I do not have a solution but found the root cause.
I had installed boot2docker and has been using for months. I had been creating all
my vbox images on the same folder all the while.
One fine day I decided to archive my machines and changed the folder in which I was creating the vbox images. It started giving this wired error. I reverted back my archive and tested again. It started working fine.
The difference I found on both the set up was, in the archived folder it was skipping the ca cert creation step and was directly creating the machine. In the new folder it was creating a cert and then creating the machine. It looks like the server doesn't like the new certs!!!!

DataStax AMI installed in VPC hangs at Installation started

I experienced this issue today with ami-ada2b6c4 creating a new instance in a VPC without a public IP - logged in via SSH and the terminal hangs on Installation started.
I used Ctrl-C to interrupt and this printed on the terminal:
Installation started ^CTraceback (most recent call last):
File "datastax_ami/ds4_motd.py", line 239, in <module>
run()
File "datastax_ami/ds4_motd.py", line 228, in run
waiting_for_status()
File "datastax_ami/ds4_motd.py", line 100, in waiting_for_status
time.sleep(5)
KeyboardInterrupt
I followed a link to github posted by joaquin for a similar problem and added an entry to /etc/hosts. Logged off the instance and then reconnected. this time, got a different error.
Raiding complete
Waiting for nodetool...
The cluster is now in it's finalization phase. This should only take a moment...
Note: You can also use CTRL+C to view the logs if desired:
AMI log: ~/datastax_ami/ami.log
Cassandra log: /var/log/cassandra/system.log
Note: Ownership information does not include topology; for complete information, specify a keyspace
Datacenter: Solr
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Token Rack
UN 10.0.10.92 53 KB 100.0% c010f1d3-3d74-4c2b-ae88-9e3fecfc447c -9223372036854755808 rack1
Opscenter: http://10.0.10.92:8888/
Please wait 60 seconds if this is the cluster's first start...
Tools:
Run: datastax_tools
Demos:
Run: datastax_demos
Support:
Run: datastax_support
------------------------------------
DataStax AMI for DataStax Enterprise
and DataStax Community
AMI version 2.5
DataStax Enterprise version 4.5.2-1
------------------------------------
These notices occurred during the startup of this instance:
[ERROR] 10/15/14-18:16:08 git pull:
error: Failed connect to github.com:443; Connection timed out while accessing https://github.com/riptano/ComboAMI.git/info/refs
The security group does allow access to the internet - I was able to sudo apt-get update, for example.

A couple of learnings from my experience. Joaquin from Datastax was very helpful with his suggestions.
I passed in userdata using the advanced config in AWS Management GUI -
--clustername EIP_ami --totalnodes 3 --version enterprise --username some.guy_some.com --password changeme --searchnodes 3
but didn't change the number of instances to 3 in the GUI so only one node was created. this may have contributed to the problem.
After interrupting the initial hanging installation -
added this entry to /etc/hosts:
127.0.1.1 ip-10-0-1-234
added an elastic IP to the instance and rebooted
I got to the 2nd error:
These notices occurred during the startup of this instance:
[ERROR] 10/15/14-18:16:08 git pull:
error: Failed connect to github.com:443; Connection timed out while accessing https://github.com/riptano/ComboAMI.git/info/refs
When I got to a bash prompt, I tested the git pull manually per Joaquin's advice:
ubuntu#ip-10-0-1-234:~/datastax_ami$ git pull
Already up-to-date.
per Joaquin:
Yes, that error message will stick around for the life of the machine.
Perhaps the git pull issue was a fluke.
Nodetool status says the node is up and acting normally. So, the learning here is to ignore the errors - none of them seem to affect the creation or operation of the node.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart