I am using the following command to start the MLflow server:
mlflow server --backend-store-uri postgresql://mlflow_user:mlflow#localhost/mlflow --artifacts-destination <S3 bucket location> --serve-artifacts -h 0.0.0.0 -p 8000
Before production deployment, we have a requirement that we need to print or fetch the under what configurations the server is running. For example, the above command uses localhost postgres connection and S3 bucket.
Is there a way to achieve this?
Also, how do I set the server's environment as "production"? So finally I should see a log like this:
[LOG] Started MLflow server:
Env: production
postgres: localhost:5432
S3: <S3 bucket path>
You can wrap it in a bash script or in a Makefile script, e.g.
start_mlflow_production_server:
#echo "Started MLflow server:"
#echo "Env: production"
#echo "postgres: localhost:5432"
#echo "S3: <S3 bucket path>"
#mlflow server --backend-store-uri postgresql://mlflow_user:mlflow#localhost/mlflow --artifacts-destination <S3 bucket location> --serve-artifacts -h 0.0.0.0 -p 8000
Additionally, it you can set and use environment variables specific to that server and print and use those in the command.
I am trying to deploy using Capistrano 3.x.
I configured agent forwarding in my ~/.ssh/config file:
Host git-codecommit.*.amazonaws.com
Hostname xxxx
ForwardAgent yes
IdentityFile /path/to/codecommit_rsa
I did the same thing for my server connection with ForwardAgent yes also.
I verified my server allows agent forwarding in the /etc/ssh/sshd_config file also:
AllowAgentForwarding yes
INFO ----------------------------------------------------------------
INFO START 2017-11-18 16:09:44 -0500 cap production deploy
INFO ---------------------------------------------------------------------------
INFO [b43ed70f] Running /usr/bin/env mkdir -p /tmp as deploy#50.116.2.15
DEBUG [b43ed70f] Command: /usr/bin/env mkdir -p /tmp
INFO [b43ed70f] Finished in 1.132 seconds with exit status 0 (successful).
DEBUG Uploading /tmp/git-ssh-testapp-production-blankman.sh 0.0%
INFO Uploading /tmp/git-ssh-testapp-production-blankman.sh 100.0%
INFO [b1a90dc1] Running /usr/bin/env chmod 700 /tmp/git-ssh-testapp-production-blankman.sh as deploy#50.116.2.15
DEBUG [b1a90dc1] Command: /usr/bin/env chmod 700 /tmp/git-ssh-testapp-production-blankman.sh
INFO [b1a90dc1] Finished in 0.265 seconds with exit status 0 (successful).
INFO [b323707d] Running /usr/bin/env git ls-remote ssh://git-codecommit.us-east-1.amazonaws.com/v1/repos/fuweb HEAD as deploy#50.116.2.15
DEBUG [b323707d] Command: ( export GIT_ASKPASS="/bin/echo" GIT_SSH="/tmp/git-ssh-testapp-production-blankman.sh" ; /usr/bin/env git ls-remote ssh://git-codecommit.us-east-1.amazonaws.com/v1/repos/fuweb HEAD )
DEBUG [b323707d] Permission denied (publickey).
DEBUG [b323707d] fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
What am I missing here?
You need to make Capistrano aware that you expect it to forward your local key. This can be done by going into you project's config/deploy.rb and adding this line:
ssh_options[:forward_agent] = true
IIRC, Capistrano executes commands remotely through SSHKit, so even if you invoke the ssh-agent and add a key locally, I can't say if it will persist for the next command.
As discussed in the comments, an SSH agent must run on the remote server as well as on the local machine that contains the key because the agents at each end need to cooperate to forward the key information. The agent (ssh-agent) is different from the SSH server (sshd). The server accepts connections, while the (otherwise optional) agent manages credentials.
Some systems start an agent automatically upon login. To check if this is the case, log in to the server and run:
$ env | grep SSH
...looking for variables like SSH_AGENT_PID or SSH_AGENT_SOCK. If it isn't started, we can execute the following command to start the agent on the server:
$ eval "$(ssh-agent)"
As we can see, this evaluates the output of the ssh-agent command because ssh-agent returns a script that sets some needed environment variables in the session.
We'll need to make sure the agent starts automatically upon login so that it doesn't interfere with the deploy process. If we checked and determined that the agent does not, in fact, start on login, we can add the last command to the "deploy" user's ~/.profile file (or ~/.bash_profile).
Note also that the host specified in the local ~/.ssh/config must match the name or IP address of the host that we want to forward credentials to, not the host that ultimately authenticates using the forwarded key. We need to change:
Host git-codecommit.*.amazonaws.com
...to:
Host 50.116.2.15
We can verify that the SSH client performs agent forwarding by checking the verbose output:
$ ssh -v deploy#50.116.2.15
...
debug1: Requesting authentication agent forwarding.
...
Of course, be sure to register any needed keys with the local agent by using ssh-add (this can also be done automatically when logging in as shown above). We can check which keys the agent loaded at any time with:
$ ssh-add -l
This usually helps me:
ssh-add -D
ssh-agent
ssh-add
In my CI chain I execute end-to-end tests after a "docker-compose up". Unfortunately my tests often fail because even if the containers are properly started, the programs contained in my containers are not.
Is there an elegant way to verify that my setup is completely started before running my tests ?
You could poll the required services to confirm they are responding before running the tests.
curl has inbuilt retry logic or it's fairly trivial to build retry logic around some other type of service test.
#!/bin/bash
await(){
local url=${1}
local seconds=${2:-30}
curl --max-time 5 --retry 60 --retry-delay 1 \
--retry-max-time ${seconds} "${url}" \
|| exit 1
}
docker-compose up -d
await http://container_ms1:3000
await http://container_ms2:3000
run-ze-tests
The alternate to polling is an event based system.
If all your services push notifications to an external service, scaeda gave the example of a log file or you could use something like Amazon SNS. Your services emit a "started" event. Then you can subscribe to those events and run whatever you need once everything has started.
Docker 1.12 did add the HEALTHCHECK build command. Maybe this is available via Docker Events?
If you have control over the docker engine in your CI setup you could execute docker logs [Container_Name] and read out the last line which could be emitted by your application.
RESULT=$(docker logs [Container_Name] 2>&1 | grep [Search_String])
logs output example:
Agent pid 13
Enter passphrase (empty for no passphrase): Enter same passphrase again: Identity added: id_rsa (id_rsa)
#host SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.6
#host SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.6
parse specific line:
RESULT=$(docker logs ssh_jenkins_test 2>&1 | grep Enter)
result:
Enter passphrase (empty for no passphrase): Enter same passphrase again: Identity added: id_rsa (id_rsa)
Before anything, please note that I have found several similar questions on Stack Overflow and articles all over the web, but none of those helped me fix my issue:
PG Error could not connect to server: Connection refused Is the server running on port 5432?
PG::ConnectionBad - could not connect to server: Connection refused
psql: could not connect to server: Connection refused
Now, here is the issue:
I have a Rails app that works like a charm.
With my collaborator, we use GitHub to work together.
We have a master and an mvp branches.
I recently updated my git version with Homebrew (Mac).
We use Foreman to start our app locally.
Now, when I try to launch the app locally, I get the following error:
PG::ConnectionBad at /
could not connect to server: Connection refused
Is the server running on host "localhost" (::1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
I tried to reboot my computers several times.
I also checked the content of /usr/local/var/postgres:
PG_VERSION pg_dynshmem pg_multixact pg_snapshots pg_tblspc postgresql.conf
base pg_hba.conf pg_notify pg_stat pg_twophase postmaster.opts
global pg_ident.conf pg_replslot pg_stat_tmp pg_xlog server.log
pg_clog pg_logical pg_serial pg_subtrans postgresql.auto.conf
As you can see, there is no postmaster.pid file in there.
Any idea how I could fix this?
run postgres -D /usr/local/var/postgres and you should see something like:
FATAL: lock file "postmaster.pid" already exists
HINT: Is another postmaster (PID 379) running in data directory "/usr/local/var/postgres"?
Then run kill -9 PID in HINT
And you should be good to go.
You most likely ran out of battery and your postgresql server didn't shutdown correctly.
The easiest workaround is to download the official postgresql app and launch it: it will force the server to start (http://postgresapp.com/)
Most likely it's because your system shutdown unexpectedly
Try
postgres -D /usr/local/var/postgres
You might see
FATAL: lock file "postmaster.pid" already exists
HINT: Is another postmaster (PID 449) running in data directory "/usr/local/var/postgres"?
Then try
kill -9 PID
example
kill -9 419
And it should start postgres normally
The postgresql server might be down and the solution might be as simple as running:
sudo service postgresql start
which fixed the issue for me.
This could be caused by the pid file created for postgres which has not been deleted due to unexpected shutdown. To fix this, remove this pid file.
Find the postgres data directory. On a MAC using homebrew it is /usr/local/var/postgres/, other systems it might be /usr/var/postgres/
Remove pid file by running:
rm postmaster.pid
Restart postgress. On Mac, run:
brew services restart postgresql
I had almost just as same error with my Ruby on Rails application running postgresql(mac). This worked for me:
brew services restart postgresql
This worked in my case:
brew uninstall postgresql
rm -fr /usr/local/var/postgres/
brew install postgresql
In my case PostgreSQL updates from version 13.4 to 14 in background, so it fixes by:
brew postgresql-upgrade-database
In other case the problem fixed by:
rm -rf /usr/local/var/postgres/postmaster.pid
or
rm -rf /opt/homebrew/var/postgres/postmaster.pid
Restart service postgresql:
brew services restart postgresql
PS:
How can you understand what is the problem?
For first see what service is not correct started:
brew services list
For second show file postgres.log, where will be the error:
tail -f /usr/local/var/log/postgres.log
or
tail -f /opt/homebrew/var/log/postgres*
And so find answer by this error's text
I resolved the issue via this command
pg_ctl -D /usr/local/var/postgres start
At times, you might get this error
pg_ctl: another server might be running; trying to start server anyway
So, try running the following command and then run the first command given above.
pg_ctl -D /usr/local/var/postgres stop
Step 1:
cd /etc/postgresql/12/main/
open file named postgresql.conf
sudo nano postgresql.conf
add this line to that file
listen_addresses = '*'
then open file named pg_hba.conf
sudo nano pg_hba.conf
and add this line to that file
host all all 0.0.0.0/0 md5
It allows access to all databases for all users with an encrypted password
restart your server
sudo /etc/init.d/postgresql restart
This is how I solved my problem:
see the status of services
brew services list
and the output was :
Name Status User Plist
postgresql error myuser /Users/myuser/Library/LaunchAgents/homebrew.mxcl.postgresql.plist
unbound stopped
I change the file name in this path, you can also remove it
mv /Users/myuser/Library/LaunchAgents/homebrew.mxcl.postgresql.plist /Users/myuser/Library/LaunchAgents/homebrew.mxcl.postgresql.plist_temp
and then reboot the os
sudo reboot
after booting I started the postgresql and it worked.
brew services start postgresql
Find postgresql#10 service directory
$ ls /usr/local/var/postgresql#10
Find file postmaster.pid and delete that file
$ rm -f postmaster.pid
Restart postgres service using
$ brew services restart postgresql#10
This worked for me:
run
sudo lsof -i :<port_number>
after that it will display the PID which is currently attached to the process.
After that run sudo kill -9 <PID>
if that doesn't work,
try the solution offered by user8376606 it would definitely work!
If you want to restart Postgresql on Linux, then you have to use the following command.
/etc/init.d/postgresql restart
In my case
I have changed the port in postgresql.conf file
and restart postgresql services in
Run => service.msc => Restart
now retry
First I tried
lsof -wni tcp:5432
but it doesn't show any PID number.
Second I tried
Postgres -D /usr/local/var/postgres
and it showed that server is listening.
So I just restarted my mac to restore all ports back and it worked for me.
For Docker users: In my case it was caused by excessive docker image size. You can remove unused data using prune command:
docker system prune --all --force --volumes
Warning: as per manual (docker system prune --help):
This will remove:
all stopped containers
all networks not used by at least one container
all dangling images
all dangling build cache
I encountered a similar problem when I was trying to connect my Django application to PostgreSQL database.
I wrote my Dockerfile with instructions to setup the Django project followed by instructions to install PostgreSQL and run Django server in my docker-compose.yml.
I defined two services in my docker-compose-yml.
services:
postgres:
image: "postgres:latest"
environment:
- POSTGRES_DB=abc
- POSTGRES_USER=abc
- POSTGRES_PASSWORD=abc
volumes:
- pg_data:/var/lib/postgresql/data/
django:
build: .
command: python /code/manage.py runserver 0.0.0.0:8004
volumes:
- .:/app
ports:
- 8004:8004
depends_on:
- postgres
Unfortunately whenever I used to run docker-compose up then same err. used to pop up.
And this is how my database was defined in Django settings.py.
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'NAME': 'abc',
'USER': 'abc',
'PASSWORD': 'abc',
'HOST': '127.0.0.1',
'PORT': '5432',
'OPTIONS': {
'client_encoding': 'UTF8',
},
}
}
So, In the end I made use of docker-compose networking which means if I change the host of my database to postgres which is defined as a service in docker-compose.yml will do the wonders.
So, Replacing 'HOST': '127.0.0.1' => 'HOST': 'postgres' did wonders for me.
After replacement this is how your Database config in settings.py will look like.
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'NAME': 'abc',
'USER': 'abc',
'PASSWORD': 'abc',
'HOST': 'postgres',
'PORT': '5432',
'OPTIONS': {
'client_encoding': 'UTF8',
},
}
}
I often encounter this problem on windows,the way I solved the problem is Service - Click PostgreSQL Database Server 8.3 - Click the second tab "log in" - choose the first line "the local system account".
It also gives the same error if you just stop your PostgreSQL app. You just need to start it again. (PostgreSQL 11)
I have faced the same issue and I was unable to start the postgresql server and was unable to access my db even after giving password, and I have been doing all the possible ways.
This solution worked for me,
For the Ubuntu users:
Through command line, type the following commands:
1.service --status-all (which gives list of all services and their status. where "+" refers to running and "-" refers that the service is no longer running)
check for postgresql status, if its "-" then type the following command
2.systemctl start postgresql (starts the server again)
refresh the postgresql page in browser, and it works
For the Windows users:
Search for services, where we can see list of services and the right click on postgresql, click on start and server works perfectly fine.
In my case I forgot to change the database from postgres (on my production) back to sqlite3 which I was using for development.
It's working for me >>Node.Js App
user#MacBook-Pro % sudo lsof -i :5430
Output
COMMAND PID user FD TYPE DEVICE SIZE/OFF NODE NAME
node 7885 user 21u IPv6 0x2e7d89f6118f95b9 0t0 TCP *:radec-corp (LISTEN)
Kill the PID
user#MacBook-Pro % sudo kill -9 7885
One more test
user#MacBook-Pro % sudo lsof -i :5430
user#MacBook-Pro % "No more running PID for the port 5430"
In my case, on a Ruby on Rails project, I removed a .pid file from the folder tmp/pids and restart the system.
Had the same issue. I checked that my database.yml file, (dev mode) host was pointing to 5433. I updated it to 5432 and it worked.
Just in case someone needs this for windows, read on.
On windows hit the Windows button + R
then enter services.msc and look for postgresql-x64-14, Right click it and click start
Then go back to your PgAdmin4 for windows and then enter your master password if asked.
From here, you should be able to proceed as usual with viewing of the db schemas.
Also, for Django, restart your server with CTR+C then python manage.py runserver (assuming you're working inside a virtual env)
Good luck
ps -ef|grep postgres
Then kill the process with PID
sudo kill -9 PID
Then start the postgresql
sudo service postgresql start
In my case I when it happens to me I need to do the following steps:
1º Step
Log in postgres user:
#sudo su postgres
2º Step
Run the following steps: /opt/PostgreSQL/10/bin/postgres -D /opt/PostgreSQL/10/data -r /usr/local/var/postgres/server.log
Explanation:
We access the utility of postgres located at /opt/PostgreSQL/10/bin/ in your case could be another but identify where it's.
After this step we tell to the utility of postgres where the it's data folder is by using the option -D, this data folder contains all necessary configuration of postgres server.
The option -r we tell to postgres where to send stdout and stderr to given file, in my case the file that I used is /usr/local/var/postgres/server.log
Note:
I'm using Postgre 10
Linux Ubuntu
I am new to nagios.
I am trying to configure the "check_disk" service for one host but I am not getting the expected results.
I should get the emails when when disk usage goes beyond 80%.
So, There is already service defined for this task with multiple hosts, as below:
define service{
use local-service ; Name of service template to use
host_name localhost, host1, host2, host3, host4, host5, host6
service_description Root Partition
check_command check_local_disk!20%!10%!/
contact_groups unix-admins,db-admins
}
The Issue:
Further I tried to test single host i.e. "host2". The current usage of host2 is as follow:
# df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/rootvg-rootvol 94G 45G 45G 50% /
So to get instant emails, I written another service as below, where warning set to <60% and critical set to <40%.
define service{
use local-service
host_name host2
service_description Root Partition again
check_command check_local_disk!60%!40%!/
contact_groups dev-admins
}
But still I am not receive any emails for the same.
Where it going wrong.
The "check_local_disk" command is defined as below:
define command{
command_name check_local_disk
command_line $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
}
Your command definition currently is setup to only check your Nagios server's disk, not the remote hosts (such as host2). You need to define a new command definition to execute check_disk on the remote host via NRPE (Nagios Remote Plugin Execution).
On Nagios server, define the following:
define command {
command_name check_remote_disk
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_disk -a $ARG1$ $ARG2$ $ARG3$
register 1
}
define service{
use genric-service
host_name host1, host2, host3, host4, host5, host6
service_description Root Partition
check_command check_remote_disk!20%!10%!/
contact_groups unix-admins,db-admins
}
Restart the Nagios service.
On the remote host:
Ensure you have NRPE plugin installed.
Instructions for Ubuntu: http://tecadmin.net/install-nrpe-on-ubuntu/
Instructions for CentOS / RHEL: http://sharadchhetri.com/2013/03/02/how-to-install-and-configure-nagios-nrpe-in-centos-and-red-hat/
Ensure there is a command defined for check_disk on the remote host. This is usually included in nrpe.cfg, but commented-out. You'd have to un-comment the line.
Ensure you have the check_disk plugin installed on the remote host. Mine is located at: /usr/lib64/nagios/plugins/check_disk
Ensure that allowed_hosts field of nrpe.cfg includes the IP address / hostname of your Nagios server.
Ensure that dont_blame_nrpe field of nrpe.cfg is set to 1 to allow command line arguments to NRPE commands: dont_blame_nrpe=1
If you made any changes, restart the nrpe service.