AKS cluster network problems - azure-aks

I have an AKS cluster. I need to connect to the customer's SFTP server from the node AKS. It worked stably but stopped working about a month ago. I started getting a connection error and the connection is timed out. I tried connecting locally and connecting from another AKS cluster. SFTP connection works fine. I created a test SFTP server and was able to connect without problems from the problematic cluster. I am using Calico. Could you tell me where to look to understand where the connection to the customer's SFTP server is blocked? Thanks.

The default behavior of Calico is to permit all traffic. However, this behavior changes to block all traffic except those that are explicitly allowed by policies when a policy is present. Please Check network policies. steps below mentioned.
Connect to AKS cluster
verify any network policy exist which conflicts SFTP server.
kubectl get networkpolicy -A
Disable policy using below command
kubectl delete networkpolicy -n < SFTP Policy Name>

Related

cert-manager does not issue certificate after upgrading to AKS k8s 1.24.6

I have an automatic setup with scripts and helm to create a Kubernetes Cluster on MS Azure and to deploy my application to the cluster.
First of all: everything works fine when I create a cluster with Kubernetes 1.23.12, that means after a few minutes everything is installed and I can access my website and there is a certificate issued by letsencrypt.
But when I delete this cluster completely and reinstall it and only change the Kubernetes version from 1.23.12 to 1.24.6. I dont't get a certificate any more.
I see that the acme challenge is not working. I get the following error:
Waiting for HTTP-01 challenge propagation: failed to perform self check GET request 'http://my.hostname.de/.well-known/acme-challenge/2Y25fxsoeQTIqprKNR4iI4X81jPoLknmRNvj9uhcOLk': Get "http://my.hostname.de/.well-known/acme-challenge/2Y25fxsoeQTIqprKNR4iI4X81jPoLknmRNvj9uhcOLk": dial tcp: lookup my.hostname.de on 10.0.0.10:53: no such host
After some time the error message changes to:
'Error accepting authorization: acme: authorization error for my.hostname.de:
400 urn:ietf:params:acme:error:connection: 20.79.77.156: Fetching http://my.hostname.de/.well-known/acme-challenge/2Y25fxsoeQTIqprKNR4iI4X81jPoLknmRNvj9uhcOLk:
Timeout during connect (likely firewall problem)'
10.0.0.10 is the cluster IP of kube-dns in my kubernetes cluster. When I look at "Services and Ingresses" in Azure portal I can see the port 53/UDP;53/TCP for the cluster IP 10.0.0.10
And I can see there that 20.79.77.156 is the external IP of the ingres-nginx-controller (Ports: 80:32284/TCP;443:32380/TCP)
So I do not understand why the acme challenge cannot be performed successfully.
Here some information about the version numbers:
Azure Kubernetes 1.24.6
helm 3.11
cert-manager 1.11.0
ingress-nginx helm-chart: 4.4.2 -> controller-v1.5.1
I have tried to find the same error on the internet. But you don't find it often and the solutions do not seem to fit to my problem.
Of course I have read a lot about k8s 1.24.
It is not a dockershim problem, because I have tested the cluster with the Detector for Docker Socket (DDS) tool.
I have updated cert-manager and ingress-nginx to new versions (see above)
I have also tried it with Kubernetes 1.25.4 -> same error
I have found this on the cert-manager Website: "cert-manager expects that ServerSideApply is enabled in the cluster for all versions of Kubernetes from 1.24 and above."
I think I understood the difference between Server Side Apply and Client Side Apply, but I don't know if and how I can enable it in my cluster and if this could be a solution to my problem.
Any help is appreciated. Thanks in advance!
I've solved this myself recently, try this for your ingress controller:
ingress-nginx:
rbac:
create: true
controller:
service:
annotations:
service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path: /healthz
k8s 1.24+ is using a different endpoint for health probes.

Can not run kubernetes dashboard on Master node

I installed kubernetes cluster (include one master and two nodes), and status of nodes are ready on master. When I deploy the dashboard and run it by acccessing the link http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/, I get error
'dial tcp 10.32.0.2:8443: connect: connection refused' Trying to
reach: 'https://10.32.0.2:8443/'
The pod state of dashboard is ready, and I tried to ping to 10.32.0.2 (dashboard's ip) not succesfully
I run dashboard as the Web UI (Dashboard) guide suggests.
How can I fix this ?
There are few options here:
Most of the time if there is some kind of connection refused, timeout or similar error it is most likely a configuration problem. If you can't get the Dashboard running then you should try to deploy another application and try to access it. If you fail then it is not a Dashboard issue.
Check if you are using root/sudo.
Have you properly installed flannel or any other network for containers?
Have you checked your API logs? If not, please do so.
Check the description of the dashboard pod (kubectl describe) if there is anything suspicious.
Analogically check the description of service.
What is your cluster version? Check if any updates are required.
Please let me know if any of the above helped.
Start proxy, if it's not started
kubectl proxy --address='0.0.0.0' --port=8001 --accept-hosts='.*'

timeout when I try to ssh cpanel

I am trying to connect to the C panel server through ssh but I get a timeout.
What I have tried
I have tried ssh user#domain.com and user#ipaddress to no avail
I already tried using the ftp user account but the result is the same. I don't know what to do.
The goal
I have to connect in order to install rails on the server and this is a huge delay.
Please help.
the error is
ssh: connect to host xxx.co.ke port 22: Connection timed out
There are many possible reasons for this.
SSH is not configured on this port
SSH is not enabled on this server
The firewall on your server is blocking your IP
We will need more information like are you having a VPS or a shared hosting plan as many hosting companies do not allow ssh in shared hosting plans. If this is a VPS then you will need to ask your hosting provider the SSH port and try. Are you able to ping your server IP. If not, then you will also need to check in the server firewall if your IP is blocked.
You are using wrong SSH port OR your ISP ip is blocked on your server and due to that you are getting connection time out, You need to confirm your SSH server port from your hosting provider and ask them to enable SSH access for your cPanel user.

JIRA Usage on AWS

I just set up JIRA on my ec2 instance after installing it via .bin installer file. But when I hit the ec2 url:
ec2-xxxxx.xxxxx.amazonaws.com
It is hitting the test success page for apache2 which I installed after JIRA installation.
How do I get to determine the correct URL for JIRA and hit the JIRA app?
Thanks
JIRA defaut http port is 8080. So you need access it via
ec2-xxxxx.xxxxx.amazonaws.com:8080
if you are not following the detault setting, then you need make sure which port are set by this document Changing JIRA's TCP Ports
You may need open the firewall port 8080 and set in one security group which you assign port 22 to be opened. Otherwise, you can't directly access that port.
Apart from the previous answer you may wish to ensure the following:
Your AWS EC2 Instance security group have the port opened
Your AWS VPC ACL allows TCP traffic on this port
Your VPC have an internet gateway
Your VPC have the routes configured
Your Apache proxy is configured to point to the Tomcat port
Your Tomcat is configured
You have enabled port allocation using setcap utility
Your local machine firewall enables the connection (in Red Hat ipconfig is enabled by default and blocks the connections)
As you can see it may be tricky to install Jira on AWS. It may be a good idea to use a deployment service like Deploy4Me to do this quickly.

neo4j backup error when backing up from ha cluster

I'm trying to setup backup for a Neo4j cluster with 3 instances. Neo4j is embedded.
If I run:
./neo4j-backup -from ha://10.106.4.80:5001,10.106.4.203:5001,10.106.14.164:5001 -to /tmp/neobak2/
from a host outside the 10.106.4.0 network, I get this error:
Could not find backup server in cluster neo4j.ha at 10.106.4.80:5001,10.106.4.203:5001,10.106.14.164:5001, operation timed out.
If I run it from a cluster member it works just fine. Also if I run the backup script with single instead of ha works fine from anywhere.
Below the basic cluster config I'm using:
ha.server_id: 1
ha.initial_hosts:10.106.4.80:5001,10.106.4.203:5001,10.106.14.164:5001
ha.tx_push_factor: 2
I already checked for firewall issues, there aren't any. Neo4j version used is 1.9.5.
The webadmin interface shows the cluster has online backup enabled and listening to the default port.
Any help will be appreciated.
According to RFC 5735 IP Adresses 10.0.0.0/8 are private. So I assume they're not routed from an external host.

Resources