Can't add Kubernetes backend pool to outbound rule in Azure Load Balancer created automatically for AKS cluster - azure-aks

I have created an AKS cluster (Kubernetes version 1.24.6) using Pulumi Azure Classic KubernetesCluster provider what resulted in creating AKS cluster with the following network profile:
{
"networkProfile": {
"dnsServiceIp": "10.0.0.10",
"dockerBridgeCidr": "172.17.0.1/16",
"ipFamilies": [
"IPv4"
],
"loadBalancerProfile": {
"allocatedOutboundPorts": 0,
"effectiveOutboundIPs": [{
"id": "/subscriptions/<subscription_id>/resourceGroups/<rg-id>/providers/Microsoft.Network/publicIPAddresses/<ip-id>",
"resourceGroup": "MC_rg-gw-aks-test00e1a208_aks-gw-konsolidator-test_westeurope"
}
],
"enableMultipleStandardLoadBalancers": null,
"idleTimeoutInMinutes": 25,
"managedOutboundIPs": {
"count": 1,
"countIpv6": null
},
"outboundIPs": null,
"outboundIpPrefixes": null
},
"loadBalancerSku": "Standard",
"natGatewayProfile": null,
"networkMode": null,
"networkPlugin": "azure",
"networkPolicy": "calico",
"outboundType": "loadBalancer",
"podCidr": null,
"podCidrs": null,
"serviceCidr": "10.0.0.0/16",
"serviceCidrs": [
"10.0.0.0/16"
]
}
}
my main goal was to have a static egress IP for the traffic outgoing from the cluster and enforce higher than default idleTimeoutInMinutes. Autogenerated Azure Load Balancer has Standard tier. I'm using ingress-nginx as ingress controller. However, after deploying the cluster and performing some troubleshooting I have noticed that:
Egress IP is different than I would expect (tested by hitting curl -s checkip.dyndns.org from a pod inside the AKS cluster) - it doesn't match any of the IPs specified in Frontend IP configuration (in particular it doesn't match the one mentioned in networkProfile.effectiveOutboundIPs). In general the outbound traffic is working, it isn't blocked, only long-running TCP traffic without keepalives is killed silently (without TCP RST) - I've checked it for requests lasting 5 minutes. I'm not using any other firewalls/network components outside of the AKS cluster apart from the Load Balancer.
In Azure Load Balancer there is an outbound rule "aksOutboundRule" automatically created after creation of AKS cluster that uses one of the Frontend IP addresses defined in Frontend IP Configurations (that is the expected egress IP), but it has an empty (with 0 instances) backend pool assigned to it. Moreover, if I try to add "kubernetes" backend pool there manually, it is greyed out:
enter image description here
a. When creating a brand new outbound rule it is also not possible to configure "kubernetes" backend pool for it as it is also greyed out.
b. It is also not possible to add any IP configurations for the nodes from the kubernetes cluster to any other backend pool (new one or existing one), as the saving of the backend pool fails with bad request and the following details:
{
"status": "Failed",
"error": {
"code": "MinimumApiVersionNotSpecifiedToSetTheProperty",
"message": "Specified api-version 2020-12-01 does not meet the minimum required api-version 2022-03-01 to set this property skuOnPublicIPAddressConfiguration.",
"details": []
}
}
Does anyone have any clue what am I missing? What can be the cause of the issue I'm facing?
I've looked into MS documentation, especially into:
https://learn.microsoft.com/en-us/azure/load-balancer/outbound-rules
https://learn.microsoft.com/en-us/azure/aks/load-balancer-standard
https://learn.microsoft.com/en-us/azure/aks/limit-egress-traffic
and I haven't found any explanation why kuberenetes backend pool may be greyed out or why mentioned errors can be thrown.

Related

Azure Application gateway fails with terminal provisioning state "Failed"

I am deploying azure application gateway (internal) with V2, it succeeded couple of times in other subscriptions (Environments), however, it is failing with strange error and without much details about the error.
deployment fails after 30 mins of applying/creating
there is a UDR but which is for different purpose and not blocking or restricting the default internet route
The deployment is using terraform and everything worked well in other instances deployment
I tried to reproduce the same in my environment and got the same error like below.
"details": [
{
"code": "Conflict",
"message": "{\r\n \"status\": \"Failed\",\r\n \"error\": {\r\n \"code\": \"ResourceDeploymentFailure\",\r\n \"message\": \"![The resource operation completed with terminal provisioning state 'Failed](https://i.imgur.com/eipLRgp.png)'.\"\r\n }\r\n}"
}
]
This issue generally occurs, when an unsupported route typically a 0.0.0.0/0 route to a firewall being advertised via BGP is affecting the Application Gateway Subnet.
Try to deploy with a default vnet and manage subnet configuration like below:
When I tried to deploy, Azure Application gateway deployment succeeded successfully like below:
If your deployment fails after 30 mins you can make use of diagnose logs to check error messages in any logs pertaining to the unsuccessful procedure.
Once you determine the cause of the issue diagnosis will guide you to take the necessary steps and fix the issue. Resolving network issues, depending on the cause of the failure.
Found the issue and resolution
Raised a Microsoft case to see the logs of the APPGW at the platform level
Microsoft verified the logs and identified that AppGW is not able to communicate with the Keyvault to read the ssl certificate as we are using Keyvault to store ssl cert for TLS encryption
Found out that subnet to subnet communication is blocked and hence AppGW is unable to communicate with KV in another subnet
Resolution:
Allowed subnet to subnet communication where appgw and kv are present
Conclusion:
Microsoft would have enabled better logging information (error details) in the AppGW resource deployment and or resource activity logs

Will external ip be stuck on pending if the pod fails?

I have a nodejs app which connects to external db , the db will refuse the connection until I whitelist my ip or my pod will fail , so is it possible that my external ip for the service will be stuck on pending if the pod fails?
is it possible that my external ip for the service will be stuck on pending if the pod fails?
The Service and Pods are created separately. So if you're creating a LoadBalancer-type Service and your cluster is correctly configured, you should be able to get an externalIP: address for it even if the Pods aren't correctly starting up.
But:
I have a nodejs app which connects to external db , the db will refuse the connection until I whitelist my ip
The Service only accepts inbound connections. In a cloud environment like AWS, the externalIP: frequently is the address of a specific load balancer. Outbound requests to a database won't usually come from this address.
If your cluster is in the same network environment as the database, you probably need to allow every individual worker node in the database configuration. Tools like the cluster autoscaler can cause the node pool to change, so if you can configure the entire CIDR block containing the cluster that's easier. If the cluster is somewhere else and outbound traffic passes through a NAT gateway of some sort, then you need to allow that gateway.

GKE not able to reach MongoDB Atlas

I have an issue with trying to deploy my containerized app to GKE. It is not able to reach my MongoDB Atlas cluster. Running the Docker container locally creates no issues and works perfectly. I am by no means an expert in Docker or Kubernetes, but I am assuming it is something to do with the DNS name resolution.
I have followed this tutorial - Deploying a containerized web application, with an addition of adding an EXTERNAL-IP of the LoadBalancer to my 'Network Access' IP Whitelist in the MongoDB Atlas console and using port mapping 443 -> 8443 since I am using HTTPS.
Only logs that my app is able to produce before failing:
(mongodb): 2020/05/30 15:07:39 logger.go:96: 2020-05-30T15:07:39Z
[error] Failed to connect to mongodb. Check if mongo is running...
(mongodb): 2020/05/30 15:07:39 logger.go:132: 2020-05-30T15:07:39Z
[fatal] server selection error: server selection timeout, current
topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: biomas-
cluster-shard-<removed>.azure.mongodb.net:27017, Type: Unknown,
State: Connected, Average RTT: 0, Last error: connection() :
connection(biomas-cluster-shard-<removed>.azure.mongodb.net:27017[-180]) incomplete read of message
header: EOF }, { Addr: biomas-cluster-shard-<removed>.azure.mongodb.net:27017, Type: Unknown, State: Connected, Average RTT: 0, Last error: connection() : connection(biomas-cluster-shard-<removed>.azure.mongodb.net:27017[-181]) incomplete read of message header: EOF }, { Addr: biomas-cluster-shard-<removed>.azure.mongodb.net:27017, Type: Unknown, State: Connected, Average RTT: 0, Last error: connection() : connection(biomas-cluster-shard-<removed>.azure.mongodb.net:27017[-179]) incomplete read of message header: EOF }, ] }
If there is a simple workaround with to this, that would be preferred since the app is in the development stage still, so I just need a basically working application using the said technologies.
The full workflow:
Android App -> Golang API running on Docker -> MongoDB Atlas
Thanks
Exactly as #Marc point, your traffic got out with EXTERNAL-IP of your worker nodes, not your load balancer.
To find nodes EXTERNAL-IP IPs use:
kubectl get nodes -owide
To be more precise and output only IPs use (taken from kubectl Cheat Sheet):
kubectl get nodes -o jsonpath='{.items[*].status.addresses[?(#.type=="ExternalIP")].address}'
Next whitelist those IPs and you should be good, but keep in mind that after Kubernetes upgrade or cluster scaling the IPs might change, so I recommend using Cloud NAT to always have the same IP for your outgoing traffic.

Deploying jenkins into an existing kubernetes cluster fails dues to "Insufficient oath scopes"

So am trying to deploy Jenkins into my existing k8s cluster, but then am getting this notification (insufficient OAuth scopes). what scope am I missing on my service account?
This is a GCP issue.
Make sure that the user you are using to launch the service has Owner or Compute Admin privileges on your GCP project.
By default Node pool are created with the following scopes which does not include the right scope:
"nodePools": [
{
"name": "default-pool",
"config": {
"oauthScopes": [
"https://www.googleapis.com/auth/compute",
"https://www.googleapis.com/auth/devstorage.read_only",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring",
"https://www.googleapis.com/auth/servicecontrol",
"https://www.googleapis.com/auth/service.management.readonly",
"https://www.googleapis.com/auth/trace.append"
],
To fix the error:
If this is a GKE test Cluster, just recreate it with the right scope. You can create a new node pool with the required scopes and then migrate your workloads over to the new node pool.
If you are using gcloud, add this scope:
--scopes=https://www.googleapis.com/auth/cloud-platform
If you decide to recreate the Node Pool, remember to drain and delete the old Node Pool afterwards. I think the following post will be helpful as it related to your case.

Bluemix Container status reads 'Networking'

I am attempting to setup a container on Bluemix that accepts UDP traffic and forwards it using a TCP connection to Logentries. When running the container locally, I used NetCat to simulate UDP traffic and saw it successfully displayed at the destination of the TCP connection.
However, when I attempt to start a container based off the same image with Bluemix containers service, the container remains stuck in a 'Networking' state and no data is transmitted to the destination. The logs only print a warning about the version listed in the syslog-ng.conf file (same warning when ran locally), and inspection of the container by command cf ic inspect <container-id> returns the following portion about the Networking state:
"Path": "date",
"ResolvConfPath": "/etc/resolv.conf",
"State": {
"ExitCode": 0,
"FinishedAt": "0001-01-01T00:00:00Z",
"Ghost": "",
"Pid": 1,
"Running": true,
"StartedAt": "2015-10-14T19:45:43.000000000Z",
"Status": "Networking"
},
One thing to note is that I had to change the nameserver to 8.8.8.8 (Google's DNS) for necessary domain name resolution, due to the following error:
Error resolving hostname; host='data.logentries.com'
Error initializing message pipeline;
Error resolving hostname; host='data.logentries.com'
Error initializing message pipeline;
You can find the source code of the Docker image I originally adopted at https://github.com/oinopion/syslog-ng-logentries.
So my questions are:
What does the 'Networking' state of a Bluemix container mean?
Why does my container work locally but not on Bluemix?
Docker containers on Bluemix doesn't support incoming UDP traffic routing yet. This feature is already planned for future updates as far as I know.
This is the reason your container works fine locally, but remotely doesn't receive traffic.
'Networking' state means that the networking is being created for your Container so that the public and private IPs for your Container can be accessed and routed to your instance. When a container gets stuck on Networking then it is typically a problem with the infrastructure rather than anything you have done.
Please try to remove your container and recreate it to see if this should solve it. If not then you will need to open a support ticket to Bluemix (via the support link 'Get Help' on the Bluemix UI) to have the IBM Containers team investigate why your Container is having issues.
Is the container able to get a response from the endpoint?
You can just forward UDP traffic to Logentries, there is no need to switch to TCP.
Maybe configure syslog-ng to connect directly to one of the IP's associated to an ingestion node.
Any of the blow should work.
data.logentries.com. 105 IN A 54.217.225.23
data.logentries.com. 105 IN A 54.217.226.18
data.logentries.com. 105 IN A 54.246.89.117
data.logentries.com. 105 IN A 54.217.226.8
data.logentries.com. 105 IN A 79.125.113.75
data.logentries.com. 105 IN A 54.228.220.150

Resources