REF https://cloud.google.com/kubernetes-engine/docs/concepts/alias-ips#cluster_sizing : In GKE Autopilot clusters running version 1.27 and later, and GKE Standard clusters running version 1.29 and later, GKE assigns IP addresses for GKE Services from a GKE-managed range: 34.118.224.0/20 by default. This eliminates the need for you to specify your own IP address range for Services.
GKE IP Address Usage Demo
This repo contains a terraform configuration that demonstrates efficient use of RFC-1918 IP addresses with GKE kubernetes clusters. IT IS NOT meant to be an example of best practices (for example, in real use I would use flux to apply kubernetes manifests instead of terraform, I would use Horizontal Pod Autoscaling, and I would use node pool auto scaling) but rather it is a contrived example of nearly minimal RFC-1918 IP address consumption.
TL;DR
- Service IP addresses are not accessible outside a cluster
- Pod IP addresses can be contained to the cluster by configuring SNAT to the node IP addresses.
- Therefore, we can use slices of
240.0.0.0/4
for pods and slices of100.64.0.0/10
for services - This is recommended by Google
What is spun up
The terraform configuration spins up:
- A Compute Engine virtual machine for you to use to test
gce-internal
ingresses. - Cloud DNS for your (sub)domain
- ExternalDNS for automatically creating DNS records for each ingress
- 14 clusters
And on each cluster it spins up:
- 2 nodes
- 1 gateway
- 12 HTTP Routes
- 12 services
- 24 pods
For a grand total of:
- 28 nodes (and 1 user machine)
- 14 gateways
- 168 HTTP Routes (and 168 subdomains)
- 168 services
- 336 pods
All of this while only using 10.10.10.0/26
from the RFC-1918 space (64 addresses).
What do I need to provide
To use the terraform configuration, you will need:
- An already existing Google Cloud project (There is no need to set anything up in the project, terraform will handle all of that, but this configuration does not create an project for you).
gcloud
authenticated with an account that has access to that project.- A (sub)domain that can have its nameservers pointed at Google Cloud DNS.
IP Address Allocations
REF: https://cloud.google.com/kubernetes-engine/docs/concepts/alias-ips#cluster_sizing_secondary_range_pods REF: https://cloud.google.com/vpc/docs/subnets#valid-ranges
Purpose | CIDR | Notes |
---|---|---|
Node IP range | 10.10.10.0/26 | 1 address per node, 1 address per gateway, 1 address per cluster (cluster private endpoint) |
Service IP range | 100.64.0.0/19 | |
Pod IP range | 240.10.0.0/17 | |
Envoy Proxy range | 100.64.96.0/24 | This is used by the GKE ingress controller. Consumes a /24 per network |
What consumes RFC-1918 IP addresses
Thing | Quantity Consumed | Notes |
---|---|---|
Unusable addresses | 4 addresses | The first two and last two addresses of a primary IP range are unusable. |
Each Node | 1 address | This example uses 2 nodes per cluster to make it numerically distinct from the quantity of clusters |
The user-machine virtual machine | 1 address | This is not needed in a production deploy. |
Each Gateway | 1 address | This can be 1 per cluster. |
The control plane private endpoint | 1 address | 1 per cluster. |
With our 64 addresses from 10.10.10.0/26
, we lose 4 as unusable addresses, we use another for the user machine, and then we have 4 addresses per cluster which means we can fit 14 clusters with 3 IP addresses left over.
Usage
To apply the terraform, authenticate with the gcloud CLI tool:
gcloud auth application-default login
Then go into the terraform
folder and apply the configuration. We need to apply the config in two phases via the cluster_exists
variable because the kubernetes terraform provider does not have native support for the Gateway API and the kubernetes_manifest
terraform resource has a shortcoming that requires the cluster exists at plan time.
terraform apply -var dns_root="k8sdemo.mydomain.example" -var quota_email="MrManager@mydomain.example" -var quota_justification="Explain why you need quotas increased here." -var cluster_exists=false
terraform apply -var dns_root="k8sdemo.mydomain.example" -var quota_email="MrManager@mydomain.example" -var quota_justification="Explain why you need quotas increased here." -var cluster_exists=true
Please note that this will exceed the default quotas on new Google Cloud projects. The terraform configuration will automatically put in requests for quota increases but they can take multiple days to be approved or denied. You should be able to fit 3 clusters in the default quota until then.
Please note that the kubernetes cluster will take a couple extra minutes to get fully set up and running after the terraform apply
command has finished. During this time, the cluster is getting IP addresses assigned to Gateway
objects and updating DNS records via ExternalDNS
.
This will spin up the kubernetes clusters and output some helpful information. One such piece of information is the nameservers for Google Cloud DNS. We need to point our (sub)domain at those name servers. If you want to get the list of nameservers again without having to wait for terraform apply
, you can run terraform output dns_name_servers
.
Personally, I run PowerDNS, so as an example, I would first clear the old NS
records from previous runs from k8sdemo.mydomain.example
(if you are setting this up for the first time you can skip this step):
pdnsutil delete-rrset mydomain.example k8sdemo NS
And then I'd add the new records (naturally you should use the domains output from terraform
, they will change each time you add the domain to Cloud DNS:
pdnsutil add-record mydomain.example k8sdemo NS 600 ns-cloud-a1.googledomains.com.
pdnsutil add-record mydomain.example k8sdemo NS 600 ns-cloud-a2.googledomains.com.
pdnsutil add-record mydomain.example k8sdemo NS 600 ns-cloud-a3.googledomains.com.
pdnsutil add-record mydomain.example k8sdemo NS 600 ns-cloud-a4.googledomains.com.
Give some time for DNS caches to expire and then you should be able to access service<num>.cluster<num>.k8sdemo.mydomain.example
by connecting the to user-machine
over ssh
and using curl
to hit the internal ingresses. First, get the gcloud
command to ssh
into the user-machine
:
terraform output user_machine_ssh_command
Then ssh
into the machine (your command will be different):
gcloud compute ssh --zone 'us-central1-c' 'user-machine' --project 'k8s-ip-demo-1aa0405a'
and hit the various ingresses on the various clusters:
curl service1.cluster1.k8sdemo.mydomain.example
Explanation
To conserve the RFC-1918 address space, we need to take advantage of two facts:
- Service IP addresses aren't real
- Pod IP addresses do not need to leave the cluster
Service IP Addresses
Service IP addresses are a fiction created by kubernetes. Service IP addresses are not routable from outside the cluster and packets to service IP addresses are never written to the wire. When a pod sends a packet to a service IP address, it is intercepted by iptables which performs DNAT to either the pod or the node's IP address (depending on cluster type). We can see this on our GKE cluster by connecting to the compute engine instance for a node over ssh
and inspecting its iptables rules.
gcloud compute ssh --zone 'us-central1-f' 'gke-cluster1-cluster1-pool-9d7804fe-fl8w' --project 'k8s-ip-demo-90bdaee2'
First, we look at the PREROUTING
chain:
$ sudo /sbin/iptables --table nat --list PREROUTING
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
KUBE-SERVICES all -- anywhere anywhere /* kubernetes service portals */
DNAT tcp -- anywhere metadata.google.internal tcp dpt:http-alt /* metadata-concealment: bridge traffic to metadata server goes to metadata proxy */ to:169.254.169.252:987
DNAT tcp -- anywhere metadata.google.internal tcp dpt:http /* metadata-concealment: bridge traffic to metadata server goes to metadata proxy */ to:169.254.169.252:988
That is sending all our traffic to the KUBE-SERVICES
chain:
$ sudo /sbin/iptables --table nat --list KUBE-SERVICES
Chain KUBE-SERVICES (2 references)
target prot opt source destination
KUBE-SVC-XBBXYMVKK37OV7LG tcp -- anywhere 100.64.28.70 /* gmp-system/gmp-operator:webhook cluster IP */ tcp dpt:https
KUBE-SVC-GQKLSXF4KTGNIMSQ tcp -- anywhere 100.64.28.107 /* default/service11 cluster IP */ tcp dpt:http
KUBE-SVC-AI5DROXYLCYX27ZS tcp -- anywhere 100.64.11.22 /* default/service5 cluster IP */ tcp dpt:http
KUBE-SVC-F4AADAVBSY5MPKOB tcp -- anywhere 100.64.12.233 /* default/service6 cluster IP */ tcp dpt:http
KUBE-SVC-NPX46M4PTMTKRN6Y tcp -- anywhere 100.64.0.1 /* default/kubernetes:https cluster IP */ tcp dpt:https
KUBE-SVC-XP4WJ6VSLGWALMW5 tcp -- anywhere 100.64.25.226 /* kube-system/default-http-backend:http cluster IP */ tcp dpt:http
KUBE-SVC-TCOU7JCQXEZGVUNU udp -- anywhere 100.64.0.10 /* kube-system/kube-dns:dns cluster IP */ udp dpt:domain
KUBE-SVC-QMWWTXBG7KFJQKLO tcp -- anywhere 100.64.7.174 /* kube-system/metrics-server cluster IP */ tcp dpt:https
KUBE-SVC-3ISFTUHJIYANB2XG tcp -- anywhere 100.64.9.63 /* default/service4 cluster IP */ tcp dpt:http
KUBE-SVC-T467R3VJHOQP3KAJ tcp -- anywhere 100.64.8.240 /* default/service9 cluster IP */ tcp dpt:http
KUBE-SVC-ERIFXISQEP7F7OF4 tcp -- anywhere 100.64.0.10 /* kube-system/kube-dns:dns-tcp cluster IP */ tcp dpt:domain
KUBE-SVC-JOVDIF256A6Q5HDW tcp -- anywhere 100.64.16.250 /* default/service8 cluster IP */ tcp dpt:http
KUBE-SVC-E7SFLZD2Y2FAKTSV tcp -- anywhere 100.64.16.205 /* default/service2 cluster IP */ tcp dpt:http
KUBE-SVC-OA62VCLUSJYXZDQQ tcp -- anywhere 100.64.16.149 /* default/service10 cluster IP */ tcp dpt:http
KUBE-SVC-SAREEPXIBVBCS5LQ tcp -- anywhere 100.64.8.122 /* default/service12 cluster IP */ tcp dpt:http
KUBE-SVC-MVJGFDRMC5WIL772 tcp -- anywhere 100.64.6.210 /* default/service7 cluster IP */ tcp dpt:http
KUBE-SVC-4RM6KDP54NYR4K6S tcp -- anywhere 100.64.22.23 /* default/service1 cluster IP */ tcp dpt:http
KUBE-SVC-Y7ZLLRVMCD5M4HRL tcp -- anywhere 100.64.12.22 /* default/service3 cluster IP */ tcp dpt:http
KUBE-NODEPORTS all -- anywhere anywhere /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL
This matches packets destinated for each service IP address and sends them to their respective chains. For service1
it is matching packets destined for 100.64.22.23
. That happens to be our service IP address for service1
:
$ kubectl --kubeconfig /path/to/kubernetes_ip_demo/output/kubeconfig/cluster1.yaml get svc service1
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service1 ClusterIP 100.64.22.23 <none> 80/TCP 34m
So its matching packets destined for service1
and sending them to KUBE-SVC-4RM6KDP54NYR4K6S
:
$ sudo /sbin/iptables --table nat --list KUBE-SVC-4RM6KDP54NYR4K6S
Chain KUBE-SVC-4RM6KDP54NYR4K6S (1 references)
target prot opt source destination
KUBE-MARK-MASQ tcp -- !240.10.0.0/24 100.64.22.23 /* default/service1 cluster IP */ tcp dpt:http
KUBE-SEP-XCTUYJ3QDWA727EN all -- anywhere anywhere /* default/service1 -> 240.10.0.24:8080 */ statistic mode random probability 0.50000000000
KUBE-SEP-5LQWHS2W6LUXXNGL all -- anywhere anywhere /* default/service1 -> 240.10.0.25:8080 */
This is how kubernetes load balances services: it uses iptables
on the machine opening the connection to randomly distribute connections to the various pods. If we take a look at the chain for the first pod:
$ sudo /sbin/iptables --table nat --list KUBE-SEP-XCTUYJ3QDWA727EN
Chain KUBE-SEP-XCTUYJ3QDWA727EN (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 240.10.0.24 anywhere /* default/service1 */
DNAT tcp -- anywhere anywhere /* default/service1 */ tcp to:240.10.0.24:8080
This corresponds to one of our pod IP addresses:
$ kubectl --kubeconfig /path/to/kubernetes_ip_demo/output/kubeconfig/cluster1.yaml get pods -l 'app=hello-app-1' -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deployment1-69bddf99b6-gjl94 1/1 Running 0 55m 240.10.0.24 gke-cluster1-cluster1-pool-9d7804fe-fl8w <none> <none>
deployment1-69bddf99b6-vrtc7 1/1 Running 0 55m 240.10.0.25 gke-cluster1-cluster1-pool-9d7804fe-fl8w <none> <none>
If we launched a routes-based cluster instead of a VPC-native cluster then the ip addresses in KUBE-SERVICES
would be the node IP addresses:
$ sudo /sbin/iptables --table nat --list KUBE-SERVICES
Chain KUBE-SERVICES (2 references)
target prot opt source destination
KUBE-SVC-NPX46M4PTMTKRN6Y tcp -- anywhere 10.107.240.1 /* default/kubernetes:https cluster IP */ tcp dpt:https
KUBE-SVC-Y7ZLLRVMCD5M4HRL tcp -- anywhere 10.107.245.254 /* default/service3 cluster IP */ tcp dpt:http
KUBE-SVC-OA62VCLUSJYXZDQQ tcp -- anywhere 10.107.250.149 /* default/service10 cluster IP */ tcp dpt:http
KUBE-SVC-JOVDIF256A6Q5HDW tcp -- anywhere 10.107.250.156 /* default/service8 cluster IP */ tcp dpt:http
KUBE-SVC-4RM6KDP54NYR4K6S tcp -- anywhere 10.107.250.111 /* default/service1 cluster IP */ tcp dpt:http
KUBE-SVC-3ISFTUHJIYANB2XG tcp -- anywhere 10.107.241.148 /* default/service4 cluster IP */ tcp dpt:http
KUBE-SVC-E7SFLZD2Y2FAKTSV tcp -- anywhere 10.107.255.251 /* default/service2 cluster IP */ tcp dpt:http
KUBE-SVC-T467R3VJHOQP3KAJ tcp -- anywhere 10.107.246.240 /* default/service9 cluster IP */ tcp dpt:http
KUBE-SVC-AI5DROXYLCYX27ZS tcp -- anywhere 10.107.253.168 /* default/service5 cluster IP */ tcp dpt:http
KUBE-SVC-GQKLSXF4KTGNIMSQ tcp -- anywhere 10.107.255.31 /* default/service11 cluster IP */ tcp dpt:http
KUBE-SVC-XP4WJ6VSLGWALMW5 tcp -- anywhere 10.107.252.203 /* kube-system/default-http-backend:http cluster IP */ tcp dpt:http
KUBE-SVC-SAREEPXIBVBCS5LQ tcp -- anywhere 10.107.249.4 /* default/service12 cluster IP */ tcp dpt:http
KUBE-SVC-F4AADAVBSY5MPKOB tcp -- anywhere 10.107.250.177 /* default/service6 cluster IP */ tcp dpt:http
KUBE-SVC-MVJGFDRMC5WIL772 tcp -- anywhere 10.107.252.157 /* default/service7 cluster IP */ tcp dpt:http
KUBE-NODEPORTS all -- anywhere anywhere /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL
But regardless, the end result is the same: Service IP addresses aren't real, so they can be anything. Despite their fictional nature, Google uses a "flat" architecture that does not allow re-using IP addresses across multiple clusters so Google recommends using slices of 100.64.0.0/10
for service IP ranges.
Pod IP Addresses
Pod IP Addresses, unlike Service IP Addresses, are actually real. We can see this by spinning up a basic http server on the user machine:
gcloud compute ssh --zone 'us-central1-c' 'user-machine' --project 'k8s-ip-demo-1aa0405a'
python3 -m http.server 8080
We can then spin up a pod in our cluster:
kubectl --kubeconfig /path/to/kubernetes_ip_demo/output/kubeconfig/cluster1.yaml run --rm -i -t --image alpine:3.21 "testpod-$(uuidgen | cut -d '-' -f 2)" -- /bin/sh
and hit the web server on the user machine:
# apk --no-cache add curl
# curl http://usermachine.k8sdemo.mydomain.example:8080
We will see the output on the user-machine
ssh
session:
240.10.1.35 - - [18/Mar/2025 01:00:27] "GET / HTTP/1.1" 200 -
But that doesn't mean that we need to use the valuable RFC-1918 IP address space for them. Instead, we can configure our cluster to perform SNAT to the node's IP address using kubernetes' ip-masq-agent. This frees us up to use other reserved-but-not-universally-supported IP address ranges like grabbing slices of 240.0.0.0/4
.
To demonstrate, we can apply the terraform config again but with the enable_snat=true
variable set:
terraform apply -var dns_root="k8sdemo.mydomain.example" -var quota_email="MrManager@mydomain.example" -var quota_justification="Explain why you need quotas increased here." -var cluster_exists=true -var enable_snat=true
Then in our kubernetes pod, we can run the curl
again:
# curl http://usermachine.k8sdemo.mydomain.example:8080
which now shows the node IP address in the user-machine
console:
10.10.10.14 - - [18/Mar/2025 22:43:25] "GET / HTTP/1.1" 200 -
So this means that just like Service IP addresses, we can make the pod IP addresses anything. Google recommends using slices of 240.0.0.0/4
for pod IP ranges, and then enabling SNAT if you need to talk to networks outside of Google Cloud.
Question and Answer
Why Gateway instead of Ingress?
GKE assigns a separate IP address to each Ingress
, but we can have a single Gateway
with an IP address and then any quantity of HTTPRoute
. This is a design choice for GKE, and not a limitation of kubernetes.
If you need to use Ingress
, we can achieve the same efficiency for IP addresses by using the nginx ingress controller. This can be enabled by passing -var ingress_type=nginx
. If you need to use the built-in ingress controller instead of nginx you can set -var ingress_type=gce
but then each Ingress
will cost 1 IP address.
Clean Up
Just like we did a 2-stage apply by toggling the cluster_exists
variable, we will need to do a 2-stage destroy. First we tear down any kubernetes resources by running apply with the cluster_exists
variable set to false
. Then we can destroy the entire project.
terraform apply -var dns_root="k8sdemo.mydomain.example" -var quota_email="MrManager@mydomain.example" -var quota_justification="Explain why you need quotas increased here." -var cluster_exists=false
terraform destroy -var dns_root="k8sdemo.mydomain.example" -var quota_email="MrManager@mydomain.example" -var quota_justification="Explain why you need quotas increased here." -var cluster_exists=false