From 47cca17e51e6046df7865e7c0846eae58362a6f7 Mon Sep 17 00:00:00 2001 From: Tom Alexander Date: Tue, 18 Mar 2025 19:06:26 -0400 Subject: [PATCH] Clean up the doc. --- LICENSE | 10 +++ README.md | 81 +++++++++++------------ terraform/modules/k8s_workload/ip_masq.tf | 3 +- terraform/user_machine.tf | 1 + 4 files changed, 50 insertions(+), 45 deletions(-) create mode 100644 LICENSE diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..5aec258 --- /dev/null +++ b/LICENSE @@ -0,0 +1,10 @@ +Permission to use, copy, modify, and/or distribute this software for any +purpose with or without fee is hereby granted. + +THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH +REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY +AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, +INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM +LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR +OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR +PERFORMANCE OF THIS SOFTWARE. diff --git a/README.md b/README.md index 46e9686..f58b85c 100644 --- a/README.md +++ b/README.md @@ -1,19 +1,4 @@ -REF https://cloud.google.com/kubernetes-engine/docs/concepts/alias-ips#cluster_sizing -REF Services only available within the cluster: https://cloud.google.com/kubernetes-engine/docs/how-to/alias-ips -REF https://wdenniss.com/gke-network-planning -REF https://cloud.google.com/blog/products/containers-kubernetes/best-practices-for-kubernetes-pod-ip-allocation-in-gke - -REF SHARE IP: https://cloud.google.com/kubernetes-engine/docs/how-to/internal-load-balancing#terraform -REF GATEWAY: https://github.com/GoogleCloudPlatform/gke-networking-recipes/tree/main/gateway/single-cluster/regional-l7-ilb -REF node NAT: https://cloud.google.com/kubernetes-engine/docs/how-to/ip-masquerade-agent - -REF "GKE networking model doesn't allow IP addresses to be reused across the network. When you migrate to GKE, you must plan your IP address allocation to Reduce internal IP address usage in GKE." : https://cloud.google.com/kubernetes-engine/docs/concepts/network-overview - -REF "Combining multiple Ingress resources into a single Google Cloud load balancer is not supported." : https://cloud.google.com/kubernetes-engine/docs/concepts/ingress - -REF "At minimum, the nonMasqueradeCIDRs property should include the node and Pod IP address ranges of your cluster." : https://cloud.google.com/kubernetes-engine/docs/how-to/ip-masquerade-agent - -TOOD: replace tf with terraform +REF https://cloud.google.com/kubernetes-engine/docs/concepts/alias-ips#cluster_sizing : In GKE Autopilot clusters running version 1.27 and later, and GKE Standard clusters running version 1.29 and later, GKE assigns IP addresses for GKE Services from a GKE-managed range: 34.118.224.0/20 by default. This eliminates the need for you to specify your own IP address range for Services. GKE IP Address Usage Demo ========================= @@ -22,9 +7,9 @@ This repo contains a terraform configuration that demonstrates efficient use of TL;DR ----- -- Service IP addresses are not accessible outside a cluster (TODO REF) +- [Service IP addresses are not accessible outside a cluster](https://cloud.google.com/kubernetes-engine/docs/how-to/alias-ips#restrictions) - Pod IP addresses can be contained to the cluster by configuring SNAT to the node IP addresses. -- Therefore, we can use (TODO ranges) for pods and services (TODO REF) +- [Therefore, we can use slices of `240.0.0.0/4` for pods and slices of `100.64.0.0/10` for services](https://cloud.google.com/blog/products/containers-kubernetes/best-practices-for-kubernetes-pod-ip-allocation-in-gke) - [This is recommended by Google](https://cloud.google.com/blog/products/containers-kubernetes/best-practices-for-kubernetes-pod-ip-allocation-in-gke) What is spun up @@ -69,12 +54,12 @@ REF: https://cloud.google.com/kubernetes-engine/docs/concepts/alias-ips#cluster_ REF: https://cloud.google.com/vpc/docs/subnets#valid-ranges -| Purpose | CIDR | Notes | -|-------------------|---------------|---------------------------------------------------------------------------------------------| -| Node IP range | 10.10.10.0/26 | 1 address per node, 1 address per gateway, 1 address per cluster (cluster private endpoint) | -| Service IP range | | | -| Pod IP range | | | -| Envoy Proxy range | | This is used by the GKE ingress controller. Consumes a `/24` per network | +| Purpose | CIDR | Notes | +|-------------------|----------------|---------------------------------------------------------------------------------------------| +| Node IP range | 10.10.10.0/26 | 1 address per node, 1 address per gateway, 1 address per cluster (cluster private endpoint) | +| Service IP range | 100.64.0.0/19 | | +| Pod IP range | 240.10.0.0/17 | | +| Envoy Proxy range | 100.64.96.0/24 | This is used by the GKE ingress controller. Consumes a `/24` per network | What consumes RFC-1918 IP addresses @@ -103,15 +88,15 @@ gcloud auth application-default login Then go into the `terraform` folder and apply the configuration. We need to apply the config in two phases via the `cluster_exists` variable because the kubernetes terraform provider does not have native support for the Gateway API and the `kubernetes_manifest` terraform resource [has a shortcoming that requires the cluster exists at plan time](https://github.com/hashicorp/terraform-provider-kubernetes/issues/1775). ``` -tf apply -var dns_root="k8sdemo.mydomain.example." -var quota_email="MrManager@mydomain.example" -var quota_justification="Explain why you need quotas increased here." -var cluster_exists=false -tf apply -var dns_root="k8sdemo.mydomain.example." -var quota_email="MrManager@mydomain.example" -var quota_justification="Explain why you need quotas increased here." -var cluster_exists=true +terraform apply -var dns_root="k8sdemo.mydomain.example." -var quota_email="MrManager@mydomain.example" -var quota_justification="Explain why you need quotas increased here." -var cluster_exists=false +terraform apply -var dns_root="k8sdemo.mydomain.example." -var quota_email="MrManager@mydomain.example" -var quota_justification="Explain why you need quotas increased here." -var cluster_exists=true ``` Please note that this will exceed the default quotas on new Google Cloud projects. The terraform configuration will automatically put in requests for quota increases but they can take multiple days to be approved or denied. You should be able to fit 3 clusters in the default quota until then. -Please note that the kubernetes cluster will take a couple extra minutes to get fully set up and running after the `tf apply` command has finished. During this time, the cluster is getting IP addresses assigned to `Gateway` objects and updating DNS records via `ExternalDNS`. +Please note that the kubernetes cluster will take a couple extra minutes to get fully set up and running after the `terraform apply` command has finished. During this time, the cluster is getting IP addresses assigned to `Gateway` objects and updating DNS records via `ExternalDNS`. -This will spin up the kubernetes clusters and output some helpful information. One such piece of information is the nameservers for Google Cloud DNS. We need to point our (sub)domain at those name servers. If you want to get the list of nameservers again without having to wait for `tf apply`, you can run `tf output dns_name_servers`. +This will spin up the kubernetes clusters and output some helpful information. One such piece of information is the nameservers for Google Cloud DNS. We need to point our (sub)domain at those name servers. If you want to get the list of nameservers again without having to wait for `terraform apply`, you can run `terraform output dns_name_servers`. Personally, I run [PowerDNS](https://github.com/PowerDNS/pdns), so as an example, I would first clear the old `NS` records from previous runs from `k8sdemo.mydomain.example` (if you are setting this up for the first time you can skip this step): @@ -131,7 +116,7 @@ pdnsutil add-record mydomain.example k8sdemo NS 600 ns-cloud-a4.googledomains.co Give some time for DNS caches to expire and then you should be able to access `service.cluster.k8sdemo.mydomain.example` by connecting the to `user-machine` over `ssh` and using `curl` to hit the internal ingresses. First, get the `gcloud` command to `ssh` into the `user-machine`: ``` -tf output user_machine_ssh_command +terraform output user_machine_ssh_command ``` Then `ssh` into the machine (your command will be different): @@ -146,15 +131,6 @@ and hit the various ingresses on the various clusters: curl service1.cluster1.k8sdemo.mydomain.example ``` -Clean Up -======== -Just like we did a 2-stage apply by toggling the `cluster_exists` variable, we will need to do a 2-stage destroy. First we tear down any kubernetes resources by running *apply* with the `cluster_exists` variable set to `false`. Then we can destroy the entire project. - -``` -tf apply -var dns_root="k8sdemo.mydomain.example." -var quota_email="MrManager@mydomain.example" -var quota_justification="Explain why you need quotas increased here." -var cluster_exists=false -tf destroy -var dns_root="k8sdemo.mydomain.example." -var quota_email="MrManager@mydomain.example" -var quota_justification="Explain why you need quotas increased here." -var cluster_exists=false -``` - Explanation =========== @@ -211,7 +187,7 @@ KUBE-NODEPORTS all -- anywhere anywhere /* kubernetes This matches packets destinated for each service IP address and sends them to their respective chains. For `service1` it is matching packets destined for `100.64.22.23`. That happens to be our service IP address for `service1`: ``` -$ kubectl --kubeconfig /bridge/git/kubernetes_ip_demo/output/kubeconfig/cluster1.yaml get svc service1 +$ kubectl --kubeconfig /path/to/kubernetes_ip_demo/output/kubeconfig/cluster1.yaml get svc service1 NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service1 ClusterIP 100.64.22.23 80/TCP 34m ``` @@ -239,7 +215,7 @@ DNAT tcp -- anywhere anywhere /* default/service This corresponds to one of our pod IP addresses: ``` -$ kubectl --kubeconfig /bridge/git/kubernetes_ip_demo/output/kubeconfig/cluster1.yaml get pods -l 'app=hello-app-1' -o wide +$ kubectl --kubeconfig /path/to/kubernetes_ip_demo/output/kubeconfig/cluster1.yaml get pods -l 'app=hello-app-1' -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES deployment1-69bddf99b6-gjl94 1/1 Running 0 55m 240.10.0.24 gke-cluster1-cluster1-pool-9d7804fe-fl8w deployment1-69bddf99b6-vrtc7 1/1 Running 0 55m 240.10.0.25 gke-cluster1-cluster1-pool-9d7804fe-fl8w @@ -267,7 +243,7 @@ KUBE-SVC-MVJGFDRMC5WIL772 tcp -- anywhere 10.107.252.157 /* KUBE-NODEPORTS all -- anywhere anywhere /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL ``` -But regardless, the end result is the same: Service IP addresses aren't real, so they can be anything. Despite their fictional nature, Google uses a "flat" architecture that does not allow re-using IP addresses across multiple clusters so [Google recommends using slices of `100.64.0.0/10` for service IP ranges](https://cloud.google.com/blog/products/containers-kubernetes/best-practices-for-kubernetes-pod-ip-allocation-in-gke). +But regardless, the end result is the same: Service IP addresses aren't real, so they can be anything. Despite their fictional nature, Google [uses a "flat" architecture that does not allow re-using IP addresses](https://cloud.google.com/kubernetes-engine/docs/concepts/network-overview) across multiple clusters so [Google recommends using slices of `100.64.0.0/10` for service IP ranges](https://cloud.google.com/blog/products/containers-kubernetes/best-practices-for-kubernetes-pod-ip-allocation-in-gke). Pod IP Addresses ---------------- @@ -284,7 +260,7 @@ python3 -m http.server 8080 We can then spin up a pod in our cluster: ``` -kubectl --kubeconfig /bridge/git/kubernetes_ip_demo/output/kubeconfig/cluster1.yaml run --rm -i -t --image alpine:3.21 "testpod-$(uuidgen | cut -d '-' -f 2)" -- /bin/sh +kubectl --kubeconfig /path/to/kubernetes_ip_demo/output/kubeconfig/cluster1.yaml run --rm -i -t --image alpine:3.21 "testpod-$(uuidgen | cut -d '-' -f 2)" -- /bin/sh ``` and hit the web server on the user machine: @@ -303,7 +279,7 @@ But that doesn't mean that we need to use the valuable RFC-1918 IP address space To demonstrate, we can apply the terraform config again but with the `enable_snat=true` variable set: ``` -tf apply -var dns_root="k8sdemo.mydomain.example." -var quota_email="MrManager@mydomain.example" -var quota_justification="Explain why you need quotas increased here." -var cluster_exists=true -var enable_snat=true +terraform apply -var dns_root="k8sdemo.mydomain.example." -var quota_email="MrManager@mydomain.example" -var quota_justification="Explain why you need quotas increased here." -var cluster_exists=true -var enable_snat=true ``` Then in our kubernetes pod, we can run the `curl` again: @@ -318,4 +294,21 @@ which now shows the node IP address in the `user-machine` console: 10.10.10.14 - - [18/Mar/2025 22:43:25] "GET / HTTP/1.1" 200 - ``` -So this means that just like Service IP addresses, we can make the pod IP addresses anything. [Google recommends using slices of `` for pod IP ranges](https://cloud.google.com/blog/products/containers-kubernetes/best-practices-for-kubernetes-pod-ip-allocation-in-gke), and then enabling SNAT if you need to talk to networks outside of Google Cloud. +So this means that just like Service IP addresses, we can make the pod IP addresses anything. [Google recommends using slices of `240.0.0.0/4` for pod IP ranges, and then enabling SNAT if you need to talk to networks outside of Google Cloud](https://cloud.google.com/blog/products/containers-kubernetes/best-practices-for-kubernetes-pod-ip-allocation-in-gke). + + +Question and Answer +=================== + +## Why Gateway instead of Ingress? + +[GKE assigns a separate IP address to each `Ingress`](https://cloud.google.com/kubernetes-engine/docs/concepts/ingress#limitations), but we can have a single `Gateway` with an IP address and then any quantity of `HTTPRoute`. This is a design choice for GKE, and not a limitation of kubernetes. + +Clean Up +======== +Just like we did a 2-stage apply by toggling the `cluster_exists` variable, we will need to do a 2-stage destroy. First we tear down any kubernetes resources by running *apply* with the `cluster_exists` variable set to `false`. Then we can destroy the entire project. + +``` +terraform apply -var dns_root="k8sdemo.mydomain.example." -var quota_email="MrManager@mydomain.example" -var quota_justification="Explain why you need quotas increased here." -var cluster_exists=false +terraform destroy -var dns_root="k8sdemo.mydomain.example." -var quota_email="MrManager@mydomain.example" -var quota_justification="Explain why you need quotas increased here." -var cluster_exists=false +``` diff --git a/terraform/modules/k8s_workload/ip_masq.tf b/terraform/modules/k8s_workload/ip_masq.tf index e4061d8..230c5a9 100644 --- a/terraform/modules/k8s_workload/ip_masq.tf +++ b/terraform/modules/k8s_workload/ip_masq.tf @@ -7,7 +7,8 @@ resource "kubernetes_config_map" "ip_masq_agent" { } data = { - config = "nonMasqueradeCIDRs:\n - 100.64.0.0/19\n - 240.10.0.0/17\nmasqLinkLocal: false\nresyncInterval: 60s\n" + # nonMasqueradeCIDRs must include pod and node IP address ranges : https://cloud.google.com/kubernetes-engine/docs/how-to/ip-masquerade-agent + config = "nonMasqueradeCIDRs:\n - 10.10.10.0/26\n - 240.10.0.0/17\nmasqLinkLocal: false\nresyncInterval: 60s\n" } depends_on = [var.node_pool] diff --git a/terraform/user_machine.tf b/terraform/user_machine.tf index c307ca3..d8111e9 100644 --- a/terraform/user_machine.tf +++ b/terraform/user_machine.tf @@ -78,6 +78,7 @@ resource "google_dns_record_set" "user_machine" { } resource "google_compute_firewall" "allow_python_http" { + # This is for demoing SNAT, not needed for production. project = google_project.project.project_id name = "allow-python-http" network = google_compute_network.default.id