A new Flexera Community experience is coming on November 25th. Click here for more information.
TL;DR: At one point in time, we have had a stable & working infrastructure to support our services. Having said that, it has been there for almost a year now, and we feel that it is the time to give them an opportunity to be refreshed and revisited. We realize a lot of benefits from doing the upgrade on various components we use like Terraform, Helm, and EKS, to name a few. However, once we got the ball rolling, we faced some surprises that turned into valuable learning, including how we manage to keep reusing our persistence (Kinesis Stream) on our newly revamped infrastructure, which allows us to get the best of both worlds, i.e. retaining any existing data that was previously persisted, and also enjoying t.he benefit that we could reap from upgrading to a newer version.
(8 mins read)
As part of the Modernization program, our team is tasked to establish a data-ingestion service. Everything has been working smoothly and things are flowing through nicely in both our test & production environment. Having said that, the infrastructure that we built is almost a year old, and we would like to take this opportunity to refresh and upgrade to the stable most-recent version.
WHY
The first question here, if it ain't broke, why do we need to upgrade? This can be a very challenging question to ask. But, first and foremost, we understand a lot of benefits that we could gain from upgrading our existing infrastructure, in term of performance, efficiency, maintainability, and other areas, For instance, the new Terraform v0.12 will benefit us from its improved error message, first-class expression, rich value types, and any others. Upgrading to Helm3 will reward us with a cleaner-simpler Tiller-free environment which increases the security of the cluster, distributed repositories and helm hub, improved Helm test, JSON schema validation, and better command-line syntax. Last but not least, it is good security practice to stay up to date.
WHAT WAS UPGRADED
Initially, we're looking at just doing the upgrade to Terraform 0.12 and Helm 3. However, during the process, as we have already performed the surgery on our infrastructure, we think it might be best to take the chance to also upgrade other components as per below:
Component
|
Old Version
|
New Version
|
---|---|---|
PROMETHEUS OPERATOR | 5.0.3 (chart) | 8.12.10 (chart) |
PROMETHEUS ADAPTER |
1.2.0 (chart) v0.5.0 (image-tag) |
2.2.0 (chart) v0.6.0 (image-tag) |
PROMETHEUS PUSHGATEWAY |
0.4.0 (chart) v1.1.2 (image-tag) |
1.4.0 (chart) v1.2.0 (image-tag) |
AWS ALB INGRESS |
0.1.10 (chart) v1.1.2 (image-tag) |
0.1.14 (chart) v1.1.6 (image-tag) |
EXTERNAL DNS | 2.0.2 (chart) | 2.20.4 (chart) |
METRICS SERVER | 2.8.2 (chart) | 2.10.2 (chart) |
VAULT SECRET WEBHOOK | 0.4.3 (chart) | 1.0.1 (chart) |
SURPRISES & LESSON LEARNT
String interpolation syntax has changed for better and clearer.
v0.11
|
v0.12
|
---|---|
|
|
map syntax changes and becomes cleaner
v0.11
|
v0.12
|
---|---|
|
|
The outputted json changes slightly, hence when using JQ to process this output, it needs a bit of modification:
v0.11
|
v0.12
|
---|---|
|
|
if (kubectl get services $(DIFF_SERVICE_RELEASE_NAME)-rethinkdb-proxy -o jsonpath="{.spec.clusterIP}") > /dev/null; then \
DIFF_RETHINKDB_PROXY_CLUSTERIP=$$(kubectl get services $(DIFF_SERVICE_RELEASE_NAME)-rethinkdb-proxy -o jsonpath="{.spec.clusterIP}"); \
else \
DIFF_RETHINKDB_PROXY_CLUSTERIP=""; \
fi && \
helm upgrade $(DIFF_SERVICE_RELEASE_NAME) $(DIFF_SERVICE_CHART_PATH) \
--set crs.rethinkdb.proxy.service.clusterIP=$${DIFF_RETHINKDB_PROXY_CLUSTERIP} \
. . .
. . .
migrate-helm2-to-helm3:
helm3 2to3 move config && \
helmCharts=$$(helm list -aq --output json | jq '') && \
for helmChart in $${helmCharts}; do \
helmChart=$${helmChart%\"};helmChart=$${helmChart#\"};helmChart=$${helmChart%\",}; \
If [ "$$helmChart" != "[" ] && [ "$$helmChart" != "]" ]; then \
helm3 2to3 convert $$helmChart; \
fi; \
done && \
helm3 2to3 cleanup
if kubectl get namespace monitoring > /dev/null ; \
then echo "Namespace 'monitoring' is exist"; \
else \
kubectl create namespace monitoring; \
echo "Namespace 'monitoring' has been created"; \
fi
helm repo remove local
helm repo add stable https://kubernetes-charts.storage.googleapis.com
The timeout need to be sufficed with time-measurements
Helm2
|
Helm3
|
---|---|
|
|
if kubectl get namespace vswh > /dev/null ; \
then echo "Namespace 'vswh' is exist"; \
else \
kubectl create namespace vswh; \
kubectl label ns vswh name="vswh"; \
echo "Namespace 'vswh' has been created and label is set"; \
fi
prometheusOperator:
enabled: true
createCustomResource: false
FURTHER THOUGHT