This article has participated in the activity of “New person creation Ceremony”, and started the road of digging gold creation together.
Defining infrastructure in code helps us easily build and track the infrastructure, but the code itself implies logic and change, and for infrastructure it is more appropriate to define it as data. 译 文 : The rat race from Infrastructure as Code to Infrastructure as Data[1]
Everybody wants IaC
I’ve always been fascinated by the IaC (Infrastructure as Code) idea. Back in 2014, my role was to help the team install new versions of vendor products, not only installing new software, but also configuring new hardware. There was no cloud computing (at least in the organization I worked for), which meant that we had to configure new virtual machines in a very painful manual process, with lots of documentation and approvals. Although we have managed to provide the virtual machines and install the software, there are still many technical and political hurdles.
That’s when I thought, “This is really fun! (uh.. Except maybe politics…) “If you can find a job where you can configure hardware as well as apply your software skills, that’s great.” I feel like I’m on to something… 😉
Fast forward to 2021, now in the cloud age, IT is impossible to imagine how to configure and manage IT infrastructure (networks, virtual machines, Kubernetes clusters, load balancers, etc.) without programming, we have entered the age of infrastructure-as-code (IaC).
Creating infrastructure out of thin air?
Most importantly, we need to understand that infrastructure does not arise in a vacuum when using IaC. Let’s dig a little deeper…
A lot of people know about public clouds. Public Cloud includes Amazon’s AWS[2], Google’s GCP[3], Microsoft’s Azure[4], Oracle’s OCI[5], IBM Cloud[6] and so on. Underlying these public clouds is a vast network of data centres across the globe, with mountains of hardware, all managed by cloud providers’ own frameworks.
While the infrastructure may be thought of as infinite, private clouds actually have only a limited physical infrastructure. There are two types of private clouds: internal and hosted. Hosted in an organization’s own machine room or data center, hosted private clouds are owned and operated by third-party service providers. Hosted private clouds can be single-tenant (dedicated to one company’s data centers) or multi-tenant (hosting multiple company data centers). Frameworks such as OpenStack[7], Apache CloudStack[8], Azure Stack[9], AND IBM Cloud Private[10] can be used to manage Private clouds.
When we provide infrastructure in public or private clouds, we typically configure it through apis that communicate with the data center management framework that provides us with virtual resources (such as virtual disks, virtual networks, and virtual machines) from the available physical resource pool.
Best practices
When configuring cloud infrastructure, we should always follow the following key practices to keep us happy and stressed:
1. Ephemerality
The life cycle of cloud infrastructure should be short. If you need to change the infrastructure, dismantle it and rebuild it. If you’re still making manual changes, that’s all the more reason to do so. Update the configuration file to bring changes under version control so that all changes can be tracked.
Never be afraid to create and break infrastructure multiple times. If the infrastructure is well defined, it will behave the same way every time it is recreated, so why be afraid?
Note: The infrastructure should be immutable, and the life cycle of stateless components should be transient, while stateful components are not. The database and event center cannot be removed or reconfigured. This process should be completely automated, but reassigning a database or an event center like Kafka can be very complex and disruptive.
2. Versioning
Always versioning infrastructure definitions so that resources can be easily recreated as needed.
3. Simplicity
Once the infrastructure automation process becomes too complex and you find yourself trying to square peg in a round hole, please stop immediately. For example, if you find yourself working beyond the limits of an API or infrastructure configuration tool, it’s time to take a backup and rethink your strategy.
Infrastructure as Data
Best practices aside, there is one thing we have yet to address when supplying cloud infrastructure. When we write code for cloud infrastructure, are we really writing code? The answer is no. 😱
Let’s go back to where WE started, when I had to provide infrastructure the old-fashioned way and had to hand over a set of specs or checklists to a team that could meet my requirements. We’re doing the same thing with cloud-based infrastructure — except our specifications are now delivered and implemented by apis, not by one person. You don’t care how the request is implemented, just that it is implemented exactly.
For example, if you are going to create a Kubernetes cluster with five nodes, the workload on the cluster is memory intensive, and you want the nodes to have high RAM and medium computing power. When you tell Google to start a GKE cluster, just tell it to start five RAM-intensive compute nodes. When you create a cluster, you don’t care what happens behind the scenes, just that you end up with the cluster you want.
Therefore, all we need to do is describe the resources we want to create, which means we should use some common format like JSON or YAML to describe them. What are JSON and YAML? They are just plain text representing data. 🤯
This concept is not new, in fact, Ansible founder Michael DeHaan [11] said it in a 2013 blog post [12] :
“… Infrastructure is best modeled not in code or GUI, but in a text-based, neutral, data-driven strategy.”
Later in the article, he coined the term “IaD” (Infrastructure as Data) to describe this concept.
IaD is a declarative approach to infrastructure — that is, you just say what you want without specifying the exact actions or steps to implement it. This is Kubernetes Controllers[13], many CI/CD tools (such as GitHub Actions[14]), and of course the concept of Ansible[15].
Who cares?
By this point, you might be wondering why I’m spending so much energy trying to convince you that IaD is a better paradigm than IaC. Well, it’s normal to think that.
Honestly, it comes down to one word: simple. I’ve spent enough time configuring cloud infrastructure on GCP and Azure to tell you that things get very complicated when you start scaling your infrastructure.
Of course, you want to configure and manage the infrastructure in a good, structured way. Dude, that’s what JSON is for. But write code to do it, right? Honestly, it makes no sense, the infrastructure is static, not an application.
Treating infrastructure as code opens the door to technical debt. Remember, not all code is created equal, and bad code can make your life a living hell.
It also introduces unnecessary complexity. Why do you need to manage a bunch of code to define the infrastructure when all you really need to do is describe it?
This section describes how to configure cloud infrastructure
Let’s leave the IaD aside for a moment and talk about configuring the cloud infrastructure. I promise it makes sense. Please bear with me.
There are many ways to configure cloud infrastructure. These methods execute our code when we configure, or some framework interprets our infrastructure definitions. Let’s take a quick look at these different approaches.
Terraform
Terraform[17] from Hashicorp[16] is a very popular platform-neutral tool for providing infrastructure for a variety of public and private cloud frameworks. It is based on a proprietary JSON-like Language, HCL (Hashicorp Configuration Language) [18], to define the infrastructure and support some very crude loops, conditions, and variables. Terraform relies on a JSON status file [19] to keep track of the infrastructure being created, which can be considered a structured log.
Pulumi
Pulumi[21] was founded by several former Microsoft employees [20] to target the vast Terraform market. It is also platform-neutral, borrowing many of Terraform’s concepts, including status files. The main difference is that Pulumi allows the infrastructure to be created by using the regular OLE programming language. Pulumi currently supports TypeScript, Javascript, Python, Go, and C#.
Note: Terraform recently hit back at Pulumi by launching the Pulumi-style Terraform CDK[22].
Ansible
Although Ansible began as a configuration management tool, it can now be used to configure and manage cloud infrastructure as well [23]. Ansible uses YAML to define infrastructure. Unlike Terraform and Pulumi, Ansible is stateless.
Crossplane
Crossplane[24] is a cloud-neutral tool running on Kubernetes for providing cloud resources outside of Kubernetes. I know… This will confuse you!) Because it is native to Kubernetes, it is declarative and uses YAML to describe the infrastructure being configured. Crossplane is still fairly new, so while it supports many AWS resources [25], there is relatively little support for other cloud services such as Azure[26] and GCP[27]. I wrote an article [28] documenting my exploration of configuring GCP clusters with crossplanes.
Command line tool (Cloud Provider CLIs)
It is worth mentioning that command line tools for cloud service providers are another popular way to configure infrastructure. When I say “command lines for cloud service providers” I mean commands like Azure’s AZ [29] and Google Cloud Platform’s GCloud [30]. Other cloud services have similar command-line tools. These command-line tools have one thing in common: they provide a way to interact with cloud service provider apis to create and manage resources. In this way, we need to write wrapper code or scripts to make different command line calls to configure the infrastructure.
Which tool is the best?
Of all the tools listed above, I think only two truly treat infrastructure as data and provide best practices for cloud infrastructure: Ansible and Crossplane.
Why is that?
Terraform(declarative) and Pulumi(code-based) both use status files, and I realize THAT I’m probably one of the few people who struggles with this issue. Status files exist to keep track of what resources Terraform creates, so they can’t be recreated unnecessarily and accidentally. While this is a brave idea, I find it violates the short life cycle principle of cloud infrastructure — we should just reinvent it.
Similarly, suppose we start by creating three cloud resources using Terraform. Another resource is then added, because the first three resources are in the status file, so they remain as they are and only new resources are added. But how do you know these four resources are in harmony? You don’t know. Unless you delete all four resources and then recreate them from scratch.
Finally, suppose we created a cloud resource using Terraform. It was then modified using the cloud service provider’s administrative console or CLI. Guess what? Terraform had no idea what was going on. The status file is no longer synchronized with the actual state of the resource. You are done!
As a personal preference, I don’t like Terraform’s HCL. I can’t read the HCL code at all. (Sorry, Hashi fans!) Also, I don’t like Terraform’s attempts to control flow and loops. For me, it makes me dizzy, and I hate it. 🤢
As a software engineer, I’m more interested in Pulumi — I could use it to take over Terraform at any moment. However, from SRE’s point of view, it is not declarative design. While programming languages are great, they can lead to overly complex automation. Remember: There is such a thing as bad code, and bad code leads to technical debt.
CLI cloud services are stateless, which I like, but they’re not declarative either. In addition, to make them more effective, they need to be wrapped in code or scripts. Give up.
That leaves us with Ansible and Crossplane. Ansible is declarative and stateless by definition, and uses YAML, which is very easy to read — even easier than JSON. Win! Also, I was surprised how easy Ansible is to create cloud infrastructure with libraries provided by various cloud providers.
Crossplane, a tool that natively supports Kubernetes, is also declarative, but not completely stateless. You see, because Crossplane runs on Kubernetes and utilizes ETCD [31] (Kubernetes’ distributed key-value store), every time we change the Crossplan infrastructure definition, it is recorded in etCD, This makes you think it behaves just like the Terraform/Pulumi status file, only slightly different. Crossplane’s blog [32] states: “Whether or not the changes are anticipated, it continuously observes and corrects the organization’s infrastructure to fit the desired configuration.” I really love that!
conclusion
Damn it, that’s too much to take! Let’s review what we’ve done:
- Resources created on the cloud (public or private) are not created out of thin air, but are virtual resources allocated from existing (limited) physical resources.
- Cloud-based infrastructure should be transient, simple, and version-controlled.
- It’s better to think of infrastructure as data, not code.
- Infrastructure as Data (IaD) is not a new concept — it dates back to at least 2013!
- Most cloud configuration tools are not suitable for IaD. Only Ansible and Crossplane meet the requirements.
And finally, a picture of Susie the mouse.
Peace, love, and code.
Using Ansible’s GCP Library to Provision a Kubernetes Cluster in Google Cloud: medium.com/dzerolabs/u… Using Crossplane to Provision a Kubernetes Cluster in Google Cloud: medium.com/dzerolabs/u…
Read more:
- Product Silos Using the Power of proceeds as Data in Kubernetes: www.redhat.com/en/blog/bre…
- I do declare! Infrastructure automation with the Configuration as Data: cloud.google.com/blog/produc…
- The Rise of proceeds as Data: radar.oreilly.com/2013/08/the…
References: [1] Rat from Infrastructure as Code to Infrastructure as Data: medium.com/dzerolabs/s… [2] AWS: aws.amazon.com/ [3] GCP: cloud.google.com/ [4] Azure: azure.microsoft.com/en-ca/ [5] OCI: www.oracle.com/ca-en/cloud… [6] IBM Cloud: www.ibm.com/cloud [7] OpenStack: www.openstack.org/ [8] Apache CluodStack: Cloudstack.apache.org/ [9] Azure Stack: azure.microsoft.com/en-ca/overv… [10] IBM Cloud Private: www.ibm.com/blogs/cloud… [11] Ansible: en.wikipedia.org/wiki/Ansibl… [12] The Rise of Instrastructure as Data: radar.oreilly.com/2013/08/the… [13] Just-in-time Kubernetes: A Beginner’s Guide to Understanding Kubernetes Core Concepts: medium.com/dzerolabs/j… [14] Workflow syntax for making Actions: docs.github.com/en/actions/… [15] Ansible: www.ansible.com/ [16] Hashicorp: www.hashicorp.com/ [17] Terraform: www.terraform.io/ [18] Hashicorp Configuration Language: github.com/hashicorp/h… [19] The Terraform The State File: An Overview: www.infrastructurecode.io/blog/the-te… [20] Former Microsoft Midori team members launch Pulumi, an open-source cloud development company: www.zdnet.com/article/for… [21] Pulumi: www.pulumi.com/ [22] Terraform CDK: github.com/hashicorp/t… [23] Cloud Support with Ansible: www.ansible.com/integration… [24] Crossplane: crossplane. IO / [25] Crossplane /provider-aws: doc.crds.dev/github.com/… [26] crossplane/provider – azure: doc.crds.dev/github.com/… [27] crossplane/provider – GCP: doc.crds.dev/github.com/… [28] Using Crossplane to Provision a Kubernetes Cluster in Google Cloud: medium.com/dzerolabs/u… [29] Azure Command-Line Interface (CLI) documentation: docs.microsoft.com/en-us/cli/a… [30] gcloud tool overview: cloud.google.com/sdk/gcloud [31] About etcd: The data backbone of Kubernetes: Medium.com/pradpoddar/… [32] Crossplane vs Terraform: blog. Crossplane. IO/Crossplane -… \
Hello, MY name is Yu Fan. I used to do R&D in Motorola, and now I am working in Mavenir for technical work. I have always been interested in communication, network, back-end architecture, cloud native, DevOps, CICD, block chain, AI and other technologies. The official wechat account is DeepNoMind