Author: admin

Mastering Podman: A Comprehensive Guide with Detailed Command Examples

Mastering Podman on Ubuntu: A Comprehensive Guide with Detailed Command Examples

Podman has become a popular alternative to Docker due to its flexibility, security, and rootless operation capabilities. This guide will walk you through the installation process and various advanced usage scenarios of Podman on Ubuntu, providing detailed examples for each command.

Table of Contents

.

1. How to Install Podman

To get started with Podman on Ubuntu, follow these steps:

Update Package Index

Before installing any new software, it’s a good idea to update your package index to ensure you’re getting the latest version of Podman:

.

.

sudo apt update

Install Podman

With your package index updated, you can now install Podman. This command will download and install Podman and any necessary dependencies:

.

.

sudo apt install podman -y

Example Output:

kotlin

.

Reading package lists… Done

Building dependency tree

Reading state information… Done

The following additional packages will be installed:

After this operation, X MB of additional disk space will be used.

Do you want to continue? [Y/n] y

Setting up podman (4.0.2) …

Verifying Installation

After installation, verify that Podman is installed correctly:

.

.

podman –version

Example Output:

.

podman version 4.0.2

2. How to Search for Images

Before running a container, you may need to find an appropriate image. Podman allows you to search for images in various registries.

Search Docker Hub

To search for images on Docker Hub:

.

.

podman search ubuntu

Example Output:

lua

.

INDEX NAME DESCRIPTION STARS OFFICIAL AUTOMATED

docker.io docker.io/library/ubuntu Ubuntu is a Debian-based Linux operating sys… 12329 [OK]

docker.io docker.io/ubuntu-upstart Upstart is an event-based replacement for the … 108 [OK]

docker.io docker.io/tutum/ubuntu Ubuntu image with SSH access. For the root p… 39

docker.io docker.io/ansible/ubuntu14.04-ansible Ubuntu 14.04 LTS with ansible 9 [OK]

This command will return a list of Ubuntu images available in Docker Hub.

3. How to Run Rootless Containers

One of the key features of Podman is the ability to run containers without needing root privileges, enhancing security.

Running a Rootless Container

As a non-root user, you can run a container like this:

.

.

podman run –rm -it ubuntu

Example Output:

ruby

.

root@d2f56a8d1234:/#

This command runs an Ubuntu container in an interactive shell, without requiring root access on the host system.

Configuring Rootless Environment

Ensure your user is added to the subuid and subgid files for proper UID/GID mapping:

.

.

echo “$USER:100000:65536” | sudo tee -a /etc/subuid /etc/subgid

Example Output:

makefile

.

user:100000:65536

user:100000:65536

4. How to Search for Containers

Once you start using containers, you may need to find specific ones.

Listing All Containers

To list all containers (both running and stopped):

.

.

podman ps -a

Example Output:

.

.

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

d13c5bcf30fd docker.io/library/ubuntu:latest 3 minutes ago Exited (0) 2 minutes ago confident_mayer

Filtering Containers

You can filter containers by their status, names, or other attributes. For instance, to find running containers:

.

.

podman ps –filter status=running

Example Output:

.

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

No output indicates there are no running containers at the moment.

5. How to Add Ping to Containers

Some minimal Ubuntu images don’t come with ping installed. Here’s how to add it.

Installing Ping in an Ubuntu Container

First, start an Ubuntu container:

.

.

podman run -it –cap-add=CAP_NET_RAW ubuntu

Inside the container, install ping (part of the iputils-ping package):

.

.

apt update

apt install iputils-ping

Example Output:

mathematica

.

Get:1 http://archive.ubuntu.com/ubuntu focal InRelease [265 kB]

Setting up iputils-ping (3:20190709-3) …

Now you can use ping within the container.

6. How to Expose Ports

Exposing ports is crucial for running services that need to be accessible from outside the container.

Exposing a Port

To expose a port, use the -p flag with the podman run command:

.

.

podman run -d -p 8080:80 ubuntu -c “apt update && apt install -y nginx && nginx -g ‘daemon off;'”

Example Output:

.

54c11dff6a8d9b6f896028f2857c6d74bda60f61ff178165e041e5e2cb0c51c8

This command runs an Ubuntu container, installs Nginx, and exposes port 80 in the container as port 8080 on the host.

Exposing Multiple Ports

You can expose multiple ports by specifying additional -p flags:

.

.

podman run -d -p 8080:80 -p 443:443 ubuntu -c “apt update && apt install -y nginx && nginx -g ‘daemon off;'”

Example Output:

wasm

.

b67f7d89253a4e8f0b5f64dcb9f2f1d542973fbbce73e7cdd6729b35e0d1125c

7. How to Create a Network

Creating a custom network allows you to isolate containers and manage their communication.

Creating a Network

To create a new network:

.

.

podman network create mynetwork

Example Output:

.

mynetwork

This command creates a new network named mynetwork.

Running a Container on a Custom Network

.

.

podman run -d –network mynetwork ubuntu -c “apt update && apt install -y nginx && nginx -g ‘daemon off;'”

Example Output:

.

1e0d2fdb110c8e3b6f2f4f5462d1c9b99e9c47db2b16da6b2de1e4d9275c2a50

This container will now communicate with others on the mynetwork network.

8. How to Connect a Network Between Pods

Podman allows you to manage pods, which are groups of containers sharing the same network namespace.

Creating a Pod and Adding Containers

.

.

podman pod create mypod

podman run -dt –pod mypod ubuntu -c “apt update && apt install -y nginx && nginx -g ‘daemon off;'”

podman run -dt –pod mypod ubuntu -c “apt update && apt install -y redis-server && redis-server”

Example Output:

.

f04d1c28b030f24f3f7b91f9f68d07fe1e6a2d81caeb60c356c64b3f7f7412c7

8cf540eb8e1b0566c65886c684017d5367f2a167d82d7b3b8c3496cbd763d447

4f3402b31e20a07f545dbf69cb4e1f61290591df124bdaf736de64bc3d40d4b1

Both containers now share the same network namespace and can communicate over the mypod network.

Connecting Pods to a Network

To connect a pod to an existing network:

.

.

podman pod create –network mynetwork mypod

Example Output:

.

f04d1c28b030f24f3f7b91f9f68d07fe1e6a2d81caeb60c356c64b3f7f7412c7

This pod will use the mynetwork network, allowing communication with other containers on that network.

9. How to Inspect a Network

Inspecting a network provides detailed information about the network configuration and connected containers.

Inspecting a Network

Use the podman network inspect command:

.

.

podman network inspect mynetwork

Example Output:

json

.

[

{

“name”: “mynetwork”,

“id”: “3c0d6e2eaf3c4f3b98a71c86f7b35d10b9d4f7b749b929a6d758b3f76cd1f8c6”,

“driver”: “bridge”,

“network_interface”: “cni-podman0”,

“created”: “2024-08-12T08:45:24.903716327Z”,

“subnets”: [

{

“subnet”: “10.88.1.0/24”,

“gateway”: “10.88.1.1”

}

],

“ipv6_enabled”: false,

“internal”: false,

“dns_enabled”: true,

“network_dns_servers”: [

“8.8.8.8”

]

}

]

This command will display detailed JSON output, including network interfaces, IP ranges, and connected containers.

10. How to Add a Static Address

Assigning a static IP address can be necessary for consistent network configurations.

Assigning a Static IP

When running a container, you can assign it a static IP address within a custom network:

.

.

podman run -d –network mynetwork –ip 10.88.1.100 ubuntu -c “apt update && apt install -y nginx && nginx -g ‘daemon off;'”

Example Output:

.

f05c2f18e41b4ef3a76a7b2349db20c10d9f2ff09f8c676eb08e9dc92f87c216

Ensure that the IP address is within the subnet range of your custom network.

11. How to Log On to a Container with

Accessing a container’s shell is often necessary for debugging or managing running applications.

Starting a Container with

If the container image includes , you can start it directly:

.

.

podman run -it ubuntu

Example Output:

ruby

.

root@e87b469f2e45:/#

Accessing a Running Container

To access an already running container:

.

.

podman exec -it <container_id>

Replace <container_id> with the actual ID or name of the container.

Example Output:

ruby

.

root@d2f56a8d1234:/#

.

.

What the Hell is Helm? (And Why You Should Care)

What the Hell is Helm?

If you’re tired of managing 10+ YAML files for every app, Helm is your new best friend. It’s basically the package manager for Kubernetes — like apt or brew — but for deploying apps in your cluster.

Instead of editing raw YAML over and over for each environment (dev, staging, prod), Helm lets you template it, inject dynamic values, and install with a single command.


Why Use Helm?

Here’s the reality:

  • You don’t want to maintain 3 sets of YAMLs for each environment.
  • You want to roll back fast if something breaks.
  • You want to reuse deployments across projects without rewriting.

Helm fixes all that. It gives you:

  • Templated YAML (no more copy-paste hell)
  • One chart, many environments
  • Version control + rollback support
  • Easy upgrades with helm upgrade
  • Access to thousands of ready-made charts from the community

Real Talk: What’s a Chart?

Think of a chart like a folder of YAML files with variables in it. You install it, pass in your config (values.yaml), and Helm renders the final manifests and applies them to your cluster.

When you install a chart, Helm creates a release — basically, a named instance of the chart running in your cluster.


How to Get Started (No BS)

1. Install Helm

brew install helm       # mac  
choco install kubernetes-helm  # windows  
sudo snap install helm  # linux  

2. Create Your First Chart

helm create myapp

Boom — you now have a scaffolded chart in a folder with templates, a values file, and everything else you need.


Folder Breakdown

myapp/
├── Chart.yaml         # Metadata
├── values.yaml        # Config you can override
└── templates/         # All your actual Kubernetes YAMLs (as templates)

Example Template (deployment.yaml)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ .Release.Name }}-app
spec:
  replicas: {{ .Values.replicaCount }}
  selector:
    matchLabels:
      app: {{ .Chart.Name }}
  template:
    metadata:
      labels:
        app: {{ .Chart.Name }}
    spec:
      containers:
      - name: {{ .Chart.Name }}
        image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
        ports:
        - containerPort: 80

Example values.yaml

replicaCount: 3
image:
  repository: nginx
  tag: latest

Change the values, re-deploy, and you’re done.


Deploying to Your Cluster

helm install my-release ./myapp

Upgrading later?

helm upgrade my-release ./myapp -f prod-values.yaml

Roll it back?

helm rollback my-release 1

Uninstall it?

helm uninstall my-release

Simple. Clean. Versioned.


Want a Database?

Don’t write your own MySQL config. Just pull it from Bitnami’s chart repo:

helm repo add bitnami https://charts.bitnami.com/bitnami
helm install my-db bitnami/mysql

Done.


Helm lets you:

  • Turn your Kubernetes YAML into reusable templates
  • Manage config per environment without duplicating files
  • Version your deployments and roll back instantly
  • Install apps like MySQL, Redis, etc., with one command

It’s the smart way to scale your Kubernetes setup without losing your mind.

How to renable the tempurl in latest Cpanel

.

As some of you have noticed the new cpanel by default has a bunch of new default settings that nobody likes.

FTPserver is not configured out of the box.

TempURL is disabled for security reasons. Under certain conditions, a user can attack another user’s account if they access a malicious script through a mod_userdir URL.

So they removed it by default.

.

They did not provide instructions for people who need it. You can easily enable it BUT php wont work on the temp url unless you do the following

remove below: and by remove I mean you need to recompile easyapache 4 the following changes.

mod_ruid2
mod_passenger
mod_mpm_itk
mod_proxy_fcgi
mod_fcgid

Install

Mod_suexec
mod_suphp

Then go into Apache_mode_user dir tweak and enable it and exclude default host only.
It
wont save the setting in the portal, but the configuration is updated. If you go back and look it will look like the settings didnt take. Looks like a bug in cpanel they need to fix on their front end.

Then PHP will work again on the tempurl

.

How to integrate VROPS with Ansible

Automating VMware vRealize Operations (vROps) with Ansible

In the world of IT operations, automation is the key to efficiency and consistency. VMware’s vRealize Operations (vROps) provides powerful monitoring and management capabilities for virtualized environments. Integrating vROps with Ansible, an open-source automation tool, can take your infrastructure management to the next level. In this blog post, we’ll explore how to achieve this integration and demonstrate its benefits with a practical example.

What is vRealize Operations (vROps)?

vRealize Operations (vROps) is a comprehensive monitoring and analytics solution from VMware. It helps IT administrators manage the performance, capacity, and overall health of their virtual environments. Key features of vROps include:

 Performance Monitoring: Continuous tracking of VMs, hosts, and other resources.
 Capacity Management: Planning and optimizing resource usage.
 Troubleshooting: Identifying and resolving issues promptly.
 Automated Actions: Responding to specific events with predefined actions.

Why Integrate vROps with Ansible?

Integrating vROps with Ansible allows you to automate routine tasks, enforce consistent configurations, and rapidly respond to changes or issues in your virtual environment. This integration enables you to:

 Automate Monitoring Setup: Configure monitoring for new virtual machines or environments automatically.
 Trigger Remediation Actions: Automate responses to alerts generated by vROps.
 Generate Reports: Automate the creation and distribution of performance and capacity reports.
 Maintain Configuration Compliance: Ensure consistent vROps configurations across environments.

Setting Up the Integration

Prerequisites

Before you start, ensure you have:

1.vROps Environment: A running instance of VMware vRealize Operations.
2.Ansible Installed: Ansible should be installed on your control node.

Step-by-Step Guide

Step 1: Configure API Access in vROps

First, ensure you have the necessary API access in vROps. You’ll need:

 vROps Host: The URL of your vROps instance.
 vROps Username: A user with API access permissions.
 vROps Password: The password for the above user.

Step 2: Install Ansible

If you haven’t installed Ansible yet, you can do so by following these commands:

sh

sudo apt update

sudo apt install ansible

Step 3: Create an Ansible Playbook

Create an Ansible playbook to interact with vROps. Below is an example playbook that retrieves the status of vROps resources.

Note: to use the other api end points you will need to acquire the token and set it as a fact to pass later.

Example

If you want to acquire the auth token:

.

name: Authenticate with vROps and Check vROps Status

  hosts: localhost

  vars:

    vrops_host: “your-vrops-host”

    vrops_username: “your-username”

    vrops_password: “your-password”

  tasks:

    – name: Authenticate with vROps

      uri:

        url: “https://{{ vrops_host }}/suite-api/api/auth/token/acquire”

        method: POST

        body_format: json

        body:

          username: {{ vrops_username }}”

          password: {{ vrops_password }}”

        headers:

          Content-Type: “application/json

        validate_certs: no

      register: auth_response

.

    – name: Fail if authentication failed

      fail:

        msg: “Authentication with vROps failed: {{ auth_response.json }}”

      when: auth_response.status != 200

.

    – name: Set auth token as fact

      set_fact:

        auth_token: {{ auth_response.json.token }}”

.

    – name: Get vROps status

      uri:

        url: “https://{{ vrops_host }}/suite-api/api/resources”

        method: GET

        headers:

          Authorization: vRealizeOpsToken {{ auth_token }}”

          Content-Type: “application/json

        validate_certs: no

      register: vrops_response

.

    – name: Display vROps status

      debug:

        msg: vROps response: {{ vrops_response.json }}”

.

Save this playbook to a file, for example, check_vrops_status.yml.

Step 4: Define Variables

Create a variables file to store your vROps credentials and host information.
Save it as vars.yml:

vrops_host: your-vrops-host

vrops_username: your-username

vrops_password: your-password

Step 5: Run the Playbook

Execute the playbook using the following command:

sh

ansible-playbook -e @vars.yml check_vrops_status.yml

This above command runs the playbook and retrieves the status of vROps resources, displaying the results if you used the first example.

Here are some of the key API functions you can use:

The Authentication to use the endpoints listed below, you will need to acquire the auth token and set it as a fact to pass to other tasks inside ansible to use with the various endpoints below.

 Login: Authenticate and get a session token.
 Endpoint: POST /suite-api/api/auth/token/acquire

Resource Management

 Get Resources: Retrieve a list of resources managed by vROps.
 Endpoint: GET /suite-api/api/resources
 Get Resource by ID: Retrieve details of a specific resource.
 Endpoint: GET /suite-api/api/resources/{resourceId}
 Create Resource: Add a new resource to vROps.
 Endpoint: POST /suite-api/api/resources
 Update Resource: Update information for an existing resource.
 Endpoint: PUT /suite-api/api/resources/{resourceId}
 Delete Resource: Remove a resource from vROps.
 Endpoint: DELETE /suite-api/api/resources/{resourceId}

Metrics and Data

 Get Metrics for a Resource: Retrieve metrics for a specific resource.
 Endpoint: GET /suite-api/api/resources/{resourceId}/stats
 Get Metric Definitions: List available metrics for a resource kind.
 Endpoint: GET /suite-api/api/resources/kind/{resourceKindKey}/statkeys
 Get Historical Metrics: Retrieve historical metric data for a resource.
 Endpoint: GET /suite-api/api/resources/{resourceId}/stats/historical

Alerts and Notifications

 Get Alerts: Retrieve a list of alerts.
 Endpoint: GET /suite-api/api/alerts
 Get Alert by ID: Retrieve details of a specific alert.
 Endpoint: GET /suite-api/api/alerts/{alertId}
 Acknowledge Alert: Acknowledge a specific alert.
 Endpoint: POST /suite-api/api/alerts/{alertId}/acknowledge
 Cancel Alert: Cancel a specific alert.
 Endpoint: POST /suite-api/api/alerts/{alertId}/cancel
 Generate Notifications: Send notifications based on specific conditions.
 Endpoint: POST /suite-api/api/notifications

Policies and Configurations

 Get Policies: Retrieve a list of policies.
 Endpoint: GET /suite-api/api/policies
 Get Policy by ID: Retrieve details of a specific policy.
 Endpoint: GET /suite-api/api/policies/{policyId}
 Create Policy: Add a new policy.
 Endpoint: POST /suite-api/api/policies
 Update Policy: Update an existing policy.
 Endpoint: PUT /suite-api/api/policies/{policyId}
 Delete Policy: Remove a policy.
 Endpoint: DELETE /suite-api/api/policies/{policyId}

Dashboards and Reports

 Get Dashboards: Retrieve a list of dashboards.
 Endpoint: GET /suite-api/api/dashboards
 Get Dashboard by ID: Retrieve details of a specific dashboard.
 Endpoint: GET /suite-api/api/dashboards/{dashboardId}
 Create Dashboard: Add a new dashboard.
 Endpoint: POST /suite-api/api/dashboards
 Update Dashboard: Update an existing dashboard.
 Endpoint: PUT /suite-api/api/dashboards/{dashboardId}
 Delete Dashboard: Remove a dashboard.
 Endpoint: DELETE /suite-api/api/dashboards/{dashboardId}
 Get Reports: Retrieve a list of reports.
 Endpoint: GET /suite-api/api/reports
 Generate Report: Generate a new report based on a template.
 Endpoint: POST /suite-api/api/reports/{reportTemplateId}/generate
 Get Report by ID: Retrieve details of a specific report.
 Endpoint: GET /suite-api/api/reports/{reportId}

Capacity and Utilization

 Get Capacity Remaining: Retrieve remaining capacity for a specific resource.
 Endpoint: GET /suite-api/api/resources/{resourceId}/capacity/remaining
 Get Capacity Usage: Retrieve capacity usage for a specific resource.
 Endpoint: GET /suite-api/api/resources/{resourceId}/capacity/usage

Additional Functionalities

 Get Custom Groups: Retrieve a list of custom groups.
 Endpoint: GET /suite-api/api/groups
 Create Custom Group: Add a new custom group.
 Endpoint: POST /suite-api/api/groups
 Update Custom Group: Update an existing custom group.
 Endpoint: PUT /suite-api/api/groups/{groupId}
 Delete Custom Group: Remove a custom group.
 Endpoint: DELETE /suite-api/api/groups/{groupId}
 Get Recommendations: Retrieve a list of recommendations.
 Endpoint: GET /suite-api/api/recommendations
 Get Recommendation by ID: Retrieve details of a specific recommendation.
 Endpoint: GET /suite-api/api/recommendations/{recommendationId}

These are just a few examples of the many functions available through the vROps REST API.

.

.

More Cheat Sheet for DevOps Engineers

More Cheat Sheet for DevOps Engineers

This guide is focused entirely on the most commonly used Kubernetes YAML examples and why you’d use them in a production or staging environment. These YAML definitions act as the foundation for automating, scaling, and managing containerized workloads.


1. Pod YAML (Basic Unit of Execution)

Use this when you want to run a single container on the cluster.

apiVersion: v1
kind: Pod
metadata:
  name: simple-pod
spec:
  containers:
  - name: nginx
    image: nginx

This is the most basic unit in Kubernetes. Ideal for testing and debugging.


2. Deployment YAML (For Scaling and Updates)

Use deployments to manage stateless apps with rolling updates and replicas.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - name: nginx
        image: nginx:1.21

3. Production-Ready Deployment Example

Use this to deploy a resilient application with health checks and resource limits.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: production-app
  labels:
    app: myapp
spec:
  replicas: 4
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: myapp-container
        image: myorg/myapp:2.1.0
        ports:
        - containerPort: 80
        livenessProbe:
          httpGet:
            path: /healthz
            port: 80
          initialDelaySeconds: 15
          periodSeconds: 20
        readinessProbe:
          httpGet:
            path: /ready
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 10
        resources:
          requests:
            cpu: "250m"
            memory: "512Mi"
          limits:
            cpu: "500m"
            memory: "1Gi"

4. Service YAML (Stable Networking Access)

apiVersion: v1
kind: Service
metadata:
  name: web-service
spec:
  selector:
    app: web
  ports:
  - port: 80
    targetPort: 80
  type: ClusterIP

5. ConfigMap YAML (External Configuration)

apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  LOG_LEVEL: "debug"
  FEATURE_FLAG: "true"

6. Secret YAML (Sensitive Information)

apiVersion: v1
kind: Secret
metadata:
  name: app-secret
stringData:
  password: supersecret123

7. PersistentVolumeClaim YAML (For Storage)

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: app-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

8. Job YAML (Run Once Tasks)

apiVersion: batch/v1
kind: Job
metadata:
  name: hello-job
spec:
  template:
    spec:
      containers:
      - name: hello
        image: busybox
        command: ["echo", "Hello World"]
      restartPolicy: Never

9. CronJob YAML (Recurring Tasks)

apiVersion: batch/v1
kind: CronJob
metadata:
  name: scheduled-task
spec:
  schedule: "*/5 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: task
            image: busybox
            args: ["/bin/sh", "-c", "echo Scheduled Job"]
          restartPolicy: OnFailure

10. Ingress YAML (Routing External Traffic)

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - host: myapp.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: web-service
            port:
              number: 80

11. NetworkPolicy YAML (Security Control)

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-nginx
spec:
  podSelector:
    matchLabels:
      app: nginx
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend

Cheat Sheet for DevOps Engineers

kubectl Cheat Sheet – In-Depth Guide

Managing Kubernetes clusters efficiently requires fluency with kubectl, the command-line interface for interacting with the Kubernetes API. Whether you’re deploying applications, viewing logs, or debugging infrastructure, this tool is your gateway to smooth cluster operations.

This in-depth cheat sheet will give you a comprehensive reference of how to use kubectl effectively in real-world operations, including advanced flags, filtering tricks, rolling updates, patching, output formatting, and resource exploration.


Shell Autocompletion

Boost productivity with shell autocompletion for kubectl:

Bash

source <(kubectl completion bash)
echo "source <(kubectl completion bash)" >> ~/.bashrc

ZSH

source <(kubectl completion zsh)
echo '[[ $commands[kubectl] ]] && source <(kubectl completion zsh)' >> ~/.zshrc

Aliases

alias k=kubectl
complete -o default -F __start_kubectl k

Working with Contexts & Clusters

kubectl config view                        # Show merged config
kubectl config get-contexts               # List contexts
kubectl config current-context            # Show current context
kubectl config use-context my-context     # Switch to context

Set default namespace for future commands:

kubectl config set-context --current --namespace=my-namespace

Deployments & YAML Management

kubectl apply -f ./manifest.yaml          # Apply resource(s)
kubectl apply -f ./dir/                   # Apply all YAMLs in directory
kubectl create deployment nginx --image=nginx
kubectl explain pod                       # Show pod schema

Apply resources from multiple sources:

kubectl apply -f ./one.yaml -f ./two.yaml
kubectl apply -f https://example.com/config.yaml

Viewing & Finding Resources

kubectl get pods                          # List all pods
kubectl get pods -o wide                  # Detailed pod listing
kubectl get services                      # List services
kubectl describe pod my-pod               # Detailed pod info

Filter & sort:

kubectl get pods --field-selector=status.phase=Running
kubectl get pods --sort-by='.status.containerStatuses[0].restartCount'

Find pod labels:

kubectl get pods --show-labels

Updating & Rolling Deployments

kubectl set image deployment/web nginx=nginx:1.19       # Set new image
kubectl rollout status deployment/web                   # Watch rollout
kubectl rollout history deployment/web                  # Show revisions
kubectl rollout undo deployment/web                     # Undo last change
kubectl rollout undo deployment/web --to-revision=2     # Revert to specific

Editing, Scaling, Deleting

kubectl edit deployment/web                    # Live edit YAML
kubectl scale deployment/web --replicas=5      # Scale up/down
kubectl delete pod my-pod                      # Delete by name
kubectl delete -f pod.yaml                     # Delete from file

Logs, Execs & Debugging

kubectl logs my-pod                            # View logs
kubectl logs my-pod -c container-name          # Logs from container
kubectl logs my-pod --previous                 # Logs from crashed
kubectl exec my-pod -- ls /                    # Run command
kubectl exec -it my-pod -- bash                # Shell access

Working with Services

kubectl expose pod nginx --port=80 --target-port=8080   # Create service
kubectl port-forward svc/my-service 8080:80             # Forward to local

Resource Metrics

kubectl top pod                              # Pod metrics
kubectl top pod --sort-by=cpu                # Sort by CPU
kubectl top node                             # Node metrics

Patching Resources

Strategic merge patch:

kubectl patch deployment my-deploy -p '{"spec":{"replicas":4}}'

JSON patch with array targeting:

kubectl patch pod my-pod --type='json' -p='[
  {"op": "replace", "path": "/spec/containers/0/image", "value":"nginx:1.21"}
]'

Cluster & Node Management

kubectl cordon my-node                        # Prevent new pods
kubectl drain my-node                         # Evict pods for maintenance
kubectl uncordon my-node                      # Resume scheduling
kubectl get nodes                             # List all nodes
kubectl describe node my-node                 # Node details

Output Formatting

kubectl get pods -o json
kubectl get pods -o yaml
kubectl get pods -o custom-columns="NAME:.metadata.name,IMAGE:.spec.containers[*].image"

Exploring API Resources

kubectl api-resources                         # List all resources
kubectl api-resources --namespaced=false      # Non-namespaced
kubectl api-resources -o wide                 # Extended info

Logging & Verbosity

kubectl get pods -v=6                         # Debug output
kubectl get deployment -v=9                   # Full API trace

Deployment Template Example

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  namespace: my-namespace
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80

Deploying Production-Grade Systems on Oracle Cloud Infrastructure (OCI) with Terraform

Launching a virtual machine is easy. Running secure, reliable, production-grade systems is not. This guide shows how to deploy enterprise-ready compute infrastructure on Oracle Cloud Infrastructure (OCI) using Terraform, with a focus on security, fault tolerance, and long-term operability.


What “Production-Grade” Actually Means

A production environment is defined by predictability, not convenience. Production systems must survive failures, scale safely, and be observable at all times.

  • Private networking by default
  • No public SSH access
  • Replaceable compute instances
  • Persistent storage separated from OS
  • Infrastructure defined as code

Target Architecture Overview

  • Private VCN and subnet with NAT gateway for outbound access
  • Network Security Groups (NSGs) with explicit rules
  • Flex compute shape
  • Detached block storage with iSCSI attachment
  • SSH key authentication only

This architecture is suitable for:

  • SaaS backends
  • Internal APIs
  • Databases
  • AI / ML inference nodes
  • HPC control or login nodes

Terraform: Provider Configuration


terraform {
  required_version = ">= 1.6"

  required_providers {
    oci = {
      source  = "oracle/oci"
      version = ">= 5.0.0"
    }
  }
}

provider "oci" {
  tenancy_ocid     = var.tenancy_ocid
  user_ocid        = var.user_ocid
  fingerprint      = var.fingerprint
  private_key_path = var.private_key_path
  region           = var.region
}

This ensures reproducible deployments and enforces secure API-based authentication.


Variables


variable "tenancy_ocid" {
  description = "OCID of the tenancy"
  type        = string
}

variable "user_ocid" {
  description = "OCID of the user calling the API"
  type        = string
}

variable "fingerprint" {
  description = "Fingerprint of the API signing key"
  type        = string
}

variable "private_key_path" {
  description = "Path to the private key for API authentication"
  type        = string
}

variable "region" {
  description = "OCI region identifier"
  type        = string
}

variable "compartment_ocid" {
  description = "OCID of the compartment for resources"
  type        = string
}

variable "image_ocid" {
  description = "OCID of the compute image (e.g., Oracle Linux 8)"
  type        = string
}

variable "ssh_public_key" {
  description = "Path to SSH public key file"
  type        = string
}

variable "allowed_cidr" {
  description = "CIDR block allowed to access instances (e.g., VPN range)"
  type        = string
  default     = "10.0.0.0/16"
}

Data Sources


data "oci_identity_availability_domains" "ads" {
  compartment_id = var.tenancy_ocid
}

locals {
  availability_domain = data.oci_identity_availability_domains.ads.availability_domains[0].name
}

This retrieves the list of availability domains in your region. We select the first AD for simplicity, but production deployments should consider multi-AD placement.


Virtual Cloud Network (VCN)


resource "oci_core_vcn" "prod_vcn" {
  cidr_blocks    = ["10.0.0.0/16"]
  display_name   = "prod-vcn"
  dns_label      = "prodvcn"
  compartment_id = var.compartment_ocid
}

A /16 CIDR allows future expansion without redesign. VCNs act as the first isolation boundary for production systems.


Internet Gateway


resource "oci_core_internet_gateway" "igw" {
  compartment_id = var.compartment_ocid
  vcn_id         = oci_core_vcn.prod_vcn.id
  display_name   = "prod-igw"
  enabled        = true
}

The internet gateway enables outbound connectivity for the NAT gateway. It does not expose private instances directly.


NAT Gateway


resource "oci_core_nat_gateway" "nat_gw" {
  compartment_id = var.compartment_ocid
  vcn_id         = oci_core_vcn.prod_vcn.id
  display_name   = "prod-nat-gw"
  block_traffic  = false
}

The NAT gateway allows private subnet instances to reach the internet for package updates and external API calls without exposing inbound access.


Route Tables


resource "oci_core_route_table" "private_rt" {
  compartment_id = var.compartment_ocid
  vcn_id         = oci_core_vcn.prod_vcn.id
  display_name   = "private-route-table"

  route_rules {
    destination       = "0.0.0.0/0"
    destination_type  = "CIDR_BLOCK"
    network_entity_id = oci_core_nat_gateway.nat_gw.id
  }
}

All outbound traffic from the private subnet routes through the NAT gateway. This ensures instances can reach external resources without being directly accessible.


Private Subnet (No Public IPs)


resource "oci_core_subnet" "private_subnet" {
  cidr_block                 = "10.0.1.0/24"
  vcn_id                     = oci_core_vcn.prod_vcn.id
  compartment_id             = var.compartment_ocid
  display_name               = "private-subnet"
  prohibit_public_ip_on_vnic = true
  route_table_id             = oci_core_route_table.private_rt.id
  dns_label                  = "private"
}

Instances in this subnet are never reachable from the internet. Access must go through a bastion, VPN, or private load balancer.


Network Security Groups (NSG)


resource "oci_core_network_security_group" "app_nsg" {
  compartment_id = var.compartment_ocid
  vcn_id         = oci_core_vcn.prod_vcn.id
  display_name   = "app-nsg"
}

# Allow SSH from internal network only
resource "oci_core_network_security_group_security_rule" "allow_ssh" {
  network_security_group_id = oci_core_network_security_group.app_nsg.id
  direction                 = "INGRESS"
  protocol                  = "6" # TCP

  source      = var.allowed_cidr
  source_type = "CIDR_BLOCK"

  tcp_options {
    destination_port_range {
      min = 22
      max = 22
    }
  }
}

# Allow HTTPS from internal network
resource "oci_core_network_security_group_security_rule" "allow_https" {
  network_security_group_id = oci_core_network_security_group.app_nsg.id
  direction                 = "INGRESS"
  protocol                  = "6" # TCP

  source      = var.allowed_cidr
  source_type = "CIDR_BLOCK"

  tcp_options {
    destination_port_range {
      min = 443
      max = 443
    }
  }
}

# Allow all outbound traffic
resource "oci_core_network_security_group_security_rule" "allow_egress" {
  network_security_group_id = oci_core_network_security_group.app_nsg.id
  direction                 = "EGRESS"
  protocol                  = "all"

  destination      = "0.0.0.0/0"
  destination_type = "CIDR_BLOCK"
}

# Allow ICMP for path MTU discovery
resource "oci_core_network_security_group_security_rule" "allow_icmp" {
  network_security_group_id = oci_core_network_security_group.app_nsg.id
  direction                 = "INGRESS"
  protocol                  = "1" # ICMP

  source      = "10.0.0.0/16"
  source_type = "CIDR_BLOCK"

  icmp_options {
    type = 3
    code = 4
  }
}

NSGs provide service-level firewalling and are preferred over subnet-wide security lists. These rules allow SSH and HTTPS only from your internal network, while permitting all outbound traffic.


Compute Instance (Flex Shape)


resource "oci_core_instance" "prod_instance" {
  availability_domain = local.availability_domain
  compartment_id      = var.compartment_ocid
  display_name        = "prod-app-01"
  shape               = "VM.Standard.E4.Flex"

  shape_config {
    ocpus         = 2
    memory_in_gbs = 16
  }

  create_vnic_details {
    subnet_id        = oci_core_subnet.private_subnet.id
    assign_public_ip = false
    nsg_ids          = [oci_core_network_security_group.app_nsg.id]
    hostname_label   = "prod-app-01"
  }

  source_details {
    source_type             = "image"
    source_id               = var.image_ocid
    boot_volume_size_in_gbs = 50
  }

  metadata = {
    ssh_authorized_keys = file(var.ssh_public_key)
  }

  preserve_boot_volume = true
}

Flex shapes allow independent scaling of CPU and memory, ensuring predictable performance without overpaying for unused resources. Setting preserve_boot_volume = true protects the boot volume if the instance is accidentally terminated.


Persistent Block Storage


resource "oci_core_volume" "data_volume" {
  availability_domain = local.availability_domain
  compartment_id      = var.compartment_ocid
  display_name        = "prod-data-vol"
  size_in_gbs         = 200
  vpus_per_gb         = 10 # Balanced performance tier
}

resource "oci_core_volume_attachment" "data_attach" {
  attachment_type = "paravirtualized"
  instance_id     = oci_core_instance.prod_instance.id
  volume_id       = oci_core_volume.data_volume.id
  display_name    = "prod-data-attachment"
}

Separating OS and data ensures instances are disposable while data remains protected. Paravirtualized attachments are simpler than iSCSI and work automatically on Oracle Linux.

Post-Deployment: Mounting the Block Volume

After Terraform applies, SSH into the instance and mount the volume:


# Find the attached volume (usually /dev/sdb)
lsblk

# Create filesystem (first time only)
sudo mkfs.xfs /dev/sdb

# Create mount point and mount
sudo mkdir -p /data
sudo mount /dev/sdb /data

# Add to fstab for persistence across reboots
echo '/dev/sdb /data xfs defaults,_netdev,nofail 0 2' | sudo tee -a /etc/fstab

The _netdev and nofail options ensure the system boots even if the volume is temporarily unavailable.


Outputs


output "instance_private_ip" {
  description = "Private IP address of the compute instance"
  value       = oci_core_instance.prod_instance.private_ip
}

output "instance_id" {
  description = "OCID of the compute instance"
  value       = oci_core_instance.prod_instance.id
}

output "vcn_id" {
  description = "OCID of the VCN"
  value       = oci_core_vcn.prod_vcn.id
}

output "volume_id" {
  description = "OCID of the data volume"
  value       = oci_core_volume.data_volume.id
}

Security & Operational Checklist

  • ✓ No public SSH access
  • ✓ Key-based authentication only
  • ✓ Private networking with NAT for outbound
  • ✓ Explicit NSG rules (no default allow)
  • ✓ Persistent storage with separate lifecycle
  • ✓ Infrastructure fully defined in code
  • ✓ Boot volume preservation enabled

What to Add Next

  • Bastion Service – OCI’s managed bastion for secure SSH access without VPN
  • Site-to-Site VPN – Connect to on-premises networks
  • OCI Load Balancer – For multi-instance deployments
  • Monitoring and Alerting – OCI Monitoring service with custom alarms
  • Dynamic Groups and IAM policies – Instance principals for secure API access
  • Cloud-init or Ansible – OS hardening and application deployment
  • CI/CD pipelines – GitOps workflow for Terraform changes
  • Volume backups – Scheduled backup policies for data protection

Example terraform.tfvars


tenancy_ocid     = "ocid1.tenancy.oc1..aaaaaaaaexample"
user_ocid        = "ocid1.user.oc1..aaaaaaaaexample"
fingerprint      = "aa:bb:cc:dd:ee:ff:00:11:22:33:44:55:66:77:88:99"
private_key_path = "~/.oci/oci_api_key.pem"
region           = "eu-frankfurt-1"
compartment_ocid = "ocid1.compartment.oc1..aaaaaaaaexample"
image_ocid       = "ocid1.image.oc1.eu-frankfurt-1.aaaaaaaaexample"
ssh_public_key   = "~/.ssh/id_rsa.pub"
allowed_cidr     = "10.0.0.0/16"

Nick Tailor’s Thoughts

Production infrastructure is not about clicking faster. It is about repeatability, security, and recovery. OCI combined with Terraform provides an extremely strong foundation when engineered correctly from day one.

If you treat infrastructure as software, production becomes predictable.

The complete code from this guide is available as a ready-to-use Terraform module. Clone it, update your variables, and run terraform apply to deploy.

How to Power Up or Power Down multiple instances in OCI using CLI with Ansible

 This assume you have already configured the OCI cli and added your key to the user inside the OCI interface so your Ubuntu or Jump box can connect to your OCI infrastructure
 Ansible
 Role to control power up/down instances using the OCI CLI
 This assume you already have ansible setup
 You will need to install the ansible oci collections

.

Now the reason why you would probably want this is over terraform is because terraform is more suited for infrastructure orchestration and not really suited to deal with the instances once they are up and running.

If you have scaled servers out in OCI powering servers up and down in bulk currently is not available. If you are doing a migration or using a staging environment that you need need to use the machine when building or doing troubleshooting.

Then having a way to power up/down multiple machines at once is convenient.

.

Install the OCI collections if you don’t have it already.

Linux/macOS

curl -L https://raw.githubusercontent.com/oracle/oci-ansible-collection/master/scripts/install.sh | bash -s — —verbose

.

ansible-galaxy collection list – Will list the collections installed

# /path/to/ansible/collections

Collection Version

——————- ——-

amazon.aws 1.4.0

ansible.builtin 1.3.0

ansible.posix 1.3.0

oracle.oci 2.10.0

.

Once you have it installed you need to test the OCI client is working

oci iam compartment list –all (this will list out the compartment ID list for your instances.

Compartments in OCI are a way to organise infrastructure and control access to those resources. This is great for if you have contractors coming and you only want them to have access to certain things not everything.

Now there are two ways you can your instance names.

 One logging in via the OCI interface and going the correct compartment, which is very slow and mind numbing to wait for.
 Or you can use automated approaches which is what you should be doing with everything you do that needs to be done over and over.

.

Bash Script to get the instances names from OCI

 This will use the OCI CLI and provide all instances name and ips
 It loops through each availability domain.
 for each availability domain, it lists the instance IDs and writes them to instance_ids.txt.
 It cleans up the instance_ids.txt file to remove brackets, quotes, and commas.
 It reads each instance ID from instance_ids.txt.
 For each instance, it retrieves the VNIC information.
 It extracts the display name, public IP, and private IP, and prints them.
 The script ends the loop and moves to the next availability domain.

compartment_id=ocid1.compartment.oc1..insert compartment ID here

.

# Explicitly define the availability domains based on your provided data

availability_domains=(“zcLB:US-CHICAGO-1-AD-1” “zcLB:US-CHICAGO-1-AD-2” “zcLB:US-CHICAGO-1-AD-3”)

.

# For each availability domain, list the instances

for ad in “${availability_domains[@]}”; do

.

    # List instances within the specific AD and compartment, extracting the “id” field

    oci compute instance list –compartment-id $compartment_id –availability-domain $ad –query data[].id” –raw-output > instance_ids.txt

.

    # Clean up the instance IDs (removing brackets, quotes, etc.)

    sed i ‘s/\[//g’ instance_ids.txt

    sed i ‘s/\]//g’ instance_ids.txt

    sed i ‘s/”//g’ instance_ids.txt

    sed i ‘s/,//g’ instance_ids.txt

.

    # Read each instance ID from instance_ids.txt

    while read -r instance_id; do

        # Get instance VNIC information

        instance_info=$(oci compute instance list-vnics –instance-id $instance_id)

.

        # Extract the required fields and print them

        display_name=$(echo $instance_info | jq -r ‘.data[0].”display-name”‘)

        public_ip=$(echo $instance_info | jq -r ‘.data[0].”public-ip“‘)

        private_ip=$(echo $instance_info | jq -r ‘.data[0].”private-ip“‘)

.

        echo “Availability Domain: $ad

        echo “Display Name: $display_name

        echo “Public IP: $public_ip

        echo “Private IP: $private_ip

        echo “—————————————–“

    done < instance_ids.txt

done

.

The output of the script when piped in to a file will look like

Instance.names

Availability Domain: zcLB:US-CHICAGO-1-AD-1

Display Name: Instance1

Public IP: 192.0.2.1

Private IP: 10.0.0.1

—————————————–

Availability Domain: zcLB:US-CHICAGO-1-AD-1

Display Name: Instance2

Public IP: 192.0.2.2

Private IP: 10.0.0.2

—————————————–

.

.

You can now grep this file for the name of the servers you want to power on or off quickly

 grep instance.names | grep <Instance*>

.

Now we have an ansible playbook that can power on or power off the instance by name provided by the OCI client

Ansible playbook to power on or off multiple instances via OCI CLI

name: Control OCI Instance Power State based on Instance Names

  hosts: localhost

  vars:

    instance_names_to_stop:

       instance1

      # Add more instance names here if you wish to stop them…

.

    instance_names_to_start:

      # List the instance names you wish to start here…

      # Example:

       Instance2

.

  tasks:

   name: Fetch all instance details in the compartment

    command:

      cmd: oci compute instance list –compartment-id ocid1.compartment.oc1..aaaaaaaak7jc7tn2su2oqzmrbujpr5wmnuucj4mwj4o4g7rqlzemy4yvxrza –output json

    register: oci_output

.

   set_fact:

      instances: {{ oci_output.stdout | from_json }}”

.

   name: Extract relevant information

    set_fact:

      clean_instances: {{ clean_instances | default([]) + [{ ‘name’: item[‘display-name’], ‘id’: item.id, ‘state’: item[‘lifecycle-state’] }] }}”

    loop: {{ instances.data }}”

    when: “‘display-name’ in item and ‘id’ in item and ‘lifecycle-state’ in item”

.

   name: Filter out instances to stop

    set_fact:

      instances_to_stop: {{ instances_to_stop | default([]) + [item] }}”

    loop: {{ clean_instances }}”

    when: “item.name in instance_names_to_stop and item.state == ‘RUNNING'”

.

   name: Filter out instances to start

    set_fact:

      instances_to_start: {{ instances_to_start | default([]) + [item] }}”

    loop: {{ clean_instances }}”

    when: “item.name in instance_names_to_start and item.state == ‘STOPPED'”

.

   name: Filter out instances to stop

    set_fact:

      instances_to_stop: {{ clean_instances | selectattr(‘name’, ‘in’, instance_names_to_stop) | selectattr(‘state’, ‘equalto‘, ‘RUNNING’) | list }}”

.

   name: Filter out instances to start

    set_fact:

      instances_to_start: {{ clean_instances | selectattr(‘name’, ‘in’, instance_names_to_start) | selectattr(‘state’, ‘equalto‘, ‘STOPPED’) | list }}”

.

   name: Display instances to stop (you can remove this debug task later)

    debug:

      var: instances_to_stop

.

   name: Display instances to start (you can remove this debug task later)

    debug:

      var: instances_to_start

.

   name: Power off instances

    command:

      cmd: oci compute instance action —action STOP –instance-id {{ item.id }}”

    loop: {{ instances_to_stop }}”

    when: instances_to_stop | length > 0

    register: state

.

#  – debug:

#      var: state

.

   name: Power on instances

    command:

      cmd: oci compute instance action —action START –instance-id {{ item.id }}”

    loop: {{ instances_to_start }}”

    when: instances_to_start | length > 0

.

The output will look like

PLAY [Control OCI Instance Power State based on Instance Names] **********************************************************************************

.

TASK [Gathering Facts] ***************************************************************************************************************************

ok: [localhost]

.

TASK [Fetch all instance details in the compartment] *********************************************************************************************

changed: [localhost]

.

TASK [Parse the OCI CLI output] ******************************************************************************************************************

ok: [localhost]

.

TASK [Extract relevant information] **************************************************************************************************************

ok: [localhost] => (item={‘display-name’: ‘Instance1’, ‘id’: ‘ocid1.instance.oc1..exampleuniqueID1’, ‘lifecycle-state’: ‘STOPPED’})

ok: [localhost] => (item={‘display-name’: ‘Instance2’, ‘id’: ‘ocid1.instance.oc1..exampleuniqueID2’, ‘lifecycle-state’: ‘RUNNING’})

.

TASK [Filter out instances to stop] **************************************************************************************************************

ok: [localhost]

.

TASK [Filter out instances to start] *************************************************************************************************************

ok: [localhost]

.

TASK [Display instances to stop (you can remove this debug task later)] **************************************************************************

ok: [localhost] => {

    instances_to_stop: [

        {

            “name”: “Instance2”,

            “id”: ocid1.instance.oc1..exampleuniqueID2″,

            “state”: RUNNING”

        }

    ]

}

.

TASK [Display instances to start (you can remove this debug task later)] *************************************************************************

ok: [localhost] => {

    instances_to_start: [

        {

            “name”: “Instance1”,

            “id”: ocid1.instance.oc1..exampleuniqueID1″,

            “state”: STOPPED”

        }

    ]

}

.

TASK [Power off instances] ***********************************************************************************************************************

changed: [localhost] => (item={‘name’: ‘Instance2’, ‘id’: ‘ocid1.instance.oc1..exampleuniqueID2’, ‘state’: ‘RUNNING’})

.

TASK [Power on instances] ************************************************************************************************************************

changed: [localhost] => (item={‘name’: ‘Instance1’, ‘id’: ‘ocid1.instance.oc1..exampleuniqueID1’, ‘state’: ‘STOPPED’})

.

PLAY RECAP ****************************************************************************************************************************************

localhost                  : ok=9    changed=3    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

.

.

How to Deploy Another VPC in AWS with Scalable EC2’s for HA using Terraform

 This will configure a new VPC
 Create a new subnet for use
 Create a new security group with bunch rules
 Create a key pair for your new instances
 Allow you to scale your instances cleanly use the count attribute.

So we are going to do this a bit different than the other post. As the other post is just deploying one instance in an existing VPC.

This one is more fun. The structure we will use this time will allow you to scale your ec2 instances very cleanly. If you are using git repos to push out changes. Then having a main.tf for your instance is much simpler to manage at scale.

File structure:

terraform-project/

├── main.tf <– Your main configuration file

├── variables.tf <– Variables file that has the inputs to pass

├── outputs.tf <– Outputs file

├── security_group.tf <– File containing security group rules

└── modules/

└── instance/

        ├── main.tf <- this file contains your ec2 instances

└── variables.tf <- variable file that defines we will pass for the module in main.tf to use

.

Explaining the process:

Main.tf

 We have defined the provider and availability zone; if you have more than one cloud, then its good create a provider.tf and carve them out. 
 The key-pair to import into aws in the second availability zone that was generated locally in my terraform directory using.
 ssh-keygen -t rsa -b 2048 -f ./terraform-aws-key
 We are then saying lets create a new vpc called vpc2
 with the subnet cidr block 10.0.1.0/24 to use internally
 this will also map the public address to the new internal address assigned upon launch
 We will be creating servers using variables defined in the variables.tf
 Instance type
 AMID
 key_pair name to use
 new subnet to use
 and assign the new security group to the ec2 instance deployed
 We also added a count on the module so when we deploy ec2’s we can simply adjust the count number and pushed the code with a one tiny change as opposed to an entire block. You will see what I mean later.
main.tf

provider aws {

  region = “us-west-2”

}

.

resource aws_key_pair “my-nick-test-key” {

  key_name   = “my-nick-test-key”

  public_key = file(${path.module}/terraform-aws-key.pub”)

}

.

resource aws_vpc “vpc2” {

  cidr_block = “10.0.0.0/16”

}

.

resource aws_subnet newsubnet {

  vpc_id                  = aws_vpc.vpc2.id

  cidr_block              = “10.0.1.0/24”

  map_public_ip_on_launch = true

}

.

module web_server {

  source           = “./module/instance”

  ami_id           = var.ami_id

  instance_type    = var.instance_type

  key_name         = var.key_name_instance

  subnet_id        = aws_subnet.newsubnet.id

  instance_count   = 2  // Specify the number of instances you want

  security_group_id = aws_security_group.newcpanel.id

}

.

Variables.tf

 Here we define the variables we want to pass to the module in main.tf for the instance.
 The linux image
 Instance type (size of the machine)
 Key-pair to use for the image

variable ami_id {

  description = “The AMI ID for the instance”

  default     = “ami-0913c47048d853921” // Amazon Linux 2 AMI ID

}

.

variable instance_type {

  description = “The instance type for the instance”

  default     = “t2.micro

}

.

variable key_name_instance {

  description = “The key pair name for the instance”

  default     = “my-nick-test-key”

}

.

Security_group.tf

 This will create a new security group in the us-west-2 with inbound rules similar to cpanel with the name newcpanel

resource aws_security_group newcpanel {

  name        = newcpanel

  description = “Allow inbound traffic”

  vpc_id      = aws_vpc.vpc2.id

.

  // POP3 TCP 110

  ingress {

    from_port   = 110

    to_port     = 110

    protocol    = tcp

    cidr_blocks = [“0.0.0.0/0”]

  }

.

  // Custom TCP 20

  ingress {

    from_port   = 20

    to_port     = 20

    protocol    = tcp

    cidr_blocks = [“0.0.0.0/0”]

  }

.

  // Custom TCP 587

  ingress {

    from_port   = 587

    to_port     = 587

    protocol    = tcp

    cidr_blocks = [“0.0.0.0/0”]

  }

.

  // DNS (TCP) TCP 53

  ingress {

    from_port   = 53

    to_port     = 53

    protocol    = tcp

    cidr_blocks = [“0.0.0.0/0”]

  }

.

  // SMTPS TCP 465

  ingress {

    from_port   = 465

    to_port     = 465

    protocol    = tcp

    cidr_blocks = [“0.0.0.0/0”]

  }

.

  // HTTPS TCP 443

  ingress {

    from_port   = 443

    to_port     = 443

    protocol    = tcp

    cidr_blocks = [“0.0.0.0/0”]

  }

.

  // DNS (UDP) UDP 53

  ingress {

    from_port   = 53

    to_port     = 53

    protocol    = udp

    cidr_blocks = [“0.0.0.0/0”]

  }

.

  // IMAP TCP 143

  ingress {

    from_port   = 143

    to_port     = 143

    protocol    = tcp

    cidr_blocks = [“0.0.0.0/0”]

  }

.

  // IMAPS TCP 993

  ingress {

    from_port   = 993

    to_port     = 993

    protocol    = tcp

    cidr_blocks = [“0.0.0.0/0”]

  }

.

  // Custom TCP 21

  ingress {

    from_port   = 21

    to_port     = 21

    protocol    = tcp

    cidr_blocks = [“0.0.0.0/0”]

  }

.

  // Custom TCP 2086

  ingress {

    from_port   = 2086

    to_port     = 2086

    protocol    = tcp

    cidr_blocks = [“0.0.0.0/0”]

  }

.

  // Custom TCP 2096

  ingress {

    from_port   = 2096

    to_port     = 2096

    protocol    = tcp

    cidr_blocks = [“0.0.0.0/0”]

  }

.

  // HTTP TCP 80

  ingress {

    from_port   = 80

    to_port     = 80

    protocol    = tcp

    cidr_blocks = [“0.0.0.0/0”]

  }

.

  // SSH TCP 22

  ingress {

    from_port   = 22

    to_port     = 22

    protocol    = tcp

    cidr_blocks = [“0.0.0.0/0”]

  }

.

  // POP3S TCP 995

  ingress {

    from_port   = 995

    to_port     = 995

    protocol    = tcp

    cidr_blocks = [“0.0.0.0/0”]

  }

.

  // Custom TCP 2083

  ingress {

    from_port   = 2083

    to_port     = 2083

    protocol    = tcp

    cidr_blocks = [“0.0.0.0/0”]

  }

.

  // Custom TCP 2087

  ingress {

    from_port   = 2087

    to_port     = 2087

    protocol    = tcp

    cidr_blocks = [“0.0.0.0/0”]

  }

.

  // Custom TCP 2095

  ingress {

    from_port   = 2095

    to_port     = 2095

    protocol    = tcp

    cidr_blocks = [“0.0.0.0/0”]

  }

.

  // Custom TCP 2082

  ingress {

    from_port   = 2082

    to_port     = 2082

    protocol    = tcp

    cidr_blocks = [“0.0.0.0/0”]

  }

}

output newcpanel_sg_id {

  value       = aws_security_group.newcpanel.id

  description = “The ID of the security group ‘newcpanel‘”

}

.

.

Outputs.tf

 We want some information to be outputted upon creating the machines like the assigned public addresses. In terraform it needs somethings outputted for the checks to work. In ansible arent forced to do this, but it looks like in terraform you are.

output public_ips {

  value       = module.web_server.public_ips

  description = “List of public IP addresses for the instances.”

}

.

Okay so now we want to create the scalable ec2

 Up on deployment in the us-west-2 which essentially is for HA purposes.
 You want the key pair to used
 And the security group we defined earlier to be added the instance.

We create a modules/instance directory and inside here define the instances as resources

 Now there are a couple of ways to do this. Depends on how you grew your infrastructure out. If all your machines are the same then you don’t need a resource block for each instance which can make the code uglier to manage. You can use the count attribute to simply add or subtract inside the main.tf where the instance_count is defined under the module  instance_count   = 2

modules/instance/main.tf

resource aws_instance “Tailor-Server” {

  count          = var.instance_count  // Control the number of instances with a variable

.

  ami            = var.ami_id

  instance_type  = var.instance_type

  subnet_id      = var.subnet_id

  key_name       = var.key_name

  vpc_security_group_ids = [var.security_group_id]

.

  tags = {

    Name = format(“Tailor-Server%02d”, count.index + 1)  // Naming instances with a sequential number

  }

.

  root_block_device {

    volume_type           = “gp2”

    volume_size           = 30

    delete_on_termination = true

  }

}

.

Modules/instance/variables.tf

Each variable serves as an input that can be set externally when the module is called, allowing for flexibility and reusability of the module across different environments or scenarios.

So here we defining it as a list of items we need to pass for the module to work. We will later provide the actual parameter to pass to the variables being called in the main.tf

Cheat sheet:

ami_id: Specifies the Amazon Machine Image (AMI) ID that will be used to launch the EC2 instances. The AMI determines the operating system and software configurations that will be loaded onto the instances when they are created.

instance_type: Determines the type of EC2 instance to launch. This affects the computing resources available to the instance (CPU, memory, etc.).

Type: It is expected to be a string that matches one of AWS’s predefined instance types (e.g., t2.micro, m5.large).

key_name: Specifies the name of the key pair to be used for SSH access to the EC2 instances. This key should already exist in the AWS account.

subnet_id: Identifies the subnet within which the EC2 instances will be launched. The subnet is part of a specific VPC (Virtual Private Cloud).

instance_names: A list of names to be assigned to the instances. This helps in identifying the instances within the AWS console or when querying using the AWS CLI.

security_group_Id: Specifies the ID of the security group to attach to the EC2 instances. Security groups act as a virtual firewall for your instances to control inbound and outbound traffic.

 We are also adding a count here so we can scale ec2 very efficiently, especially if you have a lot of hands working in the pot keeps things very easy to manage.

variable ami_id {}

variable instance_type {}

variable key_name {}

variable subnet_id {}

variable instance_names {

  type        = list(string)

  description = “List of names for the instances to create.”

}

variable security_group_id {

  description = “Security group ID to assign to the instance”

  type        = string

}

variable instance_count {

  description = “The number of instances to create”

  type        = number

  default     = 1  // Default to one instance if not specified

}

.

Time to deploy your code: I didnt bother showing the plan here just the apply

my-terraform-vpc$ terraform apply

Do you want to perform these actions?

  Terraform will perform the actions described above.

  Only ‘yes’ will be accepted to approve.

.

  Enter a value: yes

.

aws_subnet.newsubnet: Destroying… [id=subnet-016181a8999a58cb4]

aws_subnet.newsubnet: Destruction complete after 1s

aws_subnet.newsubnet: Creating…

aws_subnet.newsubnet: Still creating… [10s elapsed]

aws_subnet.newsubnet: Creation complete after 11s [id=subnet-0a5914443d2944510]

module.web_server.aws_instance.Tailor-Server[1]: Creating…

module.web_server.aws_instance.Tailor-Server[0]: Creating…

module.web_server.aws_instance.Tailor-Server[1]: Still creating… [10s elapsed]

module.web_server.aws_instance.Tailor-Server[0]: Still creating… [10s elapsed]

module.web_server.aws_instance.Tailor-Server[0]: Still creating… [20s elapsed]

module.web_server.aws_instance.Tailor-Server[1]: Still creating… [20s elapsed]

module.web_server.aws_instance.Tailor-Server[1]: Still creating… [30s elapsed]

module.web_server.aws_instance.Tailor-Server[0]: Still creating… [30s elapsed]

module.web_server.aws_instance.Tailor-Server[0]: Still creating… [40s elapsed]

module.web_server.aws_instance.Tailor-Server[1]: Still creating… [40s elapsed]

module.web_server.aws_instance.Tailor-Server[1]: Still creating… [50s elapsed]

module.web_server.aws_instance.Tailor-Server[0]: Still creating… [50s elapsed]

module.web_server.aws_instance.Tailor-Server[0]: Creation complete after 52s [id=i-0d103937dcd1ce080]

module.web_server.aws_instance.Tailor-Server[1]: Still creating… [1m0s elapsed]

module.web_server.aws_instance.Tailor-Server[1]: Still creating… [1m10s elapsed]

module.web_server.aws_instance.Tailor-Server[1]: Creation complete after 1m12s [id=i-071bac658ce51d415]

.

Apply complete! Resources: 3 added, 0 changed, 1 destroyed.

.

Outputs:

.

newcpanel_sg_id = “sg-0df86c53b5de7b348”

public_ips = [

  “34.219.34.165”,

  “35.90.247.94”,

]

.

Results:

VPC successful:

EC2 successful:

Security-Groups:

Key Pairs:

Ec2 assigned SG group:

How to deploy an EC2 instance in AWS with Terraform

.

  • How to install terraform
  • How to configure your aws cli
  • How to steup your file structure
  • How to deploy your instance
  • You must have an AWS account already setup
    • You have an existing VPC
    • You have existing security groups

Depending on which machine you like to use. I use varied distros for fun.

For this we will use Ubuntu 22.04

How to install terraform

  • Once you are logged into your linux jump box or whatever you choose to manage.

wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg –dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg

echo “deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main” | sudo tee /etc/apt/sources.list.d/hashicorp.list

sudo apt update && sudo apt install terraform

.

ThanosJumpBox:~/myterraform$ terraform -v

Terraform v1.8.2

on linux_amd64

+ provider registry.terraform.io/hashicorp/aws v5.47.

  • Okay next you want to install the awscli

sudo apt update

sudo apt install awscli

2. Okay Now you need to go into your aws and create a user and aws cli key

  • Log into your aws console
  • Go to IAM
    • Under users create a user called Terrform-thanos

Next you want to either create a group or add it to an existing. To make things easy for now we are going to add it administrator group

Next click on the new user and create the ACCESS KEY

Next select the use case for the key

Once you create the ACCESS-KEY you will see the key and secret

Copy these to a text pad and save them somewhere safe.

Next you we going to create the RSA key pair

  • Go under EC2 Dashboard
  • Then Network & ecurity
  • Then Key Pairs
  • Create a new key pair and give it a name

Now configure your Terrform to use the credentials

thanosjumpbox-myterraform$ aws configure

AWS Access Key ID [****************RKFE]:

AWS Secret Access Key [****************aute]:

Default region name [us-west-1]:

Default output format [None]:

.

So a good terraform file structure to use in work environment would be

my-terraform-project/

├── main.tf

├── variables.tf

├── outputs.tf

├── provider.tf

├── modules/

   ├── vpc/

      ├── main.tf

      ├── variables.tf

      └── outputs.tf

   └── ec2/

       ├── main.tf

       ├── variables.tf

       └── outputs.tf

├── environments/

   ├── dev/

      ├── main.tf

      ├── variables.tf

      └── outputs.tf

   ├── prod/

      ├── main.tf

      ├── variables.tf

      └── outputs.tf

├── terraform.tfstate

├── terraform.tfvars

└─ .gitignore

That said for the purposes of this post we will keep it simple. I will be adding separate posts to deploy vpc’s, autoscaling groups, security groups etc.

This would also be very easy to display if you VSC to connect to your
linux machine

mkdir myterraform

cd myterraform

touch main.tf outputs.tf variables.tf

.

So we are going to create an Instance as follows

 EC2 in my existing VPC
 Using a AMI Amazon Linux 2023 AMI (AMI CATALOG will have the ID)
 ami-0827b6c5b977c020e (ID)
 t2.micro instance type
 using a subnet that is available in the us-west-1 zone available for my vpc
 You can find the ID in the console VPC-subnets
 Security groups again will be found under Network & Security > Security Groups
 Use a general purpose volume 30G SSD
 Which is using a custom security group I created earlier
 The outputs will provided via the outputs.tf

Main.tf

provider “aws” {

  region = var.region

}

.

resource “aws_instance” “my_instance” {

  ami           = “ami-0827b6c5b977c020e  # Use a valid AMI ID for your region

  instance_type = “t2.micro              # Free Tier eligible instance type

  key_name      = “”           # Ensure this key pair is already created in your AWS account

.

  subnet_id              = “subnet-0e80683fe32a75513  # Ensure this is a valid subnet in your VPC

  vpc_security_group_ids = [“sg-0db2bfe3f6898d033]  # Ensure this is a valid security group ID

.

  tags = {

    Name = “thanos-lives”

  }

.

  root_block_device {

    volume_type = “gp2  # General Purpose SSD, which is included in the Free Tier

    volume_size = 30     # Maximum size covered by the Free Tier

  }

.

Outputs.tf

output “instance_ip_addr” {

  value = aws_instance.my_instance.public_ip

  description = “The public IP address of the EC2 instance.”

}

.

output “instance_id” {

  value = aws_instance.my_instance.id

  description = “The ID of the EC2 instance.”

}

.

output “first_security_group_id” {

  value = tolist(aws_instance.my_instance.vpc_security_group_ids)[0]

  description = “The first Security Group ID associated with the EC2 instance.”

}

.

Variables.tf

variable “region” {

  description = “The AWS region to create resources in.”

  default     = “us-west-1”

}

.

variable “ami_id” {

  description = “The AMI ID to use for the server.”

}

.

.

Terraform.tfsvars

region = “us-west-1”

ami_id = “ami-0827b6c5b977c020e  # Replace with your chosen AMI ID

.

.

Deploying your code:

thanosjumpbox:~/my-terraform$ terraform init

.

Initializing the backend…

.

Initializing provider plugins…

Reusing previous version of hashicorp/aws from the dependency lock file

Using previously-installed hashicorp/aws v5.47.0

.

Terraform has been successfully initialized!

.

You may now begin working with Terraform. Try running “terraform plan” to see

any changes that are required for your infrastructure. All Terraform commands

should now work.

.

If you ever set or change modules or backend configuration for Terraform,

rerun this command to reinitialize your working directory. If you forget, other

commands will detect it and remind you to do so if necessary.

thanosjumpbox:~/my-terraform$ terraform$

.

thanosjumpbox:~/my-terraform$ terraform$ terraform plan

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:

  + create

.

Terraform will perform the following actions:

.

  # aws_instance.my_instance will be created

  + resource “aws_instance” “my_instance” {

      + ami                                  = “ami-0827b6c5b977c020e”

      + arn                                  = (known after apply)

      + associate_public_ip_address          = (known after apply)

      + availability_zone                    = (known after apply)

      + cpu_core_count                       = (known after apply)

      + cpu_threads_per_core                 = (known after apply)

      + disable_api_stop                     = (known after apply)

      + disable_api_termination              = (known after apply)

      + ebs_optimized                        = (known after apply)

      + get_password_data                    = false

      + host_id                              = (known after apply)

      + host_resource_group_arn              = (known after apply)

      + iam_instance_profile                 = (known after apply)

      + id                                   = (known after apply)

      + instance_initiated_shutdown_behavior = (known after apply)

      + instance_lifecycle                   = (known after apply)

      + instance_state                       = (known after apply)

      + instance_type                        = “t2.micro

      + ipv6_address_count                   = (known after apply)

      + ipv6_addresses                       = (known after apply)

      + key_name                             = “nicktailor-aws”

      + monitoring                           = (known after apply)

      + outpost_arn                          = (known after apply)

      + password_data                        = (known after apply)

      + placement_group                      = (known after apply)

      + placement_partition_number           = (known after apply)

      + primary_network_interface_id         = (known after apply)

      + private_dns                          = (known after apply)

      + private_ip                           = (known after apply)

      + public_dns                           = (known after apply)

      + public_ip                            = (known after apply)

      + secondary_private_ips                = (known after apply)

      + security_groups                      = (known after apply)

      + source_dest_check                    = true

      + spot_instance_request_id             = (known after apply)

      + subnet_id                            = “subnet-0e80683fe32a75513”

      + tags                                 = {

          + “Name” = “Thanos-lives”

        }

      + tags_all                             = {

          + “Name” = “Thanos-lives”

        }

      + tenancy                              = (known after apply)

      + user_data                            = (known after apply)

      + user_data_base64                     = (known after apply)

      + user_data_replace_on_change          = false

      + vpc_security_group_ids               = [

          + “sg-0db2bfe3f6898d033”,

        ]

.

      + root_block_device {

          + delete_on_termination = true

          + device_name           = (known after apply)

          + encrypted             = (known after apply)

          + iops                  = (known after apply)

          + kms_key_id            = (known after apply)

          + tags_all              = (known after apply)

          + throughput            = (known after apply)

          + volume_id             = (known after apply)

          + volume_size           = 30

          + volume_type           = “gp2”

        }

    }

.

Plan: 1 to add, 0 to change, 0 to destroy.

.

Changes to Outputs:

  + first_security_group_id = “sg-0db2bfe3f6898d033”

  + instance_id             = (known after apply)

  + instance_ip_addr        = (known after apply)

.

─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

.

Note: You didn’t use the -out option to save this plan, so Terraform can’t guarantee to take exactly these actions if you run “terraform

apply” now.

.

thanosjumpbox:~/my-terraform$ terraform$ terraform apply

.

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:

  + create

.

Terraform will perform the following actions:

.

  # aws_instance.my_instance will be created

  + resource “aws_instance” “my_instance” {

      + ami                                  = “ami-0827b6c5b977c020e”

      + arn                                  = (known after apply)

      + associate_public_ip_address          = (known after apply)

      + availability_zone                    = (known after apply)

      + cpu_core_count                       = (known after apply)

      + cpu_threads_per_core                 = (known after apply)

      + disable_api_stop                     = (known after apply)

      + disable_api_termination              = (known after apply)

      + ebs_optimized                        = (known after apply)

      + get_password_data                    = false

      + host_id                              = (known after apply)

      + host_resource_group_arn              = (known after apply)

      + iam_instance_profile                 = (known after apply)

      + id                                   = (known after apply)

      + instance_initiated_shutdown_behavior = (known after apply)

      + instance_lifecycle                   = (known after apply)

      + instance_state                       = (known after apply)

      + instance_type                        = “t2.micro

      + ipv6_address_count                   = (known after apply)

      + ipv6_addresses                       = (known after apply)

      + key_name                             = “nicktailor-aws”

      + monitoring                           = (known after apply)

      + outpost_arn                          = (known after apply)

      + password_data                        = (known after apply)

      + placement_group                      = (known after apply)

      + placement_partition_number           = (known after apply)

      + primary_network_interface_id         = (known after apply)

      + private_dns                          = (known after apply)

      + private_ip                           = (known after apply)

      + public_dns                           = (known after apply)

      + public_ip                            = (known after apply)

      + secondary_private_ips                = (known after apply)

      + security_groups                      = (known after apply)

      + source_dest_check                    = true

      + spot_instance_request_id             = (known after apply)

      + subnet_id                            = “subnet-0e80683fe32a75513”

      + tags                                 = {

          + “Name” = “Thanos-lives”

        }

      + tags_all                             = {

          + “Name” = “Thanos-lives”

        }

      + tenancy                              = (known after apply)

      + user_data                            = (known after apply)

      + user_data_base64                     = (known after apply)

      + user_data_replace_on_change          = false

      + vpc_security_group_ids               = [

          + “sg-0db2bfe3f6898d033”,

        ]

.

      + root_block_device {

          + delete_on_termination = true

          + device_name           = (known after apply)

          + encrypted             = (known after apply)

          + iops                  = (known after apply)

          + kms_key_id            = (known after apply)

          + tags_all              = (known after apply)

          + throughput            = (known after apply)

          + volume_id             = (known after apply)

          + volume_size           = 30

          + volume_type           = “gp2”

        }

    }

.

Plan: 1 to add, 0 to change, 0 to destroy.

.

Changes to Outputs:

  + first_security_group_id = “sg-0db2bfe3f6898d033”

  + instance_id             = (known after apply)

  + instance_ip_addr        = (known after apply)

.

Do you want to perform these actions?

  Terraform will perform the actions described above.

  Only ‘yes’ will be accepted to approve.

.

  Enter a value: yes

.

aws_instance.my_instance: Creating…

aws_instance.my_instance: Still creating… [10s elapsed]

aws_instance.my_instance: Still creating… [20s elapsed]

aws_instance.my_instance: Creation complete after 22s [id=i-0ee382e24ad28ecb8]

.

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

.

Outputs:

.

first_security_group_id = “sg-0db2bfe3f6898d033”

instance_id = “i-0ee382e24ad28ecb8”

instance_ip_addr = “50.18.90.217”

Result: