Category: Oracle Cloud

admin | May 21, 2024

Deploying Production-Grade Systems on Oracle Cloud Infrastructure (OCI) with Terraform

Launching a virtual machine is easy. Running secure, reliable, production-grade systems is not. This guide shows how to deploy enterprise-ready compute infrastructure on Oracle Cloud Infrastructure (OCI) using Terraform, with a focus on security, fault tolerance, and long-term operability.

What “Production-Grade” Actually Means

A production environment is defined by predictability, not convenience. Production systems must survive failures, scale safely, and be observable at all times.

Private networking by default
No public SSH access
Replaceable compute instances
Persistent storage separated from OS
Infrastructure defined as code

Target Architecture Overview

Private VCN and subnet with NAT gateway for outbound access
Network Security Groups (NSGs) with explicit rules
Flex compute shape
Detached block storage with iSCSI attachment
SSH key authentication only

This architecture is suitable for:

SaaS backends
Internal APIs
Databases
AI / ML inference nodes
HPC control or login nodes

Terraform: Provider Configuration


terraform {
  required_version = ">= 1.6"

  required_providers {
    oci = {
      source  = "oracle/oci"
      version = ">= 5.0.0"
    }
  }
}

provider "oci" {
  tenancy_ocid     = var.tenancy_ocid
  user_ocid        = var.user_ocid
  fingerprint      = var.fingerprint
  private_key_path = var.private_key_path
  region           = var.region
}

This ensures reproducible deployments and enforces secure API-based authentication.

Variables


variable "tenancy_ocid" {
  description = "OCID of the tenancy"
  type        = string
}

variable "user_ocid" {
  description = "OCID of the user calling the API"
  type        = string
}

variable "fingerprint" {
  description = "Fingerprint of the API signing key"
  type        = string
}

variable "private_key_path" {
  description = "Path to the private key for API authentication"
  type        = string
}

variable "region" {
  description = "OCI region identifier"
  type        = string
}

variable "compartment_ocid" {
  description = "OCID of the compartment for resources"
  type        = string
}

variable "image_ocid" {
  description = "OCID of the compute image (e.g., Oracle Linux 8)"
  type        = string
}

variable "ssh_public_key" {
  description = "Path to SSH public key file"
  type        = string
}

variable "allowed_cidr" {
  description = "CIDR block allowed to access instances (e.g., VPN range)"
  type        = string
  default     = "10.0.0.0/16"
}

Data Sources


data "oci_identity_availability_domains" "ads" {
  compartment_id = var.tenancy_ocid
}

locals {
  availability_domain = data.oci_identity_availability_domains.ads.availability_domains[0].name
}

This retrieves the list of availability domains in your region. We select the first AD for simplicity, but production deployments should consider multi-AD placement.

Virtual Cloud Network (VCN)


resource "oci_core_vcn" "prod_vcn" {
  cidr_blocks    = ["10.0.0.0/16"]
  display_name   = "prod-vcn"
  dns_label      = "prodvcn"
  compartment_id = var.compartment_ocid
}

A /16 CIDR allows future expansion without redesign. VCNs act as the first isolation boundary for production systems.

Internet Gateway


resource "oci_core_internet_gateway" "igw" {
  compartment_id = var.compartment_ocid
  vcn_id         = oci_core_vcn.prod_vcn.id
  display_name   = "prod-igw"
  enabled        = true
}

The internet gateway enables outbound connectivity for the NAT gateway. It does not expose private instances directly.

NAT Gateway


resource "oci_core_nat_gateway" "nat_gw" {
  compartment_id = var.compartment_ocid
  vcn_id         = oci_core_vcn.prod_vcn.id
  display_name   = "prod-nat-gw"
  block_traffic  = false
}

The NAT gateway allows private subnet instances to reach the internet for package updates and external API calls without exposing inbound access.

Route Tables


resource "oci_core_route_table" "private_rt" {
  compartment_id = var.compartment_ocid
  vcn_id         = oci_core_vcn.prod_vcn.id
  display_name   = "private-route-table"

  route_rules {
    destination       = "0.0.0.0/0"
    destination_type  = "CIDR_BLOCK"
    network_entity_id = oci_core_nat_gateway.nat_gw.id
  }
}

All outbound traffic from the private subnet routes through the NAT gateway. This ensures instances can reach external resources without being directly accessible.

Private Subnet (No Public IPs)


resource "oci_core_subnet" "private_subnet" {
  cidr_block                 = "10.0.1.0/24"
  vcn_id                     = oci_core_vcn.prod_vcn.id
  compartment_id             = var.compartment_ocid
  display_name               = "private-subnet"
  prohibit_public_ip_on_vnic = true
  route_table_id             = oci_core_route_table.private_rt.id
  dns_label                  = "private"
}

Instances in this subnet are never reachable from the internet. Access must go through a bastion, VPN, or private load balancer.

Network Security Groups (NSG)


resource "oci_core_network_security_group" "app_nsg" {
  compartment_id = var.compartment_ocid
  vcn_id         = oci_core_vcn.prod_vcn.id
  display_name   = "app-nsg"
}

# Allow SSH from internal network only
resource "oci_core_network_security_group_security_rule" "allow_ssh" {
  network_security_group_id = oci_core_network_security_group.app_nsg.id
  direction                 = "INGRESS"
  protocol                  = "6" # TCP

  source      = var.allowed_cidr
  source_type = "CIDR_BLOCK"

  tcp_options {
    destination_port_range {
      min = 22
      max = 22
    }
  }
}

# Allow HTTPS from internal network
resource "oci_core_network_security_group_security_rule" "allow_https" {
  network_security_group_id = oci_core_network_security_group.app_nsg.id
  direction                 = "INGRESS"
  protocol                  = "6" # TCP

  source      = var.allowed_cidr
  source_type = "CIDR_BLOCK"

  tcp_options {
    destination_port_range {
      min = 443
      max = 443
    }
  }
}

# Allow all outbound traffic
resource "oci_core_network_security_group_security_rule" "allow_egress" {
  network_security_group_id = oci_core_network_security_group.app_nsg.id
  direction                 = "EGRESS"
  protocol                  = "all"

  destination      = "0.0.0.0/0"
  destination_type = "CIDR_BLOCK"
}

# Allow ICMP for path MTU discovery
resource "oci_core_network_security_group_security_rule" "allow_icmp" {
  network_security_group_id = oci_core_network_security_group.app_nsg.id
  direction                 = "INGRESS"
  protocol                  = "1" # ICMP

  source      = "10.0.0.0/16"
  source_type = "CIDR_BLOCK"

  icmp_options {
    type = 3
    code = 4
  }
}

NSGs provide service-level firewalling and are preferred over subnet-wide security lists. These rules allow SSH and HTTPS only from your internal network, while permitting all outbound traffic.

Compute Instance (Flex Shape)


resource "oci_core_instance" "prod_instance" {
  availability_domain = local.availability_domain
  compartment_id      = var.compartment_ocid
  display_name        = "prod-app-01"
  shape               = "VM.Standard.E4.Flex"

  shape_config {
    ocpus         = 2
    memory_in_gbs = 16
  }

  create_vnic_details {
    subnet_id        = oci_core_subnet.private_subnet.id
    assign_public_ip = false
    nsg_ids          = [oci_core_network_security_group.app_nsg.id]
    hostname_label   = "prod-app-01"
  }

  source_details {
    source_type             = "image"
    source_id               = var.image_ocid
    boot_volume_size_in_gbs = 50
  }

  metadata = {
    ssh_authorized_keys = file(var.ssh_public_key)
  }

  preserve_boot_volume = true
}

Flex shapes allow independent scaling of CPU and memory, ensuring predictable performance without overpaying for unused resources. Setting preserve_boot_volume = true protects the boot volume if the instance is accidentally terminated.

Persistent Block Storage


resource "oci_core_volume" "data_volume" {
  availability_domain = local.availability_domain
  compartment_id      = var.compartment_ocid
  display_name        = "prod-data-vol"
  size_in_gbs         = 200
  vpus_per_gb         = 10 # Balanced performance tier
}

resource "oci_core_volume_attachment" "data_attach" {
  attachment_type = "paravirtualized"
  instance_id     = oci_core_instance.prod_instance.id
  volume_id       = oci_core_volume.data_volume.id
  display_name    = "prod-data-attachment"
}

Separating OS and data ensures instances are disposable while data remains protected. Paravirtualized attachments are simpler than iSCSI and work automatically on Oracle Linux.

Post-Deployment: Mounting the Block Volume

After Terraform applies, SSH into the instance and mount the volume:


# Find the attached volume (usually /dev/sdb)
lsblk

# Create filesystem (first time only)
sudo mkfs.xfs /dev/sdb

# Create mount point and mount
sudo mkdir -p /data
sudo mount /dev/sdb /data

# Add to fstab for persistence across reboots
echo '/dev/sdb /data xfs defaults,_netdev,nofail 0 2' | sudo tee -a /etc/fstab

The _netdev and nofail options ensure the system boots even if the volume is temporarily unavailable.

Outputs


output "instance_private_ip" {
  description = "Private IP address of the compute instance"
  value       = oci_core_instance.prod_instance.private_ip
}

output "instance_id" {
  description = "OCID of the compute instance"
  value       = oci_core_instance.prod_instance.id
}

output "vcn_id" {
  description = "OCID of the VCN"
  value       = oci_core_vcn.prod_vcn.id
}

output "volume_id" {
  description = "OCID of the data volume"
  value       = oci_core_volume.data_volume.id
}

Security & Operational Checklist

✓ No public SSH access
✓ Key-based authentication only
✓ Private networking with NAT for outbound
✓ Explicit NSG rules (no default allow)
✓ Persistent storage with separate lifecycle
✓ Infrastructure fully defined in code
✓ Boot volume preservation enabled

What to Add Next

Bastion Service – OCI’s managed bastion for secure SSH access without VPN
Site-to-Site VPN – Connect to on-premises networks
OCI Load Balancer – For multi-instance deployments
Monitoring and Alerting – OCI Monitoring service with custom alarms
Dynamic Groups and IAM policies – Instance principals for secure API access
Cloud-init or Ansible – OS hardening and application deployment
CI/CD pipelines – GitOps workflow for Terraform changes
Volume backups – Scheduled backup policies for data protection

Example terraform.tfvars


tenancy_ocid     = "ocid1.tenancy.oc1..aaaaaaaaexample"
user_ocid        = "ocid1.user.oc1..aaaaaaaaexample"
fingerprint      = "aa:bb:cc:dd:ee:ff:00:11:22:33:44:55:66:77:88:99"
private_key_path = "~/.oci/oci_api_key.pem"
region           = "eu-frankfurt-1"
compartment_ocid = "ocid1.compartment.oc1..aaaaaaaaexample"
image_ocid       = "ocid1.image.oc1.eu-frankfurt-1.aaaaaaaaexample"
ssh_public_key   = "~/.ssh/id_rsa.pub"
allowed_cidr     = "10.0.0.0/16"

Nick Tailor’s Thoughts

Production infrastructure is not about clicking faster. It is about repeatability, security, and recovery. OCI combined with Terraform provides an extremely strong foundation when engineered correctly from day one.

If you treat infrastructure as software, production becomes predictable.

The complete code from this guide is available as a ready-to-use Terraform module. Clone it, update your variables, and run terraform apply to deploy.

admin May 21, 2024 Oracle Cloud, TerraformNo Comments »

admin | May 15, 2024

How to Power Up or Power Down multiple instances in OCI using CLI with Ansible

• This assume you have already configured the OCI cli and added your key to the user inside the OCI interface so your Ubuntu or Jump box can connect to your OCI infrastructure

• Ansible

◦ Role to control power up/down instances using the OCI CLI

◦ This assume you already have ansible setup

◦ You will need to install the ansible oci collections

Now the reason why you would probably want this is over terraform is because terraform is more suited for infrastructure orchestration and not really suited to deal with the instances once they are up and running.

If you have scaled servers out in OCI powering servers up and down in bulk currently is not available. If you are doing a migration or using a staging environment that you need need to use the machine when building or doing troubleshooting.

Then having a way to power up/down multiple machines at once is convenient.

Install the OCI collections if you don’t have it already.

Linux/macOS

curl -L https://raw.githubusercontent.com/oracle/oci-ansible-collection/master/scripts/install.sh | bash -s — —verbose

ansible-galaxy collection list – Will list the collections installed

# /path/to/ansible/collections

Collection Version

——————- ——-

amazon.aws 1.4.0

ansible.builtin 1.3.0

ansible.posix 1.3.0

oracle.oci 2.10.0

Once you have it installed you need to test the OCI client is working

oci iam compartment list –all (this will list out the compartment ID list for your instances.

Compartments in OCI are a way to organise infrastructure and control access to those resources. This is great for if you have contractors coming and you only want them to have access to certain things not everything.

Now there are two ways you can your instance names.

• One logging in via the OCI interface and going the correct compartment, which is very slow and mind numbing to wait for.

• Or you can use automated approaches which is what you should be doing with everything you do that needs to be done over and over.

Bash Script to get the instances names from OCI

• This will use the OCI CLI and provide all instances name and ips

• It loops through each availability domain.

• for each availability domain, it lists the instance IDs and writes them to instance_ids.txt.

• It cleans up the instance_ids.txt file to remove brackets, quotes, and commas.

• It reads each instance ID from instance_ids.txt.

• For each instance, it retrieves the VNIC information.

• It extracts the display name, public IP, and private IP, and prints them.

• The script ends the loop and moves to the next availability domain.

compartment_id=“ocid1.compartment.oc1..insert compartment ID here“

# Explicitly define the availability domains based on your provided data

availability_domains=(“zcLB:US-CHICAGO-1-AD-1” “zcLB:US-CHICAGO-1-AD-2” “zcLB:US-CHICAGO-1-AD-3”)

# For each availability domain, list the instances

for ad in “${availability_domains[@]}”; do

# List instances within the specific AD and compartment, extracting the “id” field

oci compute instance list –compartment-id $compartment_id –availability-domain $ad –query “data[].id” –raw-output > instance_ids.txt

# Clean up the instance IDs (removing brackets, quotes, etc.)

sed –i ‘s/\[//g’ instance_ids.txt

sed –i ‘s/\]//g’ instance_ids.txt

sed –i ‘s/”//g’ instance_ids.txt

sed –i ‘s/,//g’ instance_ids.txt

# Read each instance ID from instance_ids.txt

while read -r instance_id; do

# Get instance VNIC information

instance_info=$(oci compute instance list-vnics –instance-id “$instance_id“)

# Extract the required fields and print them

display_name=$(echo “$instance_info“ | jq -r ‘.data[0].”display-name”‘)

public_ip=$(echo “$instance_info“ | jq -r ‘.data[0].”public-ip“‘)

private_ip=$(echo “$instance_info“ | jq -r ‘.data[0].”private-ip“‘)

echo “Availability Domain: $ad“

echo “Display Name: $display_name“

echo “Public IP: $public_ip“

echo “Private IP: $private_ip“

echo “—————————————–“

done < instance_ids.txt

done

The output of the script when piped in to a file will look like

Instance.names

Availability Domain: zcLB:US-CHICAGO-1-AD-1

Display Name: Instance1

Public IP: 192.0.2.1

Private IP: 10.0.0.1

—————————————–

Availability Domain: zcLB:US-CHICAGO-1-AD-1

Display Name: Instance2

Public IP: 192.0.2.2

Private IP: 10.0.0.2

—————————————–

…

You can now grep this file for the name of the servers you want to power on or off quickly

• grep instance.names | grep <Instance*>

Now we have an ansible playbook that can power on or power off the instance by name provided by the OCI client

Ansible playbook to power on or off multiple instances via OCI CLI

—

– name: Control OCI Instance Power State based on Instance Names

hosts: localhost

vars:

instance_names_to_stop:

– instance1

# Add more instance names here if you wish to stop them…

instance_names_to_start:

# List the instance names you wish to start here…

# Example:

– Instance2

tasks:

– name: Fetch all instance details in the compartment

command:

cmd: “oci compute instance list –compartment-id ocid1.compartment.oc1..aaaaaaaak7jc7tn2su2oqzmrbujpr5wmnuucj4mwj4o4g7rqlzemy4yvxrza –output json“

– set_fact:

instances: “{{ oci_output.stdout | from_json }}”

– name: Extract relevant information

set_fact:

clean_instances: “{{ clean_instances | default([]) + [{ ‘name’: item[‘display-name’], ‘id’: item.id, ‘state’: item[‘lifecycle-state’] }] }}”

loop: “{{ instances.data }}”

when: “‘display-name’ in item and ‘id’ in item and ‘lifecycle-state’ in item”

– name: Filter out instances to stop

set_fact:

instances_to_stop: “{{ instances_to_stop | default([]) + [item] }}”

loop: “{{ clean_instances }}”

when: “item.name in instance_names_to_stop and item.state == ‘RUNNING'”

– name: Filter out instances to start

set_fact:

instances_to_start: “{{ instances_to_start | default([]) + [item] }}”

loop: “{{ clean_instances }}”

when: “item.name in instance_names_to_start and item.state == ‘STOPPED'”

– name: Filter out instances to stop

set_fact:

instances_to_stop: “{{ clean_instances | selectattr(‘name’, ‘in’, instance_names_to_stop) | selectattr(‘state’, ‘equalto‘, ‘RUNNING’) | list }}”

– name: Filter out instances to start

set_fact:

instances_to_start: “{{ clean_instances | selectattr(‘name’, ‘in’, instance_names_to_start) | selectattr(‘state’, ‘equalto‘, ‘STOPPED’) | list }}”

– name: Display instances to stop (you can remove this debug task later)

debug:

var: instances_to_stop

– name: Display instances to start (you can remove this debug task later)

debug:

var: instances_to_start

– name: Power off instances

command:

cmd: “oci compute instance action —action STOP –instance-id {{ item.id }}”

loop: “{{ instances_to_stop }}”

when: instances_to_stop | length > 0

# – debug:

# var: state

– name: Power on instances

command:

cmd: “oci compute instance action —action START –instance-id {{ item.id }}”

loop: “{{ instances_to_start }}”

when: instances_to_start | length > 0

The output will look like

PLAY [Control OCI Instance Power State based on Instance Names] **********************************************************************************

TASK [Gathering Facts] ***************************************************************************************************************************

ok: [localhost]

TASK [Fetch all instance details in the compartment] *********************************************************************************************

changed: [localhost]

TASK [Parse the OCI CLI output] ******************************************************************************************************************

ok: [localhost]

TASK [Extract relevant information] **************************************************************************************************************

ok: [localhost] => (item={‘display-name’: ‘Instance1’, ‘id’: ‘ocid1.instance.oc1..exampleuniqueID1’, ‘lifecycle-state’: ‘STOPPED’})

ok: [localhost] => (item={‘display-name’: ‘Instance2’, ‘id’: ‘ocid1.instance.oc1..exampleuniqueID2’, ‘lifecycle-state’: ‘RUNNING’})

TASK [Filter out instances to stop] **************************************************************************************************************

ok: [localhost]

TASK [Filter out instances to start] *************************************************************************************************************

ok: [localhost]

TASK [Display instances to stop (you can remove this debug task later)] **************************************************************************

ok: [localhost] => {

“instances_to_stop“: [

{

“name”: “Instance2”,

“id”: “ocid1.instance.oc1..exampleuniqueID2″,

“state”: “RUNNING”

}

]

}

TASK [Display instances to start (you can remove this debug task later)] *************************************************************************

ok: [localhost] => {

“instances_to_start“: [

{

“name”: “Instance1”,

“id”: “ocid1.instance.oc1..exampleuniqueID1″,

“state”: “STOPPED”

}

]

}

TASK [Power off instances] ***********************************************************************************************************************

changed: [localhost] => (item={‘name’: ‘Instance2’, ‘id’: ‘ocid1.instance.oc1..exampleuniqueID2’, ‘state’: ‘RUNNING’})

TASK [Power on instances] ************************************************************************************************************************

changed: [localhost] => (item={‘name’: ‘Instance1’, ‘id’: ‘ocid1.instance.oc1..exampleuniqueID1’, ‘state’: ‘STOPPED’})

PLAY RECAP ****************************************************************************************************************************************

localhost : ok=9 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0

admin May 15, 2024 Ansible, Linux, Oracle CloudNo Comments »

Nick Tailor's Technical Blog

A detail-minded individual, combining strong technical understanding and communication skills with experiences in Systems Administration, Engineering, Automation, AI Automation and Solutions; a proven methodical problem solver.

Category: Oracle Cloud

Deploying Production-Grade Systems on Oracle Cloud Infrastructure (OCI) with Terraform

What “Production-Grade” Actually Means

Target Architecture Overview

Terraform: Provider Configuration

Variables

Data Sources

Virtual Cloud Network (VCN)

Internet Gateway

NAT Gateway

Route Tables

Private Subnet (No Public IPs)

Network Security Groups (NSG)

Compute Instance (Flex Shape)

Persistent Block Storage

Post-Deployment: Mounting the Block Volume

Outputs

Security & Operational Checklist

What to Add Next

Example terraform.tfvars

Nick Tailor’s Thoughts

How to Power Up or Power Down multiple instances in OCI using CLI with Ansible