Category: Oracle Cloud

Deploying Production-Grade Systems on Oracle Cloud Infrastructure (OCI) with Terraform

Launching a virtual machine is easy. Running secure, reliable, production-grade systems is not. This guide shows how to deploy enterprise-ready compute infrastructure on Oracle Cloud Infrastructure (OCI) using Terraform, with a focus on security, fault tolerance, and long-term operability.


What “Production-Grade” Actually Means

A production environment is defined by predictability, not convenience. Production systems must survive failures, scale safely, and be observable at all times.

  • Private networking by default
  • No public SSH access
  • Replaceable compute instances
  • Persistent storage separated from OS
  • Infrastructure defined as code

Target Architecture Overview

  • Private VCN and subnet with NAT gateway for outbound access
  • Network Security Groups (NSGs) with explicit rules
  • Flex compute shape
  • Detached block storage with iSCSI attachment
  • SSH key authentication only

This architecture is suitable for:

  • SaaS backends
  • Internal APIs
  • Databases
  • AI / ML inference nodes
  • HPC control or login nodes

Terraform: Provider Configuration


terraform {
  required_version = ">= 1.6"

  required_providers {
    oci = {
      source  = "oracle/oci"
      version = ">= 5.0.0"
    }
  }
}

provider "oci" {
  tenancy_ocid     = var.tenancy_ocid
  user_ocid        = var.user_ocid
  fingerprint      = var.fingerprint
  private_key_path = var.private_key_path
  region           = var.region
}

This ensures reproducible deployments and enforces secure API-based authentication.


Variables


variable "tenancy_ocid" {
  description = "OCID of the tenancy"
  type        = string
}

variable "user_ocid" {
  description = "OCID of the user calling the API"
  type        = string
}

variable "fingerprint" {
  description = "Fingerprint of the API signing key"
  type        = string
}

variable "private_key_path" {
  description = "Path to the private key for API authentication"
  type        = string
}

variable "region" {
  description = "OCI region identifier"
  type        = string
}

variable "compartment_ocid" {
  description = "OCID of the compartment for resources"
  type        = string
}

variable "image_ocid" {
  description = "OCID of the compute image (e.g., Oracle Linux 8)"
  type        = string
}

variable "ssh_public_key" {
  description = "Path to SSH public key file"
  type        = string
}

variable "allowed_cidr" {
  description = "CIDR block allowed to access instances (e.g., VPN range)"
  type        = string
  default     = "10.0.0.0/16"
}

Data Sources


data "oci_identity_availability_domains" "ads" {
  compartment_id = var.tenancy_ocid
}

locals {
  availability_domain = data.oci_identity_availability_domains.ads.availability_domains[0].name
}

This retrieves the list of availability domains in your region. We select the first AD for simplicity, but production deployments should consider multi-AD placement.


Virtual Cloud Network (VCN)


resource "oci_core_vcn" "prod_vcn" {
  cidr_blocks    = ["10.0.0.0/16"]
  display_name   = "prod-vcn"
  dns_label      = "prodvcn"
  compartment_id = var.compartment_ocid
}

A /16 CIDR allows future expansion without redesign. VCNs act as the first isolation boundary for production systems.


Internet Gateway


resource "oci_core_internet_gateway" "igw" {
  compartment_id = var.compartment_ocid
  vcn_id         = oci_core_vcn.prod_vcn.id
  display_name   = "prod-igw"
  enabled        = true
}

The internet gateway enables outbound connectivity for the NAT gateway. It does not expose private instances directly.


NAT Gateway


resource "oci_core_nat_gateway" "nat_gw" {
  compartment_id = var.compartment_ocid
  vcn_id         = oci_core_vcn.prod_vcn.id
  display_name   = "prod-nat-gw"
  block_traffic  = false
}

The NAT gateway allows private subnet instances to reach the internet for package updates and external API calls without exposing inbound access.


Route Tables


resource "oci_core_route_table" "private_rt" {
  compartment_id = var.compartment_ocid
  vcn_id         = oci_core_vcn.prod_vcn.id
  display_name   = "private-route-table"

  route_rules {
    destination       = "0.0.0.0/0"
    destination_type  = "CIDR_BLOCK"
    network_entity_id = oci_core_nat_gateway.nat_gw.id
  }
}

All outbound traffic from the private subnet routes through the NAT gateway. This ensures instances can reach external resources without being directly accessible.


Private Subnet (No Public IPs)


resource "oci_core_subnet" "private_subnet" {
  cidr_block                 = "10.0.1.0/24"
  vcn_id                     = oci_core_vcn.prod_vcn.id
  compartment_id             = var.compartment_ocid
  display_name               = "private-subnet"
  prohibit_public_ip_on_vnic = true
  route_table_id             = oci_core_route_table.private_rt.id
  dns_label                  = "private"
}

Instances in this subnet are never reachable from the internet. Access must go through a bastion, VPN, or private load balancer.


Network Security Groups (NSG)


resource "oci_core_network_security_group" "app_nsg" {
  compartment_id = var.compartment_ocid
  vcn_id         = oci_core_vcn.prod_vcn.id
  display_name   = "app-nsg"
}

# Allow SSH from internal network only
resource "oci_core_network_security_group_security_rule" "allow_ssh" {
  network_security_group_id = oci_core_network_security_group.app_nsg.id
  direction                 = "INGRESS"
  protocol                  = "6" # TCP

  source      = var.allowed_cidr
  source_type = "CIDR_BLOCK"

  tcp_options {
    destination_port_range {
      min = 22
      max = 22
    }
  }
}

# Allow HTTPS from internal network
resource "oci_core_network_security_group_security_rule" "allow_https" {
  network_security_group_id = oci_core_network_security_group.app_nsg.id
  direction                 = "INGRESS"
  protocol                  = "6" # TCP

  source      = var.allowed_cidr
  source_type = "CIDR_BLOCK"

  tcp_options {
    destination_port_range {
      min = 443
      max = 443
    }
  }
}

# Allow all outbound traffic
resource "oci_core_network_security_group_security_rule" "allow_egress" {
  network_security_group_id = oci_core_network_security_group.app_nsg.id
  direction                 = "EGRESS"
  protocol                  = "all"

  destination      = "0.0.0.0/0"
  destination_type = "CIDR_BLOCK"
}

# Allow ICMP for path MTU discovery
resource "oci_core_network_security_group_security_rule" "allow_icmp" {
  network_security_group_id = oci_core_network_security_group.app_nsg.id
  direction                 = "INGRESS"
  protocol                  = "1" # ICMP

  source      = "10.0.0.0/16"
  source_type = "CIDR_BLOCK"

  icmp_options {
    type = 3
    code = 4
  }
}

NSGs provide service-level firewalling and are preferred over subnet-wide security lists. These rules allow SSH and HTTPS only from your internal network, while permitting all outbound traffic.


Compute Instance (Flex Shape)


resource "oci_core_instance" "prod_instance" {
  availability_domain = local.availability_domain
  compartment_id      = var.compartment_ocid
  display_name        = "prod-app-01"
  shape               = "VM.Standard.E4.Flex"

  shape_config {
    ocpus         = 2
    memory_in_gbs = 16
  }

  create_vnic_details {
    subnet_id        = oci_core_subnet.private_subnet.id
    assign_public_ip = false
    nsg_ids          = [oci_core_network_security_group.app_nsg.id]
    hostname_label   = "prod-app-01"
  }

  source_details {
    source_type             = "image"
    source_id               = var.image_ocid
    boot_volume_size_in_gbs = 50
  }

  metadata = {
    ssh_authorized_keys = file(var.ssh_public_key)
  }

  preserve_boot_volume = true
}

Flex shapes allow independent scaling of CPU and memory, ensuring predictable performance without overpaying for unused resources. Setting preserve_boot_volume = true protects the boot volume if the instance is accidentally terminated.


Persistent Block Storage


resource "oci_core_volume" "data_volume" {
  availability_domain = local.availability_domain
  compartment_id      = var.compartment_ocid
  display_name        = "prod-data-vol"
  size_in_gbs         = 200
  vpus_per_gb         = 10 # Balanced performance tier
}

resource "oci_core_volume_attachment" "data_attach" {
  attachment_type = "paravirtualized"
  instance_id     = oci_core_instance.prod_instance.id
  volume_id       = oci_core_volume.data_volume.id
  display_name    = "prod-data-attachment"
}

Separating OS and data ensures instances are disposable while data remains protected. Paravirtualized attachments are simpler than iSCSI and work automatically on Oracle Linux.

Post-Deployment: Mounting the Block Volume

After Terraform applies, SSH into the instance and mount the volume:


# Find the attached volume (usually /dev/sdb)
lsblk

# Create filesystem (first time only)
sudo mkfs.xfs /dev/sdb

# Create mount point and mount
sudo mkdir -p /data
sudo mount /dev/sdb /data

# Add to fstab for persistence across reboots
echo '/dev/sdb /data xfs defaults,_netdev,nofail 0 2' | sudo tee -a /etc/fstab

The _netdev and nofail options ensure the system boots even if the volume is temporarily unavailable.


Outputs


output "instance_private_ip" {
  description = "Private IP address of the compute instance"
  value       = oci_core_instance.prod_instance.private_ip
}

output "instance_id" {
  description = "OCID of the compute instance"
  value       = oci_core_instance.prod_instance.id
}

output "vcn_id" {
  description = "OCID of the VCN"
  value       = oci_core_vcn.prod_vcn.id
}

output "volume_id" {
  description = "OCID of the data volume"
  value       = oci_core_volume.data_volume.id
}

Security & Operational Checklist

  • ✓ No public SSH access
  • ✓ Key-based authentication only
  • ✓ Private networking with NAT for outbound
  • ✓ Explicit NSG rules (no default allow)
  • ✓ Persistent storage with separate lifecycle
  • ✓ Infrastructure fully defined in code
  • ✓ Boot volume preservation enabled

What to Add Next

  • Bastion Service – OCI’s managed bastion for secure SSH access without VPN
  • Site-to-Site VPN – Connect to on-premises networks
  • OCI Load Balancer – For multi-instance deployments
  • Monitoring and Alerting – OCI Monitoring service with custom alarms
  • Dynamic Groups and IAM policies – Instance principals for secure API access
  • Cloud-init or Ansible – OS hardening and application deployment
  • CI/CD pipelines – GitOps workflow for Terraform changes
  • Volume backups – Scheduled backup policies for data protection

Example terraform.tfvars


tenancy_ocid     = "ocid1.tenancy.oc1..aaaaaaaaexample"
user_ocid        = "ocid1.user.oc1..aaaaaaaaexample"
fingerprint      = "aa:bb:cc:dd:ee:ff:00:11:22:33:44:55:66:77:88:99"
private_key_path = "~/.oci/oci_api_key.pem"
region           = "eu-frankfurt-1"
compartment_ocid = "ocid1.compartment.oc1..aaaaaaaaexample"
image_ocid       = "ocid1.image.oc1.eu-frankfurt-1.aaaaaaaaexample"
ssh_public_key   = "~/.ssh/id_rsa.pub"
allowed_cidr     = "10.0.0.0/16"

Nick Tailor’s Thoughts

Production infrastructure is not about clicking faster. It is about repeatability, security, and recovery. OCI combined with Terraform provides an extremely strong foundation when engineered correctly from day one.

If you treat infrastructure as software, production becomes predictable.

The complete code from this guide is available as a ready-to-use Terraform module. Clone it, update your variables, and run terraform apply to deploy.

How to Power Up or Power Down multiple instances in OCI using CLI with Ansible

 This assume you have already configured the OCI cli and added your key to the user inside the OCI interface so your Ubuntu or Jump box can connect to your OCI infrastructure
 Ansible
 Role to control power up/down instances using the OCI CLI
 This assume you already have ansible setup
 You will need to install the ansible oci collections

.

Now the reason why you would probably want this is over terraform is because terraform is more suited for infrastructure orchestration and not really suited to deal with the instances once they are up and running.

If you have scaled servers out in OCI powering servers up and down in bulk currently is not available. If you are doing a migration or using a staging environment that you need need to use the machine when building or doing troubleshooting.

Then having a way to power up/down multiple machines at once is convenient.

.

Install the OCI collections if you don’t have it already.

Linux/macOS

curl -L https://raw.githubusercontent.com/oracle/oci-ansible-collection/master/scripts/install.sh | bash -s — —verbose

.

ansible-galaxy collection list – Will list the collections installed

# /path/to/ansible/collections

Collection Version

——————- ——-

amazon.aws 1.4.0

ansible.builtin 1.3.0

ansible.posix 1.3.0

oracle.oci 2.10.0

.

Once you have it installed you need to test the OCI client is working

oci iam compartment list –all (this will list out the compartment ID list for your instances.

Compartments in OCI are a way to organise infrastructure and control access to those resources. This is great for if you have contractors coming and you only want them to have access to certain things not everything.

Now there are two ways you can your instance names.

 One logging in via the OCI interface and going the correct compartment, which is very slow and mind numbing to wait for.
 Or you can use automated approaches which is what you should be doing with everything you do that needs to be done over and over.

.

Bash Script to get the instances names from OCI

 This will use the OCI CLI and provide all instances name and ips
 It loops through each availability domain.
 for each availability domain, it lists the instance IDs and writes them to instance_ids.txt.
 It cleans up the instance_ids.txt file to remove brackets, quotes, and commas.
 It reads each instance ID from instance_ids.txt.
 For each instance, it retrieves the VNIC information.
 It extracts the display name, public IP, and private IP, and prints them.
 The script ends the loop and moves to the next availability domain.

compartment_id=ocid1.compartment.oc1..insert compartment ID here

.

# Explicitly define the availability domains based on your provided data

availability_domains=(“zcLB:US-CHICAGO-1-AD-1” “zcLB:US-CHICAGO-1-AD-2” “zcLB:US-CHICAGO-1-AD-3”)

.

# For each availability domain, list the instances

for ad in “${availability_domains[@]}”; do

.

    # List instances within the specific AD and compartment, extracting the “id” field

    oci compute instance list –compartment-id $compartment_id –availability-domain $ad –query data[].id” –raw-output > instance_ids.txt

.

    # Clean up the instance IDs (removing brackets, quotes, etc.)

    sed i ‘s/\[//g’ instance_ids.txt

    sed i ‘s/\]//g’ instance_ids.txt

    sed i ‘s/”//g’ instance_ids.txt

    sed i ‘s/,//g’ instance_ids.txt

.

    # Read each instance ID from instance_ids.txt

    while read -r instance_id; do

        # Get instance VNIC information

        instance_info=$(oci compute instance list-vnics –instance-id $instance_id)

.

        # Extract the required fields and print them

        display_name=$(echo $instance_info | jq -r ‘.data[0].”display-name”‘)

        public_ip=$(echo $instance_info | jq -r ‘.data[0].”public-ip“‘)

        private_ip=$(echo $instance_info | jq -r ‘.data[0].”private-ip“‘)

.

        echo “Availability Domain: $ad

        echo “Display Name: $display_name

        echo “Public IP: $public_ip

        echo “Private IP: $private_ip

        echo “—————————————–“

    done < instance_ids.txt

done

.

The output of the script when piped in to a file will look like

Instance.names

Availability Domain: zcLB:US-CHICAGO-1-AD-1

Display Name: Instance1

Public IP: 192.0.2.1

Private IP: 10.0.0.1

—————————————–

Availability Domain: zcLB:US-CHICAGO-1-AD-1

Display Name: Instance2

Public IP: 192.0.2.2

Private IP: 10.0.0.2

—————————————–

.

.

You can now grep this file for the name of the servers you want to power on or off quickly

 grep instance.names | grep <Instance*>

.

Now we have an ansible playbook that can power on or power off the instance by name provided by the OCI client

Ansible playbook to power on or off multiple instances via OCI CLI

name: Control OCI Instance Power State based on Instance Names

  hosts: localhost

  vars:

    instance_names_to_stop:

       instance1

      # Add more instance names here if you wish to stop them…

.

    instance_names_to_start:

      # List the instance names you wish to start here…

      # Example:

       Instance2

.

  tasks:

   name: Fetch all instance details in the compartment

    command:

      cmd: oci compute instance list –compartment-id ocid1.compartment.oc1..aaaaaaaak7jc7tn2su2oqzmrbujpr5wmnuucj4mwj4o4g7rqlzemy4yvxrza –output json

    register: oci_output

.

   set_fact:

      instances: {{ oci_output.stdout | from_json }}”

.

   name: Extract relevant information

    set_fact:

      clean_instances: {{ clean_instances | default([]) + [{ ‘name’: item[‘display-name’], ‘id’: item.id, ‘state’: item[‘lifecycle-state’] }] }}”

    loop: {{ instances.data }}”

    when: “‘display-name’ in item and ‘id’ in item and ‘lifecycle-state’ in item”

.

   name: Filter out instances to stop

    set_fact:

      instances_to_stop: {{ instances_to_stop | default([]) + [item] }}”

    loop: {{ clean_instances }}”

    when: “item.name in instance_names_to_stop and item.state == ‘RUNNING'”

.

   name: Filter out instances to start

    set_fact:

      instances_to_start: {{ instances_to_start | default([]) + [item] }}”

    loop: {{ clean_instances }}”

    when: “item.name in instance_names_to_start and item.state == ‘STOPPED'”

.

   name: Filter out instances to stop

    set_fact:

      instances_to_stop: {{ clean_instances | selectattr(‘name’, ‘in’, instance_names_to_stop) | selectattr(‘state’, ‘equalto‘, ‘RUNNING’) | list }}”

.

   name: Filter out instances to start

    set_fact:

      instances_to_start: {{ clean_instances | selectattr(‘name’, ‘in’, instance_names_to_start) | selectattr(‘state’, ‘equalto‘, ‘STOPPED’) | list }}”

.

   name: Display instances to stop (you can remove this debug task later)

    debug:

      var: instances_to_stop

.

   name: Display instances to start (you can remove this debug task later)

    debug:

      var: instances_to_start

.

   name: Power off instances

    command:

      cmd: oci compute instance action —action STOP –instance-id {{ item.id }}”

    loop: {{ instances_to_stop }}”

    when: instances_to_stop | length > 0

    register: state

.

#  – debug:

#      var: state

.

   name: Power on instances

    command:

      cmd: oci compute instance action —action START –instance-id {{ item.id }}”

    loop: {{ instances_to_start }}”

    when: instances_to_start | length > 0

.

The output will look like

PLAY [Control OCI Instance Power State based on Instance Names] **********************************************************************************

.

TASK [Gathering Facts] ***************************************************************************************************************************

ok: [localhost]

.

TASK [Fetch all instance details in the compartment] *********************************************************************************************

changed: [localhost]

.

TASK [Parse the OCI CLI output] ******************************************************************************************************************

ok: [localhost]

.

TASK [Extract relevant information] **************************************************************************************************************

ok: [localhost] => (item={‘display-name’: ‘Instance1’, ‘id’: ‘ocid1.instance.oc1..exampleuniqueID1’, ‘lifecycle-state’: ‘STOPPED’})

ok: [localhost] => (item={‘display-name’: ‘Instance2’, ‘id’: ‘ocid1.instance.oc1..exampleuniqueID2’, ‘lifecycle-state’: ‘RUNNING’})

.

TASK [Filter out instances to stop] **************************************************************************************************************

ok: [localhost]

.

TASK [Filter out instances to start] *************************************************************************************************************

ok: [localhost]

.

TASK [Display instances to stop (you can remove this debug task later)] **************************************************************************

ok: [localhost] => {

    instances_to_stop: [

        {

            “name”: “Instance2”,

            “id”: ocid1.instance.oc1..exampleuniqueID2″,

            “state”: RUNNING”

        }

    ]

}

.

TASK [Display instances to start (you can remove this debug task later)] *************************************************************************

ok: [localhost] => {

    instances_to_start: [

        {

            “name”: “Instance1”,

            “id”: ocid1.instance.oc1..exampleuniqueID1″,

            “state”: STOPPED”

        }

    ]

}

.

TASK [Power off instances] ***********************************************************************************************************************

changed: [localhost] => (item={‘name’: ‘Instance2’, ‘id’: ‘ocid1.instance.oc1..exampleuniqueID2’, ‘state’: ‘RUNNING’})

.

TASK [Power on instances] ************************************************************************************************************************

changed: [localhost] => (item={‘name’: ‘Instance1’, ‘id’: ‘ocid1.instance.oc1..exampleuniqueID1’, ‘state’: ‘STOPPED’})

.

PLAY RECAP ****************************************************************************************************************************************

localhost                  : ok=9    changed=3    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

.

.