Deploying Production-Grade Systems on Oracle Cloud Infrastructure (OCI) with Terraform

Launching a virtual machine is easy. Running secure, reliable, production-grade systems is not. This guide shows how to deploy enterprise-ready compute infrastructure on Oracle Cloud Infrastructure (OCI) using Terraform, with a focus on security, fault tolerance, and long-term operability.


What “Production-Grade” Actually Means

A production environment is defined by predictability, not convenience. Production systems must survive failures, scale safely, and be observable at all times.

  • Private networking by default
  • No public SSH access
  • Replaceable compute instances
  • Persistent storage separated from OS
  • Infrastructure defined as code

Target Architecture Overview

  • Private VCN and subnet with NAT gateway for outbound access
  • Network Security Groups (NSGs) with explicit rules
  • Flex compute shape
  • Detached block storage with iSCSI attachment
  • SSH key authentication only

This architecture is suitable for:

  • SaaS backends
  • Internal APIs
  • Databases
  • AI / ML inference nodes
  • HPC control or login nodes

Terraform: Provider Configuration


terraform {
  required_version = ">= 1.6"

  required_providers {
    oci = {
      source  = "oracle/oci"
      version = ">= 5.0.0"
    }
  }
}

provider "oci" {
  tenancy_ocid     = var.tenancy_ocid
  user_ocid        = var.user_ocid
  fingerprint      = var.fingerprint
  private_key_path = var.private_key_path
  region           = var.region
}

This ensures reproducible deployments and enforces secure API-based authentication.


Variables


variable "tenancy_ocid" {
  description = "OCID of the tenancy"
  type        = string
}

variable "user_ocid" {
  description = "OCID of the user calling the API"
  type        = string
}

variable "fingerprint" {
  description = "Fingerprint of the API signing key"
  type        = string
}

variable "private_key_path" {
  description = "Path to the private key for API authentication"
  type        = string
}

variable "region" {
  description = "OCI region identifier"
  type        = string
}

variable "compartment_ocid" {
  description = "OCID of the compartment for resources"
  type        = string
}

variable "image_ocid" {
  description = "OCID of the compute image (e.g., Oracle Linux 8)"
  type        = string
}

variable "ssh_public_key" {
  description = "Path to SSH public key file"
  type        = string
}

variable "allowed_cidr" {
  description = "CIDR block allowed to access instances (e.g., VPN range)"
  type        = string
  default     = "10.0.0.0/16"
}

Data Sources


data "oci_identity_availability_domains" "ads" {
  compartment_id = var.tenancy_ocid
}

locals {
  availability_domain = data.oci_identity_availability_domains.ads.availability_domains[0].name
}

This retrieves the list of availability domains in your region. We select the first AD for simplicity, but production deployments should consider multi-AD placement.


Virtual Cloud Network (VCN)


resource "oci_core_vcn" "prod_vcn" {
  cidr_blocks    = ["10.0.0.0/16"]
  display_name   = "prod-vcn"
  dns_label      = "prodvcn"
  compartment_id = var.compartment_ocid
}

A /16 CIDR allows future expansion without redesign. VCNs act as the first isolation boundary for production systems.


Internet Gateway


resource "oci_core_internet_gateway" "igw" {
  compartment_id = var.compartment_ocid
  vcn_id         = oci_core_vcn.prod_vcn.id
  display_name   = "prod-igw"
  enabled        = true
}

The internet gateway enables outbound connectivity for the NAT gateway. It does not expose private instances directly.


NAT Gateway


resource "oci_core_nat_gateway" "nat_gw" {
  compartment_id = var.compartment_ocid
  vcn_id         = oci_core_vcn.prod_vcn.id
  display_name   = "prod-nat-gw"
  block_traffic  = false
}

The NAT gateway allows private subnet instances to reach the internet for package updates and external API calls without exposing inbound access.


Route Tables


resource "oci_core_route_table" "private_rt" {
  compartment_id = var.compartment_ocid
  vcn_id         = oci_core_vcn.prod_vcn.id
  display_name   = "private-route-table"

  route_rules {
    destination       = "0.0.0.0/0"
    destination_type  = "CIDR_BLOCK"
    network_entity_id = oci_core_nat_gateway.nat_gw.id
  }
}

All outbound traffic from the private subnet routes through the NAT gateway. This ensures instances can reach external resources without being directly accessible.


Private Subnet (No Public IPs)


resource "oci_core_subnet" "private_subnet" {
  cidr_block                 = "10.0.1.0/24"
  vcn_id                     = oci_core_vcn.prod_vcn.id
  compartment_id             = var.compartment_ocid
  display_name               = "private-subnet"
  prohibit_public_ip_on_vnic = true
  route_table_id             = oci_core_route_table.private_rt.id
  dns_label                  = "private"
}

Instances in this subnet are never reachable from the internet. Access must go through a bastion, VPN, or private load balancer.


Network Security Groups (NSG)


resource "oci_core_network_security_group" "app_nsg" {
  compartment_id = var.compartment_ocid
  vcn_id         = oci_core_vcn.prod_vcn.id
  display_name   = "app-nsg"
}

# Allow SSH from internal network only
resource "oci_core_network_security_group_security_rule" "allow_ssh" {
  network_security_group_id = oci_core_network_security_group.app_nsg.id
  direction                 = "INGRESS"
  protocol                  = "6" # TCP

  source      = var.allowed_cidr
  source_type = "CIDR_BLOCK"

  tcp_options {
    destination_port_range {
      min = 22
      max = 22
    }
  }
}

# Allow HTTPS from internal network
resource "oci_core_network_security_group_security_rule" "allow_https" {
  network_security_group_id = oci_core_network_security_group.app_nsg.id
  direction                 = "INGRESS"
  protocol                  = "6" # TCP

  source      = var.allowed_cidr
  source_type = "CIDR_BLOCK"

  tcp_options {
    destination_port_range {
      min = 443
      max = 443
    }
  }
}

# Allow all outbound traffic
resource "oci_core_network_security_group_security_rule" "allow_egress" {
  network_security_group_id = oci_core_network_security_group.app_nsg.id
  direction                 = "EGRESS"
  protocol                  = "all"

  destination      = "0.0.0.0/0"
  destination_type = "CIDR_BLOCK"
}

# Allow ICMP for path MTU discovery
resource "oci_core_network_security_group_security_rule" "allow_icmp" {
  network_security_group_id = oci_core_network_security_group.app_nsg.id
  direction                 = "INGRESS"
  protocol                  = "1" # ICMP

  source      = "10.0.0.0/16"
  source_type = "CIDR_BLOCK"

  icmp_options {
    type = 3
    code = 4
  }
}

NSGs provide service-level firewalling and are preferred over subnet-wide security lists. These rules allow SSH and HTTPS only from your internal network, while permitting all outbound traffic.


Compute Instance (Flex Shape)


resource "oci_core_instance" "prod_instance" {
  availability_domain = local.availability_domain
  compartment_id      = var.compartment_ocid
  display_name        = "prod-app-01"
  shape               = "VM.Standard.E4.Flex"

  shape_config {
    ocpus         = 2
    memory_in_gbs = 16
  }

  create_vnic_details {
    subnet_id        = oci_core_subnet.private_subnet.id
    assign_public_ip = false
    nsg_ids          = [oci_core_network_security_group.app_nsg.id]
    hostname_label   = "prod-app-01"
  }

  source_details {
    source_type             = "image"
    source_id               = var.image_ocid
    boot_volume_size_in_gbs = 50
  }

  metadata = {
    ssh_authorized_keys = file(var.ssh_public_key)
  }

  preserve_boot_volume = true
}

Flex shapes allow independent scaling of CPU and memory, ensuring predictable performance without overpaying for unused resources. Setting preserve_boot_volume = true protects the boot volume if the instance is accidentally terminated.


Persistent Block Storage


resource "oci_core_volume" "data_volume" {
  availability_domain = local.availability_domain
  compartment_id      = var.compartment_ocid
  display_name        = "prod-data-vol"
  size_in_gbs         = 200
  vpus_per_gb         = 10 # Balanced performance tier
}

resource "oci_core_volume_attachment" "data_attach" {
  attachment_type = "paravirtualized"
  instance_id     = oci_core_instance.prod_instance.id
  volume_id       = oci_core_volume.data_volume.id
  display_name    = "prod-data-attachment"
}

Separating OS and data ensures instances are disposable while data remains protected. Paravirtualized attachments are simpler than iSCSI and work automatically on Oracle Linux.

Post-Deployment: Mounting the Block Volume

After Terraform applies, SSH into the instance and mount the volume:


# Find the attached volume (usually /dev/sdb)
lsblk

# Create filesystem (first time only)
sudo mkfs.xfs /dev/sdb

# Create mount point and mount
sudo mkdir -p /data
sudo mount /dev/sdb /data

# Add to fstab for persistence across reboots
echo '/dev/sdb /data xfs defaults,_netdev,nofail 0 2' | sudo tee -a /etc/fstab

The _netdev and nofail options ensure the system boots even if the volume is temporarily unavailable.


Outputs


output "instance_private_ip" {
  description = "Private IP address of the compute instance"
  value       = oci_core_instance.prod_instance.private_ip
}

output "instance_id" {
  description = "OCID of the compute instance"
  value       = oci_core_instance.prod_instance.id
}

output "vcn_id" {
  description = "OCID of the VCN"
  value       = oci_core_vcn.prod_vcn.id
}

output "volume_id" {
  description = "OCID of the data volume"
  value       = oci_core_volume.data_volume.id
}

Security & Operational Checklist

  • ✓ No public SSH access
  • ✓ Key-based authentication only
  • ✓ Private networking with NAT for outbound
  • ✓ Explicit NSG rules (no default allow)
  • ✓ Persistent storage with separate lifecycle
  • ✓ Infrastructure fully defined in code
  • ✓ Boot volume preservation enabled

What to Add Next

  • Bastion Service – OCI’s managed bastion for secure SSH access without VPN
  • Site-to-Site VPN – Connect to on-premises networks
  • OCI Load Balancer – For multi-instance deployments
  • Monitoring and Alerting – OCI Monitoring service with custom alarms
  • Dynamic Groups and IAM policies – Instance principals for secure API access
  • Cloud-init or Ansible – OS hardening and application deployment
  • CI/CD pipelines – GitOps workflow for Terraform changes
  • Volume backups – Scheduled backup policies for data protection

Example terraform.tfvars


tenancy_ocid     = "ocid1.tenancy.oc1..aaaaaaaaexample"
user_ocid        = "ocid1.user.oc1..aaaaaaaaexample"
fingerprint      = "aa:bb:cc:dd:ee:ff:00:11:22:33:44:55:66:77:88:99"
private_key_path = "~/.oci/oci_api_key.pem"
region           = "eu-frankfurt-1"
compartment_ocid = "ocid1.compartment.oc1..aaaaaaaaexample"
image_ocid       = "ocid1.image.oc1.eu-frankfurt-1.aaaaaaaaexample"
ssh_public_key   = "~/.ssh/id_rsa.pub"
allowed_cidr     = "10.0.0.0/16"

Nick Tailor’s Thoughts

Production infrastructure is not about clicking faster. It is about repeatability, security, and recovery. OCI combined with Terraform provides an extremely strong foundation when engineered correctly from day one.

If you treat infrastructure as software, production becomes predictable.

The complete code from this guide is available as a ready-to-use Terraform module. Clone it, update your variables, and run terraform apply to deploy.

Leave a Reply

Your email address will not be published. Required fields are marked *