Category: Oracle Cloud
Deploying Production-Grade Systems on Oracle Cloud Infrastructure (OCI) with Terraform
Launching a virtual machine is easy. Running secure, reliable, production-grade systems is not. This guide shows how to deploy enterprise-ready compute infrastructure on Oracle Cloud Infrastructure (OCI) using Terraform, with a focus on security, fault tolerance, and long-term operability.
What “Production-Grade” Actually Means
A production environment is defined by predictability, not convenience. Production systems must survive failures, scale safely, and be observable at all times.
- Private networking by default
- No public SSH access
- Replaceable compute instances
- Persistent storage separated from OS
- Infrastructure defined as code
Target Architecture Overview
- Private VCN and subnet with NAT gateway for outbound access
- Network Security Groups (NSGs) with explicit rules
- Flex compute shape
- Detached block storage with iSCSI attachment
- SSH key authentication only
This architecture is suitable for:
- SaaS backends
- Internal APIs
- Databases
- AI / ML inference nodes
- HPC control or login nodes
Terraform: Provider Configuration
terraform {
required_version = ">= 1.6"
required_providers {
oci = {
source = "oracle/oci"
version = ">= 5.0.0"
}
}
}
provider "oci" {
tenancy_ocid = var.tenancy_ocid
user_ocid = var.user_ocid
fingerprint = var.fingerprint
private_key_path = var.private_key_path
region = var.region
}
This ensures reproducible deployments and enforces secure API-based authentication.
Variables
variable "tenancy_ocid" {
description = "OCID of the tenancy"
type = string
}
variable "user_ocid" {
description = "OCID of the user calling the API"
type = string
}
variable "fingerprint" {
description = "Fingerprint of the API signing key"
type = string
}
variable "private_key_path" {
description = "Path to the private key for API authentication"
type = string
}
variable "region" {
description = "OCI region identifier"
type = string
}
variable "compartment_ocid" {
description = "OCID of the compartment for resources"
type = string
}
variable "image_ocid" {
description = "OCID of the compute image (e.g., Oracle Linux 8)"
type = string
}
variable "ssh_public_key" {
description = "Path to SSH public key file"
type = string
}
variable "allowed_cidr" {
description = "CIDR block allowed to access instances (e.g., VPN range)"
type = string
default = "10.0.0.0/16"
}
Data Sources
data "oci_identity_availability_domains" "ads" {
compartment_id = var.tenancy_ocid
}
locals {
availability_domain = data.oci_identity_availability_domains.ads.availability_domains[0].name
}
This retrieves the list of availability domains in your region. We select the first AD for simplicity, but production deployments should consider multi-AD placement.
Virtual Cloud Network (VCN)
resource "oci_core_vcn" "prod_vcn" {
cidr_blocks = ["10.0.0.0/16"]
display_name = "prod-vcn"
dns_label = "prodvcn"
compartment_id = var.compartment_ocid
}
A /16 CIDR allows future expansion without redesign. VCNs act as the first isolation boundary for production systems.
Internet Gateway
resource "oci_core_internet_gateway" "igw" {
compartment_id = var.compartment_ocid
vcn_id = oci_core_vcn.prod_vcn.id
display_name = "prod-igw"
enabled = true
}
The internet gateway enables outbound connectivity for the NAT gateway. It does not expose private instances directly.
NAT Gateway
resource "oci_core_nat_gateway" "nat_gw" {
compartment_id = var.compartment_ocid
vcn_id = oci_core_vcn.prod_vcn.id
display_name = "prod-nat-gw"
block_traffic = false
}
The NAT gateway allows private subnet instances to reach the internet for package updates and external API calls without exposing inbound access.
Route Tables
resource "oci_core_route_table" "private_rt" {
compartment_id = var.compartment_ocid
vcn_id = oci_core_vcn.prod_vcn.id
display_name = "private-route-table"
route_rules {
destination = "0.0.0.0/0"
destination_type = "CIDR_BLOCK"
network_entity_id = oci_core_nat_gateway.nat_gw.id
}
}
All outbound traffic from the private subnet routes through the NAT gateway. This ensures instances can reach external resources without being directly accessible.
Private Subnet (No Public IPs)
resource "oci_core_subnet" "private_subnet" {
cidr_block = "10.0.1.0/24"
vcn_id = oci_core_vcn.prod_vcn.id
compartment_id = var.compartment_ocid
display_name = "private-subnet"
prohibit_public_ip_on_vnic = true
route_table_id = oci_core_route_table.private_rt.id
dns_label = "private"
}
Instances in this subnet are never reachable from the internet. Access must go through a bastion, VPN, or private load balancer.
Network Security Groups (NSG)
resource "oci_core_network_security_group" "app_nsg" {
compartment_id = var.compartment_ocid
vcn_id = oci_core_vcn.prod_vcn.id
display_name = "app-nsg"
}
# Allow SSH from internal network only
resource "oci_core_network_security_group_security_rule" "allow_ssh" {
network_security_group_id = oci_core_network_security_group.app_nsg.id
direction = "INGRESS"
protocol = "6" # TCP
source = var.allowed_cidr
source_type = "CIDR_BLOCK"
tcp_options {
destination_port_range {
min = 22
max = 22
}
}
}
# Allow HTTPS from internal network
resource "oci_core_network_security_group_security_rule" "allow_https" {
network_security_group_id = oci_core_network_security_group.app_nsg.id
direction = "INGRESS"
protocol = "6" # TCP
source = var.allowed_cidr
source_type = "CIDR_BLOCK"
tcp_options {
destination_port_range {
min = 443
max = 443
}
}
}
# Allow all outbound traffic
resource "oci_core_network_security_group_security_rule" "allow_egress" {
network_security_group_id = oci_core_network_security_group.app_nsg.id
direction = "EGRESS"
protocol = "all"
destination = "0.0.0.0/0"
destination_type = "CIDR_BLOCK"
}
# Allow ICMP for path MTU discovery
resource "oci_core_network_security_group_security_rule" "allow_icmp" {
network_security_group_id = oci_core_network_security_group.app_nsg.id
direction = "INGRESS"
protocol = "1" # ICMP
source = "10.0.0.0/16"
source_type = "CIDR_BLOCK"
icmp_options {
type = 3
code = 4
}
}
NSGs provide service-level firewalling and are preferred over subnet-wide security lists. These rules allow SSH and HTTPS only from your internal network, while permitting all outbound traffic.
Compute Instance (Flex Shape)
resource "oci_core_instance" "prod_instance" {
availability_domain = local.availability_domain
compartment_id = var.compartment_ocid
display_name = "prod-app-01"
shape = "VM.Standard.E4.Flex"
shape_config {
ocpus = 2
memory_in_gbs = 16
}
create_vnic_details {
subnet_id = oci_core_subnet.private_subnet.id
assign_public_ip = false
nsg_ids = [oci_core_network_security_group.app_nsg.id]
hostname_label = "prod-app-01"
}
source_details {
source_type = "image"
source_id = var.image_ocid
boot_volume_size_in_gbs = 50
}
metadata = {
ssh_authorized_keys = file(var.ssh_public_key)
}
preserve_boot_volume = true
}
Flex shapes allow independent scaling of CPU and memory, ensuring predictable performance without overpaying for unused resources. Setting preserve_boot_volume = true protects the boot volume if the instance is accidentally terminated.
Persistent Block Storage
resource "oci_core_volume" "data_volume" {
availability_domain = local.availability_domain
compartment_id = var.compartment_ocid
display_name = "prod-data-vol"
size_in_gbs = 200
vpus_per_gb = 10 # Balanced performance tier
}
resource "oci_core_volume_attachment" "data_attach" {
attachment_type = "paravirtualized"
instance_id = oci_core_instance.prod_instance.id
volume_id = oci_core_volume.data_volume.id
display_name = "prod-data-attachment"
}
Separating OS and data ensures instances are disposable while data remains protected. Paravirtualized attachments are simpler than iSCSI and work automatically on Oracle Linux.
Post-Deployment: Mounting the Block Volume
After Terraform applies, SSH into the instance and mount the volume:
# Find the attached volume (usually /dev/sdb)
lsblk
# Create filesystem (first time only)
sudo mkfs.xfs /dev/sdb
# Create mount point and mount
sudo mkdir -p /data
sudo mount /dev/sdb /data
# Add to fstab for persistence across reboots
echo '/dev/sdb /data xfs defaults,_netdev,nofail 0 2' | sudo tee -a /etc/fstab
The _netdev and nofail options ensure the system boots even if the volume is temporarily unavailable.
Outputs
output "instance_private_ip" {
description = "Private IP address of the compute instance"
value = oci_core_instance.prod_instance.private_ip
}
output "instance_id" {
description = "OCID of the compute instance"
value = oci_core_instance.prod_instance.id
}
output "vcn_id" {
description = "OCID of the VCN"
value = oci_core_vcn.prod_vcn.id
}
output "volume_id" {
description = "OCID of the data volume"
value = oci_core_volume.data_volume.id
}
Security & Operational Checklist
- ✓ No public SSH access
- ✓ Key-based authentication only
- ✓ Private networking with NAT for outbound
- ✓ Explicit NSG rules (no default allow)
- ✓ Persistent storage with separate lifecycle
- ✓ Infrastructure fully defined in code
- ✓ Boot volume preservation enabled
What to Add Next
- Bastion Service – OCI’s managed bastion for secure SSH access without VPN
- Site-to-Site VPN – Connect to on-premises networks
- OCI Load Balancer – For multi-instance deployments
- Monitoring and Alerting – OCI Monitoring service with custom alarms
- Dynamic Groups and IAM policies – Instance principals for secure API access
- Cloud-init or Ansible – OS hardening and application deployment
- CI/CD pipelines – GitOps workflow for Terraform changes
- Volume backups – Scheduled backup policies for data protection
Example terraform.tfvars
tenancy_ocid = "ocid1.tenancy.oc1..aaaaaaaaexample"
user_ocid = "ocid1.user.oc1..aaaaaaaaexample"
fingerprint = "aa:bb:cc:dd:ee:ff:00:11:22:33:44:55:66:77:88:99"
private_key_path = "~/.oci/oci_api_key.pem"
region = "eu-frankfurt-1"
compartment_ocid = "ocid1.compartment.oc1..aaaaaaaaexample"
image_ocid = "ocid1.image.oc1.eu-frankfurt-1.aaaaaaaaexample"
ssh_public_key = "~/.ssh/id_rsa.pub"
allowed_cidr = "10.0.0.0/16"
Nick Tailor’s Thoughts
Production infrastructure is not about clicking faster. It is about repeatability, security, and recovery. OCI combined with Terraform provides an extremely strong foundation when engineered correctly from day one.
If you treat infrastructure as software, production becomes predictable.
The complete code from this guide is available as a ready-to-use Terraform module. Clone it, update your variables, and run terraform apply to deploy.
How to Power Up or Power Down multiple instances in OCI using CLI with Ansible
Now the reason why you would probably want this is over terraform is because terraform is more suited for infrastructure orchestration and not really suited to deal with the instances once they are up and running.
If you have scaled servers out in OCI powering servers up and down in bulk currently is not available. If you are doing a migration or using a staging environment that you need need to use the machine when building or doing troubleshooting.
Then having a way to power up/down multiple machines at once is convenient.
Install the OCI collections if you don’t have it already.
Linux/macOS
curl -L https://raw.githubusercontent.com/oracle/oci-ansible-collection/master/scripts/install.sh | bash -s — —verbose
ansible-galaxy collection list – Will list the collections installed
# /path/to/ansible/collections
Collection Version
——————- ——-
amazon.aws 1.4.0
ansible.builtin 1.3.0
ansible.posix 1.3.0
oracle.oci 2.10.0
Once you have it installed you need to test the OCI client is working
oci iam compartment list –all (this will list out the compartment ID list for your instances.
Compartments in OCI are a way to organise infrastructure and control access to those resources. This is great for if you have contractors coming and you only want them to have access to certain things not everything.
Now there are two ways you can your instance names.
Bash Script to get the instances names from OCI
compartment_id=“ocid1.compartment.oc1..insert compartment ID here“
# Explicitly define the availability domains based on your provided data
availability_domains=(“zcLB:US-CHICAGO-1-AD-1” “zcLB:US-CHICAGO-1-AD-2” “zcLB:US-CHICAGO-1-AD-3”)
# For each availability domain, list the instances
for ad in “${availability_domains[@]}”; do
# List instances within the specific AD and compartment, extracting the “id” field
oci compute instance list –compartment-id $compartment_id –availability-domain $ad –query “data[].id” –raw-output > instance_ids.txt
# Clean up the instance IDs (removing brackets, quotes, etc.)
sed –i ‘s/\[//g’ instance_ids.txt
sed –i ‘s/\]//g’ instance_ids.txt
sed –i ‘s/”//g’ instance_ids.txt
sed –i ‘s/,//g’ instance_ids.txt
# Read each instance ID from instance_ids.txt
while read -r instance_id; do
# Get instance VNIC information
instance_info=$(oci compute instance list-vnics –instance-id “$instance_id“)
# Extract the required fields and print them
display_name=$(echo “$instance_info“ | jq -r ‘.data[0].”display-name”‘)
public_ip=$(echo “$instance_info“ | jq -r ‘.data[0].”public-ip“‘)
private_ip=$(echo “$instance_info“ | jq -r ‘.data[0].”private-ip“‘)
echo “Availability Domain: $ad“
echo “Display Name: $display_name“
echo “Public IP: $public_ip“
echo “Private IP: $private_ip“
echo “—————————————–“
done < instance_ids.txt
done
The output of the script when piped in to a file will look like
Instance.names
Availability Domain: zcLB:US-CHICAGO-1-AD-1
Display Name: Instance1
Public IP: 192.0.2.1
Private IP: 10.0.0.1
—————————————–
Availability Domain: zcLB:US-CHICAGO-1-AD-1
Display Name: Instance2
Public IP: 192.0.2.2
Private IP: 10.0.0.2
—————————————–
…
You can now grep this file for the name of the servers you want to power on or off quickly
Now we have an ansible playbook that can power on or power off the instance by name provided by the OCI client
Ansible playbook to power on or off multiple instances via OCI CLI
—
– name: Control OCI Instance Power State based on Instance Names
hosts: localhost
vars:
instance_names_to_stop:
– instance1
# Add more instance names here if you wish to stop them…
instance_names_to_start:
# List the instance names you wish to start here…
# Example:
– Instance2
tasks:
– name: Fetch all instance details in the compartment
command:
cmd: “oci compute instance list –compartment-id ocid1.compartment.oc1..aaaaaaaak7jc7tn2su2oqzmrbujpr5wmnuucj4mwj4o4g7rqlzemy4yvxrza –output json“
register: oci_output
– set_fact:
instances: “{{ oci_output.stdout | from_json }}”
– name: Extract relevant information
set_fact:
clean_instances: “{{ clean_instances | default([]) + [{ ‘name’: item[‘display-name’], ‘id’: item.id, ‘state’: item[‘lifecycle-state’] }] }}”
loop: “{{ instances.data }}”
when: “‘display-name’ in item and ‘id’ in item and ‘lifecycle-state’ in item”
– name: Filter out instances to stop
set_fact:
instances_to_stop: “{{ instances_to_stop | default([]) + [item] }}”
loop: “{{ clean_instances }}”
when: “item.name in instance_names_to_stop and item.state == ‘RUNNING'”
– name: Filter out instances to start
set_fact:
instances_to_start: “{{ instances_to_start | default([]) + [item] }}”
loop: “{{ clean_instances }}”
when: “item.name in instance_names_to_start and item.state == ‘STOPPED'”
– name: Filter out instances to stop
set_fact:
instances_to_stop: “{{ clean_instances | selectattr(‘name’, ‘in’, instance_names_to_stop) | selectattr(‘state’, ‘equalto‘, ‘RUNNING’) | list }}”
– name: Filter out instances to start
set_fact:
instances_to_start: “{{ clean_instances | selectattr(‘name’, ‘in’, instance_names_to_start) | selectattr(‘state’, ‘equalto‘, ‘STOPPED’) | list }}”
– name: Display instances to stop (you can remove this debug task later)
debug:
var: instances_to_stop
– name: Display instances to start (you can remove this debug task later)
debug:
var: instances_to_start
– name: Power off instances
command:
cmd: “oci compute instance action —action STOP –instance-id {{ item.id }}”
loop: “{{ instances_to_stop }}”
when: instances_to_stop | length > 0
register: state
# – debug:
# var: state
– name: Power on instances
command:
cmd: “oci compute instance action —action START –instance-id {{ item.id }}”
loop: “{{ instances_to_start }}”
when: instances_to_start | length > 0
The output will look like
PLAY [Control OCI Instance Power State based on Instance Names] **********************************************************************************
TASK [Gathering Facts] ***************************************************************************************************************************
ok: [localhost]
TASK [Fetch all instance details in the compartment] *********************************************************************************************
changed: [localhost]
TASK [Parse the OCI CLI output] ******************************************************************************************************************
ok: [localhost]
TASK [Extract relevant information] **************************************************************************************************************
ok: [localhost] => (item={‘display-name’: ‘Instance1’, ‘id’: ‘ocid1.instance.oc1..exampleuniqueID1’, ‘lifecycle-state’: ‘STOPPED’})
ok: [localhost] => (item={‘display-name’: ‘Instance2’, ‘id’: ‘ocid1.instance.oc1..exampleuniqueID2’, ‘lifecycle-state’: ‘RUNNING’})
TASK [Filter out instances to stop] **************************************************************************************************************
ok: [localhost]
TASK [Filter out instances to start] *************************************************************************************************************
ok: [localhost]
TASK [Display instances to stop (you can remove this debug task later)] **************************************************************************
ok: [localhost] => {
“instances_to_stop“: [
{
“name”: “Instance2”,
“id”: “ocid1.instance.oc1..exampleuniqueID2″,
“state”: “RUNNING”
}
]
}
TASK [Display instances to start (you can remove this debug task later)] *************************************************************************
ok: [localhost] => {
“instances_to_start“: [
{
“name”: “Instance1”,
“id”: “ocid1.instance.oc1..exampleuniqueID1″,
“state”: “STOPPED”
}
]
}
TASK [Power off instances] ***********************************************************************************************************************
changed: [localhost] => (item={‘name’: ‘Instance2’, ‘id’: ‘ocid1.instance.oc1..exampleuniqueID2’, ‘state’: ‘RUNNING’})
TASK [Power on instances] ************************************************************************************************************************
changed: [localhost] => (item={‘name’: ‘Instance1’, ‘id’: ‘ocid1.instance.oc1..exampleuniqueID1’, ‘state’: ‘STOPPED’})
PLAY RECAP ****************************************************************************************************************************************
localhost : ok=9 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
