[CloudStarter] Automating VM Management with Ansible and GitHub Runners

September 16, 2024

In our previous article, we explored how to provision a VM on Oracle Cloud Infrastructure (OCI) using Terraform. Today, we're taking a step further in automation by integrating Ansible and GitHub Runners to manage our VM. This powerful combination allows us to automate not just the creation of our infrastructure, but also its configuration and ongoing management.

What are Ansible and GitHub Runners?

Ansible is an open-source automation tool that can configure systems, deploy software, and orchestrate more advanced IT tasks such as continuous deployments or zero downtime rolling updates. It uses a simple, human-readable language called YAML to describe system configurations and automation jobs.

GitHub Runners, on the other hand, are part of GitHub Actions - GitHub's integrated CI/CD solution. Runners are the machines that execute the jobs in your GitHub Actions workflows. They can be hosted by GitHub or self-hosted, allowing you to run workflows on your own infrastructure.

Why Use Ansible and GitHub Runners?

The combination of Ansible and GitHub Runners offers several compelling benefits:

  1. Automation: With Ansible playbooks executed via GitHub Actions, you can automate complex multi-step processes, from VM provisioning to software installation and configuration.

  2. Consistency: Ansible ensures that your VM configurations are consistent across environments, reducing the "it works on my machine" problem.

  3. Integration: GitHub Runners integrate seamlessly with your existing GitHub workflows, allowing you to trigger VM management tasks based on code pushes, pull requests, or other GitHub events.

  4. Flexibility: While our example uses GCP, you can adapt this approach to work with any cloud provider or even on-premises infrastructure.

Implementing VM Management with Ansible and GitHub Runners

Let's break down the key components of our VM management solution:

1. GitHub Actions Workflow

We've created a deploy.yml file in our .github/workflows directory. This workflow is triggered on pushes to the main branch or manually via the workflow_dispatch event. It performs the following steps:

  • Checks out the code
  • Authenticates with Google Cloud
  • Checks the VM status
  • Creates the VM using Terraform if it doesn't exist
  • Configures the VM using Ansible

It is important to note that this script relies on two GitHub Secrets You can add Secrets by navigating to your repository, clicking on Settings then Secrets and variables and on Actions. There I added GCP_SA_KEY and VM_SSH_PRIVATE_KEY.

  • The GCP_SA_KEY is being found in Google's Cloud Console. Click on your project, then on IAM and on Service Accounts. You'll find the key in the tab Keys. Copy the JSON and paste it as a GitHub Secret.
  • The VM_SSH_PRIVATE_KEY is the private key of the VM. You have already added it's public part to main.tf as metadata while creating the VM. Now copy the private SSH key and add it as a GitHub Secret.
name: GCP VM Management

on:
  push:
    branches: [main]
  workflow_dispatch:

env:
  PROJECT_ID: long-classifier-435414-r1
  VM_NAME: gcp-free
  ZONE: us-central1-f
  SSH_USER: paul

jobs:
  manage-vm:
    runs-on: ubuntu-latest
    permissions:
      contents: 'read'
      id-token: 'write'

    steps:
      - name: Checkout code
        uses: actions/checkout@v3

      - name: Google Auth
        id: auth
        uses: 'google-github-actions/auth@v1'
        with:
          credentials_json: '${{ secrets.GCP_SA_KEY }}'

      - name: Set up Cloud SDK
        uses: google-github-actions/setup-gcloud@v1
        with:
          project_id: ${{ env.PROJECT_ID }}

      - name: Check VM status
        id: vm_status
        run: |
          VM_STATUS=$(gcloud compute instances describe ${{ env.VM_NAME }} \
            --zone ${{ env.ZONE }} \
            --format="value(status)" 2>/dev/null || echo "NOT_FOUND")
          echo "status=$VM_STATUS" >> $GITHUB_OUTPUT

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v2

      - name: Create VM if not exists
        if: steps.vm_status.outputs.status == 'NOT_FOUND'
        run: |
          cd terraform/gcp
          terraform init
          terraform apply -auto-approve

      - name: Get VM IP
        id: vm_ip
        run: |
          VM_IP=$(gcloud compute instances describe ${{ env.VM_NAME }} \
            --zone ${{ env.ZONE }} \
            --format='get(networkInterfaces[0].accessConfigs[0].natIP)')
          echo "ip=$VM_IP" >> $GITHUB_OUTPUT

      - name: Set up SSH key
        env:
          VM_SSH_PRIVATE_KEY: ${{ secrets.VM_SSH_PRIVATE_KEY }}
        run: |
          mkdir -p ~/.ssh
          echo "$VM_SSH_PRIVATE_KEY" > ~/.ssh/vm_ssh_key
          chmod 600 ~/.ssh/vm_ssh_key
          ssh-keygen -y -f ~/.ssh/vm_ssh_key > ~/.ssh/vm_ssh_key.pub

      - name: Wait for VM to be ready
        env:
          VM_IP: ${{ steps.vm_ip.outputs.ip }}
        run: |
          timeout 300 bash -c 'until nc -z ${{ env.VM_IP }} 22; do echo "Waiting for SSH..."; sleep 5; done'

      - name: Add VM host key to known hosts
        env:
          VM_IP: ${{ steps.vm_ip.outputs.ip }}
        run: |
          mkdir -p ~/.ssh
          ssh-keyscan -H ${{ env.VM_IP }} >> ~/.ssh/known_hosts

      - name: Configure VM
        env:
          VM_IP: ${{ steps.vm_ip.outputs.ip }}
        run: |
          ansible-playbook -i "${VM_IP}," \
            -u ${{ env.SSH_USER }} \
            --private-key ~/.ssh/vm_ssh_key \
            ansible/playbooks/install_docker.yml

2. Ansible Playbook

Our install_docker.yml Ansible playbook handles the configuration of the VM. It performs these tasks:

  • Updates the apt cache
  • Installs required system packages
  • Adds the Docker GPG key and repository
  • Installs Docker
  • Starts the Docker service
  • Adds the user to the Docker group
  • Checks and displays the Docker version
---
- name: Install Docker
  hosts: all
  become: yes
  vars:
    docker_packages:
      - docker-ce
      - docker-ce-cli
      - containerd.io
      - docker-buildx-plugin
      - docker-compose-plugin

  tasks:
    - name: Update apt cache (Debian/Ubuntu)
      apt:
        update_cache: yes
        cache_valid_time: 3600
      when: ansible_os_family == "Debian"

    - name: Install required system packages (Debian/Ubuntu)
      apt:
        name:
          - apt-transport-https
          - ca-certificates
          - curl
          - gnupg-agent
          - software-properties-common
        state: latest
      when: ansible_os_family == "Debian"

    - name: Add Docker GPG apt key (Debian/Ubuntu)
      apt_key:
        url: https://download.docker.com/linux/ubuntu/gpg
        state: present
      when: ansible_os_family == "Debian"

    - name: Add Docker repository (Debian/Ubuntu)
      apt_repository:
        repo: deb [arch=amd64] https://download.docker.com/linux/{{ ansible_distribution | lower }} {{ ansible_distribution_release }} stable
        state: present
      when: ansible_os_family == "Debian"

    - name: Install Docker
      package:
        name: "{{ docker_packages }}"
        state: latest

    - name: Start Docker service
      service:
        name: docker
        state: started
        enabled: yes

    - name: Ensure group "docker" exists
      group:
        name: docker
        state: present

    - name: Add user to docker group
      user:
        name: "{{ ansible_user }}"
        groups: docker
        append: yes

    - name: Reset connection to apply user changes
      meta: reset_connection

    - name: Check Docker version
      command: docker --version
      register: docker_version
      changed_when: false

    - name: Display Docker version
      debug:
        var: docker_version.stdout

Putting It All Together

With this setup, every time we push to the main branch or manually trigger the workflow, GitHub Actions will ensure our VM is created (if it doesn't exist) and properly configured with Docker. This approach gives us several advantages:

  1. Reproducibility: Our entire VM setup is codified, making it easy to recreate the same environment consistently.

  2. Auditability: All changes to our infrastructure are tracked in Git, providing a clear audit trail.

  3. Ease of Updates: Need to update Docker or install a new package? Just modify the Ansible playbook and push the changes.

  4. Reduced Manual Intervention: Once set up, the entire process runs automatically, reducing the need for manual intervention and the potential for human error.

Conclusion

By combining Terraform for provisioning, Ansible for initial setup of Docker, and GitHub Runners for automation, we've created an easy one-click VM management solution. This approach shows how different technologies can be configured to be used together and start reaping their benefits.