Building Amazon Linux 2 VMs in VSphere

Building Amazon Linux 2 VMs in VSphere

Overview

Reece utilises many different virtualised Linux based operating systems for on premise and cloud environments. This includes nodes for Kubernetes clusters as well as virtual machines used for other purposes like application servers.

For certain use cases we utilise Red Hat Linux and clones such as CentOS and Rocky Linux, however especially for Kubernetes master and worker nodes (which basically just have to be able to run containers), we have switched to Amazon Linux 2 as base operating system (even for our on-premise VSphere environment) - also to align on-premise and cloud environment builds / post build steps where possible.

Builds are performed via Github actions pipeline and Ansible interfacing with VSphere.

Preparation

Download the latest Amazon Linux 2 VMware OVA (at the time of writing https://cdn.amazonlinux.com/os-images/2.0.20220207.1/vmware/) from the Amazonlinux website. There is usually a new OVA file version available on a monthly basis.

The next step is to create the seed iso image which includes network configuration etc. for the new VM. The files below are jinja2 templated yaml for an Ansible playbook.

meta-data file

The meta-data file mainly contains static networking configuration for the new VM.

local-hostname: {{ vm_hostname }}
network-interfaces: |
  auto eth0
  iface eth0 inet static
  address {{ vm_host_ip }}
  network {{ vm_host_network }}
  netmask {{ vm_host_netmask }}
  broadcast {{ vm_host_broadcast }}
  gateway {{ vm_host_gateway }}  

user-data file

The user-data file has to start with the cloud-config tag (this is not too obvious in the documentation) and includes post install tasks for the new VM. We are updating the DNS servers, yum repositories, passwords etc. We also update the YUM repository to use our S3 bucket hosted snapshot instead of the default repo to ensure version consistency between builds. It is important to disable yum updates / upgrades when using a custom yum repo bucket as package version conflicts may arise otherwise.

#cloud-config
#vim:syntax=yaml
repo_update: false
repo_upgrade: false

users:
  - default

chpasswd:
  list: |
    ec2-user:{{ secret_user_ssh_pass }}
    root:{{ secret_root_ssh_pass }}    

write_files:
  - path: /etc/cloud/cloud.cfg.d/80_disable_network_after_firstboot.cfg
    content: |
      network:
        config: disabled      

  - path: /etc/yum/vars/awsregion
    content: |
            ap-southeast-2

  - path: /etc/cloud/cloud.cfg.d/01_enable_password_ssh.cfg
    content: |
            ssh_pwauth: true

  - path: /etc/yum.repos.d/reece.repo
    content: |
      [reece]
      name = Reece
      enabled = 1
      baseurl = https://{{ al2_repo_bucket_name }}.s3-ap-southeast-2.amazonaws.com
      gpgcheck = 0
      repo_gpgcheck = 0      

  - path: /etc/yum/pluginconf.d/priorities.conf
    content: |
      [main]
      enabled = 0      

  - path: /etc/resolv.conf
    content: |
      search {{ domains|join(' ') }}
      {% for name_server in name_servers -%}
      nameserver {{ name_server }}
      {% endfor %}      

  - path: /etc/ssh/sshd_config
    content: |
      HostKey /etc/ssh/ssh_host_rsa_key
      HostKey /etc/ssh/ssh_host_ecdsa_key
      HostKey /etc/ssh/ssh_host_ed25519_key
      SyslogFacility AUTHPRIV
      AuthorizedKeysFile .ssh/authorized_keys
      PasswordAuthentication yes
      ChallengeResponseAuthentication no
      GSSAPIAuthentication yes
      GSSAPICleanupCredentials no
      UsePAM yes
      AcceptEnv LANG LC_CTYPE LC_NUMERIC LC_TIME LC_COLLATE LC_MONETARY LC_MESSAGES
      AcceptEnv LC_PAPER LC_NAME LC_ADDRESS LC_TELEPHONE LC_MEASUREMENT
      AcceptEnv LC_IDENTIFICATION LC_ALL LANGUAGE
      AcceptEnv XMODIFIERS
      Subsystem sftp /usr/libexec/openssh/sftp-server      

runcmd:
  - rm -rf /etc/yum.repos.d/amzn2-core.repo /etc/yum.repos.d/amzn2-extras.repo
  - service sshd restart
  - yum -y install containerd.io
  - yum clean all
  - systemctl disable amazon-ssm-agent
  - yum -y update && reboot

As you can see, the ssm agent is not required for on-premise builds. Other post install tasks are patching, installing containerd for Kubernetes and restarting ssh after updating the configuration.

The following snippet shows the Ansible tasks required to create the seed iso file, upload it to VSphere, create a new VM from OVA etc.

- name: Create seed iso image dir
  file: dest=/tmp/seed state=directory
  tags: install_ova
  delegate_to: localhost

- name: Meta data file
  template: src=meta-data.j2 dest=/tmp/seed/meta-data
  tags: install_ova
  delegate_to: localhost

- name: User data file
  template: src=user-data.j2 dest=/tmp/seed/user-data
  tags: install_ova
  delegate_to: localhost

- name: Create seed iso image
  shell: cd /tmp/seed/ ; genisoimage -output seed.iso -volid cidata -joliet -rock user-data meta-data
  tags: install_ova
  delegate_to: localhost

- name: Upload seed iso image to datastore
  vsphere_copy:
    hostname: "{{ vmware_host }}"
    username: "{{ vsphere_datastore_username }}"
    password: "{{ vsphere_datastore_password }}"
    src: "/tmp/seed/seed.iso"
    datastore: "{{ vm_iso_datastore }}"
    path: "Temp/temp-{{ vm_name }}.iso"
    datacenter: "{{ vmware_datacenter }}"
    validate_certs: "no"
  tags: install_ova
  retries: 3
  delay: 10
  delegate_to: localhost

- name: Deleting local ISO image and seed files
  file:
    path: /tmp/seed
    state: absent
  tags: install_ova
  delegate_to: localhost

- name: Create New VM from OVA
  vmware_deploy_ovf:
    hostname: "{{ vmware_host }}"
    username: "{{ vsphere_admin_username }}"
    password: "{{ vsphere_admin_password }}"
    validate_certs: no
    name: "{{ vm_name }}"
    folder: "{{ vmware_workdir }}"
    datacenter: "{{ vmware_datacenter }}"
    datastore: "{{ vm_datastore }}"
    cluster: "{{ vm_cluster_name }}"
    disk_provisioning: thin
    ovf: "{{ ova_file_path }}"
    allow_duplicates: no
    power_on: no
    fail_on_spec_warnings: yes
    wait_for_ip_address: yes
    networks: "{u'bridged':u'{{ network_adapter }}'}"
  delegate_to: localhost
  tags: install_ova

After this step, the VM will still be powered off. We can perform additional changes like mounting the seed iso image, update CPU and memory, increase disk size etc. before finally powering on the VM but need to find the UUID of the new VM based on the VM name first.

- name: Get UUID from new VM
  block:
    - name: Get virtual machine info
      vmware_vm_info:
        hostname: "{{ vmware_host }}"
        username: "{{ vsphere_admin_username }}"
        password: "{{ vsphere_admin_password }}"
        folder: "{{ vmware_workdir }}"
        validate_certs: no
      delegate_to: localhost
      register: vm_info

    - set_fact:
        vm_uuid: "{{ item.uuid }}"
      with_items:
        - "{{ vm_info.virtual_machines | json_query(query) }}"
      vars:
        query: "[?guest_name=='{{ vm_name }}']"

- name: attach CDROM
  vmware_guest:
    hostname: "{{ vmware_host }}"
    username: "{{ vsphere_admin_username }}"
    password: "{{ vsphere_admin_password }}"
    validate_certs: no
    name: "{{ vm_name }}"
    folder: "{{ vmware_workdir }}"
    datacenter: "{{ vmware_datacenter }}"
    cluster: "{{ vm_cluster_name }}"
    uuid: "{{ vm_uuid }}"
    cdrom:
      type: "iso"
      iso_path: "[{{ vm_iso_datastore }}] Temp/temp-{{ vm_name }}.iso"
  delegate_to: localhost
  tags: install_ova

- name: add more CPU and memory
  vmware_guest:
    hostname: "{{ vmware_host }}"
    username: "{{ vsphere_admin_username }}"
    password: "{{ vsphere_admin_password }}"
    validate_certs: no
    name: "{{ vm_name }}"
    folder: "{{ vmware_workdir }}"
    datacenter: "{{ vmware_datacenter }}"
    cluster: "{{ vm_cluster_name }}"
    uuid: "{{ vm_uuid }}"
    hardware:
      num_cpus: "{{ vm_cpus }}"
      memory_mb: "{{ vm_memory }}"
  delegate_to: localhost
  tags: install_ova

- name: add additional disk
  vmware_guest:
    hostname: "{{ vmware_host }}"
    username: "{{ vsphere_admin_username }}"
    password: "{{ vsphere_admin_password }}"
    validate_certs: no
    name: "{{ vm_name }}"
    folder: "{{ vmware_workdir }}"
    datacenter: "{{ vmware_datacenter }}"
    cluster: "{{ vm_cluster_name }}"
    uuid: "{{ vm_uuid }}"
    disk:
      - size_gb: "{{ vm_disk_size }}"
        datastore: "{{ vm_datastore }}"
  delegate_to: localhost
  tags: install_ova

- name: power on VM
  vmware_guest:
    hostname: "{{ vmware_host }}"
    username: "{{ vsphere_admin_username }}"
    password: "{{ vsphere_admin_password }}"
    validate_certs: no
    name: "{{ vm_name }}"
    folder: "{{ vmware_workdir }}"
    datacenter: "{{ vmware_datacenter }}"
    cluster: "{{ vm_cluster_name }}"
    uuid: "{{ vm_uuid }}"
    state: "poweredon"
  delegate_to: localhost
  tags: install_ova

This should start the new Amazon Linux 2 VSphere VM and we should be able to ssh to the new server.

Pitfalls

As Amazon Linux 2 struggled to auto-detect the region, we needed to add the region (for updates etc.) to /etc/yum/vars/awsregion.

Conclusion

Advantages of this process are also reduced build times (compared to the previous Red Hat / CentOS / Rocky Linux) as well as a smaller iso and image footprint.