OpenStack Grizzly Release: High-Availability Automated Deployment Guide

From DocWiki

(Difference between revisions)
Jump to: navigation, search
m (Verify the Load-Balancer Node Installation)
m (Controller Node Deployment)
Line 280: Line 280:
Now that the OpenStack nodes are being managed by the Build Node, deploying the controller nodes should be as simple as calling the clean_node.sh script with the node name.  It is '''IMPORTANT''' to deploy the controller nodes one at a time starting with control01, then control02 and lastly control03<br>  
Now that the OpenStack nodes are being managed by the Build Node, deploying the controller nodes should be as simple as calling the clean_node.sh script with the node name.  It is '''IMPORTANT''' to deploy the controller nodes one at a time starting with control01, then control02 and lastly control03<br>  
<pre>/etc/puppet/manifests/clean_node.sh {node_name}</pre>  
<pre>/etc/puppet/manifests/clean_node.sh {node_name}</pre>  
-
'''Note:''' Replace node_name with the name of your controller.  
+
'''Note:''' Replace node_name with the name of your first controller node.  
<pre>Example:
<pre>Example:
/etc/puppet/manifests/clean_node.sh slb01</pre>
/etc/puppet/manifests/clean_node.sh slb01</pre>

Revision as of 20:31, 26 August 2013

Contents

Background

There are two common ways of installing OpenStack, either manually or by using automation tools such as the Cisco OpenStack Installer (Cisco OSI). This guide providers users step-by-step instructions for using Cisco OSI to automate the installation, configuration and deployment of a highly-available OpenStack Grizzly environment. The OpenStack deployment covers the following software components:

In addition to highly-available OpenStack services, the system will also enable basic monitoring functionality based on Nagios, Collectd, and Graphite.

In Cisco OSI, an initial build server outside of the OpenStack cluster is used to manage and automate the OpenStack software deployment. This build server primarily functions as a Puppet Master for software deployment and configuration management of the OpenStack cluster, as well as a Cobbler server for managing the bare-metal installation of the Ubuntu Operating System. Once the build server is installed and configured, it is used as an out-of-band automation and management workstation to manage the OpenStack nodes. It also functions as a monitoring server to collect statistics about the health and performance of the OpenStack deployment, as well as to monitor the availability of the machines and services.

You can find bugs, release milestones, and other information on Launchpad. You can find release notes for the most recent release here.

High-Availability Introduction

Most OpenStack deployments are maturing from evaluation-level environments to highly available and highly scalable environments to support production applications and services. The architecture consists of the following components used to provide high-availability to OpenStack services:

  • MySQL Galera provides synchronous multi-master clustering to the MySQL/InnoDB databases.
  • RabbitMQ Clustering and RabbitMQ Mirrored Queues provide active/active and highly scalable message queuing for OpenStack services.
  • HAProxy and Keepalived provide load-balancing between clients and OpenStack API Endpoints.
  • The Multiple Quantum L3 and DHCP Agents Blueprint allows multiple Quantum Layer-3 and DHCP Agents to be deployed for high-availability and scalability purposes. At this time (Grizzly release), multiple DHCP Agents can service a Quantum network, however, only a single L3 Agent can service one Quantum network at a time. Therefore, the L3 Agent is a single point of failure and is not included in the Cisco High-Availability Deployment Guide. Quantum Provider Network Extensions are used to map physical data center networks to Quantum networks. In this deployment model, Quantum relies on the physical data center to provide Layer-3 high-availability instead of the L3 Agent.
  • Glance uses Swift as the back-end to storage OpenStack images. Just as with the rest of the OpenStack API's, HAProxy and Keepalived provide high-availability to the Glance API and Registry endpoints.
  • Swift: Multiple Swift Proxy nodes are used to provide high-availability to the Swift proxy service. Replication provides high-availability to data stored within a Swift object-storage system. The replication processes compare local data with each remote copy to ensure they all contain the latest version. Object replication uses a hash list to quickly compare subsections of each partition, and container and account replication use a combination of hashes and shared high water marks.

For more details about the high-availability architecture, please refer to the manual deployment guide

Dependencies

Critical Reminders

The most common OpenStack HA deployment issues are either incorrect site.pp file settings or not deploying the nodes in the proper order. To save you from future troubleshooting steps, ENSURE that you deploy the nodes in the order described within the document and verify the accuracy of files. You will likely be using your own IP addressing, passwords and node names in your setup and it is critical to ensure any variations from this guide are fully understood.

Do not configure RAID on the hard disks of Swift Storage Nodes. Swift performs better without RAID and disk redundancy is unneeded since Swift protects the data through replication. Therefore, if a RAID Controller manages the hard disks, ensure you present each of the hard disks independently. Our example uses disk /dev/sda for the Operating System installation and disks /dev/sdb-/dev/sdf for Swift storage. Please remember to modify these definitions within the site.pp example file based on your specific deployment environment. Additional Swift considerations and tuning information can be found here.

The passwords used in our example site.pp file are insecure, so it's highly recommended to change them.

Operating System

The operating system used for this installation is Ubuntu 12.04 LTS (Precise).

Server Requirements

Our deployment uses 12 Cisco UCS C-series servers to serve the roles of Controller, Compute, Load-Balancer and Swift Proxy/Storage. The environment scales linearly, therefore individual nodes can be added to increase capacity for any particular OpenStack service. The five distinct node types used in this document are:

  • 3 Controller Nodes- Runs Nova API, Nova Conductor, Nova Consoleauth, Nova Novncproxy, Nova Scheduler, NoVNC, Quantum Server, Quantum Plugin OVS Quantum DHCP Agent, Glance API/Registry, Keystone, Cinder API, Cinder Scheduler, OpenStack Dashboard, RabbitMQ Server, MySQL Server WSREP and Galera.
    • Provides management functionality of the OpenStack environment.
  • 2 Compute Nodes- Runs Nova Compute, Quantum OVS Agent and Cinder Volume services.
    • Provides the hypervisor role for running Nova instances (Virtual Machines) and presents LVM volumes for Cinder block storage.
  • 2 Load-Balancer Nodes- Runs HAProxy and Keepalived to load-balance traffic across Controller and Swift Proxy clusters.
  • 2 Swift Proxy Nodes- The Proxy Node is responsible for tying together users and their data within the the Swift object storage system. For each request, it will look up the location of the account, container or object in the Swift ring and route the request accordingly. The public API is also exposed by Proxy Node.
  • 3 Swift Storage Nodes- Each Storage Nodes contains Swift object, container, and account services. At a very high-level, these are the servers that contain the user data and perform replication among one another to keep the system in a consistent state.

Networking Requirements

The OpenStack HA environment uses five separate networks. Three of the five networks are used by Tenants. Three tenant networks are being used as an example, and thus the tenant networks can be increased or decreased based on your deployment needs. Connectivity within Tenants uses Quantum with the Open vSwitch (OVS) plugin and Provider Network Extensions. Provider Network Extensions allow cloud administrators to create OpenStack networks that map directly to physical networks in the data center and support local, VLAN and GRE deployment models. Our example uses the Provider VLAN networking model. The network details are as follows:

  • 1 Management Network
    • This network is used to perform management functions against the node. For example, SSH'ing to the nodes to change a configuration setting. The network is also used for lights-out management using the CIMC interface of the UCS servers. Lastly, OpenStack API's and the Horizon web dashboard is associated to this network.
    • An IP address for each node is required for this network. If using lights-out management such as CIMC, each node will require 2 addresses from this network.
    • This network typically employs private (RFC1918).
  • 3 Tenant Networks
    • These networks are used to provide connectivity to Instances. Since Quantum Provider Networking Extensions are being used, it is common to give tenants direct access to a "public" network that can be used to reach the Internet.
    • Compute and Controller Nodes will have an interface attached to this network. Since the Compute/Controller Node interfaces that attach to this network are managed by OVS, the interface should not contain an IP address.
    • This network typically employs publicly routable IP addressing if external NAT'ing is not used upstream towards the Internet edge (Note: in this document all IP addressing for all interfaces comes out of various private addressing blocks).
  • 1 Storage Network
    • This network is used for providing separate connectivity between Swift Proxy and Storage Nodes. This ensures storage traffic is not interfering with Instance traffic.
    • This network typically employs private (RFC1918) IP addressing.


Figure 1 is used to help visualize the network deployment and to act as a reference for configuration steps within the document. It is highly recommend to print the diagram so it can easily be referenced throughout the installation process.

Figure 1: OpenStack HA Network Design Details

Grizzly automated ha network design details v1 0.png







  • Physical Network Switches: Each node in the reference deployment is physically connected to a Cisco Nexus switch acting as a Top-of-Rack access layer device. Trunking is configured on each interface connecting to the eth0 and eth1 NICs of each node.

Installation

The installation of the nodes should be in the following order:

  1. Build Node- build-server
  2. Load-Balancer Nodes- slb01 and slb02
  3. Swift Storage Nodes- swift01, swift02 and swift03
  4. Swift Proxy Nodes- swiftproxy01 and swiftproxy02
  5. Controller Nodes- control01, control02 and control03
  6. Compute Nodes- compute01, compute02 and compute03

Build Node Installation

Ensure you have reviewed the Critical Reminders section before proceeding. The build node is named build-server in our reference deployment. This server has relatively modest hardware requirements: 2 GB RAM, 20 GB storage, Internet connectivity, and a network interface on the same network as the management interfaces (CIMC and eth0 in our reference deployment) of the OpenStack nodes.

Install Ubuntu 12.04 LTS. A minimal install with openssh-server is sufficient. Configure the network interface on the OpenStack cluster management segment with a static IP. Also, when partitioning the storage, choose a partitioning scheme which provides at least 15 GB free space under /var, as installation packages and ISO images used to deploy OpenStack will eventually be cached there.
When the installation finishes, log in and become root:

sudo -H bash

NOTE: Please read the following if you have proxy'ed Internet access or no Internet access :

If you require a proxy server to access the Internet, be aware that proxy users have occasionally reported problems during the phases of the installation process that download and install software packages. A common symptom of proxy trouble is that apt will complain about hash mismatches or file corruptions when verifying downloaded files. A few known scenarios and workarounds include:

  • If the apt-get process reports a "HASH mismatch", you may be facing an issue with a caching engine. If it's possible to do so, bypassing the caching engine may resolve the problem.
  • If you do have a proxy, you will want, at a minimum, to export the two types of proxies needed in your root shell when running fetch commands, as noted in the relevant sections.
  • You will also want to change the $proxy setting in site.pp to reflect your local proxy.

Another possible change is if you don't have "public" Internet accessible IPs for all of your machines (build, control, compute, etc.) and are building this in a controlled environment. If this is the case, ensure that $default_gateway is *not* set in site.pp and all of the files required for installing the control and compute nodes will be fetched from the boot server.

IMPORTANT: If you have proxies, and you set your proxy information in either your .profile or in a file like /etc/environment, you will need to set both http_proxy and https_proxy. You will also need to set a no_proxy command at least for the build node. An example might look like:

http_proxy=http://your-proxy.address.com:80/
https_proxy=https://your-https-proxy.address.com:443/
no_proxy=your-build-node-name,*yourbuild.domain.name,127.0.0.1,127.0.1.1,localhost

You have two choices for setting up the build server. You can follow the manual steps below, or you can run a one line script that tries to automate this process. In either case, you should end up with the puppet modules installed, and a set of template site manifests in /etc/puppet/manifests.

Model 1: Run the Script

To run the install script, copy and paste the following on your command line (as root with your proxy set if necessary as above):

curl -s -k -B https://raw.github.com/CiscoSystems/grizzly-manifests/multi-node/install_os_puppet | /bin/bash

With a proxy, use:

https_proxy=http://proxy.example.com:80/ curl -s -k -B https://raw.github.com/CiscoSystems/grizzly-manifests/multi-node/install_os_puppet > install_os_puppet
chmod +x install_os_puppet
./install_os_puppet -p http://proxy.example.com:80/ 

You can now jump to "Customizing your build server". Otherwise, follow along with the steps below.

Model 2: Run the Commands Manually

All should now install any pending security updates:

apt-get update && apt-get dist-upgrade -y && apt-get install -y puppet git ipmitool

NOTE: The system may need to be restarted after applying the updates.

Get the Cisco Edition example manifests. Under the grizzly-manifests GitHub repository you will find different branches, so select the one that matches your topology plans most closely. In the following examples the multi-node branch will be used, which is likely the most common topology:

git clone https://github.com/CiscoSystems/grizzly-manifests ~/cisco-grizzly-manifests/
cd ~/cisco-grizzly-manifests
git checkout -q g.2

With a proxy:

https_proxy=http://proxy.example.com:80 git clone https://github.com/CiscoSystems/grizzly-manifests ~/cisco-grizzly-manifests/
cd ~/cisco-grizzly-manifests
https_proxy=http://proxy.example.com:80 git checkout multi-node

Copy the puppet manifests from ~/cisco-grizzly-manifests/manifests/ to /etc/puppet/manifests/

cp ~/cisco-grizzly-manifests/manifests/* /etc/puppet/manifests

Copy the puppet templates from ~cisco-grizzly-manifests/templates/ to /etc/puppet/templates/

cp ~/cisco-grizzly-manifests/templates/* /etc/puppet/templates

Then get the Cisco Edition puppet modules from Cisco's GitHub repository:

(cd /etc/puppet/manifests; python /etc/puppet/manifests/puppet_modules.py)

With a proxy:

(cd /etc/puppet/manifests; http_proxy=http://proxy.example.com:80 https_proxy=http://proxy.example.com:80 python /etc/puppet/manifests/puppet-modules.py)

Build Node Configuration

In the /etc/puppet/manifests directory you will find these files:

clean_node.sh
cobbler-node.pp
core.pp
modules.list
puppet-modules.py
reset_nodes.sh
site.pp.example
site.pp.ha.example

At a high level, cobbler-node.pp manages the deployment of cobbler to support booting of additional servers into your environment. The core.pp manifest defines the core definitions for OpenStack service deployment. The site.pp.example manifest captures the configuration components that can be modified by end-users ""for a non-HA deployment"". The site.pp.ha.example manifest captures the configuration components that can be modified by end-users ""for a HA deployment"". clean_node.sh is a shell script provided as a convenience to deployment users; it wraps several cobbler and puppet commands for ease of use when building and rebuilding the nodes of the OpenStack cluster. reset_nodes.sh is a wrapper around clean_node.sh to rebuild your entire cluster quickly with one command.

IMPORTANT! You must copy site.pp.ha.example to site.pp and then edit it as appropriate for your installation. It is internally documented.

cp /etc/puppet/manifests/site.pp.ha.example /etc/puppet/manifests/site.pp 
vi /etc/puppet/manifests/site.pp

Then, use the ‘puppet apply’ command to activate the manifest:

puppet apply -v /etc/puppet/manifests/site.pp

When the puppet apply command runs, the puppet client on the build server will follow the instructions in the site.pp and cobbler-node.pp manifests and will configure several programs on the build server:

  • ntpd -- a time synchronization server used on all OpenStack cluster nodes to ensure time throughout the cluster is correct
  • tftpd-hpa -- a TFTP server used as part of the PXE boot process when OpenStack nodes boot up
  • dnsmasq -- a DNS and DHCP server used as part of the PXE boot process when OpenStack nodes boot up
  • cobbler -- an installation and boot management daemon which manages the installation and booting of OpenStack nodes
  • apt-cacher-ng -- a caching proxy for package installations, used to speed up package installation on the OpenStack nodes
  • nagios -- a infrastructure monitoring application, used to monitor the servers and processes of the OpenStack cluster
  • collectd --a statistics collection application, used to gather performance and other metrics from the components of the OpenStack cluster
  • graphite and carbon -- a real-time graphing system for parsing and displaying metrics and statistics about OpenStack
  • apache -- a web server hosting sites to implement graphite, nagios, and puppet web services

The initial puppet configuration of the build node will take several minutes to complete as it downloads, installs, and configures all the software needed for these applications.


Once the site.pp manifest has been applied to your system, you need to stage puppet plugins so they can be accessed by the managed nodes:

puppet plugin download

After the build server is configured, the systems listed in site.pp should be defined in cobbler on the build server:

# cobbler system list
   slb01
   slb02
   control01
   control02
   control03
   swiftproxy01
   swiftproxy02
   swift01
   swift02
   swift03
   compute01
   compute02

Load-Balancer Node Deployment

Now that the OpenStack nodes are being managed by the Build Node, deploying a node should be as simple as calling the clean_node.sh script with the node name:

/etc/puppet/manifests/clean_node.sh {node_name}

NOTE: Replace node_name with the name of your controller.

Example:
/etc/puppet/manifests/clean_node.sh slb01
/etc/puppet/manifests/clean_node.sh slb02


clean_node.sh is a script which does several things:

  • Configures Cobbler to PXE boot the specified node with appropriate PXE options to do an automated install of Ubuntu
  • Uses Cobbler to power-cycle the node
  • Removes any existing client registrations for the node from Puppet, so Puppet will treat it as a new install
  • Removes any existing key entries for the node from the SSH known hosts database

You can watch the automated Ubuntu installation progress by using the Virtual KVM from the UCS CIMC. Once the installation finishes, nodes reboot and then automatically run the puppet agent. The Puppet agent will pull and apply the configuration associated to its name defined in the site manifest on the build node. This step will take several minutes, as puppet downloads, installs, and configures the various OpenStack components and support applications needed on the control node. Observe the progress of the puppet deployment process by tail'ing syslog:

tail -f /var/log/syslog

Once OpenStack nodes complete the build process, run puppet on the build node a second time:

puppet agent -t

This second puppet run will gather information about the individual OpenStack nodes collected by puppet when they were being built, and use that information to set up status monitoring of the OpenStack cluster on the build server.

Verify the Load-Balancer Node Installation

After the load-balancer nodes complete the automated deployment process, use the following steps to verify proper functionality:

Make sure puppet agent runs clean:

puppet agent -t -d

Note: If you are unable to complete a clean puppet agent run, then do not proceed with the deployment. Send a support request email to openstack-support@cisco.com with a copy of the puppet run error.

Verify that the haproxy service is running:

service haproxy status
haproxy is running.

Verify that the virtual IP address for the Controller cluster is bound to node slb01:

ip addr list

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether f8:72:ea:00:1c:4d brd ff:ff:ff:ff:ff:ff
    inet 192.168.220.81/24 brd 192.168.220.255 scope global eth0
    ""inet 192.168.220.40/32 scope global eth0""
    inet6 fe80::fa72:eaff:fe00:1c4d/64 scope link 
       valid_lft forever preferred_lft forever

Verify that the virtual IP address for the swift proxy cluster is bound to node slb02:

ip addr list

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether f8:72:ea:00:1c:4d brd ff:ff:ff:ff:ff:ff
    inet 192.168.220.82/24 brd 192.168.220.255 scope global eth0
    ""inet 192.168.220.60/32 scope global eth0""
    inet6 fe80::fa72:eaff:fe00:1c4d/64 scope link 
       valid_lft forever preferred_lft forever

Controller Node Deployment

Now that the OpenStack nodes are being managed by the Build Node, deploying the controller nodes should be as simple as calling the clean_node.sh script with the node name. It is IMPORTANT to deploy the controller nodes one at a time starting with control01, then control02 and lastly control03

/etc/puppet/manifests/clean_node.sh {node_name}

Note: Replace node_name with the name of your first controller node.

Example:
/etc/puppet/manifests/clean_node.sh slb01


Once control01 completes the build process, run puppet on the build node a second time:

puppet agent -t

This second puppet run will gather information about the individual OpenStack nodes collected by puppet when they were being built, and use that information to set up status monitoring of the OpenStack cluster on the build server.

Verify the Controller Node Installation

After the controller nodes complete the automated deployment process, use the following steps to verify proper functionality:

Make sure puppet agent runs clean:

puppet agent -t -d

""Note:"" If you are unable to complete a clean puppet agent run, then do not proceed with the deployment. Send a support request email to openstack-support@cisco.com with a copy of the puppet run error.

Verify that the MySQL Galera monitoring service is functioning:

curl http://192.168.220.41

Verify that the RabbitMQ service is running:

service rabbitmq-server status

Verify that the OpenStack services are running:

service keystone status
service glance-api status
service glance-registry status
service nova-api status
service nova-conductor status
service nova-scheduler status
service nova-consoleauth status
service quantum-server status
service quantum-dhcp-agent status
service quantum-plugin-openvswitch status
service cinder-api status
service cinder-scheduler status

Verify that you can obtain a token from Keystone:

keystone token-get
ip addr list

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether f8:72:ea:00:1c:4d brd ff:ff:ff:ff:ff:ff
    inet 192.168.220.81/24 brd 192.168.220.255 scope global eth0
    ""inet 192.168.220.40/32 scope global eth0""
    inet6 fe80::fa72:eaff:fe00:1c4d/64 scope link 
       valid_lft forever preferred_lft forever

Verify that the virtual IP address for the swift proxy cluster is bound to node slb02:

ip addr list

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether f8:72:ea:00:1c:4d brd ff:ff:ff:ff:ff:ff
    inet 192.168.220.82/24 brd 192.168.220.255 scope global eth0
    ""inet 192.168.220.60/32 scope global eth0""
    inet6 fe80::fa72:eaff:fe00:1c4d/64 scope link 
       valid_lft forever preferred_lft forever

Rating: 5.0/5 (2 votes cast)

Personal tools