Computing hardware requires maintenance—it’s an unavoidable fact of running a data center. There are a variety of reasons for doing both planned and unplanned maintenance on servers, but one objective remains constant: minimize the impact on end users. The Surgient Virtual Automation Platform™ includes a powerful Maintenance Window scheduling system that helps administrators find the optimal time to perform maintenance on server hardware, and automatically shifts workloads away from the targeted machines just prior to them going offline, minimizing user impact.
While some events require an immediate maintenance window, most maintenance of computing hardware is preventative and scheduled ahead of time. The Surgient Platform provides a unique dynamic capacity management approach that includes a complete reservation system supporting both end user infrastructure requests and administrative maintenance windows. When scheduling a maintenance window, the Surgient Platform provides a list of all workloads that would be affected by taking the targeted server(s) offline. Administrators may then either choose a different time – with lesser impact to running workloads – or select from a list of automated tasks to resolve the conflict with the workload, including using VMotion technology from VMware to move the running VMs. Once the administrator has selected the disposition for each affected workload, the Surgient Platform creates the maintenance window.
Prior to the scheduled start time for a maintenance window, the Surgient Platform leverages its unique reservation system and prevents users from creating reservations that would rely on the resources provided by the specific server(s) targeted for maintenance. This reservation protects those resources from being reserved during the period when they will be unavailable due to maintenance. Since no new workloads will be scheduled against those resources, the administrator’s decisions made during scheduling will allow the Surgient Platform to automatically clear all workloads from the affected servers. Due to the unique reservation system of the Surgient Platform, the further out in time administrators schedule maintenance windows, the less the impact will be on users.
When the scheduled start time for the maintenance window occurs, the Surgient Platform initiates the automated tasks defined as part of the Maintenance Window scheduling function. For lower-priority workloads, the Surgient Platform either shuts down the VMs or puts them into a suspended state from which they will be “resumed” upon completion of the hardware maintenance. For higher-priority workloads – those that must remain available – the Surgient Platform leverages VMotion technology from VMware to move the VMs to alternate hosts.
When the maintenance window ends, the Surgient Platform automatically “resumes” any suspended VMs and provides the administrator with the option to move VMs back to their original hosts. No further action by the administrator is necessary to return the environment to its pre-maintenance state. When an immediate, or emergency, maintenance window is required, the Surgient Platform provides a single click action that will shift all running workloads to another server.
With the Surgient Virtual Automation Platform, administrators can confidently schedule hardware maintenance and proactively determine the effect on end-users well ahead of time. Even in emergency maintenance situations, the Surgient Platform provides powerful automation tools to speed and ease the removal of a server from service. Creating maintenance windows within the Surgient Platform greatly reduces the manual labor and end-user impact of maintaining computing hardware.