In this section I discuss service level agreements (SLAs) drawing on my own experience of managing a cloud computing business and reworking its SLA, but also borrowing a number of ideas from a ZDNet.com article by Frank Ohlhorst (Ohlhorst, 2009).
By its nature a cloud computing purchase is usually impersonal and automated. You typically buy a service online and pay as you go, and there is often no way to negotiate a service level agreement – you just get the standard one that every customer gets unless your business is enterprise class and you are considering a serious investment. Moreover, most suppliers create SLAs to protect themselves, not their customers, against litigation, and, typically, they only offer customers minimal assurances. Understandably this state of affairs deters many would-be cloud customers, but some SLAs are better than others and there are a number of things to look out for in the small print. There are three key areas to consider when you are reviewing SLAs and talking to suppliers: data protection; continuity of service; and Quality of Service (QoS).
If you store business data in public clouds then system security failures and data loss are obvious risks, and there may be legal risks, too (see Chapter 3). In any case this data is your responsibility and you would not want it to be stolen or lost, so there follow five sets of questions on data protection for you or your legal team to bear in mind when you are reading the Service Level Agreement of a cloud provider. The five sets of questions cover the issues of ownership; security; access; storage; and retrieval. If you are in a highly regulated industry or you handle sensitive data then you will need satisfactory answers to many if not all of these questions because you, not your provider, are legally responsible for protecting your data.
First, on data ownership:
- Is there agreement that you own your data and any software you develop on the provider’s systems?
- Who owns the data about your data, such as access and modification log files?
Second, on data security:
- How many data centres does the provider have and how are they secured physically?
- How are data encrypted?
- How are their customers’ data and backup files segregated?
- Is security continually tested as they develop and improve their systems?
- Can they produce evidence to show that their security systems have been externally audited and certified?
Third, on data access:
- What personnel policies do they have, and are background checks carried out on new employees?
- How is access controlled and logged?
- What access, if any, do system administrators have to your data?
- What access control reporting facilities do they provide for audit trails?
- Do they permit any subcontractors or partners to access their systems?
- Do they use two-factor authentication for remote access?
Fourth, on data storage:
- How, and in what format, are data backed up, and where are the backups stored?
- Are data ever stored on third-party systems?
- Are data stored only in countries that subscribe to Safe Harbour agreements?
- What happens to data copies when an agreement is terminated?
- What happens to data copies if the provider’s business fails?
- Do they offer the facility for periodic offline data backups, and, if so, what measures are in place to prevent unauthorized backups?
- Can specific data retention policies be applied for regulatory purposes?
- What is their disaster recovery plan?
And, fifth, on data retrieval for legal purposes:
- What procedures do they follow in the event of international or domestic government inquiries into data stored on their systems?
- Do they provide assurances that your data will not be compromised or seized if another of their customers is under legal investigation?
- Do their systems satisfy your internal requirements for governance and compliance?
- What facility for litigation searches or electronic discovery can they provide to investigators?
- How quickly can data preservation or production requests be satisfied?
- What costs will be charged by the provider to customers under investigation?
Continuity of service
Part of the appeal of cloud computing services is that you can access them at any time, but problems do occur and sometimes systems have to be taken temporarily offline for upgrades and maintenance (scheduled downtime), but you can typically expect a guaranteed uptime of between 99.5 per cent and 99.9 per cent from a provider. Now, SLAs are full of legalese, but they should contain details on systems outages, and if you want to gain a better understanding of how the provider deals with outages, here are some good questions to ask:
- What notice period do they have to give before any scheduled downtime?
- How often do they have scheduled downtime and how much time is usually involved?
- How are complete service outages and partial systems outages defined?
- How do customers report service problems – is there a ticketing system in place?
- How do they measure downtime and the severity of outages?
- How are customers compensated for outages?
- What redundancy is built into the systems to minimize outages?
- Do they have alternative methods for accessing data if an outage is prolonged?
- Do they provide reports on outages and other problems?
Quality of Service
Just as you would expect a Quality of Service (QoS) level for IP telephony or your broadband connection, you should also expect a desktop-like experience for Software as a Service and Platform as a Service, with no noticeable latency; and consistently fast provisioning of computing resources from Infrastructure as a Service. The supplier is not responsible for your internet connection or your local network, but they are responsible for the availability of their services and the performances of their cloud infrastructure. If your potential supplier’s Service Level Agreement (SLA) does not cover QoS to your satisfaction then here are some questions to ask them about availability and performance:
- If additional resources are allocated dynamically to an overloaded application or server, how quickly does this happen?
- If a server instance fails, how quickly is it rebooted or replaced?
- Where in the world are the services hosted and how do the response times differ between geographical regions?
- Does the failure or poor performance of an individual application or server instance count as an outage for SLA purposes?
- Do they provide customers with monitoring tools for individual servers, applications and the cloud as a whole, and are these tools external?
- What general QoS metrics do they measure, if any?
As it is difficult to determine where the fault lies when using a service based on the internet, here are a few things you can do to understand and maximize QoS:
- Test and monitor your local internet connections (packet transmission speed, packet loss and response latency) during peak times, and measure data transfer speeds between your local networks and your chosen cloud – your Internet Service Provider may be able to help you improve connectivity.
- If you are migrating software from your private network to the cloud you can benchmark the performance of affected applications and operations on a powerful local server and network first, and then see what effect variations in memory and storage have on the performance by virtualizing the local server. What performance level and response time are acceptable for end users?
- Test your applications in the cloud, compare the performance with your local setup and document the differences. Can your cloud-based system deliver acceptable performance?
A very useful (and free!) tool for testing the performance of web-based applications and multi-page transactions is KITE, the Keynote Internet Testing Environment (http://kite.keynote.com/). You use the tool to navigate through a website
and record the process as a ‘script’, and then you can re-run this script locally (from your PC) and from five geographically separate locations in the KITE network to compare its performance. If, however, you want to continually
monitor your applications there is a charge for that Software as a Service product; and there are other products on the
market, including CapCal (http://www.capcal.com/), a web scalability and performance testing application that runs on
Amazon EC2 servers.
Quality of Service is a subjective term, but if you can define objective performance measurement tests and repeat them on a regular basis then you will be more likely to spot any gradual degradation of cloud services and bring them to the attention of your supplier. And if you can use your supplier’s own measurement tools to prove your point then you will be in a stronger position. Whether you can negotiate an SLA on the basis of QoS will probably depend on the size and influence of your organization, but it could be worth a try, and it is certainly worth monitoring your systems because you are sharing a public cloud with an increasing number of tenants and you are relying on your supplier to ensure their cloud’s capacity grows with demand.