Thursday, May 23, 2013

Image management - closing the install gap

I introduced the concept of an 'install gap' in this illustration that I used in 'Image management - an introduction':
The broad point is that to do useful work a VM (or cloud instance) needs to be running an application or offering services that are used as part of an application. An operating system on its own (or even an operating system with some middleware installed) isn't all that useful.

Installing by hand

After launching a virtual machine it's fairly typical (at least in the first instance) for the user to install stuff by hand. This might be done with a package manager (e.g. yum on Enterprise Linux, apt-get on Debian/Ubuntu and derivatives), a source control system (e.g. git/github or Mercurial), language specific packages (e.g. Ruby gems or Java ear/war/jar files) or simple file archives (e.g. tar or zip).

Installing by hand provides infinite flexibility, but it's time consuming, resource intensive and error prone, which is why it's often desirable to automate installation (particularly once something has been developed and it's moving into a production setting).

Scripting

There's a mantra that 'any good systems administrator will replace themselves with a script', and scripting provides ways for stuff that would be done by hand to be automated. Many operations people start out writing their own scripts in bash, perl, python or whatever, but in a large organisation the scripts can become a tangled mess with their own set of (inter)dependencies.


DevOps tools

Tools like Chef and Puppet are essentially scripting frameworks that are (or at least can be) strongly connected to version control systems. This allows infrastructure (or at least infrastructure configuration) to be treated like code. There is perhaps too much focus on the tools used in DevOps (a natural IT tendency) rather than the fact that it's an artifact of more mature design.


Deployment automation

Deployment automation tools seek to resolve scripting interdependencies by introducing centralised repositories and management. Such tools were mostly developed for and targeted at physical servers before the advent of virtualisation.

A painting analogy

Building a server up from 'bare metal' can be a bit like spray painting a car. Different layers are applied to build up to a glossy finish - primer, base coat, colour, lacquer. Like new cars coming out of highly automated factories the aim is to get a perfect finish on each individually customised order.

This is where things can go wrong with deployment automation tools. They tend to be used to spray across multiple machines at the same time, and it gets messy when a new sort of primer (aka patch) is sprayed on after the lacquer (finished application) was already applied.

The layers found in deployment automation can often become like tectonic plates - any movement and you get earthquakes across the data center.

Containment

The point of a virtual appliance is to provide containment. A single machine to be built up (like a single car in the paint shop). The virtual appliance is a unitary deployable unit. The parts within it can be known to work together. Change in one virtual appliance doesn't ripple through its neighbours.

Conclusion

There are many approaches to closing the install gap, but each has their drawbacks. Virtualisation brings with it the opportunity to do something different with virtual appliances, and in the next part of this series I'll take a look inside a virtual appliance factory at the stages where installation can take place.

Wednesday, May 22, 2013

Cloud Networking Webinar Wrap Up

From best practices to life in the cloud

We've just wrapped our most recent webinar series on cloud networking. Senior cloud solution architect, +Sam Mitchell hosted the 30 minute webinars through May. Sam focused on setting up the cloud networking concepts, highlighted the basic technologies for enterprise customers, and real-life case studies.
  
For the final webinar, Sam was feeling poorly (and coughing, which makes for a challenging webinar) so CEO +Patrick Kerpan stepped in for Part 3. 

If you missed the live events, catch up here. We've got slides and recordings hosted on our website too: cohesiveft.com/webinars


Part 1: Cloud Networking Best Practices

Cloud computing infrastructure is elastic, scalable, highly available and accessible, but is it safe? The first webinar introduces the cloud networking basics, security lattice strategy, and concepts of security and control in public cloud networking.


Full Video :



Part 2: Solution Cases

What can cloud networking do for you? Cloud networking provides answers to many of the questions enterprises face when looking to the public cloud. Part 2 highlights technical use-cases of user-controlled cloud networking.




Full video:



Part 3: Life in the Cloud
Who is using cloud networking?  This webinar focuses on case studies examining some of our current customers and how they are using cloud networking to solve their cloud related problems.


Part 3 Recording

Monday, May 20, 2013

A hybrid strategy

It seems that the debate on Twitter over hybrid cloud rumbles on.

I might have hoped that this was the end of it:
But then a few days ago we're back to the usual:
I felt obliged to reply:
This brings me back to a point I made during a panel session at ODCA Forecast last year. There's a difference between a hybrid cloud and a hybrid strategy:
  • A hybrid cloud is a public and private cloud munged together so that they (ostensibly) work as some sort of unified thing. It's a bit of a mythical beast as most private clouds work very differently to public clouds, so there's usually a massive impedance mismatch between one and the other. Tools (including CohesiveFT products) can be used to overcome those difficulties, but there are reasons why things like 'cloud bursting' appear in lots of slide shows and very few practical applications.
  • A hybrid strategy is where public cloud and private cloud are used as appropriate for a given application or workload. Any individual thing might not be bursting across the boundary, but the organisations adopting this strategy get to pick and choose what suits them. Zynga is a notable case study for this because they use public cloud for new apps at launch (when initial demand and growth are essentially unknowable) then migrate to a private environment once the dust has settled and capacity management is under control.

I'll be back at ODCA Forecast next month, on a panel about Software Define Data Centers, and I hope to be able to once again cut through some of the marketing, myth and buzz words and talk about the real progress that's being made in the industry.

Friday, May 17, 2013

Google Compute Engine - first impressions

One of the announcements that seemed to get lost in the noise at this week's IO conference was that Google Compute Engine (GCE) is now available for everyone.

I took it for a quick test drive yesterday, and here are some of my thoughts about what I found.

Web interface

The web UI is less bad than most of the other public clouds I've tried of late, but it's nowhere near as good as AWS. I see a number of places where I think 'that works fine now whilst I'm just playing, but I'm not going to like that when I'm using this in anger and I've got LOTS of stuff to manage'.

One thing I like a lot about the web interface is how well it has been connected to the REST API and gcutil command line tool. The overall effect is to give the impression 'this is just for when you're running with training wheels, if you're serious about using this platform then you'll use (or build) some grown up tools elsewhere'.

Gcutil

Google have gone with their own API, which means you can't use third party tools adapted to AWS and other popular APIs. If (as most pundits predict) Google grows to be the #2 public IaaS this won't be a big deal as an ecosystem will grow around them. For the time being I expect the main way that people will use the API is through the gcutil command line tool. It's very easy to get going with gcutil due to the integration with the web interface, though I do wish there were direct links from the tool guide rather than links to links (a trap for those like me that just copy links and paste into wget commands).

Access control

GCE uses OAUTH2 for access control. This is both a very clever use of standards, and a Lovecraftian horror to use.
Beware, Fluffy Cthulhu will eat your brains if you think you can just source different creds to switch between accounts
This manifests itself when you first use gcutil where the invokation creates a challenge/response - paste URL into browser, authenticate, approve, paste token back into gcutil. A ~/.gcutil_auth file is then written to save you jumping through the same hoops every time. It's possible to make the tool look elsewhere for the credentials stored in that file (and I guess equally possible to write a script to move files into and out of the default location), but the net effect is to bind a user on a local machine to an account in the cloud, which I think will be jarring to many people who are used to just sourcing creds files into environment variables as they hop between accounts (and providers).

SSH

Google also breaks with convention over how it manages SSH keys. Most other clouds either force you to create a key pair before launching an instance, or allow the upload of the public key from a keypair you made yourself.

GCE creates a keypair for you the first time that you try to access an instance using SSH (with a different name):
  • gcutil creates a keypair and copies the private key to ~/.ssh/google_compute_engine
  • the public key is uploaded to your project metadata as name:key_string
  • new users of 'name' are created on instances in the project
    • and the key_string is copied into ~/.ssh/authorized_keys on those instances
  • meanwhile gcutil sits there for 5 minutes waiting for all that to finish
    • I've found that the whole process is much faster than that, and in the time it takes me to convert a key to PuTTY format everything is ready for me to log into an instance (whilst gcutil is still sat there waiting).
The whole process is a little creepy, as you end up signing into cloud machines with the same local username as you're using on whatever machine you have running gcutil. This also feels like another way that gcutil ends up binding a little too hard to a single local account.

Access control redux - multi accounts

The OAUTH2 system for creating gcutil tokens does support Google's multiple account sign on - allowing me to choose between my personal and work accounts.

The web interface doesn't.

If I want to use the web interface with my work account then I have to use my browser in incognito mode (and jump through the 2FA auth every time, which is a pain).

At this stage I'm glad I'm only wrangling two GCE accounts. Any more and I'd be quickly running out of browsers (and out of luck if I was using my Chromebook).

Image management

The entire GCE image library presently fits onto a single browser page, and half of that is deprecated or deleted, so the choice of base OS is limited to Debian (6 or 7) and Centos 6.

There are no choices for anything more than a base OS (though there are instructions for creating your own images once you've added stuff to a base OS).

There is no (documented) way to import an image that didn't start out from one of the official base images.

There is no image sharing mechanism.

There is no image marketplace (or any means to protect IP within images).

Network

This is an area where it seems Google have learned from Amazon how to do things more intelligently. The network functionality is more like an Amazon Virtual Private Cloud (VPC) than the regular EC2 network. By default you get a 10.x.x.x/16 network with a gateway to the outside world and firewall rules that let instances talk to each other on that network, and SSH in from the outside.

Firewall rules apply to the network (like VPC security groups) rather than the instance (like EC2 security groups), and there's a very flexible source/target tagging system there that can be used to describe interconnectivity.

The launch announcement talks about 'Advanced Routing features help you create gateways and VPN servers, and enable you to build applications that span your local network and Google’s cloud', but if those features exist in the API I don't (yet) see them exposed anywhere in the web UI.

Disks

The approach to disks is much more like Azure's IaaS than AWS, at least in terms of default behaviour. terminating an instance doesn't destroy the disk underneath it, and it's possible to leave that disk hanging around (and the meter running) then go back and attach another instance to it later. If you don't want the disks to be persistent then that needs to be specified at launch time (or you have to delete the disk after deleting the instance).

There's no real difference in capability here, it's just a difference in default behaviour.

Speed

GCE feels fast compared to AWS and very fast compared to most of the other public clouds I've used. Launches and other actions happen quickly, and the entire environment feels responsive. I hope this isn't a honeymoon period (like Azure IaaS storage) where everything is fine for the first few days and crumbles under load once people have the time to get onto the service (given how Google have handled the launch of GCE I'm fairly confident they won't repeat Microsoft's mistakes here).

I haven't benchmarked any instances to see if machine performance is +/- equivalent AWS instances, but I've heard on the grapevine that GCE has more robust performance.

Price

Seems to be set to be about the same as AWS benchmarks across instances, storage and network. GCE doesn't seem to be competing on price (yet), but if might be offering better quality (albeit for fewer services) at the same price.

One thing that has caught people's attention is the move to per minute billing (with 10m minimum):
I'm not so sure:
Paying for a whole hour when you tried something for a few minutes (and it didn't work so start again) might be a big deal for people tinkering with the cloud. It might also be a thing for those bursty workloads, but I think for most users the integral of their minute-hour overrun is a small number (and Google will no doubt have run the numbers to know that exactly).

In effect per minute billing means GCE runs at a small discount to AWS for superficially similar price cards, but I don't see this being a major differentiator. It's also something that AWS (and other clouds) can easily replicate.


Conclusion

There's a lot to like about GCE. It gets the basics right, and no doubt more functionality will come with time.

I see room for improvement in the identity management pieces, but the underlying security bits are well thought out and executed.

Image management is the area most in need of attention. People are religious about their OS choices, and having one flavour from each of the big Linux camps is enough for a start but not enough for the long term. Google's next major area for improvement has to be getting the right stuff in place for a storefront to compete with AWS Marketplace. Some people might even want to run Windows :-0

Wednesday, May 15, 2013

Image management - an introduction

One of the key elements of any cloud is an image library, which gives users the choice of what software they get when they start a virtual machine[1]. With a popular cloud like Amazon's EC2 there are thousands of images (in their case called Amazon Machine Images [AMIs]) to choose from:

Most image libraries start out with bare operating systems, but it's possible to add additional functionality such as middleware or complete applications (often referred to as 'virtual appliances'):

Amazon these days has a shop front for images ranging from bare operating systems to complete virtual appliances - the AWS Marketplace, which has categories like Operating Systems, Application Stacks and Business Software.

Image management encompasses the tools and processes used to make images to go into an image library, and deal with the lifecycle of those images such as which should be offered (based on things like patching of various components). It's therefore more than just the image library itself. For example OpenStack has a module called Glance, which 'provides services for discovering, registering, and retrieving virtual machine images'. Glance could therefore be appropriately described as the image library component of OpenStack, but the task of getting images into Glance, and determining what should go into those images - the broader issues of image management - aren't handled by Glance itself.

This post is the first in a series, which will go on to cover things like base image selection, how to close the install gap, and how to connect the operational side of image management to development processes.

Notes:


[1] Physical machines are installed from physical media (e.g. CDs and DVDs). Virtual machines are installed from virtual media (e.g. .iso files of CDs and DVDs). Cloud machines aren't installed. This may deserve a place on the list '15 Ways to Tell Its Not Cloud Computing': 12a: if you can't choose from a range of images - it's not a cloud.
Related Posts Plugin for WordPress, Blogger...