Wednesday, June 19, 2013

Inside an image factory

In the previous post of this series I looked at what’s in an image. This post will cover the workings of an image factory.


Base library and imports

Polly and Dolly, clones. Photo credit: The Telegraph
The starting point for an image is... another image. 
Whilst it’s possible to create an image entirely from scratch, that would involve having entire operating systems available as packages that could be deployed into those empty images. It’s just more efficient to start from a base operating system image or an imported image of the users choice.


Mount here

To put things into an image the factory mounts it (or partitions within it) using the loopback interface, so the image appears to be a block device like any other. Once an image is mounted it’s very straightforward to copy packages into it.

Chroots aren't just for smoking

Photo credit: chrisjohnbeckett on Flikr
Let’s say the image is mounted at /mnt/my_image... This is no good for running scripts, because if I try to modify files in /etc I’ll end up modifying the /etc of the factory rather than the /etc of the image (which is in fact /mnt/my_image/etc). I need to change root to /mnt/my_image, which can be done using the chroot command.

Chrooting allows scripts to change the image rather than the factory, but it’s not a panacea. Any code that I execute is running in the context of the factory - so (for example) if I try to find out what the OS type is then at best I’ll get the answer of the factory’s OS, and at worst I’ll get no answer at all (because the /proc filesystem that gives me a window into the inner workings of the operating system won’t be populated).


This means that I can use chroot to get some work done at inception but not every type of script or installation is going to work as might be inspected.



Meta virtualization to the rescue

Chroot is often used as a ‘poor man’s virtualization’, and the answer to many of the problems with chroot is to use a real virtualization container. As (in the case of Server3 at least) the factory runs on a virtual machine (in the cloud) this implies virtualization within virtualization, which we can call meta virtualization.

Using meta virtualization the image is actually running within the factory, so installation steps that might go awry in a chrooted environment should work perfectly. There are however a couple of things that can catch out the unwary:



  1. When the image is instantiated as an instance inside the factory it will go through its first boot process. Some aspects of this (e.g. creating machine SSH keys, udev rules for IP addresses etc.) should not be carried over into the true runtime environment - especially where many instances will be launched from a single image.
  2. The image shouldn’t be allowed to customise itself to the factory hypervisor, and network, as it will meet new ones when it’s launched later as an instance.
Both of these issues mean that some tidying up is needed at the end of the factory installation process so that final customisations can happen on first boot in the target environment.

Meta virtualization also allows for inception of non Unix derived operating systems (like Windows), which don't have chroot.


Conclusion

An image factory allows software installation to be done before a machine reaches a cloud so that it can be productive more quickly. Getting software into an image is relatively simple, but actually running stuff inside the factory is tougher - needing chroot or meta virtualization. It's worth it though, particularly when an image is going to be launched many times.

Friday, June 14, 2013

We've been feeling lucky lately.

CohesiveFT Wins 2013 Award for Public Cloud Services & Infrastructure, First Runner Up in DataCentre Solutions Award 


Last night, at the 6th Annual International Datacenters Awards in London Senior Soutions Architect +Sam Mitchell brought home the award for Public Cloud Services & Infrastructure.

Winner! International Datacenters Awards - Public Cloud Services & Infrastructure

The Public Cloud Services & Infrastructure Award recognises a company with a demonstrable track record that can be used as an exemplar to the global datacentre industry.

CohesiveFT won the award for the company track record and the software-defined networking (SDN) product, VNS3. VNS3 gives customers connectivity that allows businesses to extend an existing network to any cloud environment.

Runner Up - Datacentre Solutions Awards CohesiveFT was also recently named runner up the Data Centre Solutions Award for Public Cloud Project of the Year.

The recognition is shared with CohesiveFT UK based partners. The solution that earned the nomination focused on migrating and connecting public utility data to the public cloud to reduce IT overhead.

The award recognizes the best implementation of a cloud computing project (public cloud, private cloud or hybrid cloud) in any public-sector organisation with tangible benefits in either cost savings or efficiency gains.

The solution that got runner-up helped a customer organize and visualize 20+ years of information, or nearly 250M data points relating to 25 million UK households and have cloud compute capacity in the IBM SmartCloud.

The customer was able to migrate and connect their cloud and physical data centers. Now the customer can use compute functions that are now accessible at any time, from anywhere, yet still retain all the functionality of a traditional enterprise-level solution. The customer used the IBM SmartCloud as the most ecologically sound way of rolling out software, with a dramatically reduced their carbon footprint.

Interested in hearing more from the CohesiveFT team? Come find us in June: 


CTO +Chris Swan at ODCA Forecast 
17 - 18 June
Panel: Software Defined Data Centers
14.15 - 15.15pm June 17 in Colonial Room - Mezzanine Level
Moderator: Jo Maitland
Agenda and event info

CohesiveFT at Cloud Computing World Forum
26 -27 June
Agenda and event info
Demos with +Sam Mitchell at Stand 3050:
  • 10.00 & 14.00 Control and Security in the Cloud - how to take control and secure your data in the cloud? 
  • 10.45 & 14.45 What is Software Defined Overlay Networking? - an introduction to VNS3 the market leading overlay SND solution
  • 11.30 & 15.30 D elivering applications to your cloud extended network - import, transform and deliver applications your chosen cloud
  • 12.15 16.45  etting started with VNS3 - Demo of using VNS3 Free Edition from AWS Marketplace
CTO Chris Swan's CWF speaking schedule:
  • Panel: CloudCamp London Preview - Network Virtualisation, SND, NfV what's it all about? - 11:45 in the Executive Track
  • Panel: How is Cloud Changing Your Data Centre?  - 26 June at 16:20 - Evolution Theatre
  • Panel: The Role of Cloud in the Internet of Things Transition - 27 June at 16:20 in the Connection Theatre

CloudCamp London - 'Network Virtualisation, SDN, and NfV - what's all the fuss?'
18.30 at  The Crypt in Clerkenwell, London
Organized by +Chris Purrington

Thursday, June 13, 2013

What's in an image anyway?

In previous post in this series I've looked at the process of software installation in order to create a fully functional virtual appliance. This post is going to look at some of the lower level details (and will get technical in places).

What's an image?


An image is in essence the representation of storage (a hard disk) as a file. Unix like operating systems typically offer the means to treat files like disks through a loopback device[1], so I can type something like:

sudo losetup /dev/loop0 my_file.img
In order to use the device /dev/loop0 as a virtual hard disk that reads and writes into my_file.img[2].

Windows users aren't left out now either, as various types of image (most typically .iso for CD/DVD images and .vhd for hard disk images) can be mounted in recent versions.

Image types


The image I've described above is the most basic form - just a (mostly empty) file with some bytes in it. Such a file might sometimes be called a 'raw' image, and may also be given the .raw extension. Each of the virtualization platforms has its own native/preferred image type:

  • VMDK - used by VMWare (and also supported as an alternative type or import type on many other platforms)
  • VHD - used by Microsoft (initially for their Virtual Server platform and more recently for Hyper-V and Azure)
  • VDI - used by Oracle's VirtualBox
  • HDD - used by Parallels
  • QCOW/QCOW2 - used by KVM

Standards?


There's something of a common misconception that the Open Virtualization Format (OVF) provides a standard for images. It doesn't. OVF is a metadata file about a VM that provides some degree of portability between virtualization platforms. An OVF file will usually accompany in image, but import support for OVF doesn't guarantee import support for a given image type provided with an arbitrary OVF file. OVF files are often bundled along with images and other key files in Open Virtualization Archives (OVA).

Partitions


Partitioning allows a hard disk to be carved up into smaller chunks, which can be useful for a variety of reasons. This practice continues into the virtual world, so it's usual for images to have one or more partitions inside them.

I could create a partition on my image:

sudo parted -s /dev/loop0 mklabel msdos
sudo parted -s /dev/loop0 unit cyl mkpart primary ext2 -- 0 -0
sudo partprobe /dev/loop0


This is one of the places that the various image types used by different

virtualisation systems and clouds can start to diverge. Some systems don't use partitions, some specify a single partition for the entire (virtual) disk whilst others allow for the same flexibility found on many physical systems with perhaps different partitions for the main filesystem (/) boot files (/boot) and virtual memory (/swap).

File systems


Just like real disks an image needs a file system such as ext3 or ext4 on Linux or NTFS on Windows. The file systems are created using the same tools as for real disks, though there's no need for the formatting process to scan a disk for possible errors

I could create a file system on my image:
sudo mkfs.ext4 /dev/loop0p1
At this stage I could mount my image and start copying stuff into it.

The boot process 


The point of most images is to be bootable units of functionality. So it's important to understand the boot process and how that changes in some virtual and cloud environments.

The usual boot order goes something like this:
  1. BIOS When a CPU is powered on it will load its reset vector, which in a PC will point to some firmware called the basic input/output system (BIOS). The BIOS will keep a list of devices that might be present with bootable media, and the order to try them in.
  2. Bootloader 1 if found in the first sector of a bootable disk, which is known as the Master Boot Record (MBR). With only 446 bytes to play with it can't do much more than launch the next stage.
  3. Bootloader 2 is able to offer a menu of choices for what gets launched next. On Linux systems this will usually be the Grand Unified Bootloader (GRUB).
  4. Kernel The operating system kernel is loaded into memory and begins to initialise the rest of the system by launching...
  5. Init a set of scripts to determine what gets started at boot time according to the desired 'run level', which determines things like number of users and whether a GUI is launched.
  6. Runlevel programs bring up various services as the machine starts.
This is the model that's existed for the PC since it first came to market over 30 years ago, so there's some fairly old stuff buried in there. With virtualisation it's possible to reinvent things...
  • The BIOS doesn't need to look for a boot order - you can't plug a USB drive or a CD into the cloud.
  • There's really no need for an MBR if all it's doing is pointing to the next link in the chain
  • A boot menu is also pretty pointless for a machine that will never have anybody stood in front of it
  • If a limited range of kernels is supported then they don't have to be read off disk, they can be integrated into the environment.
This is why some clouds effectively launch straight into a kernel, which then hands off to the init scripts found in an image. EC2 took this approach during the early years, and it's what Google Compute Engine does today; however EC2 opened things up a few years ago by offering PV-GRUB - effectively a kernel that acts like a boot loader (so you can use your own kernels).

A waste of space


The process I've outlined in the examples through this piece wastes a lot of space, because if I want a 1G disk then I have a 1G file full of zeros. Of course I can compress the file when I move it from machine to machine, but it will still take up storage when it gets to the destination.

This can be resolved by using copy on write (COW) techniques. In storage systems COW is used (often in combination with 'thin provisioning') to allocate storage as a disk fills up, and the same trick can be used when a disk is represented as a file.  Of course the cleverness to do this needs to be done somewhere, so more advanced drivers are required to sit between the image file and the operating system manipulating it, which is usually baked in to various virtualization platforms.

Most of the different image types described above use some kind of COW approach so that big virtual disks result in small image files.

Conclusion


VMware Server 1.0.6 for Windows
running Linux as a guest (credit: Wikipedia)
An image is simply a disk represented as a file. Images are fairly straightforward to create and manipulate, using a small number of tools alongside those that would normally be used for handling hard disks. There are however a somewhat bewildering array of proprietary and open source image formats, and many ways of putting an operating inside an image. This all makes moving images between systems something of a challenge. The one main standard in this area - OVF - sadly does little to help.



[1] Not to be confused with the network loopback interface that's present on most systems as 127.0.0.1.
[2] It is of course necessary to create the file (usually by filling it full of empty space 'dd if=/dev/zero of=my_file.img bs=1M count=1024' would make an empty 1G file).

Tuesday, June 11, 2013

Software Defined Networking and Network Function Virtualization

Software Defined Networking (SDN) has been a hot topic for a little while now, and remains of great interest as people try to figure out approaches to a Software Defined Data Center (SDDC).

Network Function Virtualization (NFV) is a less familiar term to many, but is a good label for some important developments where networking meets virtualization and cloud computing.
SDN is where an application programming interface (API) is used for configuration and management of a network.
NFV is where commodity hardware is used to run networking functions in software.

The Overlap
Since both SDN and NFV relate to networking and software there's quite an overlap between the two things, especially when it comes to talking about implementations:
Since NFV is implementing the network in software it would be somewhat ridiculous to have a good old command line interface (CLI) as the only means for configuration, but that doesn't mean it's not possible. Most NFV implementations come with an API (and quite often a web interface built on top of that API), but that's in no way mandatory.
Similarly Software Defined Networking doesn't have to mean software implemented networking. For sure there are a bunch of SDN implementations that have software moving packets around, but there are also plenty of SDN implementations that are hardware based.

Standards
The standard that everybody seems to be talking about in the SDN space is OpenFlow (to the extent that for some it seems like SDN==OpenFlow). OpenFlow has become by far the most common 'south bound' interface - between an SDN controller and a network forwarding plane.
Standards for NFV are yet to emerge, though there is some work going on at ETSI. To a certain extent there's less of a need for standards with NFV as it's a classic case of same wine, different bottles. The network functions are usually the same as they've ever been, and the protocols used are too. For large scale operators (like telcos) there might be some desire to standardise aspects of the hardware and any virtualization layers used and associated management/deployment tools, but as we've seen more generally in the world of virtualization and cloud platforms (lack of) portability between implementations hasn't been much of a barrier to adoption.

What's it Good For?
The advantages of SDN have been covered at length elsewhere. For me the key point is moving away from the interface to the network being a trouble ticket to the network operations team, which is a key prerequisite to data center automation at scale.
The advantages of NFV depend more upon perspective. For large network operators it's about moving from costly proprietary models to commodity implementations, which also has impact upon scale and agility (it's much easier to source, provision and deploy onto a bunch of regular servers or VMs).
For end users NFV is about the democratisation of networking. If it's now possible to deploy a router, firewall, VPN concentrator or protocol redistributor onto a cloud where it costs a couple of cents an hour then all kinds of things can be done that were previously too difficult or costly.
SDN is making networks easier to configure and manage, whilst NFV is making networks easier to deploy and scale. The two fit together quite naturally, as the key theme is software. This is especially true in the cloud, where you can't ask a service provider to install a lump of hardware for you.

This post originally appeared on Wired Innovation Insights.



Catch more from Chris Swan on SDN and NfV at CloudCamp on 26 June

CohesiveFT UK team, including +Sam Mitchell and +Chris Purrington, are hosting the next CloudCamp London on 26 June. The theme is Network Virtualisation, SND, NfV what's it all about? 
+Chris Swan will speak on "5 Years and 500 customers later -  a look at real world SDN overlays case studies"
Register here (free)

Friday, June 7, 2013

Image management - installation stages

In the previous post in this series I looked at closing the installation gap, and the tools that can be used to create a fully functioning virtual appliance. In this post I'm going to look in a little more detail at where installation can happen within a virtual appliance factory like Server3.

Images


The main feedstock for a virtual appliance factory is an image.

A 'base' image will usually be nothing more than a minimal installation of an operating system in order to minimise resource use and security vulnerability surface area. Stuff that isn't there doesn't take up disk space, doesn't consume network bandwidth (and time) on deployment, and can't be hacked.

There's no need to stick to base images though, and images may be imported that already contain additional functionality such as middleware or an application. This can be particularly useful when moving functionality from one place to another, whether that's a different virtualisation platform or cloud type, or simply different regions or accounts in the same cloud.

Packages and bundles


Packages and bundles are different levels of granularity for stuff that can be added to an image (with package being the term we use to define the smaller level of granularity - so packages go into bundles). Packages and bundles get the files to where they're needed for installation, but in most cases there's a little more work to be done - scripts need to run to move files around, update configuration metadata etc.

Putting a package or a bundle into an image is analogous to downloading something - it's there and ready to be setup, but the setup process hasn't run.

First boot


When an image is first launched (and become an instance) there's an opportunity for scripts to run, so at this stage any packages and bundles can turn from downloads into installed.

The problem with doing things at first boot is that running a bunch of scripts can be time consuming. This is especially problematic if the whole point of launching a new instance is to absorb a spike in load. What's needed is a way to push that effort back into the factory, and deliver an experience that's more like resuming a system from hibernation rather than booting up from cold.

Inception


Inception is the name for the process where packages and bundles are installed within the virtual appliance factory, so that applications are ready to run as soon as a machine is started. Inception closes the gap between a base image + some stuff and a fully baked virtual appliance. It can also work with DevOps tools like Chef, so that the config (expressed as recipes or roles) doesn't have to be replicated into packages and bundles within the factory. In that case all that's needed is a package for the DevOps tool itself and a small lump of configuration metadata so that it can connect to the right place to retrieve roles/recipes etc.

Conclusion


There are essentially three places that installation can take place:

  1. Before the factory - by importing an image that already has stuff installed.
  2. After the factory - by running scripts on first boot.
  3. In the factory - by using the inception process to go through what would happen on first boot before a machine is ever deployed.
Options 1 and 2 each come with compromises in terms of flexibility and timeliness. Option 3 - inception offers a best of both worlds approach, particularly when paired up with DevOps tools.

Catch up on the last posts on image management: 

Related Posts Plugin for WordPress, Blogger...