Relocating UK Public Sector to the Cloud

Relocating UK Public Sector to the Cloud

Introduction

A recent guidance paper published by The Commission for Smart Government urges the UK Government to take action towards transforming public services into intrinsically digital services. The Commission advises the government to move all services to the cloud by 2023.

It is clear from the paper that strong leadership and digital understanding amongst decision makers is incredibly important. This is something I noted when writing this post on defining a cloud strategy for public sector organisations. The cloud strategy should set out how technology supports and delivers the overall organisational goals.

If implemented correctly, cloud computing can maximise security and business benefits, automating and streamlining many tasks that are currently manual and slow. Published by the National Cyber Security Centre in November 2020, the Security Benefits of Good Cloud Service whitepaper provides some great pointers that should be incorporated into any cloud migration strategy.

This article discusses how to achieve a common cloud infrastructure, focusing on brownfield environments where local government, and other public sector organisations like the NHS, need to address some of the challenges below.

Common Challenges

  • IT is rarely seen as delivering value to end users, citizens, patients, etc. Often budgets are being reduced but IT are being asked to deliver more, faster. In general, people have higher demands of technology and digital services. Smart phones are now just called phones. Internet-era companies like Amazon, Google, and Netflix provide instant access to products, services, and content. Consumer expectations have shifted and the bar is raised for public services.
  • IT staff are under pressure to maintain infrastructure hardware and software. There are more vulnerabilities being exposed, and targeted cyber attacks, than ever before, which means constant security patching and fire-fighting. I’d like to add that it means more systems being architecturally reviewed and improved, but the reality is that most IT teams are still reacting. Running data centres comes with an incredible operational burden.
  • Understanding new technologies well enough to implement them confidently requires time and experience. There are more options than ever for infrastructure; on-prem, in the cloud, at the edge, managed services – Platform as a Service (PaaS), Infrastructure as a Service (IaaS). Furthermore applications are no longer just monolithic or 3-tier, they are becoming containerised, packaged, hybrid, managed – Software-as-a-Service (SaaS). IT teams are expected to maintain and securely join up all these different services whilst repurposing existing investments in supporting software and technical knowledge.
  • Business models are changing at pace, successful organisations are able to react quickly and make use of data to predict and understand their customers and consumers. The emergence of smart cities and smart hospitals can improve public services and enable cost-savings, but needs to be delivered on a strong digital foundation with fast, reliable connectivity. This approach requires joined up systems that share a secure, scalable, and resilient platform. In an ideal world applications and data should be abstracted from the underlying infrastructure in a way that allows them to securely move or be redeployed with the same policies and user experience, regardless of the hardware or provider. Legacy hardware and older systems are mostly disjointed, built in silos, with single points of failure and either non-existent or expensive business continuity models.
  • Innovation typically takes longer when the risk extends beyond monetary value. The ideas of agile development and fail-fast experimentation will naturally be challenged more for public facing services. A 999 operator locating a specialist hospital for an ambulance response unit cannot afford unpredictability or instability because developers and engineers were failing-fast. Neither can a family dependent on a welfare payment system. In environments where services are stable and reliable there is less appetite for change, even when other areas of the organisation are crying out for fast and flexible delivery.

Cloud Migration Strategies

Greater economical and technical benefits can be achieved at scale. Hyperscalers have access to cheaper commodity hardware and renewable energy sources. They are able to invest more in physical security and auditing. Infrastructure operations that are stood up and duplicated thousands of times over across the UK by individual public sector organisations can shift to the utility based model of the cloud, to free up IT staff from fire-fighting, and to be able to focus on delivering quality digital services at speed.

There are 7 R’s widely accepted as cloud migration strategies. These are listed below with a particular focus on relocate. Whilst a brand new startup might go straight into a cloud-native architecture by deploying applications through micro-services, those with existing customers and users have additional considerations. Migrating to the cloud will in most cases use more than one of the options below. Implementing the correct migration strategy for existing environments, alongside new cloud-native services, can reduce the desire for people to use shadow IT. Finding the right balance is about understanding the trade-off between risk, cost, time, and the core organisational drivers mentioned earlier.

  1. Retire. No longer needed – shut it down. Don’t know what it is – shut it down. This is a very real option for infrastructure teams hosting large numbers of Virtual Machines. VM sprawl that has built up over the years could surprise you.
  2. Retain. Leaving on-premises. This doesn’t necessarily mean doing nothing. In the most part your existing applications should run in the cloud. A requirement for applications that need to be closer to the action has progressed edge computing. Hardware advancements in areas like Hyper-Converged Infrastructure (HCI) enable high performance computing with single socket small footprints, or withstanding higher operating temperatures for locations away from data centre cooling. The key is to maintain that common underlying infrastructure, enabling service deployment in the cloud or at the edge with consistent operations and technologies.
  3. Repurchase. For example changing an on-premises and self-maintained application to a SaaS alternative. This could be the same product in SaaS form, or a competitor. The main technical consideration now becomes connectivity and how the application is accessed. Focus is generally shifted away from the overall architecture of the application itself, and more into transitioning or onboarding users and importing data.
  4. Rehost. Changing a Virtual Machine to run on a different hypervisor. This could be a VMware or Hyper-V VM, converted to run on a cloud providers hypervisor as a particular instance type. This can be relatively straight forward for small numbers of Virtual Machines, but consider other dependencies that will need building out such as networking, security, load balancing, backups, and Disaster Recovery. Although not huge, this potential change in architecture adds more time, complexity, and risk, as the size of the environment grows.
  5. Replatform. Tweaking elements of an application to run as a cloud service. This is often shifting from self-hosted to managed services, such as migrating a database from a VM with an Operating System to a managed database service. Replatform is a common approach for like-for-like infrastructure services like databases and storage.
  6. Refactor. The big bang. Rearchitecting an entire application to run as a cloud-native app. This normally means rewriting source code from scratch using a micro-services architecture or serverless / function based deployment. Infrastructure is deployed and maintained as code and can be stateless and portable. A desirable end state for modern applications.
  7. Relocate. Moves applications and Virtual Machines to a hyperscaler / cloud provider without changing network settings, dependencies, or underlying VM file format and hypervisor. This results in a seamless transition without business disruption.

Why Relocate Virtual Machines?

Relocating Virtual Machines is a great ‘lift-and-shift’ method for moving applications into the cloud. To get the most value out of this migration strategy it can be combined with one or more of the other approaches, generally replatforming some of the larger infrastructure components like database and file storage, or refactoring a certain part of an application; a component that is problematic, one that will provide a commercial or functional benefit, or that improves the end user experience. By auditing the whole infrastructure and applying this blueprint we can strike the right balance between moving to the cloud and protecting existing services.

For existing VMware customers, VMware workloads can be moved to AWS (VMware Cloud on AWS), Azure (Azure VMware Solution), Google Cloud (Google Cloud VMware Engine), as well as IBM Cloud, Oracle Cloud, and UK based VMware Cloud Provider Partners without changing the workload format or network settings. This provides the following benefits:

  • Standardised software stack – A Software-Defined Data Centre (SDDC) that can be deployed across commodity hardware in public and private clouds or at the edge, creating a common software-based cloud infrastructure.
  • Complete managed service – The hardware and software stack is managed infrastructure down, removing the operational overhead of patching, maintenance, troubleshooting, and failure remediation. Data centre tasks become automated workflows allowing for on-demand scaling of compute and storage.
  • Operational continuity – Retain skills and investment for managing applications and supporting software (backups, monitoring, security, etc.). Allowing for replacing solutions and application refactoring to take place at a gradual pace, for example when contracts expire, and with a lower risk.
  • Full data control – The Virtual Machine up is managed by the customer; security policies, data location (UK), VM and application configuration, providing the best of both worlds. Cloud security guardrails can be implemented to standardise and enforce policies and prevent insecure configurations. These same policies can extend into native cloud services and across different cloud providers using CloudHealth Secure State.
  • Sensible transformation – Although a longer term switch from capex investment to opex expenditure is required, due to the on-demand subscription based nature of many cloud services, dedicated hardware lease arrangements in solutions like those listed above can potentially be billed as capital costs. This give finance teams time to adapt and change, along with the wider business culture and processes.
  • Hybrid applications – Running applications that make use of native cloud services in conjunction with existing components, such as Virtual Machines and containers, supports a gradual refactoring process and de-risks the overall project.
Azure VMware Solution Basic Architecture
Example application migration and modernisation using Azure VMware Solution

To read more about the information available from the Government Digital Service and other UK sources see Helping Public Sector Organisations Define Cloud Strategy.

If you’re interested in seeing VMware workloads relocated to public cloud check out The Complete Guide to VMware Hybrid Cloud.

Featured image by Scott Webb on Unsplash

Bridging the Gap Between NHS and Public Cloud with VMware Cloud on AWS

Following on from How VMware is Accelerating NHS Cloud Adoption, this post dives into more detail around how the UK National Health Service (NHS) can use VMware Cloud on AWS to bridge the gap between existing investments and Public Cloud.

Part 1: How VMware is Accelerating NHS Cloud Adoption

Part 2: Bridging the Gap Between NHS and Public Cloud with VMware Cloud on AWS

Example NHS VMware Cloud on AWS Use Cases

Modern Applications: The VMware strategy of late has seen a significant shift towards cloud-agnostic software and the integration of cloud-native application development. VMware Cloud on AWS makes use of the full VMware Software-Defined Data Centre (SDDC) stack; enhancing the security of NHS applications with micro-segmentation, and future-proofing application development with Project Pacific (Understand VMware Tanzu, Pacific, and Kubernetes for VMware Administrators).

Data Centre Expansion or Disaster Recovery: VMware Cloud on AWS can reduce NHS data centre footprint on-premise, by expanding new capacity into VMware Cloud on AWS (Deploy and Configure VMware Cloud on AWS), or through the addition of a Disaster Recovery (DR) site accompanied with VMware Site Recovery Manager (SRM). Legacy Data Centre Evacuation: VMware Cloud on AWS can replace legacy data centres by facilitating the migration of VMware Virtual Machines (VMs) from end of life hardware to VMware Cloud on AWS (Migrate VMware Virtual Machines to VMware Cloud on AWS). In some cases, dependant on internal finance policies, NHS organisations may be able to capitalise the cost of reserved instances (dedicated physical hosts for 1 or 3 years) in VMware Cloud on AWS using recently introduced IFRS 16 Leases. For more information, review the Capitalising Your Cloud Booklet.

Hosting NHS Patient Data: There are several security principles which should be implemented to host patient or sensitive data, further information is available on the NHS Digital website. Important detail on the shared security model of Public Cloud, and other NHS, VMware, and AWS specific links can be found in the How VMware is Accelerating NHS Cloud Adoption article, as well as VMware Cloud on AWS Security One Stop Shop. A summary excerpt is below:

“In January 2018 NHS Digital released guidance for NHS and social care data: off-shoring and the use of public cloud services, along with a toolset for identifying and assessing data risk classification. The NHS and social care data: off-shoring and the use of public cloud services guidance paper published by NHS Digital states; ‘NHS and social care organisations can safely put health and care data, including non-personal data and confidential patient information, into the public cloud’. The NHS and social care providers may use cloud computing services for NHS data, providing it is hosted in the UK, or European Economic Area (EEA), or in the US where covered by Privacy Shield.”

“Each individual data controller organisation is responsible for implementing and reviewing their own processes around data risk classifications, however to assist NHS Digital have provided a consistent health and social care data risk model. For organisations that do not yet have cloud governance in place NHS Digital have also provided guidance on the health and social care cloud risk framework.

Cloud services introduce a shared security model. NHS organisations can be compliant by implementing a cloud risk framework and proportionate controls outlined by NHS Digital; summarised in the health and social care cloud security one page overview. Security considerations for different data classifications are detailed in the health and social care cloud security – good practice guide.”

The deployment of any native AWS services should follow best practices outlined in the Security Pillar White Paper of the AWS Well-Architected Framework. VMware Cloud on AWS can make up part of a more comprehensive cloud framework, read more about multi-account and VPC management at Building AWS Environments for VMware Cloud Customers.

Moving to Internet First: As well as the Cloud First strategy outlined in the article referenced above, the UK Government also seeks to make public sector applications, systems, and services accessible over the Internet, with the Internet First strategy. VMware Cloud on AWS can utilise existing on-premise Health and Social Care Network (HSCN) connections, but can also offer the ideal opportunity to move services to Internet-facing. This can be supported with the correct network design, and through making use of native AWS services. There is more information below on how VMware Cloud on AWS complements Internet First, and further reading on the NHS Digital Internet First policy can be found here.

“Health and care services now have an Internet First policy that states new digital services should operate over the internet. Existing services should also be updated to do the same at the earliest opportunity and ideally by March 2021.”

Example Native AWS Service Integrations

In the example architecture below a Stretched Cluster has been deployed across 2 AWS Availability Zones in the London region (eu-west-2), providing VMware Virtual Machine (VM) availability across sites and fault domains. Amazon Direct Connect provides a private link from on-premise networks and should be deployed with resilience, a standby Virtual Private Network (VPN) encrypted connection can also be used. To see these features in action review Watch VMware vSphere HA Recover Virtual Machines Across AWS Availability Zones, and Watch a Failover from Direct Connect to Backup VPN for VMware Cloud on AWS. Optional access to the Health and Social Care Network (HSCN) is provided by the existing on-premise HSCN connection.

Example_VMC

Focusing on the VMware Cloud on AWS connectivity into native AWS services from the example architecture, we can note the following:

  • Connectivity to native AWS services is provided using Elastic Network Interfaces (ENI), a 25Gbps link into Amazon’s backbone network.
  • Traffic traversing the ENI (ingress and egress) is not chargeable. Any deployed services in AWS are chargeable as usual against the connected AWS account.
  • Using a Virtual Private Cloud (VPC) Endpoint NHS organisations can make use of additional services such as Simple Storage Services (S3), which offers a tiered approach to object storage and pricing, or Glacier for data archive.
  • Using the Virtual Private Cloud (VPC) router NHS organisations can make use of services such as Elastic Compute Cloud (EC2), or managed databases with Relational Database Service (RDS).
  • See AWS Native Services Integration With VMware Cloud on AWS to understand more about VMware Cloud integration with native AWS.

An example scenario could be an on-premise application with a large database which does not have the development resource or funding to refactor for native Public Cloud. It could also be that refactoring this application doesn’t offer any additional business benefit or functionality. In this case, the database could be migrated to RDS, and the front end web/application servers could be migrated ‘as is’ to run on VMware Cloud on AWS. Using the 25Gbps ENI would, in most cases, remove any latency concerns between the application and the database.

It is important to remember that it isn’t only the consumption of traditional infrastructure services that are on offer. Opening up existing workloads to native AWS services drives innovation and modernisation of applications. One example is Amazon’s Artificial Intelligence (AI) powered voice assistant Alexa, which now gives health advice using information from the NHS website. In addition to AI and Machine Learning, AWS has a portfolio of data lakes and analytics services, enabling cost-effective methods for NHS organisations to collect, store, analyse and share data.

Example_Native

In the case of Internet First, VMware Cloud on AWS in conjunction with native AWS, can help scale and consolidate publicly available applications, as documented in VMware Cloud on AWS Reference Architectures. In one such example, the following AWS services are used to facilitate public services hosted in VMware Cloud on AWS:

  • Amazon Route 53 is a highly available and scalable cloud Domain Name System (DNS) web service for name resolution.
  • Elastic Load Balancing automatically distributes incoming application traffic across multiple targets. The Application Load Balancer is best suited for load balancing of HTTP and HTTPS traffic operating at the individual request level (Layer 7).
  • AWS Certificate Manager is a service that lets you easily provision, manage, and deploy public and private SSL/TLS certificates for use with AWS services and your internal connected resources.

Additional optional services for performance and security:

  • Amazon CloudFront is a fast Content Delivery Network (CDN) service that securely delivers data, videos, applications, and APIs to customers with low latency, high transfer speeds.
  • AWS Shield is a managed Distributed Denial of Service (DDoS) protection service that safeguards applications running on AWS.
  • AWS WAF is a Web Application Firewall that helps protect your web applications from common web exploits that could affect application availability or compromise security.
  • AWS CloudTrail is a service that enables governance, compliance, operational auditing, and risk auditing of your AWS account.

VMC_ELB

Further Reading: How to Deploy and Configure VMware Cloud on AWS (Part 1), How to Migrate VMware Virtual Machines to VMware Cloud on AWS (Part 2).

VMware Cloud on AWS FAQs | Resources | Documentation | Factbook | Evaluation Guide | On-Boarding Handbook | Operating Principles

How to Migrate VMware Virtual Machines to VMware Cloud on AWS

This post pulls together the workload migration planning and lessons learned notes made during a real-life customer use case of evacuation an on-premise data centre to VMware Cloud (VMC) on AWS (Amazon Web Services). The content is a work in progress and intended as a generic list of considerations and useful links for both VMware and AWS, it is not a comprehensive guide. Cloud, more-so than traditional infrastructure, is continuously changing. Features are implemented regularly and transparently so always validate against official documentation. This post was last updated on September 16th 2019.

Part 1: SDDC Deployment

Part 2: Migration Planning & Lessons Learned

See Also: VMware Cloud on AWS Security One Stop Shop

1. Virtual Machine Migrations

The following points should help with the planning of Virtual Machine (VM) workload migrations to VMware Cloud on AWS. An assumption is made that the Software-Defined Data Centre (SDDC) is stood up and operational with monitoring, backups, Anti-Virus, etc. in place. Review Part 1: SDDC Deployment for more information. I found the SDDC deployment and getting the environment available was the easy part. Internal processes and complexity of the existing environment are going to determine how quickly you can migrate workloads to the SDDC.

We started by exporting a list of Virtual Machines from each vCenter, from that we identified the service it was running and the service owner or a business owner. The biggest surprise here was the number of servers deployed by, or for, people who had left the organisation. These servers were still being hosted, maintained, patched, but no longer needed. We were able to decommission more workloads than expected due to years of VM sprawl. While VMware Cloud on AWS isn’t directly responsible for this, the project forced us to evaluate each server we hosted. For remaining workloads, we put together a migration flow which identified the following criteria:

  • CPU, RAM, storage requirements: specified a baseline to automatically accept and then anything above our baseline would require a manual check.
  • Network dependencies: is there a large amount of data in transit, is IP retention required, is the VLAN stretched using Hybrid Cloud Extension (HCX), load balancer requirements.
  • Data flows: used vRealize Network Insight to identify potential egress costs and additional service dependencies.
  • Additional application or organisation specific considerations: e.g. data classification, tagging / charge-back model, backups, security, monitoring, DNS, authentication, licensing or support.
  • Service Management considerations: is the service platinum/gold/silver/bronze or unclassified, do the platform Service Level Agreements (SLAs) fulfil the existing SLAs in place for each service, is the proposed migration type (i.e. the amount of downtime) taking this into consideration. Involving Service Management right from the start was useful as they were able to advise on internal processes for Service Acceptance and Business Continuity.
  • Service Owner considerations: if the technical criteria above are met then the next step was to meet with service owners and get their buy-in for the migration. We migrated internal services we owned first and then used that as a success story to onboard other services. This process involved meeting with various departments, presenting the solution and the benefits over their existing hosting, in our case DR and performance improvements, and migrating dev or test workloads first to build confidence.
  • Migration passport: one of our Senior Engineers came up with this concept as a one-pager for each service that was migrated, it consisted of migration details (change ID, date, status), migration scope (server names, locations, and notes), firewall rules, vRNI outputs, and other information such as associated documentation.

Each environment is different, so these are provided as example considerations only. Use resources such as those outlined below, and, to develop your own migration strategy.

Workload_Mobility

2. Network Design

  • Updated Feb 2020 – see also AWS Native Services Integration With VMware Cloud on AWS
  • Research the differences and limitations around the different VMware on AWS connection types, especially under 1Gbps – Configuring AWS Direct Connect with VMware Cloud on AWS
    • Make sure you understand the terminology around a Virtual Interface (VIF) and the difference between a Standard VIF, Hosted VIF, and Hosted Connection: What’s the difference between a hosted virtual interface (VIF) and a hosted connection? It is important to consider that VMware Cloud on AWS requires a dedicated Virtual Interface (VIF) – or a pair of VIFs for resilience. If you have a standard 1Gbps or 10Gbps connection direct from Amazon then you can create and allocate VIFs for this purpose. If you are using a hosted connection from an Amazon Partner Network (APN) for sub-1G connectivity then you may need to procure additional VIFs, or a dedicated Direct Connect with the ability to have multiple VIFs on a single circuit. This is a discussion you should have with your APN partner.

  • The Virtual Private Cloud (VPC) provided by the shadow AWS account cannot be used as a transit VPC. In other words, if you want to connect to private IP addressing of native AWS services, you cannot hop via VMware Cloud. In this instance, a Transit Gateway can be used.
  • At the time of writing a VPN attachment must be created to connect the SDDC to a Transit Gateway, if Direct Connect is in use, then the minimum requirement is 1Gbps.
  • If there is a requirement to connect multiple existing AWS VPCs, or multiple SDDCs, with on-premise networks then definitely check out VMware Cloud on AWS with Transit Gateway Demo.
  • If a backup VPN is in use, then you may be able to reduce failover time using Bidirectional Forwarding Detection (BFD) which is automatically enabled by AWS, in our case, it was not supported by our third-party provider.
  • Use vRealize Network Insight to get an idea of dependencies and data flows that you can use to plan firewall rules and estimate egress or cross-AZ charges. In general, my experience with these charges is that they have been minimal, this depends entirely on your own environment but should be considered when calculating overall VMware on AWS pricing.
  • If you want to update your default route see How to Set the Default Route in VMware Cloud on AWS: Part 1 & Part 2.
  • VMware Cloud on AWS: NSX Networking and Security eBook

3. Load Balancing & Security

  • Update Feb 2020 – see also VMware Cloud on AWS Security One Stop Shop
  • With the acquisition of Avi Networks, we can expect Avi Networks services as a paid add-on for VMware Cloud: VMware Cloud on AWS: NSX and Avi Networks Load Balancing and Security.
  • Third-party load balancers such as virtual F5 can be deployed in a virtual appliance format. If you are planning on using AWS Elastic Load Balancer (ELB) on a private IP address accessible on-premise ensure you have a connectivity method as outlined above.
  • The NSX Distributed Firewall (DFW) feature is included in the price of VMware Cloud, the paid-for message is removed from SDDC v1.8 onwards, this was announced at VMworld 2019.
  • Another VMworld 2019 announcement was the inclusion of syslog forwarding with the free version of VMware Cloud Log Intelligence (SaaS offering for log analytics). However, for troubleshooting NSX DFW logs you still need the paid-for version.
  • If you are using HCX, this product uses its own IPSec tunnel and therefore we could not get it working with the private IP address over a backup VPN. It was assumed that HCX would also not work with the private IP address via Transit Gateway either, due to the SDDC VPN requirement, and would need to be reconfigured to use the public IP address.
  • Another HCX migration consideration is that when you are stretching a network, all traffic goes via the HCX Interconnects. This means you are encapsulating everything in port UDP 4500, and essentially bypassing your on-premise firewall rules while the network is stretched. It is essential to double-check all rules are correct before eventually moving the gateway to VMC.
  • Again if you are doing VMware HCX migrations, remember to remove stretched networks once complete. This involves shutting down the gateway on-premise, removing the L2 stretch, and changing the network in the SDDC to routed, for us the downtime was around 30 seconds. The deployment of HCX in our environment, although covered by vSphere High Availability (HA), didn’t have resilience built-in; therefore we decided to minimise the amount of time they were in use by planning a migration strategy around each subnet.
  • If you use NSX Service Deployments for Anti-Virus, i.e. Guest Introspection for agentless AV then you will need to deploy an agent on each VM, as this feature is still currently unavailable.

 

4. General

  • The Cloud Services Portal (CSP) can be integrated with enterprise federation, allowing you to control access using your organisational policies, hopefully, therefore, enforcing Multi-Factor Authentication (MFA) and removing access as part of a leavers process. Federation will only work with a tenant, it will not work with a master organisation.
  • It is not possible at the time of writing to easily transfer an SDDC deployed in the root/master organisation into a tenant. The process currently is a redeploy and migrate.
  • Druva offers a product that will backup Virtual Machines from VMware Public Cloud direct into an S3 bucket they manage, for a greenfield deployment if you are not transferring any existing licenses this could be a good option as you only pay for the capacity you use. Having a backup environment setup in AWS has many benefits but also adds a management overhead and the consideration of replicating between Availability Zones.
  • In general internal support was good once teams were educated on the platform and the slightly different operating model we were implementing. In terms of external support, we have not encountered any compatibility issues yet. There was one application vendor with a published KB article stating they support running the application on VMware Cloud on AWS, who then retracted support stating the vSphere version being run was not GA.