Over the past 12 months we have seen further growth within the cloud, as many organisations scale or create new digital services in response to the coronavirus pandemic. Improved speed and agility has allowed businesses to pivot where traditional siloed infrastructure may have caused them to stall.
As the usage of cloud services expands, standardising and consolidating cloud tooling becomes important for financial management, operational governance, and security and compliance. Visibility into distributed system architectures across many accounts or subscriptions, or even multi-cloud, is another key challenge. For some customers cloud workloads are not optimised or configured to best standards, many will spend more than their anticipated budget, and others may accidentally expose data or services.
Those with an established cloud strategy may decide to implement a Cloud Centre of Excellence (CCoE); responsible for cloud operations, security, and financial management. The CCoE will navigate the security and configuration landscape of cloud assets, automating response and remediation to configuration drift or threats. As the team grows in maturity optimisations are made continuously and automatically, inline with the key drivers of the business. This is where CloudHealth comes in.
CloudHealth by VMware is a multi-cloud SaaS solution managing more than $11B of public cloud spend for over 10,000 customers. CloudHealth accelerates business transformation in the cloud by providing a single platform solution for visibility into AWS, Microsoft Azure, Google Cloud Platform, Oracle Cloud Infrastructure, VMware Cloud on AWS, and on-premises VMware based environments. The key functionality is broken down into the 2 products we’ll look at below.
CloudHealth Multicloud Platform
CloudHealth takes data from cloud platforms, data centres, and third party tools for application, security, and configuration management. Data is ingested and aggregated using CloudHealth’s integrated data layer, which performs analysis on usage, performance, cost, and security posture. CloudHealth becomes a single source for multi-cloud management across environments, strengthening security and compliance, consolidating management, and improving collaboration between previously siloed teams of people and tools.
Data and assets can be categorised by tags or other metadata, and viewed in logical business groups known as perspectives . Perspectives provide a breakdown for cost allocation using dynamic groups such as line of business, department, cost centre, or project. The output can be used to identify trends and build dashboards and reports. This approach simplifies financial management, saves time, aids with budgeting and forecasting, and encourages accountability through accurate chargeback or showback.
CloudHealth Cost Dashboard
Whilst visibility is great, to really have a positive impact on operations we need to know what to do with the data collected. CloudHealth presents back cost optimisation recommendations and security risks, but can also carry out remediation actions automatically.
Cost optimisation is where you can save money, using AWS as an example, based on things like; EC2 instances that are oversized or on an inefficient purchase plan, elastic IP addresses or EBS volumes that are not attached to any resources, snapshots that have not been deleted. In the physical on-premises world all of these issues were common as part of VM sprawl, they impacted capacity planning and resource consumption but were mostly hidden or swallowed as part of the wider infrastructure cost. As organisations shift from large capital investments to ongoing revenue and consumption based pricing, oversized or unused resources literally convert to money going out of the door every single month.
CloudHealth Health Check
Recommendations and actions are where CloudHealth carries out remediation for incorrectly configured or under-utilised resources. Policies can also be used to define desired states and ensure operational compliance. For example, an organisation may want to report on untagged resources, connected accounts, or open ports. The number of available actions currently appears to only cover AWS and Azure, but with support recently added for Oracle Cloud Infrastructure, and Google Cloud Platform before that, hopefully this functionality will continue to be built out.
CloudHealth Remediation Actions
At the time of writing CloudHealth is priced based on cloud spend, and can be purchased as a 1, 2, or 3 year prepaid commitment, or variable pricing based on the previous months cloud spend. A free trial is available to uncover ROI in your own environment from CloudHealth here.
Where VMware environments are in use with vRealize Operations, the CloudHealth management pack for vRealize Operations can be installed. Bringing CloudHealth dashboards and prospects into vROps allows IT ops teams to track on-premises infrastructure and public cloud costs from a single interface. The CloudHealth management pack for vROps can be downloaded from the VMware Marketplace, instructions are here.
CloudHealth Secure State
By default CloudHealth provides real-time information on security risk exposure, but for deep-dive visibility and remediation those who are serious about security will want to look at Secure State. CloudHealth Secure State is available with CloudHealth or standalone, and currently supports AWS, Azure, and GCP.
Dashboards within CloudHealth Secure State enable at-a-glance checks on security posture and compliance. There are over 700 built-in security rules and compliance frameworks that can be used as security guardrails, with the ability to add custom rules and frameworks on top.
As systems become distributed over multiple accounts, subscriptions, or even clouds, the dynamics of securing an organisations assets shift significantly. Previously all services were contained within a data centre, firstly using perimeter firewalls and then with micro-segmentation. IT teams were generally in control and had visibility throughout the corporate network. Nowadays a developer or user responsible for a service can potentially open applications or data to the public, either on purpose or by accident. Cloud security guardrails form an important baseline for security posture and cloud strategy. Security guardrails are made up of critical must-have configurations in policies with auto-remediation actions attached, they help avoid mistakes or configuration drift to ultimately reduce security risk.
CloudHealth Secure State gives further visibility into resource relationships and context, using the Explore UI. Explore enables a powerful model of multi-cloud or account architectures, with visual topology diagrams of complex environments. Cyber security analysts or operations centres can drill down into individual resources with all interoperable components and dependencies already mapped out.
A recent guidance paper published by The Commission for Smart Government urges the UK Government to take action towards transforming public services into intrinsically digital services. The Commission advises the government to move all services to the cloud by 2023.
It is clear from the paper that strong leadership and digital understanding amongst decision makers is incredibly important. This is something I noted when writing this post on defining a cloud strategy for public sector organisations. The cloud strategy should set out how technology supports and delivers the overall organisational goals.
If implemented correctly, cloud computing can maximise security and business benefits, automating and streamlining many tasks that are currently manual and slow. Published by the National Cyber Security Centre in November 2020, the Security Benefits of Good Cloud Service whitepaper provides some great pointers that should be incorporated into any cloud migration strategy.
This article discusses how to achieve a common cloud infrastructure, focusing on brownfield environments where local government, and other public sector organisations like the NHS, need to address some of the challenges below.
Common Challenges
IT is rarely seen as delivering value to end users, citizens, patients, etc. Often budgets are being reduced but IT are being asked to deliver more, faster. In general, people have higher demands of technology and digital services. Smart phones are now just called phones. Internet-era companies like Amazon, Google, and Netflix provide instant access to products, services, and content. Consumer expectations have shifted and the bar is raised for public services.
IT staff are under pressure to maintain infrastructure hardware and software. There are more vulnerabilities being exposed, and targeted cyber attacks, than ever before, which means constant security patching and fire-fighting. I’d like to add that it means more systems being architecturally reviewed and improved, but the reality is that most IT teams are still reacting. Running data centres comes with an incredible operational burden.
Understanding new technologies well enough to implement them confidently requires time and experience. There are more options than ever for infrastructure; on-prem, in the cloud, at the edge, managed services – Platform as a Service (PaaS), Infrastructure as a Service (IaaS). Furthermore applications are no longer just monolithic or 3-tier, they are becoming containerised, packaged, hybrid, managed – Software-as-a-Service (SaaS). IT teams are expected to maintain and securely join up all these different services whilst repurposing existing investments in supporting software and technical knowledge.
Business models are changing at pace, successful organisations are able to react quickly and make use of data to predict and understand their customers and consumers. The emergence of smart cities and smart hospitals can improve public services and enable cost-savings, but needs to be delivered on a strong digital foundation with fast, reliable connectivity. This approach requires joined up systems that share a secure, scalable, and resilient platform. In an ideal world applications and data should be abstracted from the underlying infrastructure in a way that allows them to securely move or be redeployed with the same policies and user experience, regardless of the hardware or provider. Legacy hardware and older systems are mostly disjointed, built in silos, with single points of failure and either non-existent or expensive business continuity models.
Innovation typically takes longer when the risk extends beyond monetary value. The ideas of agile development and fail-fast experimentation will naturally be challenged more for public facing services. A 999 operator locating a specialist hospital for an ambulance response unit cannot afford unpredictability or instability because developers and engineers were failing-fast. Neither can a family dependent on a welfare payment system. In environments where services are stable and reliable there is less appetite for change, even when other areas of the organisation are crying out for fast and flexible delivery.
Cloud Migration Strategies
Greater economical and technical benefits can be achieved at scale. Hyperscalers have access to cheaper commodity hardware and renewable energy sources. They are able to invest more in physical security and auditing. Infrastructure operations that are stood up and duplicated thousands of times over across the UK by individual public sector organisations can shift to the utility based model of the cloud, to free up IT staff from fire-fighting, and to be able to focus on delivering quality digital services at speed.
There are 7 R’s widely accepted as cloud migration strategies. These are listed below with a particular focus on relocate. Whilst a brand new startup might go straight into a cloud-native architecture by deploying applications through micro-services, those with existing customers and users have additional considerations. Migrating to the cloud will in most cases use more than one of the options below. Implementing the correct migration strategy for existing environments, alongside new cloud-native services, can reduce the desire for people to use shadow IT. Finding the right balance is about understanding the trade-off between risk, cost, time, and the core organisational drivers mentioned earlier.
Retire. No longer needed – shut it down. Don’t know what it is – shut it down. This is a very real option for infrastructure teams hosting large numbers of Virtual Machines. VM sprawl that has built up over the years could surprise you.
Retain. Leaving on-premises. This doesn’t necessarily mean doing nothing. In the most part your existing applications should run in the cloud. A requirement for applications that need to be closer to the action has progressed edge computing. Hardware advancements in areas like Hyper-Converged Infrastructure (HCI) enable high performance computing with single socket small footprints, or withstanding higher operating temperatures for locations away from data centre cooling. The key is to maintain that common underlying infrastructure, enabling service deployment in the cloud or at the edge with consistent operations and technologies.
Repurchase. For example changing an on-premises and self-maintained application to a SaaS alternative. This could be the same product in SaaS form, or a competitor. The main technical consideration now becomes connectivity and how the application is accessed. Focus is generally shifted away from the overall architecture of the application itself, and more into transitioning or onboarding users and importing data.
Rehost. Changing a Virtual Machine to run on a different hypervisor. This could be a VMware or Hyper-V VM, converted to run on a cloud providers hypervisor as a particular instance type. This can be relatively straight forward for small numbers of Virtual Machines, but consider other dependencies that will need building out such as networking, security, load balancing, backups, and Disaster Recovery. Although not huge, this potential change in architecture adds more time, complexity, and risk, as the size of the environment grows.
Replatform. Tweaking elements of an application to run as a cloud service. This is often shifting from self-hosted to managed services, such as migrating a database from a VM with an Operating System to a managed database service. Replatform is a common approach for like-for-like infrastructure services like databases and storage.
Refactor. The big bang. Rearchitecting an entire application to run as a cloud-native app. This normally means rewriting source code from scratch using a micro-services architecture or serverless / function based deployment. Infrastructure is deployed and maintained as code and can be stateless and portable. A desirable end state for modern applications.
Relocate. Moves applications and Virtual Machines to a hyperscaler / cloud provider without changing network settings, dependencies, or underlying VM file format and hypervisor. This results in a seamless transition without business disruption.
Why Relocate Virtual Machines?
Relocating Virtual Machines is a great ‘lift-and-shift’ method for moving applications into the cloud. To get the most value out of this migration strategy it can be combined with one or more of the other approaches, generally replatforming some of the larger infrastructure components like database and file storage, or refactoring a certain part of an application; a component that is problematic, one that will provide a commercial or functional benefit, or that improves the end user experience. By auditing the whole infrastructure and applying this blueprint we can strike the right balance between moving to the cloud and protecting existing services.
Standardised software stack – A Software-Defined Data Centre (SDDC) that can be deployed across commodity hardware in public and private clouds or at the edge, creating a common software-based cloud infrastructure.
Complete managed service – The hardware and software stack is managed infrastructure down, removing the operational overhead of patching, maintenance, troubleshooting, and failure remediation. Data centre tasks become automated workflows allowing for on-demand scaling of compute and storage.
Operational continuity – Retain skills and investment for managing applications and supporting software (backups, monitoring, security, etc.). Allowing for replacing solutions and application refactoring to take place at a gradual pace, for example when contracts expire, and with a lower risk.
Full data control – The Virtual Machine up is managed by the customer; security policies, data location (UK), VM and application configuration, providing the best of both worlds. Cloud security guardrails can be implemented to standardise and enforce policies and prevent insecure configurations. These same policies can extend into native cloud services and across different cloud providers using CloudHealth Secure State.
Sensible transformation – Although a longer term switch from capex investment to opex expenditure is required, due to the on-demand subscription based nature of many cloud services, dedicated hardware lease arrangements in solutions like those listed above can potentially be billed as capital costs. This give finance teams time to adapt and change, along with the wider business culture and processes.
Hybrid applications – Running applications that make use of native cloud services in conjunction with existing components, such as Virtual Machines and containers, supports a gradual refactoring process and de-risks the overall project.
Cloud computing services have grown exponentially in recent years. In many cases they are the driving force behind industry 4.0, or the fourth industrial revolution, enabling Artificial Intelligence (AI) and Machine Learning (ML), or the Internet of Things (IoT) powering smart homes and smart cities.
High speed networks are enabling secure data sharing over the Internet, resulting in a shift from compute processing in ones own server rooms or data centres to a central processing plant. Here technology can be agile and highly available whilst taking advantage of economies of scale. In much the same way as our ancestors built their own generators to consume electricity; each factory buying and installing components with specialist staff to keep systems running, before eventually moving to utility based consumption of electricity as a service.
Data sharing and data analytics are at the heart of digital transformation. Successful companies are using data with services consumed over the Internet to innovate faster and deliver value; enhancing user experience and increasing revenue.
It is important for organisations adopting cloud computing to define a cloud strategy; this helps ensure coordination and connectivity of existing and new applications, whilst providing a sustainable delivery method for future digital services. A cloud strategy can assist with standardising security and governance alongside reducing shadow IT sprawl and spiralling costs.
The first step is to have a clear understanding of what the organisation as a whole expects to gain from the consumption of cloud technologies. This isn’t limited to the IT teams but is predominantly about business outcomes, for example improved innovation and agility; faster deployment of products or features, application performance and security enhancements for remote workforce, or simply the change in consumption and charging model.
It may be that a compelling event triggered the cloud focus, such as a security breach, site reliability issue, or major system outage. Reducing carbon emissions is part of the wider corporate strategy for many public sector organisations, and replacing old or inefficient data centre and cooling equipment with hyperscalers generating renewable energy can certainly help. Whatever the reasons, they should be captured and worked into your strategy. Doing so will help identify deliverables and migration assessments for brownfield environments.
Public Sector Cloud First
The UK Government first introduced the Government Cloud First policy in 2013 for technology decisions when procuring new or existing services. The definition being that public cloud should be considered in the first instance, primarily aimed at Software as a Service (SaaS) models, unless it can be demonstrated that an alternative offers better value for money.
During the COVID-19 outbreak, the UK saw unprecedented demand for digital services. The National Health Service (NHS) in particular responded quickly; scaling out the 111 Online service to handle 30 million users between 26 February and 11 August, with 6 million people completing the dynamic coronavirus assessment. The peak number of users in a single day was over 950,000; up 95 times from the previous 10,000 daily average. NHS Digital and NHSmail rolled out Microsoft Teams to 1.3 million NHS staff in 4 days, which would go on to host over 13 million meetings and 63 million messages between 23 March and 5 October. Both of these achievements were made possible virtually overnight by the speed and agility of cloud services.
NHS 111 Online was part of the UK digital response to COVID-19
Cloud Guidance for the Public Sector
Following up on the Government Cloud First policy of 2013, the UK Government released further information in 2017 around the use of cloud first, how to choose cloud computing services for your organisation, how to approach legacy technology, and considerations for vendor lock-in. The guidance reiterates the need to consider cloud computing before other options to meet point 5 of the Technology Code of Practice (use cloud first). The Technology Code of Practice can also feed into your cloud strategy:
Define user needs
Use open source and open standards to ensure interoperability and future compatibility
Make sure systems and data are secured appropriately
More recently, in March 2020, the Government Digital Service published Cloud Guidance for the Public Sector. The guidance is set out in easy to consume chunks with links out to further content for each area. Noteworthy sections include:
People and Skills: the way technical, security, commercial, and financial teams work will change. New processes and skills will be introduced, and people need to be fully informed throughout the process. It is essential that HR are able to recruit and retain the right skillsets, and upskill people through training and development. Roles and responsibilities should be defined, and extended to service providers and teams as the strategy is executed.
Security: the first 2 words in the above guidance paper are key; “Properly implemented”. The overwhelming majority of security breaches in the cloud are due to incorrect configurations. Links are included to the National Cyber Security Centre (NCSC) guidance on cloud security and zero trust principles. Published by the National Cyber Security Centre in November 2020, the Security Benefits of Good Cloud Service whitepaper also provides some great pointers that should be incorporated into any cloud strategy.
Data Residency and Offshoring: each data controller organisation is responsible for their own decisions about the use of cloud providers and data offshoring. The government say you should take risk-based decisions whilst considering the Information Commissioner’s Office guidance. Data offshoring is not just the physical location of the data but also who has access to it, and whether any elements of the service are managed outside of the UK.
Further documentation from the UK Government on Managing Your Spending in the Cloud identifies procurement models and cost optimisation techniques when working with cloud services. It advises that a central cloud operations team, made up of both technical and commercial specialists, is formed to monitor usage, billing, and resource optimisation to reduce costs.
Tools like CloudHealth by VMware help simplify financial management. CloudHealth makes recommendations on cost savings, works across cloud platforms, and crucially provides financial accountability by cost centre. A charging model where internal departments or lines of business pay for what they consume will typically yield reduced consumption and therefore lower costs.
Build management tooling into your cloud framework and aim for consolidated and cloud agnostic tooling. This blog article with Sarah Lucas, Head of Platforms and Infrastructure at William Hill, discusses some best practices for a successful hybrid and multi-cloud management strategy.
Incorporating hybrid and multi-cloud into your strategy can help protect against vendor lock-in, enhance business continuity, and leverage the full benefit of the cloud by deploying applications to their most suited platform or service. Furthermore, having an exit strategy insures against any future price rises, service issues, data breaches, or political changes. The NHS COVID-19 track and trace app for example, was moved between hyperscalers overnight during development. All the more impressive considering it needed to scale securely on a national level, whilst incorporating new features and updates as more virus symptoms and medical guidance was added. This blog article with Joe Baguley, CTO at VMware, outlines the lessons learned developing during a pandemic.
The National Data Strategy
In September 2020 the UK Government published the National Data Strategy. The strategy focuses on making better use of data to improve public services and society as a whole. It does this by identifying the following pillars; data foundations, data skills, data availability, and responsible data. Underpinning the National Data Strategy is a modern infrastructure which should be safe and secure with effective governance, joined-up and interoperable, resilient and highly available. New technology models like cloud, edge, and secure computing enhance our capabilities of providing shared data in a secure manner. The infrastructure on which data relies is defined by the strategy as the following:
“The infrastructure on which data relies is the virtualised or physical data infrastructure, systems and services that store, process and transfer data. This includes data centres (that provide the physical space to store data), peering and transit infrastructure (that enable the exchange of data), and cloud computing that provides virtualised computing resources (for example servers, software, databases, data analytics) that are accessed remotely.“
Section 4.2.1 of the document notes that “Even the best-quality data cannot be maximised if it is placed on ageing, non-interoperable systems“, and identifies long-running problems of legacy IT as one such technical barrier. The theme of this section is that data, and we can extend this to applications, should be independent of the infrastructure it runs on. Some of the commitments outlined are also relevant to cloud strategy and can be used as part of an internal IT governance framework:
Creating a central team of experts ensuring a consistent interpretation and implementation of policies
Building a community of good practice
Learning and setting best practice and guidance through lighthouse projects
Further demonstrating the importance of data, NHSx launched the Centre for Improving Data Collaboration; a new national resource for data-driven innovation. In a blog announcing the new team Matthew Gould, CEO, NHSx, said “Good quality data is crucial to driving innovation in healthcare. It can unlock new technologies, power the development of AI, accelerate clinical trials and enable better interactions with patients“. NHSx are working on a new UK Data Strategy for Health and Social Care expected late 2020, and have also collaborated with Health Education England on the Digital Readiness Programme to support data as a priority specialism in health and care.
The UK National Data Strategy was published in September 2020
NHS Digital Public Cloud Guidance
In January 2018 NHS Digital, along with the Department of Health and Social Care, NHS England, and NHS Improvement, released guidance for NHS and social care data: off-shoring and the use of public cloud services. This national guidance for health and care organisations can also be applied to the wider public sector dealing with personal information. Andy Callow, CDIO, Kettering General Hospital, also makes a great case for the NHS to embrace the cloud in this Health Tech Newspaper article.
As per the Government Cloud Guidance for the Public Sector; each data controller is responsible for security of their data. The NHS Digital guidance outlines a 4-step process for making risk-based decisions on cloud migrations.
Steps 1 & 2 are to understand the data and assess the risks:
The National Data Guardian advises that a Senior Information Risk Owner (SIRO) is involved in the decision-making process and is comfortable with the security arrangements in place, where patient data is being hosted this should also include Caldicott Guardians.
In its review into patient data in the NHS, the Care Quality Commission defines data security as an umbrella for availability, integrity, and confidentiality. With this in mind systems should always be designed with the expectation of failure, across multiple Availability Zones or regions where offshoring policies permit, and with appropriate Disaster Recovery and backup strategies.
As systems and dependencies become cloud based and potentially distributed across multiple providers, more importance than ever is placed on network architecture, latency, and resilience. Software Defined Wide Area Network (SD-WAN) and Secure Access Service Edge (SASE) solutions like VeloCloud by VMware provide secure, high performance connectivity to enterprise and cloud or SaaS based applications.
Where digital services need to be accessed externally using national private networks, like the Health and Social Care Network (HSCN), organisations may consider moving them to Internet facing. This reduces network complexity and duplication whilst making services more accessible and interoperable. According to NHS Digital’s Internet First Policy “new services should be made available on the internet, secured appropriately using the best available standards-based approaches“.
Closing Notes
When writing your cloud strategy document, it should be based on the goals and objectives of the organisation. The strategy document does not necessarily need to define the cloud provider or type of hosting, instead it should set out how you meet or solve your business needs or problems, creating outcomes that have a direct impact on the experience of patients, users, or service consumers.
The strategy should be kept simple and high level enough that all areas of the business are able to understand it. Cloud technology moves fast, and guidance shifts with it, your strategy and policies should be reviewed regularly but the overarching strategy should not require wholesale changes that create ambiguity. Eventually, leaders will need to define lower level frameworks that balance visibility, cost, availability and security, with agility, flexibility, choice, and productivity. These frameworks along with the high-level strategy should be well documented and easily accessible.