The global impact of the novel coronavirus (aka COVID-19) has yet to be fully felt, but the number of businesses canceling travel and shifting their workers to remote work-from-home situations is already starting to expose gaps in many pandemic and disaster recovery plans – particularly around shadow IT and the use of consumer-grade technologies in the business environment.
Like many issues, these concerns are best met through the hard work of planning, testing and communicating. However, unlike other situations, we are fortunate in that the recent move toward cloud-based services, self-service portals and user monitoring also provide the tools we need to address the issues currently being discovered and explored.
As COVID-19 continues to make inroads in the U.S., most businesses are focusing their initial crisis response on:
- Canceling all unnecessary business travel.
- Encouraging workers to cancel personal travel.
- Encouraging workers to work from home wherever COVID-19 appears to be spreading.
- Reminding people to practice basic hygiene (see the World Health Organization’s Basic Protective Measures Against the New Coronavirus).
However, the devil is in the details, especially when it comes to enabling workers to work from home.
Remote Work Considerations
When you shift a majority of your workforce to remote working, you face four primary issues:
- System availability
- Worker availability
Many organizations set up remote access systems and then have the workforce use them at some point in the future. This creates an access issue because workers often skip testing and will only discover missing applications and files when it’s time for them to actually use the systems.
On the availability side, we see concerns both with how remote access systems scale when a large percentage of the workforce uses them, and whether the critical people in the organization are affected by whatever issue is forcing the work-from-home scenario.
Lastly, there is the tendency of people to simply work around any problems they find using easily available technology.
Addressing Access Concerns
Classic remote access designs either leverage a company-assigned laptop or a dedicated remote desktop solution like Citrix or virtual desktop infrastructure (VDI). When a company-assigned laptop is also the workers’ normal workstation, issues largely involve virtual private network (VPN) connectivity and access to any IP-whitelisted services that may be in use.
Some organizations configure split routing for VPNs, which can provide access to company resources, but allow workers to access the internet directly. This design reduces the load on the VPN terminators, but blocks employee access to IP-whitelisted sites unless those sites are explicitly defined in the VPN configuration.
When companies use a Citrix or VDI solution, the core issue is more of changing business practices. It is not uncommon for a recovery solution to be used well after it was set up, putting workers in a position where the icons and bookmarks they expect are missing or altered.
The best way to address such issues is periodic real-world testing. While it is common to do “bubble testing” for disaster recovery, it’s too limited and can provide a false sense of security. However, if the disaster recovery solution is the exact same as the remote working solution and workers are encouraged to work remotely one day a week or one week per quarter, the DR configuration – be it VPN-based or a remote workspace – will naturally be kept up to date with business requirements.
Addressing System Availability Concerns
Another issue commonly discovered when large groups of people begin to work remotely is system availability. System availability problems can occur at the network level – at either end of the connection – as well as at the system level.
Fortunately, real-world testing also helps identify whether any local availability issues may exist (such as family members consuming bandwidth by streaming movies or games).
To identify whether system-level or network-level availability concerns may exist at the business end, it is important to:
- Schedule periodic, real-world remote work tests with large groups of workers. By comparing system and network load in such situations to times when fewer workers are remote, it is possible to model the resources needed should all workers work remotely.
- Once the ideal resource level is known, multiply it by 1.2. This adds a 20 percent safety overhead, at which point you can explore available options.
Many organizations may find permanently adding such capacity to their systems is cost-prohibitive. In those cases, it may be possible to invest in bursting, where your internet service provider (ISP) or cloud provider allows systems to run over their contracted levels for short periods of time.
This time can then be used to take new measures and provision the actual levels needed during a disaster – presuming your business partners are relatively unaffected by the same issues facing you and are able to respond quickly.
Addressing Worker Availability Concerns
In a pandemic situation, you must plan for the possibility that critical people may be unavailable – either due to their own health issues or because they are caring for someone else affected by the pandemic. To prepare for these situations, it is wise to:
- Engage in a periodic skill mapping throughout the organization. Individuals with critical skills should be identified and placed into cross-training programs to develop internal redundancy should one person become unavailable.
- Identify key contracting/consulting partners. Look for partners who can provide requisite skills should subject matter experts become unavailable.
- Consider life insurance policies. There is no getting around the fact that pandemics are unpredictable and indiscriminate. There is the unfortunate possibility that some individuals won’t survive the disaster. While it’s important to hope for the best, it’s also wise to plan for the worst. Taking out life insurance policies on critical employees is a cost-effective way to ensure sufficient funds exist should the worst happen and you need to identify, hire and train replacements. This is particularly true should executives be affected by the disaster and executive search firms be required.
When cut off from physical corporate environments, people tend to solve problems on their own. If they can’t reach team members or management and are faced with a critical deadline, they will solve it with whatever tools they have available – including Gmail, Facebook Messenger and Dropbox. They likely already connect with work friends via social media; if normal office communications are unavailable, they will simply reach out and collaborate in the ways they know best.
The other common set of workarounds are those created by management. If a worker brings a critical issue to management along with a solution, managers often simply accept the solution and deal with the consequences later. This tendency means that during a disaster, new cloud services and software often pop up via shadow IT.
Addressing these tendencies requires a two-fold approach:
- Focus on consumer-like messaging services. Early adoption of consumer-like secure technologies within the business is critical if you want to avoid people selecting weaker options during a disaster. Leveraging applications within Office 365, such as Microsoft Teams for ad hoc instant messaging, forestalls workers who would otherwise start using Facebook Messenger, WhatsApp, Twitter and other unsanctioned options.
- Focus on consumer-like file storage services. Consider implementing applications like Office 365 OneDrive or Google Drive. Make it easy for workers to access them from anywhere – hopefully through a VDI type of environment. This reduces the tendency for workers to use non-sanctioned applications like Dropbox, YouSendIt, WeTransfer, etc.
Planning for Failure
When a business is forced to fragment itself into an untold number of independent parts, each of which interacts with others in unforeseen ways, what is leadership to do?
Fortunately, just as people have the technology they need to deal with the unexpected, we have both technology and operational practices we can leverage to keep things from spiraling out of control (as much) and rein things back in once the disaster has passed.
Prior to any disaster that forces a change in operations, be sure to proactively:
- Test the technology. Properly planning for such a situation requires focused, department-by-department training and testing of remote access solutions. It will only be successful if executive management properly prioritizes it and if IT and security approach the issue as true partners with the business. They can’t be perceived as trying to shoe-horn processes into artificial constraints.
- Focus on department-level workflows. Begin the process by determining exactly how each department functions, including:
- Data flow mapping
- Reviews of preferred applications
- Browser bookmarks
- Email contact lists
- Instant messaging technologies, and anything workers use on a daily, weekly or monthly basis
Each of these elements must be considered, represented within the recovery environment and easy for workers to find/use.
Focus on Support
Of course, proactive approaches never fully cover all the possibilities. It is also imperative to design recovery environments to be supportable. At bare minimum, this means:
- Designing methods by which IT workers can share the screens of the individuals having problems.
- Having multiple methods by which people can reach IT.
It’s also a good idea to implement a revised prioritized queueing system, since priorities can shift significantly when people are working through a disaster experience. For example, if IT provisions a critical tool with the proper name of “WidgetCorp Combobulator 2.4.1,” workers might just call it “Combo” because that is how the icon is shortened when displayed on their desktop.
When stressed workers finally figure out how to reach IT support, after digging up the instructions they haven’t looked at in over a year, figuring out how to use the VPN and access email – both of which likely involved remembering passwords they saved somewhere at the office – the last thing they will have patience for is IT not knowing what “Combo” is.
As stressful as remote working during a disaster scenario can be for IT staffers, they at least have the advantage of knowing how the system is supposed to work, what the components are and how to troubleshoot processes. Non-technical workers often feel cast adrift in a mostly unfamiliar environment, while feeling pressure from their management to get work done – all while also likely worrying about themselves and their loved ones.
The best thing you can do to defray these tensions is to learn how the business works from each department’s perspective and invest in technology so that screen mirroring is possible.
Also, for environments that lack a remotely available password self-service portal, we recommend generating a policy exception and disabling password rotation for the period of the pandemic to help prevent unrecoverable password lockouts.
Once the disaster is over and recovery processes are mostly complete, it is critical to engage in post-mortem meetings with a large number of individuals to identify what worked, what didn’t work so well and – most importantly – what teams did to work around perceived limitations of remote work.
Be sure to set the stage so that no blame is placed on anyone and everyone is working toward the same goal of managing risk and improving the recovery process. It is far better to be told that several departments used their credit cards to get three different Dropbox for Business subscriptions, a new Office 365 instance and Google Drive than it would be to discover each issue individually over the course of the next six months.
You can also work closely with accounting to catch this ad hoc use of technology. They can help review new expense reimbursement forms as well as any new vendors signed up during the disaster and the months directly following.
Lastly, it is wise to create an abbreviated fast-track vendor approval process to address situations where new vendors unexpectedly become critical to new business processes. The last thing you want is to artificially slow a recovery by forcing creative problem-solvers to work through an unnecessary bureaucratic process just to keep doing what they already believe is successful.
The post-mortem is an ideal time to look at existing workflows and identify what, if anything, can be moved to a 100 percent cloud environment. Any service your workers might access from the office in the same way when traveling or working from home is one more service IT does not need to worry about during a disaster.
While leveraging cloud technology this way brings its own risks, reviewing the data being shared and the security around how your teams work can significantly improve how you work today – and during a disaster.
Business Continuity and the Corona Virus: Know Where to Focus, March 2, 2020
Pandemic Impact Analysis Template, March 9, 2020
Any views or opinions presented in this document are solely those of the Faculty and do not necessarily represent the views and opinions of IANS. Although reasonable efforts will be made to ensure the completeness and accuracy of the information contained in our written reports, no liability can be accepted by IANS or our Faculty members for the results of any actions taken by the client in connection with such information, opinions, or advice.