Your ERP system is likely the backbone of your business. It will cover multiple departments and encompass many broad and critical business processes, such as supply chains, logistics, fulfilment etc. So when it goes wrong, whether inside or outside of your control, it could be a multibillion dollar problem.
Now, if you are a large company, you hopefully have this under control. If you haven’t, then the consequences are severe – and will probably cost you your career, but here are 5 ways improve resilience of your ERP solution.
In the olden days everything used to be in a server room, and backups and restores were the answer to everything. Unfortunately, in terms of resilience, we now operate in a hybrid on-prem, cloud world, with many applications, components and integration points sharing and spreading data around our global businesses. Multiply that by physical components (firewalls, switches etc) and we have a nightmare waiting to happen. A break in the chain could come from anywhere. Ensure, when you are looking at your contingency/resilience plans you start from an enterprise architecture level and build resilience into ALL aspects of the value chain. It is no use having a mirrored architecture on a database, but yet everything else is standalone, stopping the business processes flowing end to end. Resilience planning is now more complex than ever.
Let’s start simple. We all have backups (right?...), but many organisations fail to ensure these are tested on a regular basis. Importantly, it isn’t enough to test the backup, you also need to test the restore process. We would advise annually as a minimum, but clearly it depends on the criticality of what you are running. You also need to think about the integration impacts of your backup’s and restores – for example, what happens to data that is moved elsewhere around the organisation. How will a restore affect those data points? Will any data be missing? Or incorrectly mapped? What about those critical reference points – unique keys that are common across several systems?
Linked to the above, all good organisations have a process they default to if their technology fails. Typically, this is a paper based approach, but ensure you test both the process itself and the supporting processes around it. For example, who authorises the switch to manual? Does everyone know how to operate the contingency process? Do the manual processes capture all the right data for online entry when the situation is over? Who authorises the ‘return to normal’ process? How long is manual sustainable for? At what point do volumes trigger ‘more people are needed to operate’ or ‘more people are needed to return to BAU?’
Ensure you understand the metrics around your resilience processes. In a crisis everyone drops into a survival mode and emotions run high. Combat this by managing through facts and figures. The challenge here is knowing what facts are figures you should be monitoring in and around your business in contingency mode. For example, how long does a restore from backup take? What does the restore facilitate (i.e. part functionality etc), how long can certain departments cope on contingency plans? (And what is the impact after that?) What are the service level agreements with suppliers? What is the ‘return to business as usual’ position and how do you know it is safe to do so? How do you address legislative issues?
Every organisation should have a set of operating procedures they default to in the event of an incident. Everyone should understand their role in a contingency event – from exec level down to grass roots. Procedures should be documented, command and control management structures (the typical response) made clear, and roles and responsibilities clearly defined. There should be no doubt in anyone’s mind as to who is running the ‘event’ and who is responsible for returning the organisation to business as usual.
An important note here for all Exec team members. I can 100% guarantee that any Exec involvement in a critical event will confuse the chain of command. Employees will break protocol if an Exec member starts asking questions and managing to your needs/expectations. It will frustrate the ‘return to health’ process – probably burning time. Instead – ensure Execs are kept up to speed via a standard agreement with the ‘incident manager’ and reframe from interfering. Let the experts do their jobs – dig into your questions later.
This isn’t supposed to be a definitive list, and of course an expert will think of many other ways to improve resilience (that’s why individuals have made careers from this), but understanding the importance of this area is key. It is also relevant to all professionals. Plus you never know, implementing a few of these aspects might save you a few million...
Neil ran his first SAP transformation programme in his early twenties. He spent the next 18 years working both client side and for various consultancies running numerous SAP programmes. After successfully completing over 15 full lifecycles he took a senior leadership/board position and his work moved onto creating the same success for others.