High Availability and/or DR? What do you really need for your AIX and Unix environments?
Of course, it really depends on the criticality of your applications running on these environments but as Unix experts we have come across various misunderstandings over the years. The events over the past few months have brought the discussion to the fore again as companies evaluate their critical infrastructures and applications in light of inevitable budget constraints.
Although they can use similar mechanisms to implement and the terminology is often interchanged there are differences between HA and DR which are important to understand. If you believe you have one and in fact have the other, then should an incident occur there will be a significant expectation and recovery time gap. Similarly, you may be paying for a HA solution when your requirement has evolved over time and a simpler, more cost effective disaster recovery implementation would suffice.
HA tends to be a fully automated solution switching between the failed resources and those used to ensure availability while a basic DR solution is often manually triggered and can involve decisions by authorised personnel to invoke. Moreover, the resources used by the systems when using high availability remain in most cases unchanged, thus network configurations, data storage, monitoring and security are switched to the ‘supporting’ platform and the process remains transparent to end users.
HA is typically implemented within the same data centre and its aim is to prevent the loss of the underlying server or node that the application runs across, thereby addressing recovery within the same data centre. DR however, should address the failure of one location to allow recovery to a different location (ideally not on the same physical campus or site).
The pandemic situation has exposed instances where DR solutions have been implemented within the same physical location. Restrictions on travel, local and national lockdowns and increased home working have all given rise to questions regarding the suitability of current implementations, whether from the feasibility of them being on the same physical site/campus or remote Sys Admin support.
RPO v RTO
The key drivers for the solution required are Recovery Time Objective and Recovery Point Objective and it is essential businesses are very clear on both.
RTO means how long you need to recover your IT infrastructure to maintain business continuity, i.e. how long can your business survive following a disaster before your systems are restored to normal. A 24 hour RTO means a business can maintain operations for up to 24 hours without having its data and IT infrastructure available.
RPO measures the maximum amount of data you can tolerate losing as a result of a disaster or critical system failure. It is also used to help determine the frequency of backups whether daily or more regularly and whether high availability solutions are needed. For example, a 4 hour RPO means you can afford to lose 4 hours of data processing while the infrastructure and applications are recovered,
The phrase ‘loss of data’ sounds alarming but as with RTO these parameters should be considered by application, so for our AIX and Unix clients a key question is what applications are running on the environment and how long can these be down for in an emergency and what is the requirement to restore or rebuild the associated data.
By re-analysing requirements, not just in light of the past few months (and sadly the outlook for the next few months) but also how companies have migrated applications to public clouds we can help ensure continuity and ensure the optimum cost solution. Despite the evolution of workloads to public clouds, the majority of AIX systems still run critical applications and hence need a cost effective solution backed by the skills expertise to support this. For those systems running less critical but still necessary applications we can design a cost effective solution that still ensures business continuity. Contact us email@example.com for an initial discussion with our technical team.