Transcript for:
Data Backup and Restoration Strategy

[Music] in a business a lot of things that happen in the background are underappreciated and backup is one of them it is rarely asked for but it is the most sought after if something bad happens to your data whether it is a ransomware attack or data accidentally deleted or there is a hardware failure in this episode i will talk about what all to consider while designing a data backup and restoration strategy if you are in devops or are managing it infrastructure or are running a business where you have to rely on data availability then this video may be of some value in an enterprise a typical technology landscape has some sort of a business continuity and a disaster recovery plan backups are an integral component of that the common knowledge is that you zip a folder and save it someplace safe such as google drive or use apple's time machine and you are good but as a business owner to maintain business continuity and having a usable disaster recovery plan you first need to answer a few key questions the first question is what is the rpo the recovery point objective of your organization in case of a disaster how much data loss your organization is willing to suffer 3 days 12 hours or 1 hour that eventually will determine the data backup frequency its strategy and the costs the second question is have you set the rto the recovery time objective that is in case of an it incident how soon you need to recover the data and make the systems running then how many days worth of backup do you need the answer to it might also depend on the agreements with your customers on the data backup and storage requirements and finally would you need the data as a hot backup which means it is readily available for restoration or could it be a cold backup which might mean it is written on a slower or a cheaper storage medium for archival and long-term storage hot backers will be the ones that will be put to use in case of an i.t incident and will contain data that is essential to run a business on the other hand the data that is required for auditing or legal reasons can be considered as cold backup just remember hot warm and cold backups have different meanings depending on the context the answers to these questions will then be used by the iit team to build a data backup strategy and provide you with a cost estimate as part of an it team who is responsible for business continuity and disaster recovery planning you need to get clarity on the business objectives in terms of rpo and rto etc from the stakeholders you then also have to consider other parameters to define your backup strategy and capacity planning the first question you need to answer is what type of backup a full backup each time or taking a full backup for the first time and subsequent backups are to be incremental the next question is how will you address the ever growing storage do you have sufficient hardware and redundancy or will you be using cloud storage then how much are you willing to spend the expenses could be related to storage data transfer costs hardware licensing and personnel costs what about taking backup of databases which are changing in real time or software that keeps data in memory rather than writing on a disk how will you address those situations the most important question would be how would you ensure data security is encryption also required for data at rest especially if they are transferred to a secondary location finally how would you determine whether you can rely on the backups that you have created do you have a sandbox environment where you can restore and test the integrity of the backups you are creating when you devise a data backup strategy you have to first identify your audience and get a sense of how important the data is for them you have to realize that unless it is just files such as powerpoint or excel spreadsheet they may not always know where all their data is kept if they are using a crm application then it is possibly maintaining its data in a different location or if they have a database or docker containers then it might be difficult to identify the actual path where the data resides on that system it is your duty to guide them for individual users it is best to have at least some backup solution in place apple mac os has time machine microsoft windows has backup linux has a few equivalents such as data dupe and chrono peat these backups can be moved to a remote location or to a portable hard drive periodically but in an organization that has multiple servers running alongside user machines there is a need to have a more central and robust backup mechanism for that the easiest next step is to move towards full and incremental backups the backups are done either manually or automatically at predefined intervals they are easy to implement especially in smaller organizations though they require monitoring and the itt needs to work with the business team to identify the frequency and the schedule of backups and set expectations on the restoration and data availability there are off-the-shelf solutions that help you do that they are expensive but they can scale enterprise-wide and take the load off from your shoulders in terms of logistics they can do backup scheduling and recovery at an enterprise level including special scenarios such as database servers microsoft exchange or active directory for large organizations it works out quite well since there is software support and usable graphical interfaces to manage the backups and scheduling if you are enterprising enough you could do the same through free and open source tools too the only disadvantage would be that unlike a mostly predefined methodology offered by enterprise backup solutions you will be dictating the backup strategy end to end and you will be on your own identifying and addressing the points of failure the gold standard for taking backups is cdp continuous data protection in a cdp type setup every save is backed up effectively creating multiple versions of a single file instead of synchronizing file level differences it saves the underlying block level or byte level differences that means if a few bytes are modified of 100 mb file then only those modified bytes will be synchronized not the entire file thus saving storage and bandwidth since it is near real time of course ignoring the network transfer differences theoretically it can offer an rpo the recovery point objective of zero that is in case of a major i.t incident the chances of data loss are zero practical implementations though shifts between continuous and near continuous for the cloud environments such as aws and virtual machines such as vmware or zen or kvm there are options of creating snapshots which take backup of the entire setup periodically you may utilize them too i work with a number of small and medium enterprises and i have noticed that a daily incremental backup is good enough for a majority of use cases especially for software development teams or the devops teams who are managing a select set of servers having control of backups provide a peace of mind and a good night's sleep since i manage a large set of linux servers i always gravitate towards rsync which allows me to create differential backups without any prohibitive costs and provides more control on my setup in the next episode i will talk about some open source or free tools including rsync that you can smartly stitch together to create an effective data backup strategy as well as prepare an environment where you can try out the data restoration ability stay tuned you