What is data protection?
Data protection is the process of safeguarding important information from corruption, compromise or loss.
The importance of data protection increases as the amount of data created and stored continues to grow at unprecedented rates.
There is also little tolerance for downtime that can make it impossible to access important information.
Consequently, a large part of a data protection strategy is ensuring data can be restored quickly after any corruption or loss.
Protecting data from compromise and ensuring data privacy are other key components of data protection.
What are the principles of data protection?
The key principles of data protection are to safeguard and make available data under all circumstances. The term data protection is used to describe both the operational backup of data and business continuity/disaster recovery (BC/DR). Data protection strategies are evolving along two lines:
1. Data availability, and
2. Data management.
Data availability ensures users have the data they need to conduct business even if the data is damaged or lost.
There are two key areas of data management used in data protection:
1. Data lifecycle management, and
2. Information lifecycle management.
Data lifecycle management is the process of automating the movement of critical data to online and offline storage. Information lifecycle management is a comprehensive strategy for valuing, cataloging and protecting information assets from application and user errors, malware and virus attacks, machine failure or facility outages and disruptions.
More recently, data management has come to include finding ways to unlock business value from otherwise dormant copies of data for reporting, test/dev enablement, analytics and other purposes.
What is the purpose of data protection?
Storage technologies that can be used to protect data include a disk or tape backup that copies designated information to a disk-based storage array or a tape cartridge device so it can be safely stored. Mirroring can be used to create an exact replica of a website or files so they’re available from more than one place.
Storage snapshots can automatically generate a set of pointers to information stored on tape or disk, enabling faster data recovery, while continuous data protection (CDP) backs up all the data in an enterprise whenever a change is made.
What is data portability?
Data portability is the ability to move data among different application programs, computing environments or cloud services and this can present another set of problems and options for solutions for data protection. On the one hand, cloud-based computing makes it possible for customers to migrate data and applications between or among cloud service providers (CSP). On the other hand, it requires safeguards against data duplication.
Either way, cloud backup is becoming more prevalent and popular. Organizations frequently move backup data to public clouds or clouds maintained by backup vendors. These backups can replace on-site disk and tape libraries, or they can serve as additional protected copies of data.
Backup has traditionally been the key to an effective data protection strategy. Data was periodically copied, typically each night, to a tape drive or tape library where it would sit until something went wrong with the primary data storage. That’s when the backup data would be accessed and used to restore lost or damaged data.
Backups are no longer a stand-alone function; they’re being combined with other data protection functions to save storage space and lower costs.
Backup and archiving, for example, have been treated as two separate functions. Backup’s purpose was to restore data after a failure. An archive provided a searchable copy of data. However, that led to redundant data sets.
Today, there are products that back up, archive and index data in a single pass. This approach saves organizations time and cuts down on the amount of data in long-term storage.
Cloud backup is becoming more prevalent. Organizations frequently move their backup data to public clouds or clouds maintained by backup vendors. These backups can replace on-site disk and tape libraries, or they can serve as additional protected copies of data.
What is the convergence of disaster recovery and backup?
Another area where data protection technologies are coming together is in the merging of backup and disaster recovery (DR) capabilities. Virtualization plays a major role shifting the focus from copying data at a specific point in time to continuous data protection.
Historically, data backup has been about making duplicate copies of data. Disaster recovery, on the other hand, focuses on how backups are used once a disaster happens.
Snapshots and replication have made it possible to recover much faster from a disaster than in the past. When a server fails, data from a backup array is used in place of the primary storage, but only if steps have been taken to prevent that backup from being modified.
Those steps involve using a snapshot of the data from the backup array to immediately create a differencing disk. The original data from the backup array is then used for read operations. Write operations are directed to the differencing disk. This approach leaves the original backup data unchanged. While all this is happening, the failed server’s storage is being rebuilt and the data replicated from the backup array to the failed server’s newly rebuilt storage.
Once the replication is complete, the contents of the differencing disk are merged onto the server’s storage and users are back in business.
Data deduplication, also known as data dedupe, plays a key role in disk-based backup. Dedupe eliminates redundant copies of data to reduce the storage capacity required for backups. Deduplication can be built into backup software or can be a software-enabled feature in disk libraries.
Dedupe applications replace redundant data blocks with pointers to unique data copies. Subsequent backups only include data blocks that have changed since the previous backup. Deduplication began as a data protection technology and has moved into primary data as a valuable key feature to reduce the amount of capacity required for more expensive flash media.
Continuous Data Protection (CDP) plays a key role in disaster recovery and enables fast restores of backup data. CDP enables organizations to roll back to the last good copy of a file or database, reducing the amount of information loss in the case of corruption or deletion of data. CDP started as a separate product category, but evolved to the point where it is now built into most replication and backup applications. CDP can also eliminate the need to keep multiple copies of data. Instead, organizations retain a single copy that’s updated continuously as changes occur.
How to set up effective enterprise data protection strategies
Modern data protection for primary storage involves using a built-in system that supplements, or replaces, backups and protects against the following potential problems:
1. Media failure. The goal here is to make data available even if a storage device fails. Synchronous mirroring is one approach in which data is written to a local disk and a remote site at the same time. The write is not considered complete until a confirmation is sent from the remote site, ensuring that the two sites are always identical. Mirroring requires 100% capacity overhead.
2. Redundant Array of Independent Disks (RAID) protection is an alternative that requires less overhead capacity. With RAID, physical drives are combined into a logical unit that’s presented as a single hard drive to the operating system. RAID enables the same data to be stored in different places on multiple disks. As a result, I/O operations overlap in a balanced way, improving performance and increasing protection.
RAID protection must calculate parity, a technique that checks whether data has been lost or written over when it’s moved from one storage location to another, and that calculation consumes computer resources.
The cost of recovering from a media failure is the time it takes to return to a protected state. Mirrored systems can return to a protected state quickly. RAID systems take longer because they must recalculate all the parity. Advanced RAID controllers don’t have to read an entire drive to recover data when doing a drive rebuild; they only need to rebuild the data that is on that drive. Given that most drives run at about one-third capacity, intelligent RAID can reduce recovery times significantly.
3. Erasure coding is an alternative to advanced RAID that’s often used in scale-out storage environments. Like RAID, erasure coding uses parity-based data protection systems, writing both data and parity across a cluster of storage nodes. With erasure coding, all the nodes in the storage cluster can participate in the replacement of a failed node, so the rebuilding process doesn’t get CPU-constrained and it happens faster than it might in a traditional RAID array.
4. Replication is another data protection alternative for scale-out storage. Data is mirrored from one node to another or to multiple nodes. Replication is simpler than erasure coding, but it consumes at least twice the capacity of the protected data.
5. Data corruption. When data is corrupted or accidentally deleted, snapshots can be used to set things right. Most storage systems today can track hundreds of snapshots without any significant effect on performance.
Storage systems using snapshots can work with key applications, such as Oracle and Microsoft SQL Server, to capture a clean copy of data while the snapshot is occurring. This approach enables frequent snapshots that can be stored for long periods of time.
When data becomes corrupted or is accidentally deleted, a snapshot can be mounted and the data copied back to the production volume, or the snapshot can replace the existing volume. With this method, minimal data is lost and recovery time is almost instantaneous.
6. Storage system failure. To protect against multiple drive failures or some other major event, data centers rely on replication technology built on top of snapshots.
With snapshot replication, only blocks of data that have changed are copied from the primary storage system to an off-site secondary storage system. Snapshot replication is also used to replicate data to on-site secondary storage that’s available for recovery if the primary storage system fails.
7. Full-on data center failure. Protection against the loss of a data center requires a full disaster recovery plan. As with the other failure scenarios, there are multiple options. Snapshot replication, where data is replicated to a secondary site, is one option. However, the cost of running a secondary site can be prohibitive.
8. Cloud services are another alternative. Replication and cloud backup products and services can be used to store the most recent copies of data that is most likely to be needed if a major disaster occurs and to instantiate application images. The result is a rapid recovery in the event of a data center loss.
What is trending in data protection?
The latest trend in data protection policy and technology includes:
Hyper-convergence. With the advent of hyper-convergence, vendors have started offering appliances that provide backup and recovery for physical and virtual environments that are hyper-converged, non-hyper-converged and mixed. Data protection capabilities integrated into hyper-converged infrastructure are replacing a range of devices in the data center.
Cohesity, Rubrik and other vendors offer hyper-convergence for secondary storage, providing backup, disaster recovery, archiving, copy data management and other nonprimary storage functions. These products integrate software and hardware, and they can serve as a backup target for existing backup applications in the data center. They can also use the cloud as a target and provide backup for virtual environments.
What is ransomware?
Ransomware is a type of malware that holds data hostage for an extortion fee. It is a fast-growing problem. Traditional backup methods have been used to protect data from ransomware. However, more sophisticated ransomware is adapting to and circumventing traditional backup processes.
The latest version of this malware slowly infiltrates an organization’s data over time so the organization ends up backing up the ransomware virus along with its data. This situation makes it difficult, if not impossible, to roll back to a clean version of the data.
To counter this problem, vendors are working on adapting backup and recovery products and methodologies to thwart the new ransomware capabilities.
What are the benefits of copy data management (CDM)?
Copy data management (CDM) cuts down on the number of copies of data an organization must save. CDM helps reduce the overhead required to store and manage data and simplifying data protection. CDM can speed up application release cycles, increase productivity and lower administrative costs through automation and centralized control.
The next step with CDM is to add more intelligence. Companies such as Veritas Technologies are combining CDM with its intelligent data management platforms.
Why does your business need disaster recovery as a service (DRaaS)?
Disaster recovery as a service (DRaaS) use is expanding as more options are offered and prices come down.
This service is being used for critical business systems where an increasing amount of data is being replicated rather than just backed up.
The importance of mobile data protection for you & your business
Mobile data protection. Data protection on mobile devices has its own challenges. It can be difficult to extract data from these devices. Inconsistent connectivity makes scheduling backups difficult, if not impossible.
Mobile data protection is further complicated by the need to keep personal data stored on mobile devices separate from business data.
Selective file sync and share is one approach to data protection on mobile devices. While it isn’t true backup, file sync-and-share products typically use replication to sync users’ files to a repository in the public cloud or on an organization’s network; that location must then be backed up.
File sync and share gives users access to the data they need from a mobile device, while synchronizing any changes they make to the data with the original copy. However, it doesn’t protect the state of the mobile device, which is needed for quick recovery.
Is there a difference between security and privacy? Yes, and you need to know what they are
Differences between security and privacy. In general:
Data security refers specifically to measures taken to protect the integrity of the data itself against manipulation and malware
Privacy refers to controlling access to the data.
Understandably, a privacy breach can lead to data security issues.
Data protection and privacy
Data privacy laws and regulations vary from country-to-country and even from state-to-state and there’s a constant stream of new ones being created. China’s data privacy law went into effect June 1, 2017. The European Union’s General Data Protection Regulation (GDPR) went into effect in 2018.
Compliance with any one set of rules is complicated and challenging.
Coordinating among all the disparate rules and regulations is a massive undertaking. Being out of compliance can lead to steep fines and other penalties, including having to stop doing business in the country or region covered by the law or regulation.
For a global organizations, experts recommend having a data protection policy that complies with the most stringent set of rules the business faces, while, at the same time, using a security and compliance framework that covers a broad set of requirements.
The basics of data protection and privacy apply across the board and include:
1. Safeguarding data
2. Getting consent from the person whose data is being collected
3. Identifying the regulations that apply to the organization in question and the data it collects, and
4. Ensuring employees are fully trained in the nuances of data privacy and security.
Is your business in compliance with the new California Consumer Privacy Act (CCPA)? Does it have to be?
The California Consumer Privacy Act (CCPA) is a comprehensive new consumer protection law that took effect on January 1, 2020. The California Consumer Privacy Act represents one of the most sweeping acts of legislation enacted by a U.S. state to bolster consumer privacy. Falling on the heels of the GDPR, California Consumer Privacy Act may mark the beginning of stricter U.S. consumer privacy protections.
The intentions of CCPA are to provide California residents with the right to:
1. Know what personal data is being collected about them.
2. Know whether their personal data is sold or disclosed and to whom.
3. Say “no” to the sale of personal data.
4. Access their personal data.
5. Request a business to delete any personal information about a consumer collected from that consumer.
6. Not be discriminated against for exercising their privacy rights.
The CCPA applies to any business, including any for-profit entity that collects consumers’ personal data, which does business in California, and satisfies at least one of the following thresholds:
1. Has annual gross revenues in excess of $25 million;
2. Buys or sells the personal information of 50,000 or more consumers or households; or
3. Earns more than half of its annual revenue from selling consumers’ personal information.
4. Organizations are required to “implement and maintain reasonable security procedures and practices” in protecting consumer data.
Responsibility and accountability
1. Implement processes to obtain parental or guardian consent for minors under 13 years and the affirmative consent of minors between 13 and 16 years to data sharing for purposes (Cal. Civ. Code § 1798.120(c)).
2. “Do Not Sell My Personal Information” link on the home page of the website of the business, that will direct users to a web page enabling them, or someone they authorize, to opt out of the sale of the resident’s personal information (Cal. Civ. Code § 1798.102).
3. Designate methods for submitting data access requests, including, at a minimum, a toll-free telephone number (Cal. Civ. Code § 1798.130(a)).
4. Update privacy policies with newly required information, including a description of California residents’ rights (Cal. Civ. Code § 1798.135(a)(2)).
5. Avoid requesting opt-in consent for 12 months after a California resident opts out (Cal. Civ. Code § 1798.135(a)(5)).
Sanctions and remedies
The following sanctions and remedies can be imposed:
1. Companies, activists, associations, and others can be authorized to exercise opt-out rights on behalf of California residents (Cal. Civ. Code § 1798.135(c).
2. Companies that become victims of data theft or other data security breaches can be ordered in civil class action lawsuits to pay statutory damages between $100 to $750 per California resident and incident, or actual damages, whichever is greater, and any other relief a court deems proper, subject to an option of the California Attorney General’s Office to prosecute the company instead of allowing civil suits to be brought against it (Cal. Civ. Code § 1798.150).
3. A fine up to $7,500 for each intentional violation and $2,500 for each unintentional violation (Cal. Civ. Code § 1798.155).
4. Privacy notices must be accessible and have alternative format access clearly called out.
Definition of personal data
CCPA defines personal information as: information that identifies, relates to, describes, is reasonably capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer or household such as a real name, alias, postal address, unique personal identifier, online identifier, Internet Protocol address, email address, account name, social security number, driver’s license number, passport number, or other similar identifiers.
An additional caveat identifies, relates to, describes, or is capable of being associated with, a particular individual, including, but not limited to, their name, signature, Social Security number, physical characteristics or description, address, telephone number, passport number, driver’s license or state identification card number, insurance policy number, education, employment, employment history, bank account number, credit card number, debit card number, or any other financial information, medical information, or health insurance information.
It does not consider Publicly Available Information as personal.
Key differences between CCPA and the European Union’s GDPR include:
1. The scope and territorial reach of each
2. Definitions related to protected information
4. Levels of specificity
5. Opt-out right for sales of personal information.
CCPA differs in definition of personal information from GDPR as in some cases the CCPA only considers data that was provided by a consumer and excludes personal data that was purchased by, or acquired through, third parties[the italicized portion of this sentence is open to debate]. The GDPR does not make that distinction and covers all personal data regardless of source (even in the event of sensitive personal information, this doesn’t apply if the information was manifestly made public by the data subject themselves, following the exception under Art.9(2),e). As such the definition in GDPR is much broader than defined in the CCPA.