Multi-level Selective Deduplication as long as VM Snapshots in Cloud Storage

Multi-level Selective Deduplication as long as VM Snapshots in Cloud Storage www.phwiki.com

Multi-level Selective Deduplication as long as VM Snapshots in Cloud Storage

Brinks, Julie, General Manager has reference to this Academic Journal, PHwiki organized this Journal Multi-level Selective Deduplication as long as VM Snapshots in Cloud Storage Wei Zhang, Hong Tang†, Hao Jiang†, Tao Yang, Xiaogang Li†, Yue Zeng† University of Cali as long as nia at Santa Barbara † Aliyun.com Inc. Motivations Virtual machines on the cloud use frequent backup to improve service reliability Used in Alibaba’s Aliyun – the largest public cloud service in China High storage dem in addition to Daily backup workload: hundreds of TB @ Aliyun Number of VMs per cluster: 10000+ Large content duplicates Limited resource as long as deduplication No special hardware or dedicated machines Small CPU& memory footprint Focus in addition to Related Work Previous work Version-based incremental snapshot backup Inter-block/VM duplicates are not detected. Chunk-based file duduplication High cost as long as chunk lookup Focus on Parallel backup of a large number of virtual disks. Large files as long as VM disk images. Contributions Cost-constrained solution with very limited computing resource Multi-level selective duplicate detection in addition to parallel backup.

Augustana College US www.phwiki.com

This Particular University is Related to this Particular Journal

Requirements Negligible impact on existing cloud service in addition to VM per as long as mance Must minimize CPU in addition to IO b in addition to width consumption as long as backup in addition to deduplication workload (e.g. <1% of total resource). Fast backup speed Compute backup as long as 10,000+ users within a few hours each day during light cloud workload. Fault tolerance constraint addition of data deduplication should not decrease the degree of fault tolerance. Design Considerations Design alternatives An external in addition to dedicated backup storage system. A decentralized in addition to co-hosted backup system with full deduplication Backup Cloud service Design Considerations Decentralized architecture running on a general purpose cluster co-hosting both elastic computing in addition to backup service Multi-level deduplication Localize backup traffic in addition to exploit data parallelism Increase fault tolerance Selective deduplication Use minimal resource while still removing most of redundant content in addition to accomplishing good efficiency Key Observations Inner-VM data characteristics Exploit unchanged data to localize deduplication Cross-VM data characteristics Small common data dominates duplicates Zipf-like distribution of VM OS/user data Separate consideration of OS in addition to user data VM Snapshot Representation Data blocks are variable-sized Segments are fix-sized Processing Flow of Multi-level Deduplication Data Processing Steps Segment level checkup. Use dirty bitmap to see which segments are modified. Block level checkup Divide a segment into variable-sized blocks, in addition to compare their signatures with the parent snapshot Checkup from common dataset (CDS) Identify duplicate chunks from CDS Write new snapshot blocks Write new content chunks to stoage. Save recipes Save segment meta-data in as long as mation Architecture of Multi-level VM snapshot backup Cluster node Status& Evaluation Prototype system running on Alibaba’s Aliyuan cloud. Based on Xen. 100 nodes in addition to each has 16 cores, 48G memory, 25VMs. Use <150MB per machine as long as backup&deduplication Evaluation data from Aliyuan’s production cluster 41TB. 10 snapshots per VM Segment size: 2MB. Avg. Block size: 4KB Data Characteristics of the Benchmark Each VM uses 40GB storage space on average OS in addition to user data disks: each takes ~50% of space OS data 7 main stream OS releases: Debian, Ubuntu, Redhat, CentOS, Win2003 32bit, win2003 64 bit in addition to win2008 64 bit. User data From 1323 VM users Impacts of 3-Level Deduplication Level 1: Segment-level detection within VM Level 2: Block-level detection within VM Level 3: Common data block detection across-VM Impact as long as Different OS Releases Separate consideration of OS in addition to user data Both have Zipf-like data distribution But popularity growth differs as the cluster size/VM users increase Commonality among OS releases 1G common OS meta data covers 70+% Cumulative coverage of popular user data Coverage is the summation of covered data block sizefrequency Space saving compared to perfect deduplication as CDS size increases 100G CDS (1GB index) -> 75% of perfect dedup Impact of dataset-size increase Conclusions Contributions: A multi-level selective deduplication scheme among VM snapshots Inner-VM deduplication localizes backup in addition to exposes more parallelism global deduplication with a small common data set appeared in OS in addition to data disks Use less than 0.5% of memory per node to meet a stringent cloud resource requirement -> accomplish 75% of what perfect deduplication does. Experiments Achieve 500TB/hour on a 1000-node cloud cluster Reduce b in addition to width by 92% -> 40TB/hour

Brinks, Julie KGUN-TV General Manager www.phwiki.com

Brinks, Julie General Manager

Brinks, Julie is from United States and they belong to KGUN-TV and they are from  Tucson, United States got related to this Particular Journal. and Brinks, Julie deal with the subjects like Entertainment Programming; News Programming

Journal Ratings by Augustana College

This Particular Journal got reviewed and rated by Augustana College and short form of this particular Institution is US and gave this Journal an Excellent Rating.