CS Seminar: Migratory Compression: Coarse-grained Data Reordering to Improve Compressibility Friday, March 21, 2014 2:00pm-3:00pm Engineering Building, Room 4201 Fred Douglis, EMC Corporation Abstract We propose Migratory Compression (MC), a coarse-grained data transformation, to improve the effectiveness of traditional compressors in modern storage systems. In MC, similar data chunks are re-located together, to improve compression factors. After decompression, migrated chunks return to their previous locations. We evaluate the compression effectiveness and overhead of MC, explore reorganization approaches on a variety of datasets, and present a prototype implementation of MC in a commercial deduplicating file system. We also compare MC to the more established technique of delta compression, which is significantly more complex to implement within file systems. We find that Migratory Compression improves compression effectiveness compared to traditional compressors, by 11 percent to 105 percent, with relatively low impact on runtime performance. Frequently, adding MC to a relatively fast compressor like gzip results in compression that is more effective in both space and runtime than slower alternatives. In archival migration, MC improves gzip compression by 44–157 percent. Most importantly, MC can be implemented in broadly used, modern file systems. Joint work with Xing Lin (Univ. of Utah) and Guanlin Lu, Philip Shilane, and Grant Wallace (EMC Corporation—Data Protection and Availability Division) This will be an expanded version of the talk presented by Xing at FAST'14. Speaker's Bio Fred Douglis holds a Ph.D. in computer science from U.C. Berkeley. He has worked in industrial applied research throughout his career, including Matsushita, AT&T, IBM, and currently EMC. He also has been a visiting professor at VU Amsterdam and Princeton University. He received an IBM Outstanding Technical Achievement award for his contributions to System S, productized as Infosphere Streams. His research interests include storage, distributed systems, and Internet tools and performance. He served as EIC of IEEE Internet Computing from 2007-2010 and has been on its editorial board since 1999. He has published one book, 40 workshop or conference papers, 7 journal or magazine articles, and over 50 patents and patent applications.