Shredded Storage in SharePoint 2013 Preview

Greetings from SharePoint Fest in Chicago! Numerous discussions are going on here about a small but important new feature in SharePoint 2013: shredded storage. There’s a surprising amount of misinformation about this feature, so I thought I would try to clarify its purpose and dependencies.

As you probably know, documents stored in a library or as attachments are stored as binary large objects (BLOBs) in the content database, by default. Remote BLOB Storage (RBS) is a set of APIs that let you move BLOBs out of the SQL Server content database to another storage mechanism.

You can read my whitepaper on BLOB externalization for details.

In SharePoint 2010, there was opportunity for improvement with both the storage utilization story and I/O performance for documents. In SharePoint 2010, if version history is enabled on a document library, each new version results in a new BLOB for that document. Conceptually, a 1MB file with 10 versions is consuming 10MB of storage.

What a lot of people don’t consider is that a “new version” doesn’t mean just a change to the document-- it can mean a change to metadata. So if a user changes a metadata field, that is a new version, and a copy of the BLOB is created, even if no change was made to the document itself!

So BLOBs can proliferate quickly and, to put it bluntly, “pointlessly.” By the way, it’s a best practice to set version retention limits on any library where version history is enabled.

Second, I/O performance is problematic in SharePoint 2010. There’s an unnecessary file read that occurs when changes—at least to Office documents—are uploaded to the SharePoint web server.

At the highest level, what SharePoint 2013 shredded storage does is “chunk” or “page” the BLOB into numerous smaller shreds. So a single BLOB is now a construct made up of numerous shreds.

One result of this architecture is an effect similar to deduplication or single instancing: only differences are saved, not entire BLOBs. So, for example, if you have versioning enabled and a user makes a change to a document, only changed shreds are added to the storage footprint of that document. Shreds that have not changed from the previous version are simply “associated” with both versions.

You can see significant improvements in storage utilization. That same 1MB file with 10 versions may be consuming 2.2MB of storage, for example.

Shredded storage also reduces the amount of information about a file that has to be retrieved by the web server from the content database, so I/O improves.

With that conceptual introduction in place, let me punch out a couple of things you need to know, which I’ve found misrepresented in the community:

  • Shredded storage is, on the whole, a good thing, and is on by default.
  • You can disable (or re-enable) shredded storage on a per-web application basis.
  • BLOBs are not shredded on an upgrade, but are shredded when uploaded or modified.
  • Shredded storage is SharePoint 2013, running on SQL Server 2008 R2 or SQL Server 2012.
  • Shredded storage is different than Cobalt. Cobalt is a framework that allows Office client applications to efficiently synchronize changes to SharePoint using the File Synchronization via SOAP of HTTP (FSSHTTP) API. Shredded storage is about how a document is shredded, stored, and reassembled by SQL Server. I’m hearing lots of people suggest that shredded storage works only on Office documents. Not true. Such statements are confounding shredded storage with Cobalt. When we look inside a content database on SharePoint 2013, we see PDFs and other file formats being shredded as well.
  • Shredded storage is independent of RBS. You can use RBS with or without shredded storage, and vice versa. Now whether you would use RBS with shredded storage is another question. Folks in the community are currently running tests to determine the performance implications of doing so. My guess is that many of the performance advantages of RBS that I saw with customers, and that both Microsoft and I have documented in white papers, will be reduced or eliminated due to shredded storage. There was also a benefit in SharePoint 2010 to running RBS to store BLOBs on SAN and NAS devices that support deduplication. Shredded storage might very well reduce or eliminate that benefit. However, RBS will continue to be critically important in hierarchical storage management, where you are managing tiers of storage (and therefore cost and other characteristics) based on business rules.

So that’s the “net net” of shredded storage.

What is still not well documented are the inner workings, so I’ll strive in a future article to detail how it works. Even some of the “official” training and documentation I’ve seen has holes, particularly in relation to the interaction of Cobalt and shredded storage, and how shredded storage does (or does not) help I/O for non-office document formats.

The good news is you don’t really have to understand how it works, just that it does work. The bottom line is that the SharePoint web server is doing more work, and SQL Server is doing more work, to reduce I/O bottlenecks. Because I/O is likely to be the number-one bottleneck in SharePoint performance, this is all quite desirable. And, along the way, the storage footprint of a document can be reduced—perhaps significantly.

Each release of SharePoint offers a “feature” that is terribly named (does “shredded storage” sound like a good thing?), poorly documented, and misrepresented in the community. This is one of them for SharePoint 2013.

The situation will improve, as Microsoft continues to release new documentation towards RTM. Where I’ve been vague or have left open questions, it was with the goal of being accurate here today, and coming back with more information later once it’s been fully vetted.

I hope I’ve hit that target, and I am definitely open to feedback from folks who have dug deep into this feature. My goal is simply to help clarify the feature for us all.

I’d like to give a HUGE shout-out to Jeremy Thake and Randy Williams at AvePoint, and to a great guy at Microsoft (I didn’t have time to check whether I could thank him publicly), each of whom are dedicated to helping us all succeed with SharePoint 2013! Thank you!!

Discuss this Blog Entry 2

on May 22, 2013

Shredded storage testing framework is available here:

http://shreddedstorage.codeplex.com

on Jun 24, 2013

Nice Post Dan...!!!

Please or Register to post comments.

What's Dan Holme's Viewpoint on SharePoint Blog?

SharePoint expert Dan Holme shares tips, how-to's, ideas, and news about all things SharePoint, and more.

Contributors

Dan Holme

Dan Holme's 18 years of experience and his impact on hundreds of thousands of IT professionals and business decision makers have earned him a reputation as one of the world's most respected...
IdeaXchange

Come join the IdeaXchange conversation!

Read the latest from our Xperts, and make your voice heard.

Want to learn more? Check out the FAQs.

eBooks For You
Join the Conversation
Blog Archive

Sponsored Introduction Continue on to (or wait seconds) ×