14 February, 2006

Storing documents in the database versus on the file sytem

The long-term strategy for storage technology at Microsoft is to take advantage of SQL Server relational database technology wherever possible, and SharePoint Products and Technologies is a showcase example. In SharePoint, all content including BLOBs (Office docs, images, etc.) and configuration information is stored in SQL databases.

Recently, I have come across a number of DBAs that religiously refuse to have any BLOBs stored in their databases. This obviously limits how custom applications handling images and other documents can be designed. In addition, Windows SharePoint Services is not a popular technology with those DBAs.

There are a number of significant benefits of having your BLOBs stored in the database. First of all, you gain all the ACID properties of your transactions. Trying to obtain that in an architecture with data in the database and BLOBs on the file system is certainly not a trivial challenge. Secondly, BLOB data is backed up with the database easing administration. Another benefit is that SQL Server Full Text Search operations can be performed against formatted text-based data contained within BLOBs (eg. Word docs, PDF docs, etc.).

SharePoint takes fully advantage of all these benefits. The benefits do however come with a cost. Because SQL Server breaks BLOBs up into chunks that fit on database pages, there is a performance overhead when reassembling the BLOBs on retrieval compared to if they were stored on a file system.

In summary, there is no one right answer to where you should store your BLOBs. It all depends on the context. You need to consider the performance issues in the database and hold them up against the benefits of having your BLOBs stored in the database. A good example of resources that are better stored on the file system rather than in the database is images that are typically referenced via HTTP HREF.

For SharePoint, where BLOBs (mainly Office docs) are not extraordinary large files, the benefits definitely outweighs the slight performance overhead. Images that are referenced from within the HTML in the SharePoint site definitions are stored on the file system. These scnerios are good to keep in mind when considering BLOB storage strategy for your own custom applications.

2 comments:

Marcus Rosen said...

Good Article that discusses using BLOBs in SQL Server 2000 on MS TechNet:

http://www.microsoft.com/technet/prodtechnol/sql/2000/reskit/part3/c1161.mspx?mfr=true

Anonymous said...

cons:
* there is already a high-specializid "database" to store files... commonly called "filesystem".
pro:
* in database (application) you can more easy build in features like versioning, authorization, authentication, who-accessed-when, ...
* single-point of consistent backup. (with all the problems to do this with a DB)
* referiential integrety (foreign-key). (e.g. you cant delete document X out of DB, as long something links to it)