When you are designing a data warehouse solution that publishes out data marts, always be sure that your warehouse is granularity agnostic. Granularity means the level of detail you track (defined more in depth below). Being agnostic means that you don't care about or have knowledge of something. To be "granularity agnostic" means that your warehouse doesn't care or have knowledge of how granular your data is. In essence, you design it up front so that it can get as much or as little detail as possible.
For example, your warehouse could start out housing daily items. At some point, you may have a need to trickle feed your warehouse (per events, possibly multiple times per day). If your warehouse is granularity agnostic, you do not have to change your database design. To do that, we borrow the concept of batching, which moves the level of detail away from time or any other element of granularity. You can batch as much or as little as your want. That means we can pull in as much or as little detail as we need without making changes to our database design. And it's so simple it doesn't really require big up front design.
This is how Wikipedia defines granularity of data:
The granularity of data refers to the fineness with which data fields are sub-divided. For example, a postal address can be recorded, with low granularity, as one field:
- address = 200 2nd Ave. South #358, St. Petersburg, FL 33701-4313 USA
or with high granularity, as many fields:
- street address = 200 2nd Ave. South #358
- city = St. Petersburg
- postal code = FL 33701-4313
- country = USA
or even higher granularity:
- street number = 200
- street = 2nd Ave. South #358
- city = St. Petersburg
- postal code state = FL
- postal-code-first-part = 33701
- postal-code-second-part = 4313
- country = USA
Higher granularity has overheads for data input and storage, but offers benefits in flexibility of data processing.
To change that a little, we think of granularity within the concept of Data Warehousing to mean how finely you track events/changes. For example, if you think of investing, a trade could go through several stages during the day. At the start of a day a trade could be set up. Then it gets approved. Then it might get settled. All before the close of the day. If your grain is on a daily basis, you will only see the trade as it exists at the end of the day. The day before you don't see anything with this trade and the day of you only see the trade in a settled status. If instead you are trickle feeding, you see every one of these events in your warehouse.
Now think in terms of making changes to how often you feed the data to the warehouse. If you want to change it from daily to hourly, the warehouse design should not have to change at all. If you want to move from daily to a trickle feed, it should handle that appropriately. One way it is achieved is by using batching and surrogate keys. That is a data warehouse that is granularity agnostic.