Thursday, August 23, 2007

Wherescape Red and creating dimensions

My data source is a very denormalised reporting-style source system extract. I need to pluck out the dimensions and facts that are embedded in this source data, and I was hoping Red's much touted code generation would help me through this.

Dimension #1
- Dragged a load table into the dimension middle panel and specified the name
- Selected Normal dimension type
- Clicked OK to accept definition
- Clicked Finish to avoid creating the dimension based on the load table
- Deleted lots of irrelevant columns
- Right-clicked the dimension (in the left panel) and selected Create (recreate)
- Right-clicked the dimension and selected Properties
- Specified Update Procedure as ***Create New Procedure ***
- Selected the business key column
- Right-clicked the dimension and selected Update

Phew, I'm done! It didn't take long but I had a lot of steps to remember. It strikes me that a suitably constructed wizard could have completed a cycle of creating multiple dimensions and facts from one load table.

Dimension #2
More of the same, except that the Update failed with error:

-2 dim_2 update FAILED with error 2601 Step 100. Error Msg: Cannot insert duplicate key row in object 'dbo.dim_2' with unique index 'dim_2_idx_A'.

The dim_2 table was partially populated, so it appears a specific cursor row upset it partway through the data load. My first guess was there were multiple cursor rows for the same business key. However queries on the load table didn't find any.

Turns out that the generated update procedure doesn't handle nulls. To fix this I had to:
- right-click the dimension
- select properties
- click Rebuild for the update procedure
- tick Allow where clause editing and click OK to leave the business key as is
- type where business_key_column is not null and click OK

I guess there may be a tick box somewhere to make the procedure ignore nulls but I couldn't find it.

Now on to dimension hierarchies! I right-clicked the dimension and selected Add Hierarchy and it only offered me the option to reorganise the existing fields. I had to create separate dimension columns for calculated hierarchies by right-clicking the dimension and selecting Add Column.

Comments:
- The fields are mostly self explanatory but not very typo-proof, being free text fields.
- At various points it is necessary to update the dialog box which is done by clicking the Update button. Annoyingly this closes the dialog box and I had to reopen the column properties to be able to continue defining the column.
- The transformation tab could potentially hold some serious sql but is fixed size with limited IDE functionality.

No comments: