http://laurentschneider.com/wordpress/2007/09/isnumber-in-sql.html
select x,
case when
regexp_like(:x, ‘^[+-]?(\d+\.?\d*\.\d+)([eE][+-]?\d+)?$’)
then to_number(x)
end n
from t;
Friday, September 28, 2007
Monday, September 24, 2007
Wherescape Red and multi stage ETL
In Red, you associate dimensions with a fact in the Stage table. If your load table has data that needs to be transformed before it can be used to join to the appropriate dimension business keys then you may want a 2 step stage process to keep the transformations readable.
For instance if your load table has a column that contains data in the form ddmon (e.g. 01SEP) this can be converted to a date before joining to dim_date to get the dim_date_key. So stage_table_1 would have a date column derived from the ddmon column and stage_table_2 would have the date column plus a dim_date_key column using the relevant join criteria.
Of course stage_table_2 could be derived in one step but my preference is to keep the transformations simple.
For instance if your load table has a column that contains data in the form ddmon (e.g. 01SEP) this can be converted to a date before joining to dim_date to get the dim_date_key. So stage_table_1 would have a date column derived from the ddmon column and stage_table_2 would have the date column plus a dim_date_key column using the relevant join criteria.
Of course stage_table_2 could be derived in one step but my preference is to keep the transformations simple.
Wednesday, September 19, 2007
Wherescape Red and data repositories
I wanted to create a separate data repository from WslWarehouse (which is full of metadata), so I created a new SQL Server database instance called Test_Warehouse on the same server.
I had to create a system DSN called Test_Warehouse pointing to this instance. I don't see the need for this step. Red supports SQL Server in theory which means that I should be able to specify the server, database and credentials somewhere.
Then in Red I right-clicked Connections, selected New Object and specified properties as follows:
Name of object: Test_Warehouse
Connection Method: IP
ODBC Source: Test_Warehouse
(No need for credentials as I specified Windows Authentication in the DSN)
The next step is to get the load table to create itself in Test_Warehouse. I thought that Change Connect/Schema would do the trick but it made no difference. After much messing around and consulting help files I still didn't figure out how to do it. So it's back to mingling meta and data...
I had to create a system DSN called Test_Warehouse pointing to this instance. I don't see the need for this step. Red supports SQL Server in theory which means that I should be able to specify the server, database and credentials somewhere.
Then in Red I right-clicked Connections, selected New Object and specified properties as follows:
Name of object: Test_Warehouse
Connection Method: IP
ODBC Source: Test_Warehouse
(No need for credentials as I specified Windows Authentication in the DSN)
The next step is to get the load table to create itself in Test_Warehouse. I thought that Change Connect/Schema would do the trick but it made no difference. After much messing around and consulting help files I still didn't figure out how to do it. So it's back to mingling meta and data...
Wherescape Red and fixed width text source
Two annoyances:
1. The data load wizard doesn't let you throw anything away in the source. You have to create junk columns to hold separators and other unneeded file content.
2. If you use the Back button in the data load wizard, it appears that you lose your column specifications. Actually the column definition has already been created and you are creating a duplicate. Clicking ok to Create and Load when you finish the wizard will not work because of this, and you will have to go back and delete the old column definitions.
If you get the following errors, create a c:\temp directory.
Attempt to bulk insert has FAILED![Microsoft][SQL Native Client][SQL Server]Cannot bulk load because the file "c:\temp\wsl6.fmt" could not be opened. Operating system error code 3(The system cannot find the path specified.).
[Microsoft][SQL Native Client][SQL Server]Cannot bulk load because the file "c:\temp\wsl6.fmt" could not be opened. Operating system error code 3(The system cannot find the path specified.).
.
1. The data load wizard doesn't let you throw anything away in the source. You have to create junk columns to hold separators and other unneeded file content.
2. If you use the Back button in the data load wizard, it appears that you lose your column specifications. Actually the column definition has already been created and you are creating a duplicate. Clicking ok to Create and Load when you finish the wizard will not work because of this, and you will have to go back and delete the old column definitions.
If you get the following errors, create a c:\temp directory.
Attempt to bulk insert has FAILED![Microsoft][SQL Native Client][SQL Server]Cannot bulk load because the file "c:\temp\wsl6.fmt" could not be opened. Operating system error code 3(The system cannot find the path specified.).
[Microsoft][SQL Native Client][SQL Server]Cannot bulk load because the file "c:\temp\wsl6.fmt" could not be opened. Operating system error code 3(The system cannot find the path specified.).
.
Wherescape Red and creating facts
Much the same as dimensions but I got stuck going from load to fact - I eventually figured out that stage tables are not just a nice to have, they are pretty much essential as there are wizard steps that won't appear when constructing facts directly from load tables.
Tuesday, September 18, 2007
Wherescape Red and projects
At first I couldn't figure out how to drag-drop or multi-drag objects into a newly created project. Surely I wouldn't have to do them one at a time?? Turns out the trick is to select the container (for instance, Load Table) then make my selections from the middle pane, right-click and Add to Project.
Thursday, August 23, 2007
Wherescape Red and creating dimensions
My data source is a very denormalised reporting-style source system extract. I need to pluck out the dimensions and facts that are embedded in this source data, and I was hoping Red's much touted code generation would help me through this.
Dimension #1
- Dragged a load table into the dimension middle panel and specified the name
- Selected Normal dimension type
- Clicked OK to accept definition
- Clicked Finish to avoid creating the dimension based on the load table
- Deleted lots of irrelevant columns
- Right-clicked the dimension (in the left panel) and selected Create (recreate)
- Right-clicked the dimension and selected Properties
- Specified Update Procedure as ***Create New Procedure ***
- Selected the business key column
- Right-clicked the dimension and selected Update
Phew, I'm done! It didn't take long but I had a lot of steps to remember. It strikes me that a suitably constructed wizard could have completed a cycle of creating multiple dimensions and facts from one load table.
Dimension #2
More of the same, except that the Update failed with error:
-2 dim_2 update FAILED with error 2601 Step 100. Error Msg: Cannot insert duplicate key row in object 'dbo.dim_2' with unique index 'dim_2_idx_A'.
The dim_2 table was partially populated, so it appears a specific cursor row upset it partway through the data load. My first guess was there were multiple cursor rows for the same business key. However queries on the load table didn't find any.
Turns out that the generated update procedure doesn't handle nulls. To fix this I had to:
- right-click the dimension
- select properties
- click Rebuild for the update procedure
- tick Allow where clause editing and click OK to leave the business key as is
- type where business_key_column is not null and click OK
I guess there may be a tick box somewhere to make the procedure ignore nulls but I couldn't find it.
Now on to dimension hierarchies! I right-clicked the dimension and selected Add Hierarchy and it only offered me the option to reorganise the existing fields. I had to create separate dimension columns for calculated hierarchies by right-clicking the dimension and selecting Add Column.
Comments:
- The fields are mostly self explanatory but not very typo-proof, being free text fields.
- At various points it is necessary to update the dialog box which is done by clicking the Update button. Annoyingly this closes the dialog box and I had to reopen the column properties to be able to continue defining the column.
- The transformation tab could potentially hold some serious sql but is fixed size with limited IDE functionality.
Dimension #1
- Dragged a load table into the dimension middle panel and specified the name
- Selected Normal dimension type
- Clicked OK to accept definition
- Clicked Finish to avoid creating the dimension based on the load table
- Deleted lots of irrelevant columns
- Right-clicked the dimension (in the left panel) and selected Create (recreate)
- Right-clicked the dimension and selected Properties
- Specified Update Procedure as ***Create New Procedure ***
- Selected the business key column
- Right-clicked the dimension and selected Update
Phew, I'm done! It didn't take long but I had a lot of steps to remember. It strikes me that a suitably constructed wizard could have completed a cycle of creating multiple dimensions and facts from one load table.
Dimension #2
More of the same, except that the Update failed with error:
-2 dim_2 update FAILED with error 2601 Step 100. Error Msg: Cannot insert duplicate key row in object 'dbo.dim_2' with unique index 'dim_2_idx_A'.
The dim_2 table was partially populated, so it appears a specific cursor row upset it partway through the data load. My first guess was there were multiple cursor rows for the same business key. However queries on the load table didn't find any.
Turns out that the generated update procedure doesn't handle nulls. To fix this I had to:
- right-click the dimension
- select properties
- click Rebuild for the update procedure
- tick Allow where clause editing and click OK to leave the business key as is
- type where business_key_column is not null and click OK
I guess there may be a tick box somewhere to make the procedure ignore nulls but I couldn't find it.
Now on to dimension hierarchies! I right-clicked the dimension and selected Add Hierarchy and it only offered me the option to reorganise the existing fields. I had to create separate dimension columns for calculated hierarchies by right-clicking the dimension and selecting Add Column.
Comments:
- The fields are mostly self explanatory but not very typo-proof, being free text fields.
- At various points it is necessary to update the dialog box which is done by clicking the Update button. Annoyingly this closes the dialog box and I had to reopen the column properties to be able to continue defining the column.
- The transformation tab could potentially hold some serious sql but is fixed size with limited IDE functionality.
Subscribe to:
Posts (Atom)