An SQL alternative to the SCD
In SQL 2008 a new T-SQL construct was added - the MERGE operation. (Ok, pedants will know this wasn’t new to Oracle, but it was new to SQL Server).
This operation allows for the merging of a dataset into a reference dataset – which can be remarkably similar to Insert / Update operations effected by the Slowly Changing Dimension transformation. However the way it operates is very different. Instead of the SCD’s row by row evaluation approach, the MERGE operation is a set based operation. What this means is it compares the whole of the source dataset to the reference dataset in a single pass. This has significant implications for performance – on a site where I implemented this the operation which took 1,200 seconds in the SCD cut down to 51 seconds using a Merge.
There are limitations and differences to be aware of:
- You cannot directly return row counts for Insert / Update / Ignore operations in the Merge
- As it is a bulk operation a single row will cause failure of the whole batch
- There’s no GUI – just hand crafted SQL
- Less error trapping / logging options
- More flexibility in terms of actions when matches / non matches are found
The main reason why you would consider the SQL Merge – it handles Type 1, and with a little cunning, Type 2 dimensions – in a fraction of the time it takes the SCD to plod through. It’s still not as fast as a proper in memory comparison using something such as TableDifference – but it’s always good to know you have something else available in your toolkit.
Further information:
- Using the SQL MERGE Statement for Slowly Changing Dimension Processing – from the Kimball Group
- How to create type 1 & 2 SCD’s using standard SSIS components (other than the SCD) (at the bottom of the post) – Benny Austin
Kimball Methods SCD component is one the best option available as of day for SCDs.
http://kimballscd.codeplex.com/
–Siddharth
Agreed. However some IT departments won’t allow unsupported 3rd party components – so it’s a handy alternative.
The only problem I have with the MERGE statement is that both the dimension key lookup from the dimension table and the source data have to be in the same location, on the same server- unless you set linked servers, which introduces performance issues itself. Also, it is typically the case that ETL developers cannot create linked servers on a production warehouse- or even get the DBAs to do it for them.
Esta es una implementación para SCD tipo 2
– Dimensión Cliente
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N’[Comisiones].[FNX_Cliente]‘) AND type in (N’U'))
BEGIN
CREATE TABLE [Comisiones].[FNX_Cliente](
[IdSKCliente] [int] IDENTITY (1, 1) NOT NULL,
[IdCliente] [varchar](12) NOT NULL, — IdCliente = Cedula
[Nombre] [varchar](80) NULL,
[IdCategoria] [varchar](4) NULL,
[Estrato] [varchar](3) NULL,
[TelefonoAvisar] [varchar] (10) NULL,
[FInicio] [date] NOT NULL CONSTRAINT [DF_FNX_Cliente_FInicio] DEFAULT (GetDate()),
[FFin] [date] NULL,
[Version] [smallint] NOT NULL,
[EsVersionActual] [bit] NOT NULL CONSTRAINT [DF_FNX_Cliente_EsVersionActual] DEFAULT (1),
) ON [fgDimensiones_dat];
End
IF Not EXISTS (SELECT name FROM sys.indexes WHERE name = N’PK_FNX_Cliente’)
Begin
ALTER TABLE [Comisiones].[FNX_Cliente]
ADD CONSTRAINT [PK_FNX_Cliente] PRIMARY KEY CLUSTERED ( [IdSKCliente] ASC )
ON [fgDimensiones_dat];
End
GO
IF Not EXISTS (SELECT name FROM sys.indexes WHERE name = N’IX_FNX_Cliente_1′)
Begin
CREATE NONCLUSTERED INDEX [IX_FNX_Cliente_1] ON [Comisiones].[FNX_Cliente]
( [IdCliente] ASC ) ON [fgDimensiones_ind]
End
GO
IF NOT EXISTS (SELECT * FROM sys.check_constraints WHERE object_id = OBJECT_ID(N’[Comisiones].[CK_FNX_Cliente_FFinFInicio]‘) AND parent_object_id = OBJECT_ID(N’[Comisiones].[FNX_Cliente]‘))
ALTER TABLE [Comisiones].[FNX_Cliente] WITH CHECK ADD CONSTRAINT [CK_FNX_Cliente_FFinFInicio] CHECK ([FFin] >= [FInicio])
GO
IF EXISTS (SELECT * FROM sys.check_constraints WHERE object_id = OBJECT_ID(N’[Comisiones].[CK_FNX_Cliente_FFinFInicio]‘) AND parent_object_id = OBJECT_ID(N’[Comisiones].[FNX_Cliente]‘))
ALTER TABLE [Comisiones].[FNX_Cliente] CHECK CONSTRAINT [CK_FNX_Cliente_FFinFInicio]
GO
DROP TABLE [Comisiones].[tmpFNX_Cliente]
TRUNCATE TABLE [Comisiones].[FNX_Cliente]
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N’[Comisiones].[tmpFNX_Cliente]‘) AND type in (N’U'))
BEGIN
CREATE TABLE [Comisiones].[tmpFNX_Cliente](
[IdSKCliente] [int] IDENTITY (1, 1) NOT NULL,
[IdCliente] [varchar](12) NOT NULL, — IdCliente = Cedula
[ClienteNombre] [varchar](80) NULL,
[IdClienteCategoria] [varchar](4) NULL,
[ClienteEstrato] [varchar](3) NULL,
[TelefonoAvisar] [varchar] (10) NULL,
) ON [fgDimensiones_dat];
End
IF Not EXISTS (SELECT name FROM sys.indexes WHERE name = N’PK_tmpFNX_Cliente’)
Begin
ALTER TABLE [Comisiones].[tmpFNX_Cliente]
ADD CONSTRAINT [PK_tmpFNX_Cliente] PRIMARY KEY CLUSTERED ( [IdSKCliente] ASC )
ON [fgDimensiones_dat];
End
GO
INSERT INTO [Comisiones].[tmpFNX_Cliente]([IdCliente],[ClienteNombre],[IdClienteCategoria],[ClienteEstrato],[TelefonoAvisar])
VALUES(1, ‘C1′, ’1′, ’1′, ’000′);
INSERT INTO [Comisiones].[tmpFNX_Cliente]([IdCliente],[ClienteNombre],[IdClienteCategoria],[ClienteEstrato],[TelefonoAvisar])
VALUES(2, ‘C2′, ’2′, ’2′, ’000′);
INSERT INTO [Comisiones].[tmpFNX_Cliente]([IdCliente],[ClienteNombre],[IdClienteCategoria],[ClienteEstrato],[TelefonoAvisar])
VALUES(3, ‘C3′, ’3′, ’3′, ’000′);
EXECUTE [Comisiones].[spETL_ActualizarFNX_Cliente] 0
Select * From [Comisiones].[tmpFNX_Cliente]
Select * From [Comisiones].[FNX_Cliente]
/*
–INSERT INTO [Comisiones].[tmpFNX_Cliente]([IdCliente],[ClienteNombre],[IdClienteCategoria],[ClienteEstrato],[TelefonoAvisar])
–VALUES(1, ‘C1_2′, ’1_2′, ’1_2′, ’000′);
–2
UPDATE [Comisiones].[tmpFNX_Cliente] SET
[ClienteNombre] = ‘C1_2′
,[IdClienteCategoria] = ‘C1_2′
,[ClienteEstrato] = ’1_2′
WHERE [IdCliente] = 1;
EXECUTE [Comisiones].[spETL_ActualizarFNX_Cliente] 0
Select * From [Comisiones].[FNX_Cliente]
–3
UPDATE [Comisiones].[tmpFNX_Cliente] SET
[ClienteNombre] = ‘C1_3′
,[IdClienteCategoria] = ‘C1_3′
,[ClienteEstrato] = ’1_3′
WHERE [IdCliente] = 1;
EXECUTE [Comisiones].[spETL_ActualizarFNX_Cliente] 0
Select * From [Comisiones].[FNX_Cliente]
–4
INSERT INTO [Comisiones].[tmpFNX_Cliente]([IdCliente],[ClienteNombre],[IdClienteCategoria],[ClienteEstrato],[TelefonoAvisar])
VALUES(4, ‘C4′, ’4′, ’4′, ’000′);
UPDATE [Comisiones].[tmpFNX_Cliente] SET
[ClienteNombre] = ‘C1_4′
,[IdClienteCategoria] = ‘C1_4′
,[ClienteEstrato] = ’1_4′
WHERE [IdCliente] = 1;
EXECUTE [Comisiones].[spETL_ActualizarFNX_Cliente] 0
Select * From [Comisiones].[FNX_Cliente]
–5
UPDATE [Comisiones].[FNX_Cliente] SET
[FFin] = GetDate()
,[EsVersionActual] = 0
WHERE [IdCliente] = 1;
UPDATE [Comisiones].[FNX_Cliente] SET
[FFin] = Null
,[EsVersionActual] = 1
WHERE [IdSKCliente] = 4;
Select * From [Comisiones].[FNX_Cliente] order by [IdCliente];
EXECUTE [Comisiones].[spETL_ActualizarFNX_Cliente] 0
Select * From [Comisiones].[FNX_Cliente] order by [IdCliente];
–6
UPDATE [Comisiones].[tmpFNX_Cliente] SET
[ClienteNombre] = ‘C1_5′
,[IdClienteCategoria] = ‘C1_5′
,[ClienteEstrato] = ’1_5′
WHERE [IdCliente] = 1;
EXECUTE [Comisiones].[spETL_ActualizarFNX_Cliente] 0
Select * From [Comisiones].[FNX_Cliente] order by [IdCliente];
*/
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N’[Comisiones].[spETL_ActualizarFNX_Cliente]‘) AND type in (N’P', N’PC’))
DROP PROCEDURE [Comisiones].[spETL_ActualizarFNX_Cliente]
GO
CREATE procedure [Comisiones].[spETL_ActualizarFNX_Cliente]
@idLog int
as
/* ============================================================================================
Propósito: Actualiza los datos de la dimensión Clientes con los datos provenientes del proceso ETL.
Creado por: Hugo González Olaya
Fecha creación: 2010-10-15
Actualizado: 2010-10-15
Parametros:
@idLog: Código de ejecución
Ejemplo:
EXECUTE [Comisiones].[spETL_ActualizarFNX_Cliente] 0
Select * From [Comisiones].[FNX_Cliente]
============================================================================================ */
begin
set nocount on
– Adicionar filas duplicadas a tabla de inconsistencia
;WITH DuplicadosCTE([IdCliente], NumeroFila)
AS
(
SELECT [IdCliente], ROW_NUMBER() OVER (PARTITION BY [IdCliente] ORDER BY [IdCliente])
From [Comisiones].[tmpFNX_Cliente]
)
INSERT INTO [Comisiones].[ETLInconsistencia]([IdProceso], [Tabla], [Estado], [CampoIncon], [ValorIncon], [Campo1], [Valor1], [Descripcion])
Select @idLog, ‘FNX_Cliente’, ‘A’, ‘IdCliente’, [IdCliente], ‘IdCliente’, [IdCliente], ‘Fila duplicada borrada’
FROM DuplicadosCTE
WHERE NumeroFila > 1
– Borar filas duplicadas
;WITH DuplicadosCTE([IdCliente], NumeroFila)
AS
(
SELECT [IdCliente], ROW_NUMBER() OVER (PARTITION BY [IdCliente] ORDER BY [IdCliente])
From [Comisiones].[tmpFNX_Cliente]
)
DELETE
FROM DuplicadosCTE
WHERE NumeroFila > 1
/*
Tabla para insertar SCD Tipo 2
Este es un cambio al MERGE para evitar filas repetidas con versión superior y
También por que MERGE actualza bien el número de versión y cierre de vigencia cuando la versión actual es
la última, de lo contrario, adiciona filas cuando es una versión intermedia
*/
DECLARE @SCD_Clientes TABLE(
[Accion] [varchar](20) NULL,
[IdSKCliente] [int] NULL,
[IdCliente] [varchar](12) NOT NULL,
[ClienteNombre] [varchar](80) NULL,
[IdClienteCategoria] [varchar](4) NULL,
[ClienteEstrato] [varchar] (10) NULL,
[TelefonoAvisar] [varchar] (10) NULL,
[FFin] [date] NULL,
[Version] [smallint] NULL,
[VersionAnterior] [smallint] NULL,
[EsVersionActual] [bit] NULL,
[Insertado] [bit] NULL
);
– Insertar cambio SCD tipo 2 en tabla de dimensiones
–INSERT INTO [Comisiones].[FNX_Cliente] ( [IdCliente], [Nombre], [IdCategoria], [Estrato], [TelefonoAvisar],
– [FFin], [Version], [EsVersionActual])
–SELECT M.[IdCliente], M.[ClienteNombre], M.[IdClienteCategoria], M.[ClienteEstrato], M.[TelefonoAvisar],
– NULL, M.[VersionNueva], 1
– Insertar cambio SCD tipo 2 en tabla temporal
INSERT INTO @SCD_Clientes ([Accion], [IdSKCliente], [IdCliente], [ClienteNombre], [IdClienteCategoria],
[ClienteEstrato], [TelefonoAvisar], [FFin], [Version], [VersionAnterior], [EsVersionActual])
SELECT M.[Accion], M.[IdSKCliente], M.[IdCliente], M.[ClienteNombre], M.[IdClienteCategoria],
M.[ClienteEstrato], M.[TelefonoAvisar], M.[FFin], M.[VersionNueva], M.[VersionAnterior], 1
FROM
( MERGE [Comisiones].[FNX_Cliente] As C
USING (SELECT DISTINCT T.[IdCliente], T.[ClienteNombre], T.[IdClienteCategoria], T.[ClienteEstrato], T.[TelefonoAvisar]
FROM [Comisiones].[tmpFNX_Cliente] As T — [Comisiones].[tmpFNX_PedidoDetalle_PedidosInstalados]
) AS Origen
ON (C.[IdCliente] = Origen.[IdCliente])
WHEN NOT MATCHED BY TARGET THEN
– Insertar nuevos valores
INSERT ([IdCliente], [Nombre], [IdCategoria], [Estrato], [TelefonoAvisar], [FInicio], [FFin], [Version], [EsVersionActual])
VALUES ([IdCliente], [ClienteNombre], [IdClienteCategoria], [ClienteEstrato], [TelefonoAvisar], GETDATE(), Null, 1, 1)
WHEN MATCHED AND C.[EsVersionActual] = 1 AND
(Origen.[ClienteNombre] C.[Nombre] OR Origen.[IdClienteCategoria] C.[IdCategoria] OR
Origen.[ClienteEstrato] C.[Estrato] OR Origen.[TelefonoAvisar] C.[TelefonoAvisar]) THEN
– Cerrar vigencia por adición de nueva versión
UPDATE SET [FFin] = GETDATE(), [EsVersionActual] = 0
OUTPUT $Action As [Accion], Inserted.[IdSKCliente], Origen.[IdCliente], Origen.[ClienteNombre],
Origen.[IdClienteCategoria], Origen.[ClienteEstrato],
Origen.[TelefonoAvisar], Deleted.[FFin], Deleted.[Version] + 1 As [VersionNueva], Deleted.[Version] As [VersionAnterior]
) AS M;
–WHERE M.[Accion] = ‘UPDATE’; — Usar filtro cuando inserta en tabla de dimensiones
– Actualizar bandera
UPDATE M SET
[Insertado] = 1
FROM @SCD_Clientes M LEFT JOIN [Comisiones].[FNX_Cliente] C ON M.[IdCliente] = C.[IdCliente] AND M.[ClienteNombre] = C.[Nombre] AND
M.[IdClienteCategoria] = C.[IdCategoria] AND M.[ClienteEstrato] = C.[Estrato] AND M.[TelefonoAvisar] = C.[TelefonoAvisar]
WHERE C.[IdCliente] IS NULL AND M.Accion = ‘UPDATE’ –’INSERT’;
– Insertar cambio SCD tipo 2 en tabla de dimension
INSERT INTO [Comisiones].[FNX_Cliente] ([IdCliente], [Nombre], [IdCategoria], [Estrato], [TelefonoAvisar],
[FInicio], [FFin], [Version], [EsVersionActual])
SELECT M.[IdCliente], M.[ClienteNombre], M.[IdClienteCategoria], M.[ClienteEstrato], M.[TelefonoAvisar],
GETDATE(), Null, (SELECT MAX(CMax.[Version]) + 1 FROM [Comisiones].[FNX_Cliente] CMax
WHERE CMax.[IdCliente] = M.[IdCliente]), 1
FROM @SCD_Clientes M LEFT JOIN [Comisiones].[FNX_Cliente] C ON M.[IdCliente] = C.[IdCliente] AND M.[ClienteNombre] = C.[Nombre] AND
M.[IdClienteCategoria] = C.[IdCategoria] AND M.[ClienteEstrato] = C.[Estrato] AND M.[TelefonoAvisar] = C.[TelefonoAvisar]
WHERE C.[IdCliente] IS NULL AND M.Accion = ‘UPDATE’ –’INSERT’
ORDER BY M.[IdCliente];
– Ajustar vigencia modificada en el MERGE
UPDATE C SET
[FFin] = M.[FFin]
,[EsVersionActual] = 1
FROM @SCD_Clientes M INNER JOIN [Comisiones].[FNX_Cliente] C ON M.[IdSKCliente] = C.[IdSKCliente]
WHERE M.[VersionAnterior] = C.[Version] AND M.[Insertado] 1;
UPDATE C SET
[FFin] = M.[FFin]
,[EsVersionActual] = 1
FROM @SCD_Clientes M INNER JOIN [Comisiones].[FNX_Cliente] C ON M.[IdSKCliente] = C.[IdSKCliente]
WHERE M.[VersionAnterior] = C.[Version] AND M.[Insertado] IS NULL
AND (M.[ClienteNombre] = C.[Nombre] OR M.[IdClienteCategoria] C.[IdCategoria] OR
M.[ClienteEstrato] C.[Estrato] OR M.[TelefonoAvisar] C.[TelefonoAvisar]);
set nocount off
End
GO
.