An SQL alternative to the SCD

In SQL 2008 a new T-SQL construct was added – the MERGE operation. (Ok, pedants will know this wasn’t new to Oracle,  but it was new to SQL Server).

This operation allows for the merging of a dataset into a reference dataset – which can be remarkably similar to Insert / Update operations effected by the Slowly Changing Dimension transformation. However the way it operates is very different. Instead of the SCD’s row by row evaluation approach, the MERGE operation is a set based operation. What this means is it compares the whole of the source dataset to the reference dataset in a single pass. This has significant implications for performance – on a site where I implemented this the operation which took 1,200 seconds in the SCD cut down to 51 seconds using a Merge.

There are limitations and differences to be aware of:

  • You cannot directly return row counts for Insert / Update / Ignore operations in the Merge
  • As it is a bulk operation a single row will cause failure of the whole batch
  • There’s no GUI – just hand crafted SQL
  • Less error trapping / logging options
  • More flexibility in terms of actions when matches / non matches are found

The main reason why you would consider the SQL Merge – it handles Type 1, and with a little cunning, Type 2 dimensions – in a fraction of the time it takes the SCD to plod through. It’s still not as fast as a proper in memory comparison using something such as TableDifference – but it’s always good to know you have something else available in your toolkit.

Further information:

6 thoughts on “An SQL alternative to the SCD

  1. The only problem I have with the MERGE statement is that both the dimension key lookup from the dimension table and the source data have to be in the same location, on the same server- unless you set linked servers, which introduces performance issues itself. Also, it is typically the case that ETL developers cannot create linked servers on a production warehouse- or even get the DBAs to do it for them.

  2. Esta es una implementación para SCD tipo 2

    — Dimensión Cliente
    IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[Comisiones].[FNX_Cliente]’) AND type in (N’U’))
    BEGIN
    CREATE TABLE [Comisiones].[FNX_Cliente](
    [IdSKCliente] [int] IDENTITY (1, 1) NOT NULL,
    [IdCliente] [varchar](12) NOT NULL, — IdCliente = Cedula
    [Nombre] [varchar](80) NULL,
    [IdCategoria] [varchar](4) NULL,
    [Estrato] [varchar](3) NULL,
    [TelefonoAvisar] [varchar] (10) NULL,
    [FInicio] [date] NOT NULL CONSTRAINT [DF_FNX_Cliente_FInicio] DEFAULT (GetDate()),
    [FFin] [date] NULL,
    [Version] [smallint] NOT NULL,
    [EsVersionActual] [bit] NOT NULL CONSTRAINT [DF_FNX_Cliente_EsVersionActual] DEFAULT (1),
    ) ON [fgDimensiones_dat];
    End

    IF Not EXISTS (SELECT name FROM sys.indexes WHERE name = N’PK_FNX_Cliente’)
    Begin
    ALTER TABLE [Comisiones].[FNX_Cliente]
    ADD CONSTRAINT [PK_FNX_Cliente] PRIMARY KEY CLUSTERED ( [IdSKCliente] ASC )
    ON [fgDimensiones_dat];
    End
    GO

    IF Not EXISTS (SELECT name FROM sys.indexes WHERE name = N’IX_FNX_Cliente_1′)
    Begin
    CREATE NONCLUSTERED INDEX [IX_FNX_Cliente_1] ON [Comisiones].[FNX_Cliente]
    ( [IdCliente] ASC ) ON [fgDimensiones_ind]
    End
    GO

    IF NOT EXISTS (SELECT * FROM sys.check_constraints WHERE object_id = OBJECT_ID(N'[Comisiones].[CK_FNX_Cliente_FFinFInicio]’) AND parent_object_id = OBJECT_ID(N'[Comisiones].[FNX_Cliente]’))
    ALTER TABLE [Comisiones].[FNX_Cliente] WITH CHECK ADD CONSTRAINT [CK_FNX_Cliente_FFinFInicio] CHECK ([FFin] >= [FInicio])
    GO
    IF EXISTS (SELECT * FROM sys.check_constraints WHERE object_id = OBJECT_ID(N'[Comisiones].[CK_FNX_Cliente_FFinFInicio]’) AND parent_object_id = OBJECT_ID(N'[Comisiones].[FNX_Cliente]’))
    ALTER TABLE [Comisiones].[FNX_Cliente] CHECK CONSTRAINT [CK_FNX_Cliente_FFinFInicio]
    GO

    DROP TABLE [Comisiones].[tmpFNX_Cliente]
    TRUNCATE TABLE [Comisiones].[FNX_Cliente]
    IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[Comisiones].[tmpFNX_Cliente]’) AND type in (N’U’))
    BEGIN
    CREATE TABLE [Comisiones].[tmpFNX_Cliente](
    [IdSKCliente] [int] IDENTITY (1, 1) NOT NULL,
    [IdCliente] [varchar](12) NOT NULL, — IdCliente = Cedula
    [ClienteNombre] [varchar](80) NULL,
    [IdClienteCategoria] [varchar](4) NULL,
    [ClienteEstrato] [varchar](3) NULL,
    [TelefonoAvisar] [varchar] (10) NULL,

    ) ON [fgDimensiones_dat];
    End

    IF Not EXISTS (SELECT name FROM sys.indexes WHERE name = N’PK_tmpFNX_Cliente’)
    Begin
    ALTER TABLE [Comisiones].[tmpFNX_Cliente]
    ADD CONSTRAINT [PK_tmpFNX_Cliente] PRIMARY KEY CLUSTERED ( [IdSKCliente] ASC )
    ON [fgDimensiones_dat];
    End
    GO

    INSERT INTO [Comisiones].[tmpFNX_Cliente]([IdCliente],[ClienteNombre],[IdClienteCategoria],[ClienteEstrato],[TelefonoAvisar])
    VALUES(1, ‘C1′, ‘1’, ‘1’, ‘000’);
    INSERT INTO [Comisiones].[tmpFNX_Cliente]([IdCliente],[ClienteNombre],[IdClienteCategoria],[ClienteEstrato],[TelefonoAvisar])
    VALUES(2, ‘C2′, ‘2’, ‘2’, ‘000’);
    INSERT INTO [Comisiones].[tmpFNX_Cliente]([IdCliente],[ClienteNombre],[IdClienteCategoria],[ClienteEstrato],[TelefonoAvisar])
    VALUES(3, ‘C3′, ‘3’, ‘3’, ‘000’);

    EXECUTE [Comisiones].[spETL_ActualizarFNX_Cliente] 0
    Select * From [Comisiones].[tmpFNX_Cliente]
    Select * From [Comisiones].[FNX_Cliente]
    /*
    –INSERT INTO [Comisiones].[tmpFNX_Cliente]([IdCliente],[ClienteNombre],[IdClienteCategoria],[ClienteEstrato],[TelefonoAvisar])
    –VALUES(1, ‘C1_2′, ‘1_2′, ‘1_2′, ‘000’);

    –2
    UPDATE [Comisiones].[tmpFNX_Cliente] SET
    [ClienteNombre] = ‘C1_2′
    ,[IdClienteCategoria] = ‘C1_2′
    ,[ClienteEstrato] = ‘1_2′
    WHERE [IdCliente] = 1;
    EXECUTE [Comisiones].[spETL_ActualizarFNX_Cliente] 0
    Select * From [Comisiones].[FNX_Cliente]

    –3
    UPDATE [Comisiones].[tmpFNX_Cliente] SET
    [ClienteNombre] = ‘C1_3′
    ,[IdClienteCategoria] = ‘C1_3′
    ,[ClienteEstrato] = ‘1_3′
    WHERE [IdCliente] = 1;
    EXECUTE [Comisiones].[spETL_ActualizarFNX_Cliente] 0
    Select * From [Comisiones].[FNX_Cliente]

    –4
    INSERT INTO [Comisiones].[tmpFNX_Cliente]([IdCliente],[ClienteNombre],[IdClienteCategoria],[ClienteEstrato],[TelefonoAvisar])
    VALUES(4, ‘C4′, ‘4’, ‘4’, ‘000’);
    UPDATE [Comisiones].[tmpFNX_Cliente] SET
    [ClienteNombre] = ‘C1_4′
    ,[IdClienteCategoria] = ‘C1_4′
    ,[ClienteEstrato] = ‘1_4′
    WHERE [IdCliente] = 1;
    EXECUTE [Comisiones].[spETL_ActualizarFNX_Cliente] 0
    Select * From [Comisiones].[FNX_Cliente]

    –5
    UPDATE [Comisiones].[FNX_Cliente] SET
    [FFin] = GetDate()
    ,[EsVersionActual] = 0
    WHERE [IdCliente] = 1;
    UPDATE [Comisiones].[FNX_Cliente] SET
    [FFin] = Null
    ,[EsVersionActual] = 1
    WHERE [IdSKCliente] = 4;
    Select * From [Comisiones].[FNX_Cliente] order by [IdCliente];
    EXECUTE [Comisiones].[spETL_ActualizarFNX_Cliente] 0
    Select * From [Comisiones].[FNX_Cliente] order by [IdCliente];

    –6
    UPDATE [Comisiones].[tmpFNX_Cliente] SET
    [ClienteNombre] = ‘C1_5′
    ,[IdClienteCategoria] = ‘C1_5′
    ,[ClienteEstrato] = ‘1_5′
    WHERE [IdCliente] = 1;
    EXECUTE [Comisiones].[spETL_ActualizarFNX_Cliente] 0
    Select * From [Comisiones].[FNX_Cliente] order by [IdCliente];
    */

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[Comisiones].[spETL_ActualizarFNX_Cliente]’) AND type in (N’P’, N’PC’))
    DROP PROCEDURE [Comisiones].[spETL_ActualizarFNX_Cliente]
    GO
    CREATE procedure [Comisiones].[spETL_ActualizarFNX_Cliente]
    @idLog int
    as
    /* ============================================================================================
    Propósito: Actualiza los datos de la dimensión Clientes con los datos provenientes del proceso ETL.
    Creado por: Hugo González Olaya
    Fecha creación: 2010-10-15
    Actualizado: 2010-10-15

    Parametros:
    @idLog: Código de ejecución

    Ejemplo:
    EXECUTE [Comisiones].[spETL_ActualizarFNX_Cliente] 0
    Select * From [Comisiones].[FNX_Cliente]
    ============================================================================================ */
    begin
    set nocount on

    — Adicionar filas duplicadas a tabla de inconsistencia
    ;WITH DuplicadosCTE([IdCliente], NumeroFila)
    AS
    (
    SELECT [IdCliente], ROW_NUMBER() OVER (PARTITION BY [IdCliente] ORDER BY [IdCliente])
    From [Comisiones].[tmpFNX_Cliente]
    )
    INSERT INTO [Comisiones].[ETLInconsistencia]([IdProceso], [Tabla], [Estado], [CampoIncon], [ValorIncon], [Campo1], [Valor1], [Descripcion])
    Select @idLog, ‘FNX_Cliente’, ‘A’, ‘IdCliente’, [IdCliente], ‘IdCliente’, [IdCliente], ‘Fila duplicada borrada’
    FROM DuplicadosCTE
    WHERE NumeroFila > 1

    — Borar filas duplicadas
    ;WITH DuplicadosCTE([IdCliente], NumeroFila)
    AS
    (
    SELECT [IdCliente], ROW_NUMBER() OVER (PARTITION BY [IdCliente] ORDER BY [IdCliente])
    From [Comisiones].[tmpFNX_Cliente]
    )
    DELETE
    FROM DuplicadosCTE
    WHERE NumeroFila > 1

    /*
    Tabla para insertar SCD Tipo 2
    Este es un cambio al MERGE para evitar filas repetidas con versión superior y
    También por que MERGE actualza bien el número de versión y cierre de vigencia cuando la versión actual es
    la última, de lo contrario, adiciona filas cuando es una versión intermedia
    */

    DECLARE @SCD_Clientes TABLE(
    [Accion] [varchar](20) NULL,
    [IdSKCliente] [int] NULL,
    [IdCliente] [varchar](12) NOT NULL,
    [ClienteNombre] [varchar](80) NULL,
    [IdClienteCategoria] [varchar](4) NULL,
    [ClienteEstrato] [varchar] (10) NULL,
    [TelefonoAvisar] [varchar] (10) NULL,
    [FFin] [date] NULL,
    [Version] [smallint] NULL,
    [VersionAnterior] [smallint] NULL,
    [EsVersionActual] [bit] NULL,
    [Insertado] [bit] NULL
    );

    — Insertar cambio SCD tipo 2 en tabla de dimensiones
    –INSERT INTO [Comisiones].[FNX_Cliente] ( [IdCliente], [Nombre], [IdCategoria], [Estrato], [TelefonoAvisar],
    — [FFin], [Version], [EsVersionActual])
    –SELECT M.[IdCliente], M.[ClienteNombre], M.[IdClienteCategoria], M.[ClienteEstrato], M.[TelefonoAvisar],
    — NULL, M.[VersionNueva], 1

    — Insertar cambio SCD tipo 2 en tabla temporal
    INSERT INTO @SCD_Clientes ([Accion], [IdSKCliente], [IdCliente], [ClienteNombre], [IdClienteCategoria],
    [ClienteEstrato], [TelefonoAvisar], [FFin], [Version], [VersionAnterior], [EsVersionActual])
    SELECT M.[Accion], M.[IdSKCliente], M.[IdCliente], M.[ClienteNombre], M.[IdClienteCategoria],
    M.[ClienteEstrato], M.[TelefonoAvisar], M.[FFin], M.[VersionNueva], M.[VersionAnterior], 1
    FROM
    ( MERGE [Comisiones].[FNX_Cliente] As C
    USING (SELECT DISTINCT T.[IdCliente], T.[ClienteNombre], T.[IdClienteCategoria], T.[ClienteEstrato], T.[TelefonoAvisar]
    FROM [Comisiones].[tmpFNX_Cliente] As T — [Comisiones].[tmpFNX_PedidoDetalle_PedidosInstalados]
    ) AS Origen
    ON (C.[IdCliente] = Origen.[IdCliente])

    WHEN NOT MATCHED BY TARGET THEN
    — Insertar nuevos valores
    INSERT ([IdCliente], [Nombre], [IdCategoria], [Estrato], [TelefonoAvisar], [FInicio], [FFin], [Version], [EsVersionActual])
    VALUES ([IdCliente], [ClienteNombre], [IdClienteCategoria], [ClienteEstrato], [TelefonoAvisar], GETDATE(), Null, 1, 1)

    WHEN MATCHED AND C.[EsVersionActual] = 1 AND
    (Origen.[ClienteNombre] C.[Nombre] OR Origen.[IdClienteCategoria] C.[IdCategoria] OR
    Origen.[ClienteEstrato] C.[Estrato] OR Origen.[TelefonoAvisar] C.[TelefonoAvisar]) THEN
    — Cerrar vigencia por adición de nueva versión
    UPDATE SET [FFin] = GETDATE(), [EsVersionActual] = 0

    OUTPUT $Action As [Accion], Inserted.[IdSKCliente], Origen.[IdCliente], Origen.[ClienteNombre],
    Origen.[IdClienteCategoria], Origen.[ClienteEstrato],
    Origen.[TelefonoAvisar], Deleted.[FFin], Deleted.[Version] + 1 As [VersionNueva], Deleted.[Version] As [VersionAnterior]
    ) AS M;
    –WHERE M.[Accion] = ‘UPDATE'; — Usar filtro cuando inserta en tabla de dimensiones

    — Actualizar bandera
    UPDATE M SET
    [Insertado] = 1
    FROM @SCD_Clientes M LEFT JOIN [Comisiones].[FNX_Cliente] C ON M.[IdCliente] = C.[IdCliente] AND M.[ClienteNombre] = C.[Nombre] AND
    M.[IdClienteCategoria] = C.[IdCategoria] AND M.[ClienteEstrato] = C.[Estrato] AND M.[TelefonoAvisar] = C.[TelefonoAvisar]
    WHERE C.[IdCliente] IS NULL AND M.Accion = ‘UPDATE’ –‘INSERT';

    — Insertar cambio SCD tipo 2 en tabla de dimension
    INSERT INTO [Comisiones].[FNX_Cliente] ([IdCliente], [Nombre], [IdCategoria], [Estrato], [TelefonoAvisar],
    [FInicio], [FFin], [Version], [EsVersionActual])
    SELECT M.[IdCliente], M.[ClienteNombre], M.[IdClienteCategoria], M.[ClienteEstrato], M.[TelefonoAvisar],
    GETDATE(), Null, (SELECT MAX(CMax.[Version]) + 1 FROM [Comisiones].[FNX_Cliente] CMax
    WHERE CMax.[IdCliente] = M.[IdCliente]), 1
    FROM @SCD_Clientes M LEFT JOIN [Comisiones].[FNX_Cliente] C ON M.[IdCliente] = C.[IdCliente] AND M.[ClienteNombre] = C.[Nombre] AND
    M.[IdClienteCategoria] = C.[IdCategoria] AND M.[ClienteEstrato] = C.[Estrato] AND M.[TelefonoAvisar] = C.[TelefonoAvisar]
    WHERE C.[IdCliente] IS NULL AND M.Accion = ‘UPDATE’ –‘INSERT’
    ORDER BY M.[IdCliente];

    — Ajustar vigencia modificada en el MERGE
    UPDATE C SET
    [FFin] = M.[FFin]
    ,[EsVersionActual] = 1
    FROM @SCD_Clientes M INNER JOIN [Comisiones].[FNX_Cliente] C ON M.[IdSKCliente] = C.[IdSKCliente]
    WHERE M.[VersionAnterior] = C.[Version] AND M.[Insertado] 1;

    UPDATE C SET
    [FFin] = M.[FFin]
    ,[EsVersionActual] = 1
    FROM @SCD_Clientes M INNER JOIN [Comisiones].[FNX_Cliente] C ON M.[IdSKCliente] = C.[IdSKCliente]
    WHERE M.[VersionAnterior] = C.[Version] AND M.[Insertado] IS NULL
    AND (M.[ClienteNombre] = C.[Nombre] OR M.[IdClienteCategoria] C.[IdCategoria] OR
    M.[ClienteEstrato] C.[Estrato] OR M.[TelefonoAvisar] C.[TelefonoAvisar]);

    set nocount off
    End
    GO
    .

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>