4 min læsning

Change data capture in SSIS SQL Server 2012

Featured Image

SQL Server 2012 gives us some new SSIS components that make reading from the CDC tables created by SQL Server a lot easier.

Before this release, we had to do all the housekeeping our self but now almost everything is done out of the box.

Learn more about SQL here

There are three new components that you need to know. The first one is the CDC Source with reads

from the CDC tables and returns data like any other data source component. The second one is the CDC Splitter it's basically a conditional split hardcoded to split on the CDC type (__$operation). It will return three data flows containing rows for insert, delete or update. The last component is the CDC Control Task it’s the housekeeper. The CDC Control Task keeps track of which rows to process.

Working with incremental loads always requires two SSIS packages. One handling the first initial load and one handling the future incremental loads.

Hands-on:

First, create an SSIS package for the initial load. Insert a CDC Control Task, a Data Flow Task, and again a CDC Control Task.

ssis

Open the first CDC Control Task and set it up like this.

  1. Select or create a connection to the database containing the CDC tables.
  2. Select “Mark initial load start” in the CDC control operation.
  3. Create a new variable by pressing the “New”-button. It will be called CDC_State unless you change it.
  4. Select or create a connection to the database where you what the housekeeping table to be stored. This table is holding a string for tracking the rows to process.
  5. Press the “New”-button to create the cdc_state table.
  6. State name holds the key value for this CDC flow. If have more than one CDC table, include ex. the name of the table.

The result should be something like this.

init-begin

Now open the last CDC Control Task and set it up like the first one except from the CDC control operation. Now select “Mark initial load end”.

The data flow for the testing purposes could be like this.

ssis-2

Read everything from the CDC source table and write it to a new table in the stage or data warehouse database.

The second SSIS package begins like the first one. Insert a CDC Control Task, a Data Flow Task, and again a CDC Control Task.

Configure the first CDC Control Task like the other ones except for the CDC control operation. Now select “Get processing range”.

Configure the last CDC Control Task like the other ones except for the CDC control operation. Now select “Mark processing range”.

Insert a CDC Source, a CDC Splitter and three Row Counts in the Data Flow

Setup the CDC Source like this.

  1. The connection in the CDC Source is the connection to the database from where you want to read.
  2. CDC enabled table is the table containing the data you want to read.
  3. Select “Net” in the CDC processing mode. It will only give us the newest row if it has been updated more than one time.
  4. Select the same variable as in the CDC control task.

Connect the CDC Source and the CDC Splitter. One each of the three outputs from the CDC splitter connects one of the row counts.

The result should be like this:

DF

To see it working run the packages like this.

  1. Run the Init package. All the data from the source table should be copied.
  2. Run the Incr. package. No rows should be copied.
  3. Run an update, a delete, and an insert on the source table.
  4. Run the Incr. package again. Now there should be one row in each row count.

Any questions?
Please reach out to info@inspari.dk or +45 70 24 56 55 if you have any questions. We are looking forward to hearing from you.