This is the 12th day of my participation in the November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021

Translation reference

This article mainly refers to Deleting Rows in Incremental Loads — Level 5 of the Stairway to, translated from the Stairway to Integration Services series Integration Services aims to have a comprehensive and clear understanding of SSIS. All contents are carried out on the basis of the original text. Due to various reasons such as version differences and personal doubts, the principle of complete translation is not adopted. Hopefully eventually it will be more conducive to learning and understanding SSIS,

Thanks for your support!

Delete rows for incremental loading

Deleting Rows in Incremental Loads Deletes Rows in Incremental Loads.

In the previous article, we learned how to transport updates from source to target, and we also introduced the use of collection-based updates to optimize the implementation.

In this article we continue to build incremental loading capabilities in SSIS, focusing on removing rows that have been removed from the source when loaded to the target.

In order to accomplish the removal of rows in an incremental load, many Update operations were basically modified in subsequent data flow tasks.

Simulate deleting rows from the source table

Start setting test conditions. Deleting a row from the source table indicates that the row is in the target table but not in the source table. That is, the row previously inserted into the target has been removed from the source.

Then execute the following T-SQL in SSMS, adding four statements

Use AdventureWorks2012
go
Insert Into dbo.FullName
 (BusinessEntityID,FirstName, MiddleName, LastName)
Values
 (20778.'Andy'.'Ray'.'Leonard'),
 (20779.'Candy'.'Ray'.'Leonard'),
 (20780.'Beany'.'Ray'.'Leonard'),
 (20781.'Andy'.'Tony'.'Leonard');
Copy the code

When the incremental load delete logic is performed it should check for missing records in the source and remove the rows from the target.

At this point, you can run the current SSIS package to check how the new records in the target table affect the existing logic for handling incremental loading.

Delete the incremental load implementation of rows

Before adding components to the SSIS package, first rename the “Data Flow Task” to “Insert and Update Rows”.

Add and delete data flow tasks

Drag another Data Flow task to the control Flow and add a “Precedence Constraint” to the new Data Flow task from the “Apply Staged Updates” execution SQL task. Rename the data flow task to “Delete Rows”.

Add the OLEDB source

Open the data flow task for data flow “Delete Rows”, add an OLE DB source and open the editor to configure the following properties:

  • OLE DB Connection manager: [server-name].AdventureWorks2012.sa
  • Data access mode: table or view (Table or view)
  • Table or view name:dbo.FullName

Add a lookup transform

The target table FullName is used as the source in the current data flow task, and the next step is to remove the missing rows from that source using Lookup Transformation. This process is the same as the previous data flow task.

Drag and drop a search conversion to the “Delete Rows” data stream task. Connect the data flow path from the OLE DB source adapter to Find.

Open the Find Transformation editor, on the General page, Modify the “Specify how to handle rows with no matching entries” drop-down list to “Redirect rows to no match Output “), the cache mode can be set according to the actual situation.

On the Connections page, make sure the OLE DB Connection Manager drop-down list is set to “[server-name].AdventureWorks2012.sa”. Select the “Use Results of an SQL Query” option and enter the following T-SQL statement in the textbox

Select BusinessEntityID As BID
 From Person.Person
Copy the code

To detect rows that have been removed from the source table but exist in the target table, simply specify the “BusinessEntityID” field. On the Columns page, drag the “BusinessEntityID” column from the available input column cell to the available lookup column cell:

Because we configured the lookup to redirect mismatched rows to the no match output (in “regular pages”), so that if there is no match between the source (dbo.fullname) and the lookup (Person.person), the row will be sent to the “no match output”. Click ok to close the editor.

Add OLE DB target

How to implement the actual delete operation? As in the previous exercise, in the original update logic, we did this by adding and configuring an OLE DB command transform. You can do the same here. But you should already know the final result. So let’s skip the middle step (try to do it yourself). Add an OLE DB target to the “Delete Rows” data flow task. Then wire the unmatched output from the lookup transformation.

Rename the target to “StageDeletes” and double-click to open the editor. Select “OLE DB Connection Manager “. Click the New button next to the “Name of the table or View” drop-down list to change the DDL statement displayed as follows:

CREATE TABLE [StageDeletes] (
    [BusinessEntityID] int PRIMARY KEY,
    [FirstName] nvarchar(50) NOT NULL,
    [MiddleName] nvarchar(50),
    [LastName] nvarchar(50) NOT NULL
)
Copy the code

Click OK to create the StageDeletes table. Then click on the “Mapping” page to complete the (automatic mapping,auto-mapped) configuration of the OLE DB target

Click the OK button to close the OLE DB target editor.

Collection based deletion

As with the previous update, we need to apply “collection-based deletion”, implemented using similar collection-based and related statements.

In the control flow, add a Execute SQL Task and connect the green “priority constraint” from the “Delete Rows” data flow Task.

Open Execute SQL Task Editor and modify the following properties:

  • Name: Apply Staged Deletes
  • Connection: [Server-Name].AdventureWorks2012.sa
  • SQLStatement:
Delete src
 From dbo.FullName src
   Join StageDeletes stage
     On stage.BusinessEntityID = src.BusinessEntityID
Copy the code

Click OK to close the Execute SQL Task Editor.

Unit tests (unit-test)

Before continuing, let’s unit test this configuration.

How do I unit test SSIS tasks? · Right click on “Apply Staged tasks “and click on” Execute Task”

It will work if we configure it correctly.

Management StageDeletes table

To stop SSIS debugging, like collection-based update logic, we need to manage “StageDeletes” tables.

As before, we truncate it before each load, load it into the data stream task, and keep records in the table in the middle of the load in case we need to look at them.

You do not need to create a separate Execute SQL Task for this. We already have a Truncate SQL execution task — “Truncate StageUpdates”. Double-click on it to open the editor and add the following T-SQL statement to the existing SQL statement:

Truncate Table StageDeletes;
Copy the code

The window for entering SQL queries is closed. Then change the Name property to “Truncate StageUpdates and StageDeletes” and click “OK” to close the editor.

Perform incremental updates to delete data

Press “F5″(or the “Start debugging” button) to execute the SSIS package. Check the control flow and “Delete Rows” data flow tasks executed

The additional four rows have been removed from the target table (dbo.fullname). This can be verified in SSMS with the following SQL statement:

 select * from  dbo.FullName f
 left join Person.Person  p
 on p.BusinessEntityID=f.BusinessEntityID
 where p.BusinessEntityID IS NULL;
Copy the code

[Here you can also use the where condition to determine whether it has been deleted.]

The number of results returned is 0.

conclusion

We have built the first SSIS Design Pattern — incremental loading. At the same time, you learned some ETL test primer examples, found transformations, and performed SQL tasks.

We designed and built a re-executable SSIS package, combined with “incremental loading”. You can perform this once a month to load the records changed in the last month; It can also be done conveniently every five minutes to capture data changes during that time.

Very flexible!