Click here to Skip to main content
15,891,184 members
Articles / Database Development / SQL Server
Tip/Trick

SSIS: Transform data using Regex

Rate me:
Please Sign up or sign in to vote.
5.00/5 (1 vote)
21 Jan 2013BSD3 min read 42.4K   401   5   1
Transform data by using regex and Script Component.

Introduction

In this tutorial I will show how to transform data using a Script Component within a Data Flow Task and regex. My source data contains customer name, and I need to separate it to first, middle, and last names.

Background

Regular Expressions are a powerful way for you to search for patterns in strings of text. In SSIS, we can use Regular Expressions to assist us in cleansing or transforming data.

This is my partial "Customer" reference table.

Image 1

What you need

Download script and execute in SSMS.

SQL
-- Create database
create database Test
go
use Test
go
 
USE [Test]
GO

-- Create customer table
if exists (select * from sys.tables where name = 'customer_destination')
drop table customer_destination
go

CREATE TABLE [dbo].[customer_destination](
	[customer_number] [varchar](10) NULL,
	[customer_type] [varchar](1) NULL,
	[name] [varchar](100) NULL,
	[gender] [varchar](1) NULL,
	[email_address] [varchar](200) NULL,
	[date_of_birth] [datetime] NULL,
	[occupation] [varchar](50) NULL,
	[status] [varchar](3) NULL,
	[fname] [varchar](50) NULL,
	[mname] [varchar](50) NULL,
	[lname] [varchar](50) NULL
)

This script was tested in SQL Server 2008.

Create Project

Open SQL Server Business Intelligence Development Studio.

Image 2

Then go to File->New->Project and select Integration Service Project.

Image 3

Select "Data Flow Task" from "Control Flow Items" and drag it on "Control Flow" tab. Then double click it.

Image 4

Select "Excel Source" from "Data Flow Source" and drag it on "Data Flow" tab. Double click on “Excel Source” task to configure it.

Image 5

Click New button for new OLE DB connection or select from existing connection.

Image 6

Click Browse button to select the "Customer.xls" file from your local path.

Image 7

Select the Excel sheet name.

Image 8

Check the available column and click OK.

Image 9

Select "Data Conversion" from "Data Flow Transformation" and drag it on the "Data Flow" tab and connect the extended green arrow from “Excel Source” to your "Data Conversion".

Image 10

Convert data. The length should be the same, otherwise you will get a warning message. Add "convert" prefix in each value.

Image 11

Select "Script Component" from "Data Flow Transformation" and drag it on the "Data Flow" tab. Connect the extended green arrow from “Data Conversion” to your "Script Component".

Image 12

Image 13

Click Input Column tab and select the "convert_name" column.

Image 14

Click the Output Column tab and create three outputs: "fname", "mname", and "lname".

Image 15

Click the "Script" tab and click "Edit Script". I have chosen my ScriptLanguage as "Microsoft Visual C# 2008".

Image 16

In the Main.cs file we need to add a reference. We need this for Regular Expressions.

Image 17

C#
using System.Text.RegularExpressions;

In the Main.cs file it's time to write some code in the Input0_ProcessInputRow procedure.

Image 18

C#
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
    /*
      Add your code here
    */
    string keyVal = Row.convertname.ToString();
    Match match = Regex.Match(keyVal, @"^(?<first>\w+) (?<last>\w+)(?: (?<middle>\w+))?$");
    if (match.Success)
    {
        string f = match.Groups["first"].Value;
        string l = match.Groups["last"].Value;
        string m = null;
        if (match.Groups["middle"].Success)
        {
            m = match.Groups["middle"].Value;
        }

        Row.fname = f;
        Row.lname = l;
        Row.mname = m;
    }
    else
        Row.fname = Row.convertname.ToString();
}</middle></last></first>

Select "OLE DB Destination" from "Data Flow Destination" and drag it on the "Data Flow" tab. Connect the extended green arrow from “Script Component” to your "OLE DB Destination".

Image 19

Double click on the "OLE DB Destination" task to configure it.

Image 20

Click New to create a new connection.

Image 21

Select "Server Name", "Authentication", and "Database" which will be "Test" for this example. Click Test Connection for checking, then click OK and then again click OK.

Image 22

Select the "customer_destination" table.

Image 23

Click the mapping tab and map each field and click OK.

Image 24

If you execute the package with debugging (press F5), the package should succeed and appear as shown here:

Image 25

To check what is happening here:

SQL
SELECT [customer_number]
      ,[customer_type] as [type]
      ,[name]
      ,[gender]
      ,[email_address]
      ,[date_of_birth]
      ,[occupation]
      ,[status]
      ,[fname]
      ,[mname]
      ,[lname]
  FROM [Test].[dbo].[customer_destination]

Image 26

Conclusion

You can see that the fname, mname, and lname fields have been populated. The .NET Framework includes full support for regular expressions (Regex), in the System.Text.RegularExpressions namespace. Regex provides an incredibly powerful way of defining and finding string patterns. You can use them for string pattern match, data cleaning, and data transformation. Email, postal code, and phone number verification are a few of the uses. I have included some Regex here, though all are not perfect, they are collected from Google.

Mail: [a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?

Phone: ^[0-9+\(\)#\.\s\/ext-]+$

First character capital:- ^[A-Z][a-zA-Z0-9]+$

I hope this might be helpful to you!

References

History

None so far.

License

This article, along with any associated source code and files, is licensed under The BSD License


Written By
Software Developer (Senior) icddr,b
Bangladesh Bangladesh
More than 8 years experience on Programming and Project implementation, I was primarily involved with projects for private organization,Govt.(Bangladesh Army,DG Health,RJSC), NGO (SEDF,WFP). Presently I am working at ICDDR,B and enhancing Hospital Management System developed by Microsoft Dynamic NAV and Windows Mobile Application 5.0

An active supporter of Open Source technology, my interested areas are ERP, IT Audit, Data warehouse, BI etc.

Playing Guitar for 15 years, my interested music style is Blues Rock,Neo Classical.

Certification

70-540:Microsoft® Windows Mobile® 5.0 - Application Development
MB7-514:Microsoft Dynamics™ NAV 5.0 C/SIDE Introduction
MB7-516:Microsoft Dynamics™ NAV 5.0 Solution Development
MB7-517:Microsoft Dynamics™ NAV 5.0 Installation and Configuration
MB7-515:Microsoft Dynamics™ NAV 5.0 Financials
70-432:Microsoft SQL Server 2008 - Implementation and Maintenance
70-450:PRO: Designing, Optimizing and Maintaining a Database Administrative Solution Using Microsoft SQL Server 2008
70-448:Microsoft SQL Server 2008, Business Intelligence Development and Maintenance
312-50:Certified Ethical Hacker

Web :http://masudparvezshabuz.appspot.com
Blog :http://masudparvezshabuz.wordpress.com
linkedin :http://www.linkedin.com/in/masudparvez

Comments and Discussions

 
QuestionUsing RegEx from another table Pin
Stephen Rajan2-Oct-18 18:40
Stephen Rajan2-Oct-18 18:40 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.