Click here to Skip to main content
15,891,607 members
Articles / .NET
Article

MasterFileUpdater

Rate me:
Please Sign up or sign in to vote.
5.00/5 (2 votes)
25 Nov 2011CPOL6 min read 17.3K   116   10   5
A generic class to help plumb the matching of items of two sorted lists

Background

Waaaaayyyy back when I was in college (circa 1986), I was subjected to two (count 'em, two) whole semesters of COBOL (on a VAX!). The only thing I learned (other than a healthy dislike of COBOL) was the Master File Update -- a crusty old technique from pre-database days. The teacher said that it was old but that it was a good tool to have in the toolbox.

And -- surprise surprise -- it has been; I've used this technique (or variations thereof) many times, including twice in the last year. When I mentioned this in this[^] thread, Luc begged and pleaded on bended knee (and I believe that he offered beer as well) that I impart this ancient knowledge to all so that it shall not perish in oblivion.

Introduction

This article presents a small class that encapsulates the mechanics of the Master File Update and (I hope) allows the developer to more easily avoid some pitfalls.

If you have two sorted lists of data and try to match the items to each other, there will be three sets of outcomes:

  1. Successful matches
  2. Items in the first list that don't match the second
  3. Items in the second that don't match the first

(Yes, this requires that the lists be sorted the same way and that the items match on the same keys, but that should be obvious if you use your little grey cells.)

To quote my first post from the thread, the basic algorithm follows:

0) While both lists have items to process:
0.1) Compare the two current items
0.1.1) If they match, process the item, and advance both lists
0.1.2) If the first list's item is less than the second list's item, advance the first list
0.1.3) Otherwise, advance the second list
1) Do stuff with any items left in the first list
2) Do stuff with any items left in the second list
What you do with the items is dependent on the task at hand -- you may only be interested in matches, or only in non-matches.

In response to Luc's sad puppy-dog eyes, I sought to write a general technique that should suit the situations that I have encountered.

You can learn a little bit more about the Master File Update here[^] (not a college I attended). As far as I recall the Master File Update presented in my class only dealt with changes to a Master File (e.g. debits and credits to accounts), not additions and deletions, so that's all this code supports.

Implementation

My goal was to achieve as general a framework for this technique as possible, so I made it generic. I quickly realized that the class would require four generic parameters:

C#
public partial class MasterFileUpdater<LeftList,LeftItem,RightList,RightItem>
{
  ...
}

This should allow as much flexibility as I need.

Then I decided to use delegates and events for all the actions that need to be taken.

Events

Here are the events that are raised after a comparison:

C#
public delegate void MatchDelegate          ( LeftItem  LeftItem  , RightItem RightItem ) ;
public delegate void UnmatchedLeftDelegate  ( LeftItem  LeftItem  ) ;
public delegate void UnmatchedRightDelegate ( RightItem RightItem ) ;

public event MatchDelegate          OnMatch          ;
public event UnmatchedLeftDelegate  OnUnmatchedLeft  ;
public event UnmatchedRightDelegate OnUnmatchedRight ;

Each event is raised with a method of the following pattern:

C#
private void
RaiseOnMatch
(
  LeftItem  LeftItem
,
  RightItem RightItem
)
{
  if ( this.OnMatch != null )
  {
    this.OnMatch ( LeftItem , RightItem ) ;
  }

  return ;
}

If you are concerned about potential threading issues with this pattern (I'm not), you may alter it to suit you.

Delegates

And here are the delegates that allow the manipulation of the lists and items:

C#
public delegate bool LeftListGetItemDelegate  ( LeftList  LeftList  , out LeftItem  LeftItem  ) ;
public delegate bool RightListGetItemDelegate ( RightList RightList , out RightItem RightItem ) ;
public delegate int  ComparerDelegate         ( LeftItem  LeftItem  , RightItem     RightItem ) ;

public virtual LeftListGetItemDelegate  LeftListInitializer  { get ; set ; }
public virtual LeftListGetItemDelegate  LeftListAdvancer     { get ; set ; }
public virtual RightListGetItemDelegate RightListInitializer { get ; set ; }
public virtual RightListGetItemDelegate RightListAdvancer    { get ; set ; }
public virtual ComparerDelegate         Comparer             { get ; set ; }

Note that the Advancers and Initializers share the same delegates; there are situations where one method can be used for both purposes.

DataExistsFor

The class contains no other fields, but there is this enumeration...

C#
private enum DataExistsFor
{
  Neither = 0
,
  Left = 1
,
  Right = 2
,
  Both = 3
}

...to help with tracking the states of the lists. If you try to implement a Master File Update yourself, you may be tempted (as I have been) to use a single boolean to serve this purpose, but experience shows that it just doesn't work. In fact, my first attempts at this class tried to use a single boolean and testing showed it to be faulty. Then after I got this class working properly and I began to rework it into a recent utility that had its own implementation, I found that utility to be faulty as well.

Constructors

Which brings us to the constructors; there are two so you can either provide values for the delegates or not, depending on your personal style:

C#
public MasterFileUpdater
(
)
{
  return ;
}

public MasterFileUpdater
(
  LeftListGetItemDelegate  LeftListInitializer
,
  LeftListGetItemDelegate  LeftListAdvancer
,
  RightListGetItemDelegate RightListInitializer
,
  RightListGetItemDelegate RightListAdvancer
,
  ComparerDelegate         Comparer
)
{
  this.LeftListInitializer  = LeftListInitializer  ;
  this.LeftListAdvancer     = LeftListAdvancer     ;
  this.RightListInitializer = RightListInitializer ;
  this.RightListAdvancer    = RightListAdvancer    ;
  this.Comparer             = Comparer             ;

  return ;
}

PerformMatching

Once you have an instance of the class, you can use it to perform the matching of two lists. There are two overloads of PerformMatching: one that allows you to specify the delegates if you choose...

C#
public virtual void
PerformMatching
(
  LeftList                 LeftList
,
  LeftListGetItemDelegate  LeftListInitializer
,
  LeftListGetItemDelegate  LeftListAdvancer
,
  RightList                RightList
,
  RightListGetItemDelegate RightListInitializer
,
  RightListGetItemDelegate RightListAdvancer
,
  ComparerDelegate         Comparer
)
{
  this.LeftListInitializer  = LeftListInitializer  ;
  this.LeftListAdvancer     = LeftListAdvancer     ;
  this.RightListInitializer = RightListInitializer ;
  this.RightListAdvancer    = RightListAdvancer    ;
  this.Comparer             = Comparer             ;

  this.PerformMatching ( LeftList , RightList ) ;

  return ;
}

...and one that just takes the two lists, performs some checks, and gets on with it:

C#
public virtual void
PerformMatching
(
  LeftList  LeftList
,
  RightList RightList
)
{
  if ( this.LeftListAdvancer == null )
  {
    throw ( new System.ArgumentNullException 
	( "LeftListAdvancer" , "LeftListAdvancer must not be null" ) ) ;
  }

  if ( this.RightListAdvancer == null )
  {
    throw ( new System.ArgumentNullException 
	( "RightListAdvancer" , "RightListAdvancer must not be null" ) ) ;
  }

  if ( this.Comparer == null )
  {
    throw ( new System.ArgumentNullException ( "Comparer" , "Comparer must not be null" ) ) ;
  }

  if ( this.LeftListInitializer == null )
  {
    this.LeftListInitializer = this.LeftListAdvancer ;
  }

  if ( this.RightListInitializer == null )
  {
    this.RightListInitializer = this.RightListAdvancer ;
  }

  ...
}

Note that if the same method can be used for an Advancer and an Initializer, you need only provide it as the Advancer and it will be copied to the Initializer for you.

Algorithm

The basic pattern of the algorithm is stated above; here's how it's implemented in this class. There are fields to hold the states of the lists and the current items in the lists. These fields need to be initialized before the matching can commence.

C#
DataExistsFor liststate = DataExistsFor.Neither ;

LeftItem leftitem = default(LeftItem) ;

if ( this.LeftListInitializer ( LeftList , out leftitem ) )
{
  liststate |= DataExistsFor.Left ;
}

RightItem rightitem = default(RightItem) ;

if ( this.RightListInitializer ( RightList , out rightitem ) )
{
  liststate |= DataExistsFor.Right ;
}

The comparison of values is only performed as long as values exist in both lists. The Comparer delegate is executed on the current values from the lists, one of the events is raised (if there is at least one handler attached), and then at least one Advancer is executed. If a list no longer contains values, the liststate will be altered to reflect this.

C#
while ( liststate == DataExistsFor.Both )
{
  int diff = this.Comparer ( leftitem , rightitem ) ;

  if ( diff == 0 )
  {
    this.RaiseOnMatch ( leftitem , rightitem ) ;

    if ( !this.LeftListAdvancer ( LeftList , out leftitem ) )
    {
      liststate &= ~DataExistsFor.Left ;
    }

    if ( !this.RightListAdvancer ( RightList , out rightitem ) )
    {
      liststate &= ~DataExistsFor.Right ;
    }
  }
  else if ( diff < 0 )
  {
    this.RaiseOnUnmatchedLeft ( leftitem ) ;

    if ( !this.LeftListAdvancer ( LeftList , out leftitem ) )
    {
      liststate &= ~DataExistsFor.Left ;
    }
  }
  else
  {
    this.RaiseOnUnmatchedRight ( rightitem ) ;

    if ( !this.RightListAdvancer ( RightList , out rightitem ) )
    {
      liststate &= ~DataExistsFor.Right ;
    }
  }
}

The matching ends when at least one list has no more items, but the other list may still have items, and you may need to process those items. The following will handle that situation.

C#
if ( this.OnUnmatchedLeft != null )
{
  while ( liststate == DataExistsFor.Left )
  {
    this.RaiseOnUnmatchedLeft ( leftitem ) ;

    if ( !this.LeftListAdvancer ( LeftList , out leftitem ) )
    {
      liststate &= ~DataExistsFor.Left ;
    }
  }
}

if ( this.OnUnmatchedRight != null )
{
  while ( liststate == DataExistsFor.Right )
  {
    this.RaiseOnUnmatchedRight ( rightitem ) ;

    if ( !this.RightListAdvancer ( RightList , out rightitem ) )
    {
      liststate &= ~DataExistsFor.Right ;
    }
  }
}

Using the Code

MFUtester.cs

I have included MFUtester.cs as an example of how this can be used to perform matching on two lists of integers:

C#
using ListType = System.Collections.Generic.List<int> ;
using MFU = PIEBALD.Types.MasterFileUpdater<System.Collections.Generic.List<int>,
		int,System.Collections.Generic.List<int>,int> ;

For this demonstration, we need two lists, an index for each, and a suitable instance of MasterFileUpdater:

C#
int leftindex = 0 ;
ListType leftlist = new ListType() ;

int rightindex = 0 ;
ListType rightlist = new ListType() ;

for ( int i = 0 ; i < 10 ; i++ )
{
  leftlist.Add  ( i * 2 ) ;
  rightlist.Add ( i * 3 ) ;
}

MFU mfu = new MFU() ;
The two lists created will have some values that match and some that don't.

I use anonymous methods as event handlers:

C#
mfu.OnMatch += delegate ( int leftitem , int rightitem )
{
  System.Console.WriteLine ( "Match {0}=={1}" , leftitem , rightitem ) ;

  return ;
} ;

mfu.OnUnmatchedLeft += delegate ( int leftitem )
{
  System.Console.WriteLine ( "Unmatched left {0}" , leftitem ) ;

  return ;
} ;

mfu.OnUnmatchedRight += delegate ( int rightitem )
{
  System.Console.WriteLine ( "Unmatched right {0}" , rightitem ) ;

  return ;
} ;

The two Initializers are similar; here's the one for the left list:

C#
mfu.LeftListInitializer = delegate ( ListType list , out int item )
{
  item = leftindex = 0 ;

  return ( list != null ) ;
} ;

Likewise, the Advancers are similar:

C#
mfu.LeftListAdvancer = delegate ( ListType list , out int item )
{
  bool advanced ;

  if ( advanced = ( ( list != null ) && ( leftindex < list.Count - 1 ) ) )
  {
    item = list [ ++leftindex ] ;
  }
  else
  {
    item = 0 ;
  }

  return ( advanced ) ;
} ;

The Comparer for integers is very simple:

C#
mfu.Comparer = delegate ( int left , int right )
{
  return ( left - right ) ;
} ;

Once everything is set up, you just call PerformMatching:

C#
mfu.PerformMatching
(
  leftlist
,
  rightlist
) ;

MFUtester.cs flexes the algorithm with the two lists various ways. It can be built at the command line with:

C#
CSC MFUtester.cs MasterFileUpdater.cs 

SQLcomparator.cs

The utility I wrote recently (not included with the article, but maybe in a future one) that uses this, uses IDataReaders for the two lists. And, as luck would have it, the same IDataReaders make suitable items as well:

C#
using MFU = PIEBALD.Types.MasterFileUpdater
  <
    System.Data.IDataReader
  ,
    System.Data.IDataReader
  ,
    System.Data.IDataReader
  ,
    System.Data.IDataReader
  > ;

Without going into too much detail, the utility gets an IDataReader from each of two data connections and matches the records based on the ItemID fields.

C#
using
(
  System.Data.IDataReader
  dr0 = db0.ExecuteReader ( sql , args [ 0 ] , args [ 1 ] , filter )
,
  dr1 = db1.ExecuteReader ( sql , args [ 2 ] , args [ 3 ] , filter )
)
{
  MFU mfu = new MFU() ;

  mfu.Comparer = delegate ( System.Data.IDataReader left , System.Data.IDataReader right )
  {
    return ( System.StringComparer.InvariantCulture.Compare 
		( left [ "ItemID" ] , right [ "ItemID" ] ) ) ;
  } ;

The event handlers write the differences between the two lists to the Console.

C#
mfu.OnMatch += delegate ( System.Data.IDataReader leftitem , System.Data.IDataReader rightitem )
{
  string left  = (string) leftitem  [ "ItemContent" ] ;
  string right = (string) rightitem [ "ItemContent" ] ;

  ...

  WriteLine ( args [ 0 ] , args [ 1 ] , left  ) ;

  WriteLine ( args [ 2 ] , args [ 3 ] , right ) ;

  return ;
} ;

mfu.OnUnmatchedLeft += delegate ( System.Data.IDataReader leftitem )
{
  WriteLine ( args [ 0 ] , args [ 1 ] , leftitem [ "ItemContent" ] ) ;

  WriteLine ( null , null , null ) ;

  return ;
} ;

mfu.OnUnmatchedRight += delegate ( System.Data.IDataReader rightitem )
{
  WriteLine ( args [ 2 ] , args [ 3 ] , rightitem [ "ItemContent" ] ) ;

  WriteLine ( null , null , null ) ;

  return ;
} ;

The two lists can share a single method as its Initializer and Advancer.

C#
  mfu.LeftListInitializer =
  mfu.LeftListAdvancer = Advancer ;

  mfu.RightListInitializer =
  mfu.RightListAdvancer = Advancer ;

private static bool
Advancer
(
  System.Data.IDataReader     list
,
  out System.Data.IDataReader item
)
{
  bool advanced = ( list != null ) && !list.IsClosed && list.Read() ;

  if ( advanced )
  {
    item = list ;
  }
  else
  {
    item = null ;
  }

  return ( advanced ) ;
}

Then perform the matching and you're done.

C#
    mfu.PerformMatching
    (
      dr0
    ,
      dr1
    ) ;

    dr0.Close() ;
    dr1.Close() ;
  }
}

Afterthought

It occurred to me, as I was writing this article, that perhaps, rather than events, this might be better as an Enumerator, so you can do:

C#
foreach ( something result in mfu.MatchResults ( left , right ) )
{
  ...
}

What are your thoughts?

Points of Interest

I found that...

C#
using ListType = System.Collections.Generic.List<int> ;
using MFU = PIEBALD.Types.MasterFileUpdater<ListType,int,ListType,int> ;

...won't compile, but that...

C#
using ListType = System.Collections.Generic.List<int> ;
namespace whatever 
{
  using MFU = PIEBALD.Types.MasterFileUpdater<ListType,int,ListType,int> ;
}

...will.

History

  • 2011-11-25 First submitted

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior)
United States United States
BSCS 1992 Wentworth Institute of Technology

Originally from the Boston (MA) area. Lived in SoCal for a while. Now in the Phoenix (AZ) area.

OpenVMS enthusiast, ISO 8601 evangelist, photographer, opinionated SOB, acknowledged pedant and contrarian

---------------

"I would be looking for better tekkies, too. Yours are broken." -- Paul Pedant

"Using fewer technologies is better than using more." -- Rico Mariani

"Good code is its own best documentation. As you’re about to add a comment, ask yourself, ‘How can I improve the code so that this comment isn’t needed?’" -- Steve McConnell

"Every time you write a comment, you should grimace and feel the failure of your ability of expression." -- Unknown

"If you need help knowing what to think, let me know and I'll tell you." -- Jeffrey Snover [MSFT]

"Typing is no substitute for thinking." -- R.W. Hamming

"I find it appalling that you can become a programmer with less training than it takes to become a plumber." -- Bjarne Stroustrup

ZagNut’s Law: Arrogance is inversely proportional to ability.

"Well blow me sideways with a plastic marionette. I've just learned something new - and if I could award you a 100 for that post I would. Way to go you keyboard lovegod you." -- Pete O'Hanlon

"linq'ish" sounds like "inept" in German -- Andreas Gieriet

"Things would be different if I ran the zoo." -- Dr. Seuss

"Wrong is evil, and it must be defeated." –- Jeff Ello

"A good designer must rely on experience, on precise, logical thinking, and on pedantic exactness." -- Nigel Shaw

“It’s always easier to do it the hard way.” -- Blackhart

“If Unix wasn’t so bad that you can’t give it away, Bill Gates would never have succeeded in selling Windows.” -- Blackhart

"Use vertical and horizontal whitespace generously. Generally, all binary operators except '.' and '->' should be separated from their operands by blanks."

"Omit needless local variables." -- Strunk... had he taught programming

Comments and Discussions

 
GeneralMy vote of 5 Pin
Kanasz Robert29-Nov-11 22:47
professionalKanasz Robert29-Nov-11 22:47 
QuestionAn example using DataTables Pin
PIEBALDconsult28-Nov-11 10:04
mvePIEBALDconsult28-Nov-11 10:04 
GeneralYour Aliased Namespace Finding is Interesting Pin
AspDotNetDev25-Nov-11 17:07
protectorAspDotNetDev25-Nov-11 17:07 
GeneralRe: Your Aliased Namespace Finding is Interesting Pin
PIEBALDconsult26-Nov-11 7:31
mvePIEBALDconsult26-Nov-11 7:31 
GeneralRe: Your Aliased Namespace Finding is Interesting Pin
AspDotNetDev26-Nov-11 8:37
protectorAspDotNetDev26-Nov-11 8:37 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.