Background
Waaaaayyyy back when I was in college (circa 1986), I was subjected to two (count 'em, two) whole semesters of COBOL (on a VAX!). The only thing I learned (other than a healthy dislike of COBOL) was the Master File Update -- a crusty old technique from pre-database days. The teacher said that it was old but that it was a good tool to have in the toolbox.
And -- surprise surprise -- it has been; I've used this technique (or variations thereof) many times, including twice in the last year. When I mentioned this in this[^] thread, Luc begged and pleaded on bended knee (and I believe that he offered beer as well) that I impart this ancient knowledge to all so that it shall not perish in oblivion.
Introduction
This article presents a small class that encapsulates the mechanics of the Master File Update and (I hope) allows the developer to more easily avoid some pitfalls.
If you have two sorted lists of data and try to match the items to each other, there will be three sets of outcomes:
- Successful matches
- Items in the first list that don't match the second
- Items in the second that don't match the first
(Yes, this requires that the lists be sorted the same way and that the items match on the same keys, but that should be obvious if you use your little grey cells.)
To quote my first post from the thread, the basic algorithm follows:
0) While both lists have items to process:
0.1) Compare the two current items
0.1.1) If they match, process the item, and advance both lists
0.1.2) If the first list's item is less than the second list's item, advance the first list
0.1.3) Otherwise, advance the second list
1) Do stuff with any items left in the first list
2) Do stuff with any items left in the second list
What you do with the items is dependent on the task at hand -- you may only be interested in matches, or only in non-matches.
In response to Luc's sad puppy-dog eyes, I sought to write a general technique that should suit the situations that I have encountered.
You can learn a little bit more about the Master File Update here[^] (not a college I attended). As far as I recall the Master File Update presented in my class only dealt with changes to a Master File (e.g. debits and credits to accounts), not additions and deletions, so that's all this code supports.
Implementation
My goal was to achieve as general a framework for this technique as possible, so I made it generic. I quickly realized that the class would require four generic parameters:
public partial class MasterFileUpdater<LeftList,LeftItem,RightList,RightItem>
{
...
}
This should allow as much flexibility as I need.
Then I decided to use delegates and events for all the actions that need to be taken.
Events
Here are the events that are raised after a comparison:
public delegate void MatchDelegate ( LeftItem LeftItem , RightItem RightItem ) ;
public delegate void UnmatchedLeftDelegate ( LeftItem LeftItem ) ;
public delegate void UnmatchedRightDelegate ( RightItem RightItem ) ;
public event MatchDelegate OnMatch ;
public event UnmatchedLeftDelegate OnUnmatchedLeft ;
public event UnmatchedRightDelegate OnUnmatchedRight ;
Each event is raised with a method of the following pattern:
private void
RaiseOnMatch
(
LeftItem LeftItem
,
RightItem RightItem
)
{
if ( this.OnMatch != null )
{
this.OnMatch ( LeftItem , RightItem ) ;
}
return ;
}
If you are concerned about potential threading issues with this pattern (I'm not), you may alter it to suit you.
Delegates
And here are the delegates that allow the manipulation of the lists and items:
public delegate bool LeftListGetItemDelegate ( LeftList LeftList , out LeftItem LeftItem ) ;
public delegate bool RightListGetItemDelegate ( RightList RightList , out RightItem RightItem ) ;
public delegate int ComparerDelegate ( LeftItem LeftItem , RightItem RightItem ) ;
public virtual LeftListGetItemDelegate LeftListInitializer { get ; set ; }
public virtual LeftListGetItemDelegate LeftListAdvancer { get ; set ; }
public virtual RightListGetItemDelegate RightListInitializer { get ; set ; }
public virtual RightListGetItemDelegate RightListAdvancer { get ; set ; }
public virtual ComparerDelegate Comparer { get ; set ; }
Note that the Advancer
s and Initializer
s share the same delegates; there are situations where one method can be used for both purposes.
DataExistsFor
The class contains no other fields, but there is this enumeration...
private enum DataExistsFor
{
Neither = 0
,
Left = 1
,
Right = 2
,
Both = 3
}
...to help with tracking the states of the lists. If you try to implement a Master File Update yourself, you may be tempted (as I have been) to use a single boolean to serve this purpose, but experience shows that it just doesn't work. In fact, my first attempts at this class tried to use a single boolean and testing showed it to be faulty. Then after I got this class working properly and I began to rework it into a recent utility that had its own implementation, I found that utility to be faulty as well.
Constructors
Which brings us to the constructors; there are two so you can either provide values for the delegates or not, depending on your personal style:
public MasterFileUpdater
(
)
{
return ;
}
public MasterFileUpdater
(
LeftListGetItemDelegate LeftListInitializer
,
LeftListGetItemDelegate LeftListAdvancer
,
RightListGetItemDelegate RightListInitializer
,
RightListGetItemDelegate RightListAdvancer
,
ComparerDelegate Comparer
)
{
this.LeftListInitializer = LeftListInitializer ;
this.LeftListAdvancer = LeftListAdvancer ;
this.RightListInitializer = RightListInitializer ;
this.RightListAdvancer = RightListAdvancer ;
this.Comparer = Comparer ;
return ;
}
PerformMatching
Once you have an instance of the class, you can use it to perform the matching of two lists. There are two overloads of PerformMatching
: one that allows you to specify the delegates if you choose...
public virtual void
PerformMatching
(
LeftList LeftList
,
LeftListGetItemDelegate LeftListInitializer
,
LeftListGetItemDelegate LeftListAdvancer
,
RightList RightList
,
RightListGetItemDelegate RightListInitializer
,
RightListGetItemDelegate RightListAdvancer
,
ComparerDelegate Comparer
)
{
this.LeftListInitializer = LeftListInitializer ;
this.LeftListAdvancer = LeftListAdvancer ;
this.RightListInitializer = RightListInitializer ;
this.RightListAdvancer = RightListAdvancer ;
this.Comparer = Comparer ;
this.PerformMatching ( LeftList , RightList ) ;
return ;
}
...and one that just takes the two lists, performs some checks, and gets on with it:
public virtual void
PerformMatching
(
LeftList LeftList
,
RightList RightList
)
{
if ( this.LeftListAdvancer == null )
{
throw ( new System.ArgumentNullException
( "LeftListAdvancer" , "LeftListAdvancer must not be null" ) ) ;
}
if ( this.RightListAdvancer == null )
{
throw ( new System.ArgumentNullException
( "RightListAdvancer" , "RightListAdvancer must not be null" ) ) ;
}
if ( this.Comparer == null )
{
throw ( new System.ArgumentNullException ( "Comparer" , "Comparer must not be null" ) ) ;
}
if ( this.LeftListInitializer == null )
{
this.LeftListInitializer = this.LeftListAdvancer ;
}
if ( this.RightListInitializer == null )
{
this.RightListInitializer = this.RightListAdvancer ;
}
...
}
Note that if the same method can be used for an Advancer
and an Initializer
, you need only provide it as the Advancer
and it will be copied to the Initializer
for you.
Algorithm
The basic pattern of the algorithm is stated above; here's how it's implemented in this class. There are fields to hold the states of the lists and the current items in the lists. These fields need to be initialized before the matching can commence.
DataExistsFor liststate = DataExistsFor.Neither ;
LeftItem leftitem = default(LeftItem) ;
if ( this.LeftListInitializer ( LeftList , out leftitem ) )
{
liststate |= DataExistsFor.Left ;
}
RightItem rightitem = default(RightItem) ;
if ( this.RightListInitializer ( RightList , out rightitem ) )
{
liststate |= DataExistsFor.Right ;
}
The comparison of values is only performed as long as values exist in both lists. The Comparer delegate is executed on the current values from the lists, one of the events is raised (if there is at least one handler attached), and then at least one Advancer
is executed. If a list no longer contains values, the liststate
will be altered to reflect this.
while ( liststate == DataExistsFor.Both )
{
int diff = this.Comparer ( leftitem , rightitem ) ;
if ( diff == 0 )
{
this.RaiseOnMatch ( leftitem , rightitem ) ;
if ( !this.LeftListAdvancer ( LeftList , out leftitem ) )
{
liststate &= ~DataExistsFor.Left ;
}
if ( !this.RightListAdvancer ( RightList , out rightitem ) )
{
liststate &= ~DataExistsFor.Right ;
}
}
else if ( diff < 0 )
{
this.RaiseOnUnmatchedLeft ( leftitem ) ;
if ( !this.LeftListAdvancer ( LeftList , out leftitem ) )
{
liststate &= ~DataExistsFor.Left ;
}
}
else
{
this.RaiseOnUnmatchedRight ( rightitem ) ;
if ( !this.RightListAdvancer ( RightList , out rightitem ) )
{
liststate &= ~DataExistsFor.Right ;
}
}
}
The matching ends when at least one list has no more items, but the other list may still have items, and you may need to process those items. The following will handle that situation.
if ( this.OnUnmatchedLeft != null )
{
while ( liststate == DataExistsFor.Left )
{
this.RaiseOnUnmatchedLeft ( leftitem ) ;
if ( !this.LeftListAdvancer ( LeftList , out leftitem ) )
{
liststate &= ~DataExistsFor.Left ;
}
}
}
if ( this.OnUnmatchedRight != null )
{
while ( liststate == DataExistsFor.Right )
{
this.RaiseOnUnmatchedRight ( rightitem ) ;
if ( !this.RightListAdvancer ( RightList , out rightitem ) )
{
liststate &= ~DataExistsFor.Right ;
}
}
}
Using the Code
MFUtester.cs
I have included MFUtester.cs as an example of how this can be used to perform matching on two lists of integers:
using ListType = System.Collections.Generic.List<int> ;
using MFU = PIEBALD.Types.MasterFileUpdater<System.Collections.Generic.List<int>,
int,System.Collections.Generic.List<int>,int> ;
For this demonstration, we need two lists, an index for each, and a suitable instance of MasterFileUpdater
:
int leftindex = 0 ;
ListType leftlist = new ListType() ;
int rightindex = 0 ;
ListType rightlist = new ListType() ;
for ( int i = 0 ; i < 10 ; i++ )
{
leftlist.Add ( i * 2 ) ;
rightlist.Add ( i * 3 ) ;
}
MFU mfu = new MFU() ;
The two lists created will have some values that match and some that don't.
I use anonymous methods as event handlers:
mfu.OnMatch += delegate ( int leftitem , int rightitem )
{
System.Console.WriteLine ( "Match {0}=={1}" , leftitem , rightitem ) ;
return ;
} ;
mfu.OnUnmatchedLeft += delegate ( int leftitem )
{
System.Console.WriteLine ( "Unmatched left {0}" , leftitem ) ;
return ;
} ;
mfu.OnUnmatchedRight += delegate ( int rightitem )
{
System.Console.WriteLine ( "Unmatched right {0}" , rightitem ) ;
return ;
} ;
The two Initializer
s are similar; here's the one for the left list:
mfu.LeftListInitializer = delegate ( ListType list , out int item )
{
item = leftindex = 0 ;
return ( list != null ) ;
} ;
Likewise, the Advancer
s are similar:
mfu.LeftListAdvancer = delegate ( ListType list , out int item )
{
bool advanced ;
if ( advanced = ( ( list != null ) && ( leftindex < list.Count - 1 ) ) )
{
item = list [ ++leftindex ] ;
}
else
{
item = 0 ;
}
return ( advanced ) ;
} ;
The Comparer
for integers is very simple:
mfu.Comparer = delegate ( int left , int right )
{
return ( left - right ) ;
} ;
Once everything is set up, you just call PerformMatching
:
mfu.PerformMatching
(
leftlist
,
rightlist
) ;
MFUtester.cs flexes the algorithm with the two lists various ways. It can be built at the command line with:
CSC MFUtester.cs MasterFileUpdater.cs
SQLcomparator.cs
The utility I wrote recently (not included with the article, but maybe in a future one) that uses this, uses IDataReaders
for the two lists. And, as luck would have it, the same IDataReaders
make suitable items as well:
using MFU = PIEBALD.Types.MasterFileUpdater
<
System.Data.IDataReader
,
System.Data.IDataReader
,
System.Data.IDataReader
,
System.Data.IDataReader
> ;
Without going into too much detail, the utility gets an IDataReader
from each of two data connections and matches the records based on the ItemID
fields.
using
(
System.Data.IDataReader
dr0 = db0.ExecuteReader ( sql , args [ 0 ] , args [ 1 ] , filter )
,
dr1 = db1.ExecuteReader ( sql , args [ 2 ] , args [ 3 ] , filter )
)
{
MFU mfu = new MFU() ;
mfu.Comparer = delegate ( System.Data.IDataReader left , System.Data.IDataReader right )
{
return ( System.StringComparer.InvariantCulture.Compare
( left [ "ItemID" ] , right [ "ItemID" ] ) ) ;
} ;
The event handlers write the differences between the two lists to the Console.
mfu.OnMatch += delegate ( System.Data.IDataReader leftitem , System.Data.IDataReader rightitem )
{
string left = (string) leftitem [ "ItemContent" ] ;
string right = (string) rightitem [ "ItemContent" ] ;
...
WriteLine ( args [ 0 ] , args [ 1 ] , left ) ;
WriteLine ( args [ 2 ] , args [ 3 ] , right ) ;
return ;
} ;
mfu.OnUnmatchedLeft += delegate ( System.Data.IDataReader leftitem )
{
WriteLine ( args [ 0 ] , args [ 1 ] , leftitem [ "ItemContent" ] ) ;
WriteLine ( null , null , null ) ;
return ;
} ;
mfu.OnUnmatchedRight += delegate ( System.Data.IDataReader rightitem )
{
WriteLine ( args [ 2 ] , args [ 3 ] , rightitem [ "ItemContent" ] ) ;
WriteLine ( null , null , null ) ;
return ;
} ;
The two lists can share a single method as its Initializer
and Advancer
.
mfu.LeftListInitializer =
mfu.LeftListAdvancer = Advancer ;
mfu.RightListInitializer =
mfu.RightListAdvancer = Advancer ;
private static bool
Advancer
(
System.Data.IDataReader list
,
out System.Data.IDataReader item
)
{
bool advanced = ( list != null ) && !list.IsClosed && list.Read() ;
if ( advanced )
{
item = list ;
}
else
{
item = null ;
}
return ( advanced ) ;
}
Then perform the matching and you're done.
mfu.PerformMatching
(
dr0
,
dr1
) ;
dr0.Close() ;
dr1.Close() ;
}
}
Afterthought
It occurred to me, as I was writing this article, that perhaps, rather than events, this might be better as an Enumerator, so you can do:
foreach ( something result in mfu.MatchResults ( left , right ) )
{
...
}
What are your thoughts?
Points of Interest
I found that...
using ListType = System.Collections.Generic.List<int> ;
using MFU = PIEBALD.Types.MasterFileUpdater<ListType,int,ListType,int> ;
...won't compile, but that...
using ListType = System.Collections.Generic.List<int> ;
namespace whatever
{
using MFU = PIEBALD.Types.MasterFileUpdater<ListType,int,ListType,int> ;
}
...will.
History
- 2011-11-25 First submitted