How One Can Reduce Code Maintenance in the Insurance/Finance/Bank Industry

Nicolas Kogler

4.00/5 (1 vote)

Sep 26, 2017

CPOL

8 min read

7487

This article will take a look at the benefits of code generators, the .NET reflection and external files to reduce code maintenance in industries with a lot of data.

Introduction

In this article, I will elaborate how the use of code generators and .NET reflection can heavily reduce code maintenance for industries that operate with a lot of data and data testing. If you are working as a developer in an insurance, finance or bank company, you sure know about the issues that these companies have concerning code maintenance. Probably there are a lot of programs in your company based on a single, but huge data format that are being changed every once in a while. When that happens, you know you will go through a lot of pain in the following weeks. This article is supposed to help you in developing several strategies in order to fight against this process. In the end, you may have a single code generator that generates code, inserts it into your project and recompiles the project with the click of a button.

Background

At my work, most of my colleagues struggle with the lengthy task of maintaining historically grown code, without even thinking about automating the process in order to be more efficient. In a recent project of mine, I was given the task to compare huge structured files of data with data in Excel-workbooks resulting from Excel-calculations. Said files are around 8000 lines long and contain over 1000 properties to compare with those from the Excel-workbooks.

I thought the process of thinking about solutions to write the code to be as maintenance-free as possible could be interesting for many developers working in industries that handle this much data and that is why I decided to write this article.

The Problems

At first glance, this may seem like a simple task - but is not when one takes a closer look at what actually has to be achieved:

High performance - we deal with around 5000 dynamically created test cases (stored in files that are 8000 lines long) that again consist of 20-30 services. The services themselves are stored in different files that are again around 8000 lines long.
Biggest possible editing possibility - the customers want to define the properties to be compared themselves, as the Excel-workbooks and the formulas change almost weekly.
Biggest possible configuration - for testing purposes, they also want to define which type of file they want to compare to which type of Excel-workbook.

Everyone knows that high performance and biggest possible "xyz" are usually enemies. If you give the user more control, your code has to be more dynamic, i.e., performs worse. We are trying to find a solution that does not hurt both of them too much.

The Big Files with Data

Basically, we need to read and parse them into a C# structure with around 1000 members.

Firstly, writing such code takes long and eventually a human being will lose track of that huge data monster.
Secondly, the name of the members may change anytime (it does monthly, at least here) and new members may be introduced anytime.
Thirdly, string comparisons are not performing very well - in fact, they do not even work here because our file has several headings that again contain several sub-members and those members may have duplicate names at global file scope. That means we have to do at least amount-of-headings * amount-of-sub-properties string comparisons per line, and that for 8000 lines. Nope.

The Solution: Code Generators

The solution to the problem is code generators. In this specific case, they read the file, detect headings and sub-properties and generate several C# structures that represent the entire file. The implementation of those is entirely up to you - I will not elaborate how to do them in this article, but maybe in another, if requested. Just make sure that along with the private members of the structure also generate C# properties that return the member value. Do this for actual values but also headings! You will soon find out that this is a key part of all the magic that will happen at a later point of the article.

Now that we have a code generator that can generate C# code out of files anytime they change, the maintenance problem is solved. But what about our possibly millions of string comparisons? I came up with a pretty clever solution to this. When iterating through each property within the file, calculate the Hashcode of the property string and the Hashcode of the heading that property belongs to. Once you did this, you can generate a switch-statement that will replace the string comparisons with integer comparisons and that speeds up the code significantly. (Note: If you abs() and mod() the Heading-Hash accordingly (in a way they do not overlap), you can achieve the headings to be in range 0 to <num>: the compiler may then generate an actual function pointer table out of this and you gain another bit of performance.)

The switch-statement may look as follows:

switch (headingHash)
{
    case {generator_heading_hash}:
    {
        if (propertyHash == {generator_prop_hash})
            myStructure.myHeading.[...].{generator_prop_name} = currentValue;
        else if (...)
            // Do this for all properties in the current heading
    }

    // Do this for all headings in the file
}

The Excel-workbooks

Our task is to read values from workbooks and compare them to values from the big files I have mentioned earlier. The interface or connection between those two formats must be chosen wisely: If you fail, you may suffer from huge maintenance problems or even performance problems. Before I was assigned with the task, many departments of my company tried to find a suitable solution. All of the approaches, though, were very unmaintainable and slow since Excel-interopability has its costs and they dealt with data cell-by-cell. The first approach of many of my colleagues looked something like this:

excelInput.setCell(1, 1, bigFileStructure.someHeading.[...].someProperty);
excelInput.setCell(1, 2, bigFileStructure.someHeading.[...].anotherProperty);
// and a 100 more input parameters

if (excelOutput.cell(1, 1).CastTo<double>() == bigFileStructure.[...].property)
    // store result
// and a 1000 more output parameters

The Maintenance Solution: XML Files and .NET Reflection

To actually come up with this solution required me putting on the thinking cap. On the one hand, I had to provide a reasonable performance for the tests. On the other hand, they wanted to manipulate as much as possible in order to control what is being tested and how it is tested. Eventually, I came up with a solution to fulfill both requirements. To begin with, let us take a look at the XML file I have designed. The file is structured like this:

<inputs type="{calculator type}">
    <input row="1" column="1" property="{property tree}"/>
</inputs>

The Property Attribute and Where .NET Reflection Shines

As some of you might know, via .NET reflection you can receive pointers to functions, pointers to members and some other fancy things by name. How can we possibly use this knowledge to map an actual property in the code to a lousy string in an XML file? Imagine that we have a property tree that is structured like this:

<input ... property="Heading1.Property1"/>

and a C# structure looking like that:

// Important: We need a static object that remains the same for the entire test run!
// If you want to run the test on multiple cores, adjust the algorithm to your needs ;)
static MyData globalData = new MyData();

struct MyData
{
    private Heading1 m_heading1;
    public Heading1 Heading1 { get { return m_heading1; } }
}

struct Heading1
{
    private double m_property1;
    public double Property1 { get { return m_property1; } }
}

then, you can finally parse the property tree as follows:

object current = globalData;
string[] entries = tree.split('.');

for (int i = 0; i < entries.Length-1; i++)
{
    var nextEntry = entries[i];
    var propInfo = current.GetType().GetProperty(nextEntry);
    if (propInfo == null)
        // property tree invalid, throw error.

    // Gets the reference to e.g. a heading object and sets it current.
    current = propInfo.GetValue(current);
}

var finalEntry = entries[entries.Length-1];
var finalProp = current.GetType().GetProperty(finalEntry);
var method = (Func<double>) Delegate.CreateDelegate(typeof(Func<double>), current, finalProp.GetMethod);

listOfMethods.Add(method);

// Note: It is also useful to store the minimum and maximum row/column. This is your task for now ;)

With the awesomeness of .NET reflection, we now have a list of function pointers that we can simply use during the test run:

public class RunData
{
    public class RunEntry
    {
        public int Row { get; set; }
        public int Column { get; set; }
        public Func<double> Method { get; }

        // We will need those later!
        public int MinRow { get; set; }
        public int MaxRow { get; set; }
        public int MinColumn { get; set; }
        public int MaxColumn { get; set; }
    }

    public List<RunEntry> Entries { get; }
}

public void FillExcel(RunData data)
{
    foreach (var entry in data.Entries)
    {
        excelInput.setCell(entry.Column, entry.Row, entry.Method());
    }
}

// For the output, it works pretty much the same, I will not elaborate it in this article.
// if (excelOutput.cell(entry.Column, entry.Row).CastTo<double>() == entry.Method()) { }

The advantages of that method are obvious - we will never have to recompile the code (unless we decide to update the interface provided by the code generator), therefore the customer does not have to call us when they want to implement a new property to be compared. They simply modify the underlying XML file in order to achieve what they want! Win-win situation here.

The only drawback is that we have to use a static object in order to pre-parse the data at startup and retrieve the delegates. While that is not a problem in a single-threaded test run, it causes severe issues in multi-threaded test runs. The ugly solution to this is to simply define multiple static instances of your data class and to create delegates for all of those. Keep in mind that this can significantly reduce startup performance.

The Performance Solution: Range-based Reading and Writing

While our code is perfectly maintenance-free by now, we still read/write cell by cell and that is a lengthy task that almost ate ~8 seconds of runtime (!) per test case per service. The problem is that for every cell we read or write to, an interop call to the Excel-assembly has to be issued. The interoperability overhead eats a lot more than actually manipulating the cell. If we instead write an entire range at once, the interop overhead is reduced to a single interop call, which perfectly suits our needs. But reading and writing range-based is not as easy as it may sound - some of the input values are default values and should not be overwritten during the process of writing to the Excel-workbook. The input sheet also does not require you to consistently fill something in row-by-row - there is blank space in between.

Solution: Reading the Default Values at Startup

How about we pre-fetch all the values at startup and write them into a multidimensional array beforehand, so we can modify the array during the test run and write it back into the Excel-worksheet?

private object[,] m_inputValues;

public void Initialize()
{
    // Remember our "RunEntry" class? We will need min/max row/column now to convert R1C1 to A1!
    var a1ref = app.ConvertFormula($"=R{minrow}C{mincol}:R{maxrow}C{maxcol}",
        XlReferenceStyle.xlR1C1, XlReferenceStyle.xlA1).ToString().Substring(1);

    // Now we have e.g. "AA1:BCE8".
    m_inputValues = (object[,]) (worksheet.Range[a1ref].Value2);
}

public void FillExcel(RunData data)
{
    foreach (var entry in data.Entries)
    {
        m_inputValues[entry.Column, entry.Row] = entry.Method();
    }

    // We again need the A1-reference from Initialize().
    worksheet.Range[a1ref].Value2 = m_inputValues;

    // Excel will immediately calculate everything, read the output data next.
}

Explanation: At startup, we call Initialize. That function will read all default values into a private member variable at once. Blank spaces will simply be null after the process. We also do not carelessly read an arbitrary range of unspecific size - upon parsing the XML file, we remember the minimum and maximum row/column we have encountered in the entire file. Those values span the range we need from Excel. When we now insert the data into Excel in function FillExcel, we basically iterate through every RunData entry whilst modifying the multidimensional array we have acquired at startup. (Note: You simply call a delegate like any other function in C#, using the brackets.)

With this solution, I have scaled down the runtime per testcase and per service to ~150 milliseconds!

Closing Words

Writing code that is maintenance-free and performant at the same time is a hard task, especially if you work in the insurance, finance or bank industry, in which data formats continuously change over time. I hope this article helped you to counteract this process by writing code generators, using .NET reflection in combination with Delegates and inventing simple but efficient XML formats that allow the customer to maintain the tool's parameters themselves, if ever needed.

If you have any questions, suggestions or criticism, let me know in the comments below. I will try to answer them as soon as possible.

History

26^thSeptember, 2017: Initial version of the article