Click here to Skip to main content
15,892,161 members
Articles / All Topics

Structuring Your Excel – “The Hidden Agenda”

Rate me:
Please Sign up or sign in to vote.
5.00/5 (2 votes)
5 Jan 2017CPOL7 min read 4.6K   1  
Most developers don't like Excel as a "development platform". Structuring your Excel makes it maintainable, and makes replacing it by some code possible.

Introduction

Most developers don’t like Excel as a “development platform”. Many projects are migration projects from an Excel solution to a “real application”. And often, this is worth the trouble. But in some cases, Excel has a lot of advantages.

An Excel workbook can be set up in many ways, and it usually starts off very small, to end in a big spaghetti where nobody knows what is going on. And nobody dares to change a thing. Sounds like a typical spaghetti .NET (or enter your favorite language) project. So let’s try to make our Excel project manageable.

Structuring Your Excel

Every workbook starts with the first sheet. And initially, there are some cells that we use as input, and then some cells are used as output. If you want to keep a nice overview of what is happening, it is worth creating separate sheets for separate concerns.

image

Input Sheet(s)

Use this sheet to gather all the input parameters from your users. This can be a simple sheet. In the end, this is the user interface to your workbook, so make it easy for users to enter data. Use data validation and lookups to limit errors. The input sheets should be the only thing that can be modified by the end-users.

Enriching the Input Sheet

image

You can also do some simple calculations that are local to the input sheet. For example, adding 2 input fields together to see their total may be handy here. This also gives immediate feedback to the user. Looking up the city that goes with a ZIP code is another good example. I know that most of us have a problem remembering the syntax of “lookup” in Excel, hence the link Winking smile.

Depending on the nature of your applications, there can be 1 or more input sheets. For example, a simulation using some parameters will typically contain 1 input sheet, where an accounting application (with cash books, bank books, …) may contain multiple input sheets.

Principle: All the calculations that only concern the input data can be done already here. Using some data from the datasheets (as static data) can also be done here. This will give us the first intermediate results.

Output Sheet(s)

image

These sheets will contain the output for your users. The sheets should not contain calculations, only formatting. They will present the results from the calculation sheet(s). Of course, formatting means changing fonts, colors, … and also cell formats.

Principle: This sheet contains no calculations, only formatting. Everything should be calculated in the calculations sheets already.

Data Sheet(s)

image

Your workbook will probably need some data, and if you’re lucky, this data is in a structured format. This data serves as static input for your calculations.

Often, you will want to calculate some results per data row. This can be done in a separate sheet per data sheet that will contain all the necessary calculations that are only using data from that data sheet. Eventually, you will want this data to be stored in a database to be able to access it from many applications (think reporting, for example).

To accommodate for this, you can create the data sheets to contain only raw data, and then

  • Add columns to the raw data that contain calculations on the data per row. Maybe you have some fields to be added already in the data that you’ll need later, some Lookups to do. All that does not involve the input data can be done here. Make sure you put the calculations away from the data (separate them by some empty columns for later expansion of the raw data). It may also be a good idea to use colors to indicate which are the calculated fields.
  • Add a new sheet that will contain all the calculations for this data sheet that depend on the input parameters. This sheet will calculate intermediate results that are needed later.

Principle: Separate data from calculations, either by adding calculated columns at the end of the raw data, or by adding dedicated sheets to calculate these intermediate results.

Calculation Sheet(s)

This is where you perform more complex calculations. Some simple calculations can be done on the input sheets and the data sheets already, but all the complex work should happen here. The calculation sheets will use the (calculated) data from many sheets and combine this data to become the final (unformatted) results. Because all the simple calculations have been done in the input- and datasheets, the calculation sheet is just consolidating this data.

Principle: This is where the sheets in the workbook are combined to produce the end result. The local calculations per sheet are done already, so that only the consolidation logic remains.

Keep Your Calculations As Local As Possible

In the proposed structure, it is easy to see that calculations are done in the input sheet over only the input data (and maybe some lookups using the data sheets). Calculations over the data is done in a separate sheet per data sheet.

It is only in the calculation sheets that data will be combined from different sheets. In this way, the workbook remains manageable, even when it grows bigger.

Structuring your workbook like this will also make intermediate results visible. These intermediate results will be calculated only once, and can be used everywhere. This has a couple of advantages:

  • The workbook is simpler because the intermediate results are calculated as close as possible to their data.
  • It is easier to create more complex formulas when the intermediate results are calculated already. Instead of trying to squeeze everything in 1 formula, there are now building blocks available to be used.
  • The results become more consistent. The formulas for calculating the intermediate results are not repeated all over the workbook. So there is less of a risk that in some instances a formula is (slightly) different. This will prevent hard to solve errors.
  • The results are easy to test. The intermediate results are simple, so verifying if they are correct is easy. The more complex formulas are using the intermediate results as building blocks, so they become much easier to test too. If something goes wrong it is easier to track back where it went wrong. This can be seen as a kind of unit testing.

Did you notice how I don’t care about performance in this case? That is because the other advantages outweigh this by far. But of course, there could be a performance gain as well.

Named Cells and Regions

This is an easy one. Excel allows to name cells or ranges of cells. Try to use a naming convention, for example:

  • Cells containing input fields will have a name starting with I_  (for example I_Zip code)
  • Ranges containing data for lookups can start with l_  (for example l_locations)
  • Cells containing intermediate results can start with c_ (for example c_TotalConsumption)

In this way, the purpose of each cell or range is clear.

An additional advantage is that you can now move the named cells and ranges elsewhere in the workbook if you need to. As long as the new zone gets the same name, all the formulas referring to it will still work.

Excel as a Functional Language

If you look closely at an Excel workbook, then you’ll notice that the cells either contain input values which can be modified by the users, or output values. The output values are obtained by calculating formulas in various sheets, and with many dependencies. But each formula contains functions that have no side effects (unless you start to use non-deterministic functions of course).

So calling a function will never do something bad to your input data. It will of course update the cell which contains the formula, in a deterministic way.

Conclusion – Our Hidden Agenda

Once the workbook is properly structured, it becomes easy to separate the data from the functions. The functions are now very clear and easy to implement in some programming language. The names ranges can be the names of variables in your program, making the match easy.

We use the Excel workbook here as input – processing – output, which can be perfectly done in a Functional language, such as F#. You will end up with data coming from the database, input parameters and functions to be applied. This will then give the end results. More on this in a later post.

Image 6 Image 7

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Architect Faq.be bvba
Belgium Belgium
Gaston Verelst is the owner of Faq.be, an IT consultancy company based in Belgium (the land of beer and chocolate!) He went through a variety of projects during his career so far. Starting with Clipper - the Summer '87 edition, he moved on to C and mainly C++ during the first 15 years of his career.

He quickly realized that teaching others is very rewarding. In 1995, he became one of the first MCT's in Belgium. He teaches courses on various topics:
• C, C++, MFC, ATL, VB6, JavaScript
• SQL Server (he is also an MSDBA)
• Object Oriented Analysis and Development
• He created courses on OMT and UML and trained hundreds of students in OO
• C# (from the first beta versions)
• Web development (from ASP, ASP.NET, ASP.NET MVC)
• Windows development (WPF, Windows Forms, WCF, Entity Framework, …)
• Much more

Of course, this is only possible with hands-on experience. Gaston worked on many large scale projects for the biggest banks in Belgium, Automotive, Printing, Government, NGOs. His latest and greatest project is all about extending an IoT gateway built in MS Azure.

"Everything should be as simple as it can be but not simpler!" – Albert Einstein

Gaston applies this in all his projects. Using frameworks in the best ways possible he manages to make code shorter, more stable and much more elegant. Obviously, he refuses to be paid by lines of code!

This led to the blog at https://msdev.pro. The articles of this blog are also available on https://www.codeproject.com/script/Articles/MemberArticles.aspx?amid=4423636, happy reading!

When he is not working or studying, Gaston can be found on the tatami in his dojo. He is the chief instructor of Ju-Jitsu club Zanshin near Antwerp and holds high degrees in many martial arts as well.

Gaston can best be reached via https://www.linkedin.com/in/gverelst/.


Comments and Discussions

 
-- There are no messages in this forum --