Click here to Skip to main content
15,881,173 members
Articles / Productivity Apps and Services / Microsoft Office / Microsoft Excel

A Silverlight control to parse Excel data into a flat collection

Rate me:
Please Sign up or sign in to vote.
4.31/5 (4 votes)
7 Apr 2014CPOL7 min read 8.9K   195   6  

Introduction

Many a times there is a need to import data from Excel files into a database or even just generally parse the Excel data into a typed collection in memory that can then be used for various purposes. Here, I describe a Silverlight user control that I have written that should make things easier for developers working on a similar scenario.

My primary use case for this control is to help me with building Excel data import functionalities into web applications rather quickly and that's going to be my angle to explain the workings of this control.

Background

In my work I have often had to provide a functionality in my web apps to upload data from Excel files into a database (SQL server mostly). One of the common pain points in doing this is that the Excel data is very difficult to standardise particularly when the Excel file format is not strictly in your control and the data is collated by someone else. Things like typos in column headers, lazy shortening of names like "Option" becomes "opt", variable order of columns etc. are more than common in trying to solve this problem.

One of the 16th century ways of doing this was to install JET ODBC engine and Excel package on the server, physically upload the file onto the server and then using JET to read the files out and squirt data in the database. Needless to say, this is an old fashioned approach and is not supported by Microsoft anymore, server admins don't like putting packages like Excel on servers and most importantly its not very scalable.

Another way is to use SSIS (SQL Server Integration Service) and set up import packages that will do this kinda like ETL but that would require users (who aren't necessarily technically aware) to know how to set these up and configure. They want something simple that allows them to import Excel file without knowing too much tech.

So the problem that I wrote this control for is exactly that, as mentioned earlier, it helps me build Excel data import functionality very quickly. There can be other use cases, like parsing the Excel data into a typed collection to work with instead of having to faf about with strings. Let's see...

Workflow

Let's say my Excel file looks like this which I want to import into a database:

Image 1

I copy the data from the above Excel and paste into the text box as shown below:

Image 2

Click "Next" and select a value in the dropdowns on the right hand side that represents the expected column name on the left hand side:

Image 3

Finally, I click "Import" and the pasted data is parsed out into a flat column value collection ready to be consumed by the client side code.

NB: The thing to notice here is that the drop downs in the mapping screen contain the Excel column headers in the same order as they appear in the Excel file above, so by selecting a value in the dropdowns and by reading in their SelectedIndex property, I am in effect selecting the index of the column in the Excel data. This is key to this whole concept of "copy, paste, map and import" process.

Architecture

This control is structured as shown below:

Image 4

Legend:

TDTP: TabDelimitedTextParser (the actual control that has 2 panels, one for accepting the raw data from excel into a text box and another to generate the column mapping screen into based on the expected columns sent by the client code)

CCM: ComboboxToColumnMapping(a helper class containing a Dictionary<int, string> that stores the number suffix in drop downs' names (for e.g. cmb1, cmb2, cmb3) as a key and the expected column name as the value. This is used to track which column on the left the selections in the dropdowns on the right are being made for. Its just much easier than storing the whole combo box's name. And since these dropdowns are being generated at runtime by the system, names are under my control so I can take a few liberties with them without having to worry someone else changing the names and breaking this tracking)

C#
public class ComboboxToColumnMapping
    {
        private Dictionary<int, string> comboBoxNumberToExpectedColumnMappings;
 
        public ComboboxToColumnMapping()
        {
            comboBoxNumberToExpectedColumnMappings = new Dictionary<int, string>();
        }
        
        public void MapComboboxNumberToExpectedColumn(int comboBoxNumberSuffix, string expectedColumn)
        {
            if (!comboBoxNumberToExpectedColumnMappings.ContainsKey(comboBoxNumberSuffix))
            {
                comboBoxNumberToExpectedColumnMappings.Add(comboBoxNumberSuffix, expectedColumn);
            }
        }
 
        public string GetExpectedColumnComboboxIsMappedTo(int comboBoxNumberSuffix)
        {
            string expectedColumn = String.Empty;
 
            if (comboBoxNumberToExpectedColumnMappings.ContainsKey(comboBoxNumberSuffix))
            {
                expectedColumn = comboBoxNumberToExpectedColumnMappings[comboBoxNumberSuffix];
            }
 
            return expectedColumn;
        }
    } 

CIM: ColumnToIndexMapping(a helper class containing a Dictionary<string, int> that stores the expected column name as the key and the SelectedIndex value from the dropdowns as the value. This essentially maps the expected columns on the left to the available columns in drop down on the right)

C#
public class ColumnToIndexMapping
    {
        private Dictionary<string, int> expectedColumnToIndexMappings;
 
        public ColumnToIndexMapping()
        {
            expectedColumnToIndexMappings = new Dictionary<string, int>();
        }
 
        public void SeedColumnToIndexMappingTable(List<string> expectedColumns)
        {
            for (int i = 0; i < expectedColumns.Count; i++)
            {
                //bug fix: check to prevent duplicate keys. 5-Apr-2014
                if (!expectedColumnToIndexMappings.ContainsKey(expectedColumns[i]))
                {
                    expectedColumnToIndexMappings.Add(expectedColumns[i], -1);
                }
            }
        } 
       
        public void MapColumnToIndex(string columnName, int selectedIndex)
        {
            if (expectedColumnToIndexMappings.ContainsKey(columnName))
            {
                expectedColumnToIndexMappings[columnName] = selectedIndex;
            }
        }
 
        public int GetMappedColumnIndex(string columnName)
        {
            int index = -1;
 
            if (expectedColumnToIndexMappings.ContainsKey(columnName))
            {
                index = expectedColumnToIndexMappings[columnName];
            }
 
            return index;
        }
    } 

FCB: FlatCollectionBuilder (the output builder)

C#
public class FlatCollectionBuilder
    {
        public static Dictionary<string, List<string>> PrepareFlatColumnValuesCollection(string rawPastedData, List<string> expectedColumns, ColumnToIndexMapping columnToIndexMapping)
        {
            ImportParser importDataParser = new ImportParser(rawPastedData);
            string[] allRows = importDataParser.GetRows();
            Dictionary<string, List<string>> flatColumnValues = new Dictionary<string, List<string>>();
 
            foreach (string expectedColumn in expectedColumns)
            {
                List<string> valuesForThisColumn = new List<string>();
                int expectedColumnIndex = columnToIndexMapping.GetMappedColumnIndex(expectedColumn);
 
                //don't include the ones that have been left on their default selections
                if (expectedColumnIndex >= 0)
                {
                    for (int i = 1; i < allRows.Length; i++)
                    {
                        string[] columnsInThisRow = importDataParser.GetColumns(allRows[i]);
                        valuesForThisColumn.Add(columnsInThisRow[expectedColumnIndex]);
                    }
 
                    flatColumnValues.Add(expectedColumn, valuesForThisColumn);
                }
            }
 
            return flatColumnValues;
        }
    } 

DDE (not shown): DelimitedDataExtractor (helper to parse the tab and newline delimited Excel data into string array using the Split(...) method of the String class, the actual splitting funcationality is wrapped into another DLL called ToolsNUtils.AgLight and is available with the source code)

C#
public class DelimitedDataExtractor
    {
        public static List<string> ExtractColumnsFromPastedData(string rawPastedData)
        {
            ImportParser importDataParser = new ImportParser(rawPastedData);
            string[] allRows = importDataParser.GetRows();
            string[] allColumns = importDataParser.GetColumns(allRows[0]);
            List<string> columnsInPastedData = allColumns.Where(x => !string.IsNullOrWhiteSpace(x)).Select(x => x).ToList();
            columnsInPastedData.Insert(columnsInPastedData.Count, Resource.ResourceManager.GetString("PleaseSelect"));
 
            return columnsInPastedData;
        }
    } 

The output of this control is a flat 1 dimensional column value table i.e. Dictionary<string, List<string>> that looks like this:

Image 5

Each key represents a column name and the associated value i.e. List represents a list of values for that column. There is a strict one to one correspondence between values under different keys and it represents one row of the Excel data as shown below for e.g. in the above image, John’s last name is Gostick and date of birth is 10/11/1982 similarly, Mike’s last name is Bennett and date of birth is 11/12/1983.

Image 6

This data is then ready to be transformed into a typed collection like an ObservableCollection<Student> in this case and imported into the database.

The most important thing that makes this control really flexible is that the dropdown mapping screen is entirely generated at runtime based on the list of expected columns sent to it. Those expected columns form the labels (column names) on the left and the headers from the Excel data pasted into this control get populated into the drop downs on the right.

Using the code

This control can be hosted in a Silverlight child window or a another usercontrol or a page etc. and the client code would need to pass it a List<string> which is a list of expected columns it expects to see in the flat collection and an Action<Dictionary<string, List<string>>> to execute once the output is ready, as shown below this control is hosted inside a Silverlight ChildWindow :

Image 7

XAML markup of the hosting ChildWindow:

C#
<controls:ChildWindow x:Class="SimpleTabDelimitedDataImporter.Popups.ImportPopup"
           xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" 
           xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" 
           xmlns:controls="clr-namespace:System.Windows.Controls;assembly=System.Windows.Controls"
           Width="Auto" Height="Auto"            
           xmlns:ImportControls="clr-namespace:SimpleTabDelimitedDataImporter.Controls" Loaded="ChildWindow_Loaded">
    <Grid x:Name="LayoutRoot" Margin="2">
        <Grid.RowDefinitions>            
            <RowDefinition Height="Auto" />
        </Grid.RowDefinitions>
 
        <ImportControls:TabDelimitedTextParser x:Name="copyPasteImporter" Grid.Row="0"></ImportControls:TabDelimitedTextParser>
    </Grid>
</controls:ChildWindow> 

Code behind of the pop-up:

C#
public partial class ImportPopup : ChildWindow
    {
        private List<string> expectedColumns;
        private Action<Dictionary<string, List<string>>> columnValuesReadyCallback;
 
        public ImportPopup()
        {
            InitializeComponent();
            expectedColumns = new List<string>();
        }
 
        public ImportPopup(List<string> expectedColumns, Action<Dictionary<string, List<string>>> columnValuesReadyCallback)
        {
            InitializeComponent();
 
            if (expectedColumns != null)
            {
                this.expectedColumns = expectedColumns;
            }
            else
            {
                this.expectedColumns = new List<string>();
            }
 
            this.columnValuesReadyCallback = columnValuesReadyCallback;
        }        
 
        private void ChildWindow_Loaded(object sender, RoutedEventArgs e)
        {
            SetLocalisedWindowTitle();
            copyPasteImporter.ExpectedColumns = this.expectedColumns;
            copyPasteImporter.ColumnValuesReadyCallback = this.columnValuesReadyCallback;
        }
 
        private void SetLocalisedWindowTitle()
        {
            this.Title = Resource.ResourceManager.GetString("ImportPopup");
        }
    } 

As can be seen, I prepare a list of columns that I want to map the Excel columns to (i.e. expect to see), define an Action that I want executed once the output is ready and finally load the popup with these input parameters:

C#
public void ImportData()
        {            
            List<string> expectedcolumns = new List<string>();
            expectedcolumns.Add(Helpers.Constants.FIRSTNAME);
            expectedcolumns.Add(Helpers.Constants.LASTNAME);
            expectedcolumns.Add(Helpers.Constants.DATEOFBIRTH);
            importPopup = new Popups.ImportPopup(expectedcolumns, new Action<Dictionary<string, List<string>>>(ColumnValuesReady));
            importPopup.Show();
        }
 
        private void ColumnValuesReady(Dictionary<string, List<string>> columnValues)
        {
            if (columnValues != null && columnValues.Count > 0)
            {
                importPopup.DialogResult = true;
                ImportStudentData(columnValues);
            }            
        } 

The ColumnValuesReady callback is executed when the results are ready and as you can see, I have chosen to import this data into the database. But how can I import this flat one dimensional data into a database table? Well, I convert it into a typed collection first that is representative of the table structure as shown below:

C#
private ObservableCollection<Services.StudentDTO> ConvertToDTOs(Dictionary<string, List<string>> flatColumnValues)
        {
            ObservableCollection<Services.StudentDTO> studentsToBeImported = new ObservableCollection<Services.StudentDTO>();
            //parsing logic. Since the number of values under each key would always be the same (Excel data would be x rows and y columns
            //but the number of cells under each column would always be same, you cannot select jagged data. With notepad its possible
            //so would need to handle that case)
            //get the iteration count from under any key here i have used the first key
            int numberOfStudents = flatColumnValues[Helpers.Constants.FIRSTNAME].Count;
 
            for (int i = 0; i < numberOfStudents; i++)
            {
                Services.StudentDTO student = new Services.StudentDTO();
                student.Id = Guid.NewGuid();
                student.FirstName = flatColumnValues[Helpers.Constants.FIRSTNAME][i];
                student.LastName = flatColumnValues[Helpers.Constants.LASTNAME][i];
                student.DateOfBirth = DateTime.Parse(flatColumnValues[Helpers.Constants.DATEOFBIRTH][i]);
                studentsToBeImported.Add(student);
            }                                   
            
            return studentsToBeImported;
        } 

The beauty here is that due to one to one correspondence of the values under different columns, I can easily find out how many student records I have to import and use that as the iteration limit numberOfStudents.

So by doing:

C#
int numberOfStudents = flatColumnValues[Helpers.Constants.FIRSTNAME].Count; 

I can find out the number of first names of students which are indicative of how many students there are (one student will only have one name). I could have just as well picked LASTNAME or DATEOFBIRTH, and that would have give me the same thing, 3 (look at screenshot of the Excel data, there are 3 students I am importing into database).

Points of Interest

Although, this pretty much does the job that I wanted it to I am sure its far from perfect which is why sharing on such sites helps because someone might find a bug and/or fix it or let me know about it. One of the cases that I would like to pursue is how to handle relational objects so for e.g. Customer-Order data is a 2 dimensonal data structure but this control only gives 1 dimensional structure back and all are string types so perhaps the client side code would have to do the shaping up of the data which doesn't sound too unfair. The control is doing what its designed to do no more, no less.

Please feel free to ask questions or send comments, criticisms in the comments section below and use the code in anyway you like. I hope you find it useful if you find yourself working on a similar problem from time to time.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior)
United Kingdom United Kingdom
A software developer in pursuit of craftsmanship and the zen of software.

Comments and Discussions

 
-- There are no messages in this forum --