Gurux Data Refinery

Gurux Ltd

4.57/5 (6 votes)

Apr 22, 2010

GPL3

11 min read

32466

632

Retrieve data from a physical device, process it, and visualize the results

Download Gurux Data Refinery source code - 3.7 MB

Introduction

Gurux Data Refinery is an application for gathering data, processing it, and visualizing the results. There are also other applications that can do the same operations, but Data Refinery is an all-in-one solution to do it all through the same user interface.

GXDataCollector, a component to gather the data in Data Refinery, retrieves data from selected source, and saves it in a data table. The source can be a physical device of any kind, for example, a web server, production machinery, or metering equipment. GXDataCollector stores the data in a table, in which the first column (column 0) is always the time, so all data collected is always timestamped. Gurux Data Refinery needs to be accompanied by .NET Framework 3.5 (or newer).

The main problem, when needing to collect data from a remote, physical device, often is that there is no overall standard, including all possible options. That is why most of the applications on the market are suitable only for devices of a certain manufacturer, a purpose, and a way of use. This article comes in handy, when you have a device, whose parameter values you need to collect, but have not found an application that can do it, or even one flexible enough, for you to customize it. The example within this article allows you to determine the required properties of the data collector component yourself.

Gurux Data Refinery is maintained as an Open Source project at SourceForge.net, including all released versions, and the integrated application manual.

Latest versions of all source code, and documentation, of the project is available at http://gurux.svn.sourceforge.net/svnroot/gurux/GuruxDataRefinery/.

Background

Like most good ideas, this one was also born to solve an existing problem. As Gurux Ltd went OpenSource (autumn 2009), and we were setting up an Open Source Community, we needed to get some specified information of traffic amounts at our site, and download amounts of our projects at SourceForge.net. We wanted to see, how well the audience responds to our contributions (new product versions, E-mail campaigns, etc.), and get valuable information about this kind of relations. The available statistics, provided by Drupal, Sourceforge, and Google, as good as they are, did not quite cover our needs, because we wanted to combine data from different sources into one display, to see the big picture.

Our developers coded the components for:

retrieving the data (Drupal Collector and Sourceforge.net Collector)
refining the retrieved data (Gaussian Filter and DataJoiner Processor)
visualizing the results (RawData View, Collage View, and Data Graph Visualizer)

The first version of the application, and the components we originally coded for our own use, were released as Open Source, to give the same possibilities for other communities using Drupal based web sites, and / or Sourceforge.net as a source code host. Soon after the first release, also came the idea of a data collector that would be totally customizable by the user. This article is the result of developing that idea.

Main Components of DataRefinery

Repository is the basic unit in Data Refinery. It can be seen as a container that deposits the data tables with data collected from devices, and includes the collectors, processors, and events. You can control one or more Repositories in your Repositories Collection.

Collector is able to collect vast amounts of data, straight from any physical devices, them being a production system, a web site servers, moisture measurement equipment, energy meters, thermometers, or devices of what ever kind. The Collector gathers data into so called RawData tables.

Processor refines the collected data for different purposes of use. The type of the processing component can be, for example, a filter, a series of calculations, or whatever operation to be done on the data. You can set several operations to be done one after another.

With Visualizer, you can display the collected and or processed data in a way of your own choice; numeric table (so called raw data), curves, diagrams, etc.

Structure

A collection of Repositories, displayed in the Data Refinery Repository Tree (pictured on the left), can be seen as a bank vault, with separate safety deposit boxes.

Each deposit box ( = Repository), holds a content (= items), and the contents of separate boxes can be used either separately, or together, as combined assets.

The amount of items, in a Repository, is not limited, so you can group up items as you see fit. Items in same Repository (Collectors 01 and 02, in Repository 01, in the picture) need not to have anything in common, but can share a Visualizer, if one is set to the Repository.

Repositories

Each Repository can include one or more GXDataCollectors and / or GXDataProcessors. Each Collector holds a RawDataTable, which can have one or more rows, and one or more columns. All the data collected, processed, and visualized in DataRefinery is timestamped. In all DataRefinery projects, Time is the primary variable, and all data is compared against it. Therefore, the first column, in all data tables in DataRefinery, is always the time.

A Repository can hold even only processors, and use the data of Collectors in another Repository. For example, Repository 02, in the picture, holds Processor 02, which joins the data from Collectors 01 and 02, in Repository 01. This Processor 02 has a child; Processor 03, which then filters the joined data.

Collectors

Though a collector can gather data from any physical device, we have coded and published collectors only for gathering:

download amounts from Sourceforge.net
traffic amounts at our Drupal based Gurux Web site

In addition to this, we have included instructions and code samples to create a collector of your own, later on in this article.

Processors

The data of a Collector can be processed by more than one Processor. In multi-processor cases, the first operation to be done, needs to be specified in a parenting processor. The second operation is then set in a child processor to the first, and it parents the processor doing the third operation and so on. The structure is not actually hierarchical in multi-processor cases, but simply shows the order, in which processing operations are carried out.

Gaussian Filter Processor 'smoothens' the highs and lows of the curve, so the trend of progress is easier to see.
Data Joiner Processor joins data from separate sources, to be shown in same visualization. You can select from four options:
- Combined displays the curves together, but separate, as such.
- Sum adds up the all values of each date/time from selected sources, and displays the sum as curve.
- Incremental sum is like the sum, but it also displays the shares of separate sources, as layers.
- Total adds up the all values of each date/time from all sources, and displays the total sum as curve.

In addition to this, we have included instructions and code samples to create a processor of your own, later on in this article.

Visualizers

When talking about a visualization, with Data Refinery, we mean a graphical presentation of the data. Every item in Data Refinery, each repository, collector, and processor, can be set an own visualizer.

RawData View is the simplest Visualizer, which displays the data as a numeric data table.
Collage View allows you to show multiple types of data presentations in one view.
Data Graph Visualizer, allows you to select
- whether the curve is filled (from zero...to amount) or a single line
- GraphType: Linear, Logarithmic, Ordinal, or Exponential
- whether units are displayed or hidden
- whether the data is presented as a curve, horizontal/vertical bars, horizontal/vertical percent stacks, or as a pie
- whether a symbol, marking the dates, is displayed on the curve, and what kind of a symbol to use (square, diamond, triangle pointing up / down, circle, XCross, plus sign, star, horizontal / vertical dash, or a user defined pattern

If the item does not have a Visualizer of its own, the visualization is inherited from the parenting item.

Create Your Own Collector

Because sources of the data vary a lot, we provide an illustrated walk-through to create a data collector of your own, with Visual Studio.

The following instructions provide a simple example of creating a collector, to gather timestamped temperature data. The example is kept simple, to make it easier to apply, to create any kind of data collector(s) of your own. At the end of the example, you find most common problems, and ways to solve them, under the headline Troubleshooting.

Work with Gurux DataRefinery Source Code

With this article, comes the current, May 2010, version of the applications source code. In the future, check for the latest version at http://gurux.svn.sourceforge.net/viewvc/gurux/GuruxDataRefinery/.

The following example is included in the source code directory of this article, in the \Development\Collectors\GuruxFileCollectorSample directory as GXFileCollectorSample.cs.

Open Gurux DataRefinery solution (GuruxDataRefinery.sln) in Visual Studio.
Create a ClassLibrary project in Collectors directory.

Note: It is vital to create the ClassLibrary project in Collectors directory, if created elsewhere, the Post-build event (see next step) will not work correctly.
Copy the Post-build event from an existing GXCollector (Gurux Drupal Collector, or Gurux SourceForge.net Collector), and paste it to the Post-build event of the new collector. This copies the dynamic link library (.dll file) of the new collector, in the relevant directory (for example: bin\debug\collectors), after building the project.
Add Gurux.DataRefineryAddIn project as Reference.

Inherit GXDataCollector to the class, and implement it.

namespace Gurux.FileCollectorSample
    {
        public class GXFileCollectorSample : GXDataCollector
        {
            public override ShownValues DefaultValueType
	    {
	        get
	            {
		        return ShownValues.None;
		    }
	    }
        }
    }

Then, implement the basic properties for the new collector:

TypeName
Description
DisplayUnit

Implement InitializeDefault()

Create a GXDataTable in DataSet Data
Create Columns

public override void InitializeDefault()
    {
        GXDataTable table = new GXDataTable();
        this.Data.Tables.Add(table);
        table.Columns.Add(new GXDataColumn("TimeStamp", typeof(DateTime)));
        table.Columns.Add(new GXDataColumn
		("Temperature", typeof(double), Color.Red));
    }

Add the required public properties, so the end user can edit the options of the collector.
For example, IPAddress, or FilePath.
```
public string FilePath
    {
        get;
        set;
    }
```

Finally, implement the code that actually collects the timestamped data; CollectData(bool force).

Notes:

The gathered data is appended in DataSet member Data.
Thrown exceptions are handled, and displayed on the user interface.

public override void CollectData(bool force)
    {
        if (!File.Exists(FilePath))
        {
	    throw new Exception("The specified file could not be found.");
        }
        try
	{
	    string fileContent = File.ReadAllText(FilePath);
	    List valuePairs = new List();
	    valuePairs.AddRange(fileContent.Split
			(Environment.NewLine.ToCharArray(), 
			StringSplitOptions.RemoveEmptyEntries));

            foreach (string pair in valuePairs)
            {
                DateTime timeStamp = DateTime.Parse(pair.Split(',')[0], 
			CultureInfo.InvariantCulture);
		double temperature = Double.Parse(pair.Split(',')[1], 
			CultureInfo.InvariantCulture);
		Data.Tables[0].Rows.Add(new object[] { timeStamp, temperature });
            }
        }
        catch (Exception ex)
        {
            throw new Exception("Possible file format error: " + ex.Message);
        }
    }

Build your project.

Troubleshooting

If the GXCollector that you have created does not appear in the list of available collectors in DataRefinery Collector Settings dialog:

its dynamic link library may be displaced, or
GXDataCollector may not be inherited properly.

To make the GXCollector functional:

Make sure that the .dll file of the collector exists in Collectors directory of Data Refinery.
Make sure that GXDataCollector is inherited to the class, and implemented correctly, as shown in the example above, step 5.

Part II

Create Your Own Processor

In case you find our processors inadequate, and want to create a processor of your own, we provide instructions and sample codes also to create a processor, with Visual Studio.

The following instructions provide a simple example of creating a processor, which checks each value in the data table, and alerts, if the preset maximum value is exceeded. The example is kept simple, to make it easier to apply, to create any kind of data processor(s) of your own. At the end of the example, you find most common problems, and ways to solve them, under the headline Troubleshooting.

Work with Gurux DataRefinery Source Code

With this article, you now find the current, updated (May 2010), version of the applications source code. In the future, check for the latest version at http://gurux.svn.sourceforge.net/viewvc/gurux/GuruxDataRefinery/.

The following example is included in the source code directory of this article, in the \Development\Processors\GuruxAlertProcessorSample directory as GXAlertProcessorSample.cs.

Open Gurux DataRefinery solution (GuruxDataRefinery.sln) in Visual Studio.
Create a ClassLibrary project in Processors directory.

Note: It is vital to create the ClassLibrary project in Processors directory, if created elsewhere, the Post-build event (see next step) will not work correctly.
Copy the Post-build event from an existing GXProcessor (Gurux Gaussian Filter Processor, or Gurux Data Joiner Processor), and paste it to the Post-build event of the new processor. This copies the dynamic link library (.dll file) of the new processor, in the relevant directory (for example: bin\debug\processors), after building the project.
Add Gurux.DataRefineryAddIn project as Reference.

Inherit GXDataProcessor to the class, and implement it. Also, reset the Maximum value.

namespace GuruxAlertProcessorSample
{
    public class GXAlertProcessorSample : GXDataProcessor
    {
        double m_Maximum = 0;
    }
}

Implement Maximum, the value, to which the values of the data table are compared.
When the user changes the maximum value, all data is to be updated, and ForceUpdate method is called.

public double Maximum
    {
        get
        {
            return m_Maximum;
        }
        set
        {
            bool change = m_Maximum != value;
            m_Maximum = value;
            if (change)
            {
                ForceUpdate();
            }
        }
    }
}

Then, implement the basic properties for the new processor:
- TypeName
- Description

Next, implement the code that actually checks in each cell of the data table, if the maximum value is exceeded, and reports an error if it is.

 public override void ProcessData(object sender, GXDataSet changedItems)
        {
            foreach (GXDataTable table in changedItems.Tables)
            {
                GXDataTable newTable = this.Data.Tables.Find(table.TableName);
                //Add table columns and primary keys, but not rows.
                if (newTable == null)
                {
                    newTable = table.Clone(false);
                    this.Data.Tables.Add(newTable);
                }                
                for (int column = 0; column < table.Columns.Count; ++column)
                {
                    if (IsNumeric(table.Columns[column].DataType))
                    {
                        foreach (DataRow row in table.Rows)
                        {
                            try
                            {
                                double val = Convert.ToDouble(row[column]);
                                //Report an error if value is exceeded.
                                if (val > Maximum)
                                {
                                    throw new Exception("Value is out of limits.");
                                }
                                DataRow dr = newTable.NewRow();
                                dr.ItemArray = row.ItemArray;
                                newTable.Rows.Add(dr);
                            }
                            //Exception is handled and notified here, 
                            //because the checking is wanted to continue, 
                            //even if an exceeding value is found.
                            catch (Exception Ex)
                            {
                                NotifyError(this, Ex);
                            }
                        }
                    }
                }
            }
        }

Build your project.

Troubleshooting

If the GXProcessor that you have created does not appear in the list of available processors in DataRefinery Processor Settings dialog:

its dynamic link library may be displaced, or
GXDataProcessor may not be inherited properly.

To make the GXProcessor functional:

Make sure that the .dll file of the processor exists in Processors directory of Data Refinery.
Make sure that GXDataProcessor is inherited to the class, and implemented correctly, as shown in the example above, step 5.

Improvements and Additions

This article was first posted on 22^nd of April 2010. At that point, it included an example to create a data collector of your own.

In early May, we received an E-mail reporting a bug found in the source code: collector retrieved the data ok, but updating it to the processor did not work. The bug was fixed, and new source code was posted on 5^th of May.

We received a lot of positive feedback about Gurux DataRefinery, and therefore we revised the article by adding a "Part II" in it. Part II provides you with a sample code, and instructions, for creating a data processor of your own. Revised article, and yet another updated version of the source code directory, (with the necessary sample code) was posted on 11^th of May 2010.