Click here to Skip to main content
15,891,864 members
Articles / Web Development / HTML
Tip/Trick

Delphi CSV File and String Reader Classes

Rate me:
Please Sign up or sign in to vote.
4.80/5 (15 votes)
12 Mar 2015CPOL6 min read 76.7K   2.4K   11   25
TnvvCSVFileReader and TnvvCSVStringReader are light weighted and fast classes that resemble unidirectional data set

Introduction

Classes I present here are functionally identical to classes described in the article C# CSV File and String Reader Classes and have the same set of public methods and properties that are explained there in detail. All said in that article is also true in this case. It is recommended to read that article first since I am not going to repeat everything here even though there are some minor differences like variable types, etc., but the relation between Delphi and C# specifics is obvious. Below, I will outline the CSV Reader features and will also provide the information related to Delphi code use.
Version 2.0 update: Version 2.0 significantly improves performance and adds encoding control to TnvvCSVFileReader, which resulted in slight modification of public interfaces of base TnvvCSVReader (minor change) and derived TnvvCSVFileReader (more significant change) classes comparing to version 1.0. Differences are explained in the section "Notable difference between Delphi and C# CSV Reader classes" below. Performance related notes are in "History" section below.

TnvvCSVFileReader and TnvvCSVStringReader are light weighted and fast classes that resemble unidirectional data set. They are very simple to use and have properties that allow handling number of existing variations of CSV and “CSV-like” formats.

Classes are derived from abstract TnvvCSVReader class that does not specify data source and instead works with instance of TTextReader class.

TnvvCSVFileReader and TnvvCSVStringReader accept file and string as data sources respectively. They introduce additional “CSV source” related properties and override the abstract method that returns instance of specific TTextReader descendant:

Delphi
function CreateDataSourceReader: TTextReader; virtual; abstract;

Classes for other CSV data sources can be created in a similar way.

CSV Reader Features

  • Supports three kinds of line delimiters: <CR>, <CR><LF> and <LF>, all of which can be present in the same CSV file simultaneously. Consequently, the <LF><CR> pair will result in an empty line. This situation can nonetheless be handled by setting property IgnoreEmptyLines to true.
  • Presence of header in the very first record of file is controlled by boolean property HeaderPresent.
  • Empty lines can be ignored (by default, they are not ignored).
  • Number of fields is auto-detected (by default) on the base of the first record or must be set explicitly if auto-detection is off.
  • Field separator by default is comma (0x2C) but virtually any (Unicode) character can be used, for example, TAB, etc.
  • Field quoting allows multi-line field values and presence of quote and field separator characters within the field. By default, it is assumed that field may or may not be enclosed in quotes but reader can be instructed not to use field quoting.
  • Quote character by default is double quotes (0x22) but virtually any (Unicode) character can be used. It is assumed that quote character is also used as an escape character.
  • Unicode range of the character codes is assumed by default but can be limited to ASCII only by setting corresponding property to true.
  • Characters with codes below 0x20 (and above 0x7E in ASCII case) are considered to be “Special characters” and by default must not appear in the file. That requirement does not affect line delimiters and field separator and/or quote character if they are from this range. As an option, the reader can be instructed to simply ignore the special characters.
  • Reader itself does not use buffering. It uses memory just enough to store field names and field values of the current record. If any buffering is happening, then standard Delphi classes like TStreamReader and TStringReader are responsible for that.
    Version 2.0 update: Version 2.0 does use buffering in order to significantly improve performance. Performance related notes are in below "History" section.
  • Reader supposedly is fast since it reads each character directly from TTextReader and analyzes character just once, i.e., reader does one-pass parsing. Also, parser uses minimum conditional logic.

Using the Code

Use is straightforward. Simply create an instance of corresponding class, specify the source of CSV data, modify some properties if necessary, call Open, and iterate through records calling Next. Within each record, iterate through the field values. Call Close when done.

Using TnvvCSVFileReader Class

Delphi
uses Nvv.IO.CSV.Delphi.NvvCSVClasses;

procedure ReadCSVFile(const ACSVFilePath: string);
var
  csvReader: TnvvCSVFileReader;
  i: Integer;
begin
  //Constructor can have parameter that, if >0 and <>512(default), sets buffer size in chars
  csvReader := TnvvCSVFileReader.Create;
  try
    //Specify source CSV data file using one of the three overloaded methods.
    //If, for example, it is ASCII file:
    csvReader.SetFile(ACSVFilePath, TEncoding.ASCII);
    // Modify values of other input properties if necessary. For example:
    csvReader.HeaderPresent := True;

    csvReader.Open;

    if (csvReader.HeaderPresent) then
      for i:=0 to csvReader.FieldCount-1 do
        DoSomethingWithFieldName(csvReader.Fields[i].Name);

    while (not csvReader.Eof) do
    begin
      for i:=0 to csvReader.FieldCount-1 do
        DoSomethingWithFieldValue(csvReader.Fields[i].Value);

      csvReader.Next;
    end;

    csvReader.Close;
  finally
    csvReader.Free;
  end;
end;

Using TnvvCSVStringReader Class

Delphi
uses Nvv.IO.CSV.Delphi.NvvCSVClasses;

procedure ReadCSVString(const ACSVString: string);
var
  csvReader: TnvvCSVStringReader;
  i: Integer;
begin
  //Constructor can have parameter that, if >0 and <>512(default), sets buffer size in chars
  csvReader := TnvvCSVStringReader.Create;
  try
    csvReader.DataString := ACSVString; // Assign string containing CSV data
    // Modify values of other input properties if necessary. For example:
    csvReader.HeaderPresent := True;

    csvReader.Open;

    if (csvReader.HeaderPresent) then
      for i:=0 to csvReader.FieldCount-1 do
        DoSomethingWithFieldName(csvReader.Fields[i].Name);

    while (not csvReader.Eof) do
    begin
      for i:=0 to csvReader.FieldCount-1 do
        DoSomethingWithFieldValue(csvReader.Fields[i].Value);

      csvReader.Next;
    end;
    csvReader.Close;
  finally
    csvReader.Free;
  end;
end;

Notable Difference between Delphi and C# CSV Reader Classes

Delphi’s counterpart defines an event in the following way:

Delphi
property OnFieldCountAutoDetectComplete : TNotifyEvent
{- This event fires from within Open if FieldCount_AutoDetect is true. Use of this event is
 optional since "auto-detected" FieldCount is available upon completion of Open any way.}

Starting with version 2.0:

  • Constructor of TnvvCSVReader and consequently constructors of TnvvCSVFileReader and TnvvCSVStringReader have optional parameter that defines capacity in chars of buffer between CSVReader and source stream. Experiment shows that increasing size (over default 512) does not give visible performance improvement.
    Delphi
    constructor Create( ABufferReadFromStreamCapacityInChars: Integer = 512 ); override;
    
  • Instead of read-write property FileName, TnvvCSVFileReader uses three overloaded methods to specify source file. Those methods correspond to three overloaded constructors of TStreamReader with the same sets of parameters. Meaning of parameters is also the same. Calling particular form of SetFile results in the call of corresponding TStreamReader constructor when TnvvCSVFileReader instantiates TStreamReader internally to actually read the file. Note that Delphi's TStreamReader with default encoding settings, unlike .NET's StreamReader, can be not very good in automatic detection of source's encoding. Sometimes TStreamReader returns just part of the source data causing CSVReader to generate error (like "wrong number of fields"), sometimes it looks like it just "hangs". Therefore if attempt to read some CSV data using SetFile with single file name parameter or with some encoding parameters generates error, then it is possible that TStreamReader needs more or correct information about source's encoding (AEncoding and/or ADetectBOM parameters).
Delphi
procedure SetFile( const AFileName: string ); overload;
procedure SetFile( const AFileName: string; ADetectBOM: Boolean ); overload;
procedure SetFile( const AFileName: string; AEncoding: TEncoding;
  ADetectBOM: Boolean = False; AStreamReaderInternBufferSize: Integer = 1024 ); overload;
  • TnvvCSVFileReader has five read-only properties that are source file related. Their values are set by above-mentioned SetFile methods. Meaning of first four properties is obvious. Property StreamReader_ConstructorKind has type TstreamReaderConstructorKind and its value shows what kind of constructor (with regard to set of parameters) of TStreamReader is called when it is instantiated.
Delphi
type
  TStreamReaderConstructorKind = ( srckFile,  srckFileBOM, srckFileEncodingBOMBuffsize );


    property FileName: string read FFileName;
    property StreamReader_Encoding: TEncoding read FStreamReader_Encoding;
    property StreamReader_DetectBOM: Boolean read FStreamReader_DetectBOM;
    property StreamReader_InternBufferSize: Integer read FStreamReader_InternBufferSize;
    property StreamReader_ConstructorKind: TStreamReaderConstructorKind
                                             read FStreamReader_ConstructorKind;

Downloading Source Code

The following source code, which should work with Delphi 2009 (and later versions), is available for download above:

  • Unit "Nvv.IO.CSV.Delphi.NvvCSVClasses.pas" containing classes TnvvCSVReader, TnvvCSVFileReader and TnvvCSVStringReader.
  • Code part of main form of VCL Forms Application that tests both TnvvCSVFileReader and TnvvCSVStringReader classes is in "CSVReaderTest_MainForm.pas" file. Detailed instruction on how to quickly create test application is provided at the beginning of the file.

History

Version 2.0 (2015-03-10)

  1. Significantly improved performance roughly seven times for TnvvCSVFileReader and four times for TnvvCSVStringReader due to the following:
    • Use of TCharArray buffer for reading chars from stream object in bigger chunks and after that reading single chars from this new buffer. It was done mainly because of very inefficient handling by Delphi's TStreamReader of its internal buffer (see issue description for example here and here).
    • Use of dynamic TCharArray as buffer for currently "assembled" field value instead of String, Array grows in size dynamically with 128 increments to accommodate longest field value. By the way, dynamic array is at least two times more efficient here than TStringBuilder.
    • Assembling frequently called methods/procedures into big procedure at expense of code structuring and readability. Apparently time of procedural call is significant.
  2. Added encoding control to the TnvvCSVFileReader. See above "Notable difference between Delphi and C# CSV Reader classes" section for details.

Version 1.0 (2014-06-08)

  • First release

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Systems Engineer
United States United States
Extensive experience developing pure software and combined soft-hardware systems using variety of languages and tools.

Comments and Discussions

 
Questionfield seperator with more than one character Pin
Member 152706971-Jul-21 3:02
Member 152706971-Jul-21 3:02 
QuestionCannot import this file. Pin
Member 1498620011-Nov-20 9:55
Member 1498620011-Nov-20 9:55 
AnswerRe: Cannot import this file. Pin
Vladimir Nikitenko7-Dec-20 10:11
Vladimir Nikitenko7-Dec-20 10:11 
QuestionThis is Fantastic Pin
Member 1475816728-Feb-20 9:18
Member 1475816728-Feb-20 9:18 
QuestionCSVReaderTest_MainForm.dfm is missing from the download? Pin
edwinzyh31-Jan-18 19:23
edwinzyh31-Jan-18 19:23 
AnswerRe: CSVReaderTest_MainForm.dfm is missing from the download? Pin
Vladimir Nikitenko15-Mar-18 7:14
Vladimir Nikitenko15-Mar-18 7:14 
QuestionДвойные кавычки и разделитесь полей. Pin
qwertEHOK7-Jul-16 20:55
qwertEHOK7-Jul-16 20:55 
AnswerRe: Двойные кавычки и разделитесь полей. Pin
Vladimir Nikitenko11-Jul-16 8:19
Vladimir Nikitenko11-Jul-16 8:19 
BugCalling CSVRead.Free Causes Access Violation Pin
KevinBlack6-Nov-15 12:49
KevinBlack6-Nov-15 12:49 
GeneralRe: Calling CSVRead.Free Causes Access Violation Pin
Vladimir Nikitenko11-Nov-15 3:58
Vladimir Nikitenko11-Nov-15 3:58 
QuestionDifferent Number of Fields in One Line Pin
KevinBlack3-Nov-15 14:24
KevinBlack3-Nov-15 14:24 
AnswerRe: Different Number of Fields in One Line Pin
Vladimir Nikitenko4-Nov-15 12:07
Vladimir Nikitenko4-Nov-15 12:07 
Hi Kevin,

Thank you for a good question. The problem is in the fact that because of the nature of CSV format it is practically impossible to come up with general error recovery algorithm though other than that CSV format is very “compact” and elegant.


Here are some examples to demonstrate scale of a problem:
- Source has multiline field values. Error within multiline value before “internal” end of line. If try to recover and continue then this end of line will be considered as end of record and everything breaks loose with hard to imagine outcome.
- Error that leads to reading of “unpaired” double quote (no more double quotes in the source). Then algorithm will read rest of source as single value and will detect error only at the end of the source.


The number of possible situations looks countless. To deal with such a problem it probably needs development of some kind of Artificial Intelligence (AI), which, of course, is interesting but hardly practical.


Even in simplest situations when some records seemingly have absolutely valid format and just have insufficient number of fields then it is still unclear how to interpret those records. Which fields are missing: leading or trailing or mixed set? Obviously those records should not be trusted and probably should be discarded. But probably whole file containing those records cannot be trusted. And I am not even consider situation when number of field is greater because it complicates situation even more. In other words, so far I do not see any other “efficient” approach in dealing with CSV error except “manual” human intervention in fixing CSV source.


I actually already had plans to implement everything you proposed (indeed, I also live in real world). I just need to find time to present it all in one “sane package”.


I probably should add to next version option to skip erroneous records even though in general result of “skipping” cannot be predicted. It will be responsibility of the user do deal with consequences of that choice.


More realistic and making more sense is support for variable record length (different field number) though one can ask why there is different number of fields: because of error or “by design”? Any way, it is relatively frequent practical situation. Though it is not entirely correct to talk about “CSV format” here. It’s just some kind of value delimited format. CSV format is about presenting data in table form. After all we do not expect variable field number in datasets returned by relational databases. The right approach for CSV data producers would be adding necessary number of empty fields.


That said, I can offer you couple of quick workarounds. And they are “quick” indeed since I did not have much time to make it “more nice”.


1. For very particular situation when record has valid format but number of fields is less (!!!) than required you can try to modify the code instructing it to skip that record. (Note that in situation when number of fields is excessive reading still will be terminated). To achieve that, procedure DoEndOfLine should be modified in following way:

Delphi
procedure TnvvCSVReader.DoEndOfLine;
begin
  if (FIndexOfLastProcessedField <> (FieldCount - 1)) then
  begin
    if (FIndexOfLastProcessedField < (FieldCount - 1)) then
      Reset_for_NextRecord
    else
      Throw_ErrorWithRecordAndFieldNum(MsgStr_WrongNumberOfFields)
  else
  else
  begin
    Reset_for_NextRecord;
    FFlag_ExitMainLoop := True;
  end;

  if (FieldCount_AutoDetect_InProgress) then
  begin
    FieldCount_AutoDetect_InProgress := False;
    OnFieldCountAutoDetectCompleted;
  end;
end;
Note that this will make option “IgnoreEmptyLines” useless since they always will be ignored (“insufficient number of field values”).


2. For support of variable number (including zero, i.e. empty lines) of fields in record following should be done:
- In procedure DoOpen remove (or comment it out) line

Delphi
FieldCount_AutoDetect_InProgress := FieldCount_AutoDetect and (not Eof);

- In procedure Next add line
Delphi
FieldCount_AutoDetect_InProgress := true;
right before line
Delphi
FFlag_ExitMainLoop := False;

Note:
- Reader’s property FieldCount_AutoDetect will do nothing.
- Do not store field count. For every record read property FieldCount.
- Option “IgnoreEmptyLines” will be useless since empty line always will be returned as record with zero values.
- “HeaderPresent” feature probably will make not much sense for this “variable record length support”.
- Not sure my test program can handle situation if somewhere “down the road” some record has length greater than very first record. If all records are no longer than first one than test program is definitely OK.


Let me know whether it works for you. Good luck.


Vladimir


GeneralRe: Different Number of Fields in One Line Pin
KevinBlack4-Nov-15 14:45
KevinBlack4-Nov-15 14:45 
GeneralRe: Different Number of Fields in One Line Pin
Member 152706971-Jul-21 2:44
Member 152706971-Jul-21 2:44 
SuggestionNeed GoToStart method Pin
Member 118797054-Aug-15 9:21
Member 118797054-Aug-15 9:21 
GeneralRe: Need GoToStart method Pin
Vladimir Nikitenko9-Aug-15 5:32
Vladimir Nikitenko9-Aug-15 5:32 
GeneralMy vote of 5 Pin
CdnConsultant1-Apr-15 8:07
CdnConsultant1-Apr-15 8:07 
QuestionThank you very much!! Pin
CdnConsultant29-Mar-15 20:32
CdnConsultant29-Mar-15 20:32 
QuestionTrying out Pin
DaveBoltman23-Mar-15 0:02
DaveBoltman23-Mar-15 0:02 
AnswerRe: Trying out Pin
DaveBoltman23-Mar-15 0:33
DaveBoltman23-Mar-15 0:33 
GeneralRe: Trying out Pin
Vladimir Nikitenko23-Mar-15 3:43
Vladimir Nikitenko23-Mar-15 3:43 
GeneralThanks! Downloaded. Pin
edwinzyh25-Jul-14 22:01
edwinzyh25-Jul-14 22:01 
QuestionGreat Job, Save a lot of my Time. Pin
Member 1094534514-Jul-14 4:45
Member 1094534514-Jul-14 4:45 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.