|
Sorry, but I have to answer no for the time being, both because of time and actual demand for that. Unless I am mistaken, the need to read a csv file is much more common than writing one and good performance is more easy to obtain (basically, do not copy data around and write in chunks).
You can take a look at this library for inspiration ... https://sourceforge.net/projects/javacsv/
|
|
|
|
|
Hi,
Great article.
I am trying to read a TAB delimited file using the CVSReader. I am specifying "\t" as the optional delimiter parameter but this is not working. I am getting an error Can't read -1 before 0.
using (CsvReader csv = new CsvReader(new StringReader(Data), false,"\t"))
Dan
Dan@AccessibleMedia
w. http://www.accessiblemedia.co.uk
|
|
|
|
|
Hello,
Just to update, below is a sample of the file I am trying to read. The delimited parameter is actually being set correctly. The problem is the CurrentRecordIndex, the value is -1 after it has parsed the file.
advertiser language masterCategoryID masterCategory productID refNumber name brand promotionalText description deepLink imageURL delivery validFrom validTo currency displayPrice searchPrice
"OfficeGiant" "EN" 594 "Paper, Writing Pads etc." "3M00036" "" "POSTIT PININ Z NOTE DISP FOC 76X76 PAD" "" "15.91" "Post-it Promotion. Each promotional pack contains a Post-it Pininfarina Z-Note Dispenser with a FOC 76x76 Z-Note Pad." "http://www.awin1.com/pclick.php?p=349379&a=55703&m=1111" "http://www.ogimages.co.uk/products/3M97091.jpg" "" "0000-00-00" "0000-00-00" "GBP" "GBP11.14" 11.14
"OfficeGiant" "EN" 638 "Presentation, Planning & Training" "3M00088" "" "3M OHP TRANY FLIP FRAME PK100 RS7110" "" "91.86" "3M Flip Frame Transparency Protectors. Unique transparency protection and presentation wallets. Each valuable overhead is carried in its own plastic wallet, providing instant identification and total protection. Size: A4. Pack of 100" "http://www.awin1.com/pclick.php?p=364690&a=55703&m=1111" "http://www.ogimages.co.uk/products/3M00088.jpg" "" "0000-00-00" "0000-00-00" "GBP" "GBP64.30" 64.30
"OfficeGiant" "EN" 638 "Presentation, Planning & Training" "3M00089" "" "3M OHP TRANY FLIP FRME PK20 RS711020" "" "19.29" "3M Flip Frame Transparency Protectors. Unique transparency protection and presentation wallets. Each valuable overhead is carried in its own plastic wallet, providing instant identification and total protection. Size: A4. Pack of 20" "http://www.awin1.com/pclick.php?p=364691&a=55703&m=1111" "http://www.ogimages.co.uk/products/3M00088.jpg" "" "0000-00-00" "0000-00-00" "GBP" "GBP13.50" 13.50
Dan@AccessibleMedia
w. http://www.accessiblemedia.co.uk
|
|
|
|
|
Are you using the latest version? Cause I have no problem with the sample you gave me.
string data = @"advertiser language masterCategoryID masterCategory productID refNumber name brand promotionalText description deepLink imageURL delivery validFrom validTo currency displayPrice searchPrice
""OfficeGiant"" ""EN"" 594 ""Paper, Writing Pads etc."" ""3M00036"" """" ""POSTIT PININ Z NOTE DISP FOC 76X76 PAD"" """" ""15.91"" ""Post-it Promotion. Each promotional pack contains a Post-it Pininfarina Z-Note Dispenser with a FOC 76x76 Z-Note Pad."" ""http://www.awin1.com/pclick.php?p=349379&a=55703&m=1111"" ""http://www.ogimages.co.uk/products/3M97091.jpg"" """" ""0000-00-00"" ""0000-00-00"" ""GBP"" ""GBP11.14"" 11.14
""OfficeGiant"" ""EN"" 638 ""Presentation, Planning & Training"" ""3M00088"" """" ""3M OHP TRANY FLIP FRAME PK100 RS7110"" """" ""91.86"" ""3M Flip Frame Transparency Protectors. Unique transparency protection and presentation wallets. Each valuable overhead is carried in its own plastic wallet, providing instant identification and total protection. Size: A4. Pack of 100"" ""http://www.awin1.com/pclick.php?p=364690&a=55703&m=1111"" ""http://www.ogimages.co.uk/products/3M00088.jpg"" """" ""0000-00-00"" ""0000-00-00"" ""GBP"" ""GBP64.30"" 64.30
""OfficeGiant"" ""EN"" 638 ""Presentation, Planning & Training"" ""3M00089"" """" ""3M OHP TRANY FLIP FRME PK20 RS711020"" """" ""19.29"" ""3M Flip Frame Transparency Protectors. Unique transparency protection and presentation wallets. Each valuable overhead is carried in its own plastic wallet, providing instant identification and total protection. Size: A4. Pack of 20"" ""http://www.awin1.com/pclick.php?p=364691&a=55703&m=1111"" ""http://www.ogimages.co.uk/products/3M00088.jpg"" """" ""0000-00-00"" ""0000-00-00"" ""GBP"" ""GBP13.50"" 13.50";
using (LumenWorks.Framework.IO.Csv.CsvReader csv = new LumenWorks.Framework.IO.Csv.CsvReader(new System.IO.StringReader(data), false, '\t'))
{
int count = csv.FieldCount;
while (csv.ReadNextRecord())
{
for (int i = 0; i < count; i++)
Console.Write(csv[i]);
Console.WriteLine();
}
}
|
|
|
|
|
What if I have a file that has only tabs \t and no quotes. Can I just take out the " delimeter and put a space there in CsvReader?
Thanks.
|
|
|
|
|
Is it possible to pass an argument to the constructor to skip the first line?
The data I am working with looks like this
"AMEX Securities as of 9/1/2006","","",""
"Name","Symbol","Market Value (millions)","Description (as filed with the SEC)"
"1% CONVERTICBLE SECURITIES EXCHANGABLE","MNV.A","N/A","Not Available"
"1.25% GE CO EXCHANGABLE NOTES","MGE.A","N/A","Not Available"
Right now I am opening the file after it's downloaded, skipping the first line and writing it out to a new file. Then passing the new file to CSVReader
|
|
|
|
|
My first question would be why you need to skip the first line. If it is a comment, it should be marked with the commenting char at the beginning ('#' by default) and will then be skipped automatically. If is a blank line, it will also be skipped automatically. If it is a record/header, you can call ReadNextRecord().
Otherwise, the way you are doing it now sounds like a fine solution ... I believe in modular programming and adding a centralized component when required. If I would enable scenarios like these directly in the CsvReader, it would quickly fit in the can-also-serve-you-coffee-in-the-morning type of class. But nothing prevents you from making a derived class or modifying the source code to fit your needs
|
|
|
|
|
I am using your (fantastic) code to parse a CSV file returned from a web request and have come accross an issue when using a 'streaming' stream if you'll pardon the pun.
In the contructor where you are checking for the stream length, you need to check if the stream allows seeking. If not then an error is raised.
To reproduce in VB code:
Dim client As New WebClient
Dim data As Stream
Dim reader As StreamReader
data = client.OpenRead("http://www.slowdating.com/Affiliates/csv.aspx")
reader = New StreamReader(data)
csv = New CsvReader(reader, True, ",", """", CsvReader.DefaultEscape, CsvReader.DefaultComment, True)
My quick fix which has had only minor testing is to add the following in the class constructor:
if (reader is StreamReader)
{
long length;
if (((StreamReader)reader).BaseStream.CanSeek == true)
length = ((StreamReader)reader).BaseStream.Length;
else
length = 0;
Patrick
|
|
|
|
|
Thanks for pointing me this limitation! I will add the check to the existing ones.
And thanks for your appreciation
|
|
|
|
|
First of all I want to thank you for providing this and improving on it every once and a while. It really is some great stuff.
I've got a 1.2 gig CSV file.
I'm using the CSV reader to convert this file into another text format using a streamwriter.
It started out taking 4 seconds to write every 100,0000 records, after a while the time to write 100,000 records gradually increases up to 5 minutes or so.
The Page File and memory usage displayed in the task manager increases as well.
Is there anything you can think of that would cause the memory usage to keep increasing like this?
It finally gets to the point where the box runs out of memory and dies.
Any ideas?
Justin
|
|
|
|
|
I'm converting the CSV to | delimited text file.
Friend Sub ConvertCSV2DelimitedTxt(ByVal sCSVFile As String, ByVal sNewFile As String, Optional ByRef sHeaderColumns As String() = Nothing)
colLog.Add(Now & " - Start Converting CSV to TXT.")
colLog.Add(Now & " - " & sCSVFile & " to " & sNewFile)
RaiseEvent Status("Converting CSV File to Delimited Text File")
File.Delete(sNewFile)
Dim sw As StreamWriter = File.CreateText(sNewFile)
Dim chDelimiter As Char = CChar(GetSetting("Delimiter", "|"))
Dim csv As New CsvReader(New StreamReader(sCSVFile), True)
csv.SupportsMultiline = True
csv.MissingFieldAction = 0
sHeaderColumns = csv.GetFieldHeaders()
sw.WriteLine(Join(sHeaderColumns, chDelimiter))
'Write the rest of the file
Dim FakePercentDone As Double = 0.6
Dim LastPercentDone As Integer = 0
While csv.ReadNextRecord()
'Progress bar logic
If CInt(FakePercentDone) <> LastPercentDone Then
RaiseEvent Progress(CInt(FakePercentDone))
LastPercentDone = CInt(FakePercentDone)
End If
FakePercentDone += 0.05
If CInt(FakePercentDone) > 100 Then FakePercentDone = 0.6
Try
Dim sRecord As String = String.Empty
For i As Integer = 0 To csv.FieldCount - 1
sRecord += csv.Item(i) & chDelimiter
Next
sw.WriteLine(sRecord.Substring(0, sRecord.Length - 1))
Catch ex As Exception
Throw ex
End Try
End While
sw.Close()
csv.Dispose()
colLog.Add(Now & " - Done Converting CSV to TXT.")
End Sub
|
|
|
|
|
My first question would be if you tried only reading the csv and doing nothing more. If so, do you still have memory issues? I use the CsvReader with very large files and never had this problem, so your comment is a bit surprising. Could you send me a sample of your csv file which I can use to generate a large file?
Thanks for your comment!
|
|
|
|
|
When I removed all the code except for the basic While and For loops, the problem did not present itself. I've also discovered that the problem seems to be .NET 1.1 related. I code and debug in VS 2005 against the 2.0 framework, but have to compile using the MSBee utility to target the 1.1 framework for my users. The process is quite speedy in the debugger in VS 2005 but it starts dragging when I run the 1.1 version.
I'm going to try to close and reopen the streamwriter for my output file every 100,000 records or so to see if that will clear it up. Any other ideas you might have would be apprecieated.
Thanks again for the CSVReader.
Justin
Here's a sample of the data if you still wanted to tinker with it.
Category Name,Fiscal Year / Period,Store Number,Sku Number,Sales Type,DIY Gross Sales QTY,DIY Damaged Returns,DIY Undamaged Returns QTY,DIY Gross Sales Retail,DIY Damaged Returns Retail,DIY Undamaged Returns Retail,VDP Gross Sales QTY,VDP Damaged Returns QTY,VPD Undamaged Returns QTY,VPD Gross Sales Retail,VPD Damaged Returns Retail,VPd Undamaged Returns Retail
OILATF,2005071,2,7395,0,6,0,-7,9.54,0.00,-11.13,0,0,0,0.00,0.00,0.00
OILATF,2005071,2,7396,0,70,0,-5,111.30,0.00,-7.95,0,0,0,0.00,0.00,0.00
OILATF,2005071,2,20224,0,156,0,0,201.24,0.00,0.00,0,0,0,0.00,0.00,0.00
OILATF,2005071,2,20233,0,132,0,0,170.28,0.00,0.00,0,0,0,0.00,0.00,0.00
OILATF,2005071,2,20234,0,2,0,0,2.58,0.00,0.00,0,0,0,0.00,0.00,0.00
OILATF,2005071,2,54771,0,4,0,0,14.57,0.00,0.00,0,0,0,0.00,0.00,0.00
OILATF,2005071,2,54805,0,1,0,0,4.39,0.00,0.00,0,0,0,0.00,0.00,0.00
OILATF,2005071,2,54806,0,11,0,0,57.09,0.00,0.00,0,0,0,0.00,0.00,0.00
OILATF,2005071,2,69375,0,23,0,0,41.17,0.00,0.00,0,0,0,0.00,0.00,0.00
OILATF,2005071,2,119580,0,6,0,0,15.54,0.00,0.00,0,0,0,0.00,0.00,0.00
OILATF,2005071,2,119879,0,18,0,0,46.62,0.00,0.00,0,0,0,0.00,0.00,0.00
OILATF,2005071,2,119880,0,5,0,0,9.95,0.00,0.00,0,0,0,0.00,0.00,0.00
OILATF,2005071,2,119881,0,39,0,0,77.61,0.00,0.00,0,0,0,0.00,0.00,0.00
|
|
|
|
|
Humm, if the problem is not related to the CsvReader, then I would need to investigate your program to help you out and honestly, I do not have much time.
That said, avoiding the string concatenations and writing your data in chunks would certainly help.
Const ChunkSize As Integer = 4096 'Whatever suits you best
Dim chunk As Integer = 0
While csv.ReadNextRecord()
For i As Integer = 0 To csv.FieldCount - 2
sw.Write(csv.Item(i))
sw.Write(chDelimiter)
Next
sw.WriteLine(csv.Item(csv.FieldCount - 1))
chunk += 1
If chunk > ChunkSize Then
chunk = 0
sw.Flush()
End If
End While
|
|
|
|
|
I want read CSV files as below.
A,F01,F02,F03,F04,F05
U,F11,F12,F13
A,F21,F22,F23,F24,F25
D,F31
A,F41,F42,F43,F44,F45
But the last fields of 2'nd and 3'rd line(in other words, F13 and F31) are seemed like a missing Field, when I access those fields. So, I added ',' after those filelds as below. Then those fields appare now.
A,F01,F02,F03,F04,F05
U,F11,F12,F13,
A,F21,F22,F23,F24,F25
D,F31,
A,F41,F42,F43,F44,F45
Why?
<br />
DataTable tbl = new DataTable() ;<br />
<br />
StreamReader sr = new StreamReader(csvFileName) ;<br />
<br />
CsvReader csv = new CsvReader(sr, false) ;<br />
csv.MissingFieldAction = MissingFieldAction.ReturnNullValue ;<br />
<br />
tbl.Load(csv) ;<br />
<br />
DataGridView testView = new DataGridView() ;<br />
. <br />
.<br />
testView.DataSource = tbl ;<br />
<br />
Japanese C# primer I.K
|
|
|
|
|
You need to use the following line of code instead:
csv.MissingFieldAction = MissingFieldAction.ReturnPartiallyParsedValue;
The logic is as follow:
In "U,F11,F12,F13", the reader does not know if F13 is indeed the complete field value or if it is corrupted and a part of it is missing. After all, the csv is not well formed, so anything could have happened.
So to handle this situation, you have 3 choices: return an empty value instead of "F13", return a null value instead of "F13" or return the partial result "F13" (which in your case, you know is valid).
|
|
|
|
|
Thanks for your reply, Sébastien. I changed my code according to your recommend as bellow.
csv.MissingFieldAction = MissingFieldAction.ReturnPartiallyParsedValue;
|
V
csv.MissingFieldAction = MissingFieldAction.ReturnPartiallyParsedValue;
This time, value of F13 and F31 are displayed in DataGridView just as I wanted, without adding ',' after those fields (in other word, by using the original data).
A,F01,F02,F03,F04,F05
U,F11,F12,F13 <--- 2 really missing fields exist
A,F21,F22,F23,F24,F25
D,F31 <--- 4 really missing fields exist
A,F41,F42,F43,F44,F45
Now in turn, the values of really missing fields are empty string instead of null, whitch is the valid result, I think. And this result is good enough at this time, but I think that it should give the result similar to the ReturnPartiallyParsedValue case in the ReturnPartiallyParsedValue case, too. (i.e. null instead of empty string)
I like your liblary very well. thank you!!
Japanese C# primer I.K
|
|
|
|
|
when I have this line:
Bike;£9,99;bell
the pound sign disappears?
Can you tell me why? Is this a bug?
And/Or how can I fix this?
Thanks,
Bjorn
|
|
|
|
|
This is an encoding issue. Your need to save your file as Unicode (UTF-8 or whatever) or otherwise provide the codepage to the StreamReader (probably Windows-1252).
using (CsvReader csv = new CsvReader(new StreamReader("data.csv", Encoding.GetEncoding(1252)), false, ';'))
|
|
|
|
|
Thanks it worked
Many thanks..
Bye
Bjorn
|
|
|
|
|
Is there any way though of getting the current line in the same way as you can get the headers (i.e. as a string array)? I'm currently reading each field in the record and then putting back together into the array, but I'd like to do something like
while(csv.ReadNextRecord())
{
string[] x = csv.CurrentRecord;
// do something with x
}
Is this possible?
Eifion
|
|
|
|
|
Yes, you can do it as follow:
string[] x = new string[csv.FieldCount];
while(csv.ReadNextRecord())
{
csv.CopyCurrentRecordTo(x);
}
|
|
|
|
|
|
Found this my mistake - the app that generates our CSV file decided to create a file with just two blank lines in it. When I came to read the CSV file, I got a Null Reference Exception in routine GetFieldHeaders(). I don't know whether that's what you're expecting (I wasn't) but I saw that version 3.0 came with "extensive support for malformed CSV files" so thought you'd like to know about this kind of malformation
Thanks for a great library.
Steve
|
|
|
|
|
You are right, I did not test this particular scenario. What would you be expecting ? An exception or the method returns null (and would it also raise the ParseError event) ?
|
|
|
|
|