|
Hi Vladimir,
Is it possible to use a field seperator of more than one character?
Kind regards,
Jan
|
|
|
|
|
Nice code.
I have been trying to import the Transaction file from our new fuel system. Much coding, copying of others code, and frustration. Thought I had found it with your code, but no luck.
Excel and Access both open the file. If I save with Excel as a CSV, the quotes are removed and I can easily open the file with your code and with my other efforts.
Below two lines of the file. I see no way to attach a file.
I am using Delphi 10.3 Community for your code.
Thank you
Rupert
"Date","Account #","Customer Name","Card #","Holder Name","Status","Card","Fuel","Pump Name","Price","Units","Sub Total","Taxes","Sale Amount","Tail #"
"10/1/2020 9:02 AM","","","XXXXXXXXXXXX11234","N1234RM ","Closed","MasterCard","Jet A","Jet A","3.40000","180.00000","612.00","0","612.00",""
The error message I receive.
TnvvCSVfilereader;
Wrong format of field value.
Record #: 1; Field #: 1 (both start from 1)
Update: The file is encoded in UCS-2 Little Endian, if I change it to ANSI, your code, my code, and other's code will work. Mystery solved, but not the problem.
|
|
|
|
|
Hi Rupert,
Thanks for trying it out. I am glad you managed to solve your task.
Possible encoding issues are described in above article at the second bullet of "Notable Difference between Delphi and C# CSV Reader Classes"section.
If I got it right, you eventually saved source file as ASCII and it went through. If that is the case, then it would be interesting if you can read source file in original encoding by explicitly setting encoding parameters using corresponding constructor.
|
|
|
|
|
I was dreading the thought of writing a CSV parser. Then I found these file and string reader classes.
What I thought might take me several days was done in just half a day. 
|
|
|
|
|
Hello Vladimir,
The CSVReaderTest_MainForm.dfm is missing from the .zip I downloaded today. Would you include it? thanks.
|
|
|
|
|
Sorry for the late response. For some reason I did not get notification about posted (by you) question. Just in case you are still interested, detailed instruction on how to quickly create test application is provided at the beginning of the CSVReaderTest_MainForm.pas. There is the reason for such approach. On the codeproject page with this CSVReader you can take a look on the question “Trying out” asked by DaveBoltman on 23-Mar-15 and corresponding answers. Regards, Vladimir
|
|
|
|
|
Здравствуйте.
Сразу скажу огромное спасибо за ваш труд - он очень помог мне в работе.
Но вот встала проблема - файл csv в 3 гб. 26 миллионов строк. Просто читается он великолепно и достаточно быстро
НО в полях есть значения как ;"Вася";, так и ;"вася;петя";
Выставляя разделитель полей на ';' я надеялся что кавычки будут проигнорированы, но это не так.
Подскажите как игнорировать кавычки в полях?
Александр
|
|
|
|
|
Assuming that you do not have multi-line field values and also values with double quotes inside, you can ignore double quotes by setting UseFieldQuoting := false but:
1. They will become part of value:
"Вася"
"вася
петя"
2. More importantly, most likely you get into situation when file consist of the records with different number of fields, which makes file invalid and processing stops on the first record with different field count.
Therefore I cannot recommend that solution.
But, if I got it right, you get situation when at least some field values are composite. To be precise, they itself are encoded in CSV-like format. By the way, double quotes are absolutely necessary then.
In such case it is impossible to decode file in one pass. You should extract values from the file using double quotes and handle some (or all) of them as CSV-like-encoded using TnvvCSVStringReader . If these "second-level" values are also composite then handle them again the same way and so on.
That's the general idea. If you have more questions and want to continue discussion in russian language then please switch to email exchange (under this my comment click "Email")
|
|
|
|
|
Hi Vladimir,
Apologies for bothering you again, but a perplexing issue. If I call CSVRead.Free (and don't handle the exception) I get an App Crash - Runtime Error 216. FWIW, this code is in a Delphi DLL. If I single step through the code, the CSVRead.Free causes an Access Violation when executing the CSVRead.Free statement:
finally
CSVRead.Close;
CSVRead.Free;
end;
Unhandled I get this followed by the Runtime Error 216 Message:
Quote: Problem signature:
Problem Event Name: APPCRASH
Application Name: DropboxDLLTest.exe
Application Version: 1.0.0.0
Application Timestamp: 563d3a3e
Fault Module Name: DbXConnect.dll
Fault Module Version: 1.0.0.0
Fault Module Timestamp: 563d47bc
Exception Code: c0000005
Exception Offset: 000089aa
OS Version: 6.1.7601.2.1.0.256.48
Locale ID: 3081
Additional Information 1: b7ed
Additional Information 2: b7ed4aa7e72b85eabb40a830464d513f
Additional Information 3: b886
Additional Information 4: b886cb1d92b0bb468eead179f9849cd9
Seems to be after I put in your 'tweaks' to handle FieldCount issue which, apart from this issue, work brilliantly. I should add, if I don't have the CSVRead.Free statement I also get a Runtime Error 216. Any ideas?
Kind regards,
Kevin
modified 6-Nov-15 18:55pm.
|
|
|
|
|
Hi Kevin,
I implemented exactly what I suggested to you in option “2” (variable number of field values support). I do not observe any exceptions. Everything works. I will try to send to you (via email) that modified code. Meantime you can try to debug and Step-in into Free (Destroy) and see what exactly causes exception.
Regards, Vladimir
|
|
|
|
|
Hi Vladimir,
I've used your components to decode a CSV string and for the most part it works fine.
Where there is a significant issue (and maybe I'm just doing it wrong), but in the CSV file most lines have 10 fields HOWEVER one line in the middle of the CSV string has only 7 fields. Your component just aborts when it detects a line that has a different number of fields to that specified initially or autodetected in the first line.
In a perfect world, all lines would have the same number of fields, but this is not a perfect world.
How can I either just process the line as is or ignore it (perhaps logging the fact that there was a discrepancy in the field count.
That is essentially:
If an error is detected in the field count DO NOT ABORT simply set an error variable (or whatever) and move on to the next line.
OR:
I know this would slow things down, but for me it would work. Have a property that calculates the fieldcount NOT using the first line, but for every line:
FieldCount_AutoDetect_Always := True;
So if this is true it ignores and FieldCount_AutoDetect settings and calculates the field count on a line for line basis.
As you know, it is always the edge cases that cause the grief.
Thank you for an excellent unit, but with an issue for me.
Thanks,
Kevin
modified 3-Nov-15 21:15pm.
|
|
|
|
|
Hi Kevin,
Thank you for a good question. The problem is in the fact that because of the nature of CSV format it is practically impossible to come up with general error recovery algorithm though other than that CSV format is very “compact” and elegant.
Here are some examples to demonstrate scale of a problem:
- Source has multiline field values. Error within multiline value before “internal” end of line. If try to recover and continue then this end of line will be considered as end of record and everything breaks loose with hard to imagine outcome.
- Error that leads to reading of “unpaired” double quote (no more double quotes in the source). Then algorithm will read rest of source as single value and will detect error only at the end of the source.
The number of possible situations looks countless. To deal with such a problem it probably needs development of some kind of Artificial Intelligence (AI), which, of course, is interesting but hardly practical.
Even in simplest situations when some records seemingly have absolutely valid format and just have insufficient number of fields then it is still unclear how to interpret those records. Which fields are missing: leading or trailing or mixed set? Obviously those records should not be trusted and probably should be discarded. But probably whole file containing those records cannot be trusted. And I am not even consider situation when number of field is greater because it complicates situation even more. In other words, so far I do not see any other “efficient” approach in dealing with CSV error except “manual” human intervention in fixing CSV source.
I actually already had plans to implement everything you proposed (indeed, I also live in real world). I just need to find time to present it all in one “sane package”.
I probably should add to next version option to skip erroneous records even though in general result of “skipping” cannot be predicted. It will be responsibility of the user do deal with consequences of that choice.
More realistic and making more sense is support for variable record length (different field number) though one can ask why there is different number of fields: because of error or “by design”? Any way, it is relatively frequent practical situation. Though it is not entirely correct to talk about “CSV format” here. It’s just some kind of value delimited format. CSV format is about presenting data in table form. After all we do not expect variable field number in datasets returned by relational databases. The right approach for CSV data producers would be adding necessary number of empty fields.
That said, I can offer you couple of quick workarounds. And they are “quick” indeed since I did not have much time to make it “more nice”.
1. For very particular situation when record has valid format but number of fields is less (!!!) than required you can try to modify the code instructing it to skip that record. (Note that in situation when number of fields is excessive reading still will be terminated). To achieve that, procedure DoEndOfLine should be modified in following way:
procedure TnvvCSVReader.DoEndOfLine;
begin
if (FIndexOfLastProcessedField <> (FieldCount - 1)) then
begin
if (FIndexOfLastProcessedField < (FieldCount - 1)) then
Reset_for_NextRecord
else
Throw_ErrorWithRecordAndFieldNum(MsgStr_WrongNumberOfFields)
else
else
begin
Reset_for_NextRecord;
FFlag_ExitMainLoop := True;
end;
if (FieldCount_AutoDetect_InProgress) then
begin
FieldCount_AutoDetect_InProgress := False;
OnFieldCountAutoDetectCompleted;
end;
end; Note that this will make option “IgnoreEmptyLines” useless since they always will be ignored (“insufficient number of field values”).
2. For support of variable number (including zero, i.e. empty lines) of fields in record following should be done:
- In procedure DoOpen remove (or comment it out) line
FieldCount_AutoDetect_InProgress := FieldCount_AutoDetect and (not Eof);
- In procedure Next add line
FieldCount_AutoDetect_InProgress := true;
right before line
FFlag_ExitMainLoop := False;
Note:
- Reader’s property FieldCount_AutoDetect will do nothing.
- Do not store field count. For every record read property FieldCount .
- Option “IgnoreEmptyLines ” will be useless since empty line always will be returned as record with zero values.
- “HeaderPresent ” feature probably will make not much sense for this “variable record length support”.
- Not sure my test program can handle situation if somewhere “down the road” some record has length greater than very first record. If all records are no longer than first one than test program is definitely OK.
Let me know whether it works for you. Good luck.
Vladimir
|
|
|
|
|
Hi Vladimir,
Thank you for your very detailed response.
Quote: Even in simplest situations when some records seemingly have absolutely valid format and just have insufficient number of fields then it is still unclear how to interpret those records. Which fields are missing: leading or trailing or mixed set? Obviously those records should not be trusted and probably should be discarded. But probably whole file containing those records cannot be trusted.
I am dealing with a JSON return from Dopbox (a list of shared folders). I get the return parsed back to a CSV string (that's done elsewhere). There should be 9 fields, BUT one return does not have the owner details. I don't want to stop processing because of the erroneous return. Logging an error and continuing on is essentially what I want to do:
A normal return:
Quote: Emptor, owner, 303003839, all, email_verified: true, Kevin Black, 1442832,email: kevin.black@orexresearch.com, /emptor,
The erroneous return:
Quote: Fork It, editor, 51251486, all, null, null, /Fork It,
And, of course, the erroneous return is smack bang in the middle of the folder list. It would have been appropriate if Dropbox simply returned null in all fields that were not used, but they didn't so I have to deal with the (unlikely) event.
Vladimir Nikitenko wrote: I actually already had plans to implement everything you proposed (indeed, I also live in real world). I just need to find time to present it all in one “sane package”.
I probably should add to next version option to skip erroneous records even though in general result of “skipping” cannot be predicted. It will be responsibility of the user do deal with consequences of that choice.
That would be perfect for me.
Vladimir Nikitenko wrote: More realistic and making more sense is support for variable record length (different field number) though one can ask why there is different number of fields: because of error or “by design”?
In this instance it is possibly 'by Design'. I suspect that the shared folder owner has not only removed himself from the share, but from Dropbox altogether so the folder is an orphan, but still plays the game for those that share it (other than the owner).
I will implement the changes and get back. Any additional suggestions very appreciated.
Again, thank you for your detailed response, I appreciate the time you have taken to both develop the unit in the first place and to respond to me personally.
UPDATE:
I have retrofit your suggestions. After a bit of stuffing around with my code (I had nested 'while not CSVRead.eof do' loops which didn't seem to get too far) it is now working perfectly (as far as I can tell). I find the errant line (with too few fields), log the error and continue on. Brilliant.
Thanks again for your Assistance, time and effort.
Best regards,
kevin
modified 5-Nov-15 1:33am.
|
|
|
|
|
Hi Vladimir,
First of all, I think your library is great.
You write that you might want to add an option to allow different number of fields in one line. In our case only the trailing fields may not be provided as these are optional. Of course, we prefer to have these provided as well, but we do not control all files that have to be processed.
Any case this option will become available? I used your suggestion in this thread and it works great, but it would be nicer to have this feature in the library itself.
Thank you!
Kind regards,
Jan
|
|
|
|
|
Hi
I am trying to find the position of a certain string in a CSV like this:
function FindValue(ColumnName, TextToFind: string): integer
This should return the row where the text was found.
I iterate through all rows using Next until I find the text. But after this, how do I reset the cursor back to the first row to do other searches?
So, I need a function like GoToStart or GotoRow(RowNo)
This won't work:
procedure TnvvCSVReader.GoToStart;
begin
FIndexOfLastProcessedField := -1;
FIndexOfLastProcessedRecord := 0;
FEof:= FALSE;
end;
Many thanks
Cati

modified 5-Aug-15 4:31am.
|
|
|
|
|
As it stated in the description, CSV Reader is unidirectional. That’s because CSV parsing starts from the beginning of data/file and I doubt it can be done in reverse direction. Therefore the only way to return to the beginning is to simply close Reader and Open it again:
CSVReader.Close;
CSVReader.Open;
Not sure what task you are solving but parsing same CSV data several times does not seem as efficient approach. Did you consider possibility to read CSV records one by one but do all searches at once for each record? Or (especially if CSV data/file is not big) to read all data into memory, say two-dimensional array, and after that do your searches as many times as necessary.
|
|
|
|
|
An excellent CSV Parser!!
|
|
|
|
|
Hello Vladimir,
You CSV Reader code and sample project is very impressive. It was able to read a file I have been having some issues with for the last 24 hours. The import into your grid was flawless and even beat out a product called CVSIt (you deal very nicely with CRLF inside quoted pairs (" ")...
Much appreciated!! Send me an email at the address below - I would like to gift you a disk analysis program I have written as a thank you.
Bradley MacDonald
brad_AT_timeacct_DOT_com
|
|
|
|
|
Great job!!!
I'm trying it out. Is it possible to include the Delphi form file CSVReaderTest_MainForm.dfm in the zip download? It goes together with CSVReaderTest_MainForm.pas which is already in the download
|
|
|
|
|
What I said about needing the Delphi form file (.dfm) is not true. Noted after reading the directions in CSVReaderTest_MainForm.pas thanks.
However it would make it much easier to try out your code if we have a ready-made form. E.g. I would like to just double-click on the button "Select & Open" to get to the code for procedure buttonReadFromFileOnClick. Now it's much more difficult. Clever but not useful...
|
|
|
|
|
Thanks for trying it out. I understand your concern. I tried to make it as easy as possible within some constraints. One of the reason is to avoid possible incompatibility of different Delphi versions and, most important, to avoid possible legal issues. Now all what is posted could be prepared in ordinary text editor.
|
|
|
|
|
I'll see how it goes when I use it later.
|
|
|
|
|
Great Job, Save a lot of my Time. Thank you.
|
|
|
|
|