If the whole file would fit in memory, I'd read it all in with
File.ReadAllLines()
.
Make 1 pass through all of the lines and build a parallel array that has the parsed numeric keys.
Then use
Array.Sort<TKey, TValue>(TKey[] keys, TValue[] items)
to do the sorting.
(This will sort both arrays together by the sort order of the
keys
array.)
Then rewrite the file using the sorted array of the lines.
If the whole thing will
not fit in memory, then you could read the file line-by-line, extract and parse the key into an array, and simultaneously construct a parallel array of objects with the start byte position in the file and the byte length of each record.
Do the
Array.Sort()
as above then write a new output file, copying each record from the input file by seeking to the record position and copying length bytes to the output file.
[edit: Matt T Heffron]
From your comment it looks like you can pull the whole thing into memory, so modifying the code in your comment. I'm a little unclear on the nature of your sort key. It appears from the example in the question that you have floating point keys (shown with the "," as the decimal separator). However in your comment's code, you appear to be using only the integer before the first comma as the sort key, but, again, the original example
implies a secondary sort based on the second number. I'll show this as sorting by the first integer column case,
ignoring the second column:
var lines = File.ReadAllLines(fileunordred);
int[] allCustomerIds = new int[lines.Length];
char[] splitter = new char[]{','};
for (int ix = 0; ix < ix.Length; ++ix)
{
var splitLine = lines[ix].Split(splitter, 2);
int customerId;
if (!int.TryParse(splitLine[0], out customerId)
{
allCustomerIds[ix] = -1;
}
allCustomerIds[ix] = customerId;
}
Array.Sort(allCustomerIds, lines);
both arrays are now sorted in ascending numerical order by the customer id.
just use
File.WriteAllLines("filename", lines)
to make the sorted file.
[Edit #2: Matt T Heffron]
Mostly very similar to the above, but now it compares first by the first integer and then by the second integer within the group that has the same first integer.
var lines = File.ReadAllLines(fileunordred);
int[] allInfo = new int[lines.Length];
char[] splitter = new char[]{','};
for (int ix = 0; ix < allInfo.Length; ++ix)
{
var splitLine = lines[ix].Split(splitter);
int[] pair= new int[2];
allInfo[ix] = pair;
int id;
if (!int.TryParse(splitLine[0], out id))
{
id = -1;
}
pair[0] = id;
if (!int.TryParse(splitLine[1], out id))
{
id = -1;
}
pair[1] = id;
}
Array.Sort(allInfo, (a, b) => {
int comp = a[0].CompareTo(b[0]);
return comp == 0 ? a[1].CompareTo(b[1]) : comp;
});
using (var out = new StreamWriter("outputfilename"))
{
foreach (var pair in allInfo)
{
out.WriteLine("{0},{1}", pair[0], pair[1]);
}
}
(I haven't actually tried this but it should be close...)