Click here to Skip to main content
15,887,135 members
Please Sign up or sign in to vote.
3.29/5 (3 votes)
See more:
I have a file with 5 million records which are include of numbers
that they are out of sequence (irregular)
you can find file structure below :

SQL
for instance          desired Result
------------          ---------------
 723,80                1,4
 14,50                 1,5
 723,2                 10,8
 1,5                   14,50
 10,8                  723,2
 1,4                   723,80



This structure displays bad condition and optimum condition and I
expect to reach the optimum

The most important (the main) tip :
I didn't use any techniques such as linq, ....
I want to do it with available algorithms and arrange the file.

furthermore (more over) the time should be considered
so, we need to use a proper algorithm to put the numbers in order
under a minute

Thanks
Posted
Updated 28-Dec-13 23:34pm
v2
Comments
BillWoodruff 20-Dec-13 12:49pm    
CodeProject has lots of good articles on sorting:

http://www.codeproject.com/search.aspx?q=c%23+quicksort&doctypeid=1
Philippe Mori 20-Dec-13 18:27pm    
In 1 minute, you can do a lot more than sorting 5 millions items. You should be able to sort them in a few seconds without much effort.

If the whole file would fit in memory, I'd read it all in with File.ReadAllLines().
Make 1 pass through all of the lines and build a parallel array that has the parsed numeric keys.
Then use Array.Sort<TKey, TValue>(TKey[] keys, TValue[] items) to do the sorting.
(This will sort both arrays together by the sort order of the keys array.)
Then rewrite the file using the sorted array of the lines.

If the whole thing will not fit in memory, then you could read the file line-by-line, extract and parse the key into an array, and simultaneously construct a parallel array of objects with the start byte position in the file and the byte length of each record.
Do the Array.Sort() as above then write a new output file, copying each record from the input file by seeking to the record position and copying length bytes to the output file.

[edit: Matt T Heffron]
From your comment it looks like you can pull the whole thing into memory, so modifying the code in your comment. I'm a little unclear on the nature of your sort key. It appears from the example in the question that you have floating point keys (shown with the "," as the decimal separator). However in your comment's code, you appear to be using only the integer before the first comma as the sort key, but, again, the original example implies a secondary sort based on the second number. I'll show this as sorting by the first integer column case, ignoring the second column:
C#
var lines = File.ReadAllLines(fileunordred);
int[] allCustomerIds = new int[lines.Length];  // make it the same length as lines
char[] splitter = new char[]{','};
for (int ix = 0; ix < ix.Length; ++ix)
{
  var splitLine = lines[ix].Split(splitter, 2);
  int customerId;
  if (!int.TryParse(splitLine[0], out customerId)
  {
    // error parsing the data, do something "sensible"
    allCustomerIds[ix] = -1;  // some value to indicate a "bad" row, to sort together
  }
  allCustomerIds[ix] = customerId;
}
Array.Sort(allCustomerIds, lines);
// the <int,int> is wrong 
// and <int,string> is unnecessary since the compiler can figure it out.

both arrays are now sorted in ascending numerical order by the customer id.
just use File.WriteAllLines("filename", lines) to make the sorted file.

[Edit #2: Matt T Heffron]
Mostly very similar to the above, but now it compares first by the first integer and then by the second integer within the group that has the same first integer.
C#
var lines = File.ReadAllLines(fileunordred);
int[] allInfo = new int[lines.Length];  // make it the same length as lines
char[] splitter = new char[]{','};
for (int ix = 0; ix < allInfo.Length; ++ix)
{
  var splitLine = lines[ix].Split(splitter);
  int[] pair= new int[2];
  allInfo[ix] = pair;
  int id;
  if (!int.TryParse(splitLine[0], out id))
  {
    // error parsing the data, do something "sensible"
    id = -1;  // some value to indicate a "bad" row, to sort together
  }
  pair[0] = id;
  if (!int.TryParse(splitLine[1], out id))
  {
    // error parsing the data, do something "sensible"
    id = -1;  // some value to indicate a "bad" row, to sort together
  }
  pair[1] = id;
}
Array.Sort(allInfo, (a, b) => {
  int comp = a[0].CompareTo(b[0]);
  return comp == 0 ? a[1].CompareTo(b[1]) : comp;
});
//Now just rewrite the file from the integers, (the lines array IS NOT sorted)
using (var out = new StreamWriter("outputfilename"))
{
  foreach (var pair in allInfo)
  {
    out.WriteLine("{0},{1}", pair[0], pair[1]);
  }
}

(I haven't actually tried this but it should be close...)
 
Share this answer
 
v4
Comments
log988 20-Dec-13 15:20pm    
I use this code but it gives me an error
Error:argument type string is not assignable to parameter type int[]
This line gives the error: Array.Sort<int,int>(customerId, orderId);



var lines = File.ReadAllLines(fileunordred);
foreach (var line in lines)
{
var splitLine = line.Split(',');
var customerId = splitLine[0];
var orderId = splitLine[1];

Array.Sort<int,int>(customerId, orderId);
}
Matt T Heffron 20-Dec-13 17:52pm    
I updated the solution to address this. But consider the comments re: the sort key.
log988 21-Dec-13 5:29am    
You wrote the code that does the wrong sort.
https://www.dropbox.com/s/nexiis9tp4jlle4/Capture.JPG
If I have to spend a lot of time to read line by line
Is that true?
In fact, if you are sorting.
1,1835
1,1000410
1,2001721
1,3000687
1,4000107
1,5000407
Matt T Heffron 23-Dec-13 15:21pm    
Well that's the best I could do with the information provided.
Your attempted solution showed splitting on the comma and then trying to sort on the first column.
From the image, what you want is a much simpler sort with the Whole line as the "key" and sorting as a floating point number (with the "," as the decimal separator).
I can rework this in awhile....
Matt T Heffron 23-Dec-13 16:10pm    
Or, if they really are two independent integers, you could sort as strings, but then "2,..." would sort after "10,..." and "1,1835" would sort after "1,1000410". Is that acceptable?
Quote:
I didn't use any tecniques such as linq, ....
I want to do it with available algorithms and arrange the file.

One of best sorting algorithms is quicksort[^]. Happy coding.
 
Share this answer
 
Comments
Maciej Los 24-Dec-13 2:06am    
+5!
CPallini 24-Dec-13 3:04am    
Thank you.
5 million records? Do you want to torture yourself using flat file to store large amount of data?

Firstly, i would suggest to change the way you store the data. But in a meanwhile - use OleDb. It should be the easiest and the quickest way to sort data. Please see:
Other Text File Driver Programming Details[^]
Examples:
My past answers[^]
How to: Use OleDb to import text files (tab, csv, custom)[^]
 
Share this answer
 
Comments
log988 24-Dec-13 1:40am    
I am not allowed to use oledb
Maciej Los 24-Dec-13 1:52am    
Why?
log988 24-Dec-13 2:04am    
We are told that none of the existing facilities can not be used and should we sort our data (this is an exercise), please see Solution 4.

Thank you
Maciej Los 24-Dec-13 2:08am    
Where it is? I don't see it. I see solution: 1,2,3,5...
log988 24-Dec-13 2:15am    
please see Solution 6
The DOS SORT utility appears to sort that as specified; have you tried it?

Or is this homework?
 
Share this answer
 
Comments
log988 20-Dec-13 12:46pm    
no I dont use Dos sort also it is not my homework...
PIEBALDconsult 20-Dec-13 12:50pm    
Try it to at least get a baseline of how long it takes.
I wrote a code that will sort based on decimals. The quick sort algorithm. But my issue is the removal of the number 0 in some decimals. Eg 723.1000 becomes 723.1 Numbers are important to me because I want to be displayed and stored as

Quote:@
Quote:
Matt T Heffron
From the image, what you want is a much simpler sort with the Whole line as the "key" and sorting as a floating point number (with the "," as the decimal separator).



C#
internal class Sorter
    {
        public string[] QuickSort(string[] array, int low, int high)
        {
            if (low < high)
            {
                int p = Partition(array, low, high);
                QuickSort(array, low, p - 1);
                QuickSort(array, p + 1, high);
            }
 
            return array;
        }
 
        private static int Partition(string[] array, int low, int high)
        {
            int left = low + 1, right = high;
            string temp;
            double pivot = double.Parse(array[low]);
            int piv;
 
            while (left <= right)
            {
                while (left <= right && Convert.ToDouble(array[left]) <= pivot)
 
               
                    left++;
 
                while (left <= right && Convert.ToDouble(array[right]) >= pivot)
 
                
                    right--;
 
                if (left < right)
                {
                    temp = array[left];
                    array[left] = array[right];
                    array[right] = temp;
                }
            }
            if (right == high)
                piv = high;
            else if (right == low)
                piv = low;
            else
                piv = left - 1;
            array[low] = array[piv];
            array[piv] = pivot.ToString();
            return piv;
        }
    }
 
Share this answer
 
Comments
Maciej Los 24-Dec-13 2:33am    
723.1000 becomes 723.1
Values are equal, no matter how many zeros you'll add at the end!

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900