|
George_George wrote: How do you think of this solution.
I tested it with 16,000,000 integers and it took ~1 second.
"Normal is getting dressed in clothes that you buy for work and driving through traffic in a car that you are still paying for, in order to get to the job you need to pay for the clothes and the car and the house you leave vacant all day so you can afford to live in it." - Ellen Goodman
"To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne
|
|
|
|
|
Hi DavidCrow,
I can not believe your system is so fast -- looks like some super computers. What machine are you using?
regards,
George
|
|
|
|
|
George_George wrote: What machine are you using?
It is a Dell Optiplex 170L with a 2.4GHz CPU and 1GB of RAM. I also tried it on an IBM ThinkPad A21m with an 800MHz CPU and 512MB of RAM. It took ~4 seconds.
"Normal is getting dressed in clothes that you buy for work and driving through traffic in a car that you are still paying for, in order to get to the job you need to pay for the clothes and the car and the house you leave vacant all day so you can afford to live in it." - Ellen Goodman
"To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne
|
|
|
|
|
Thanks DavidCrow,
It is appreciated if we could go back to the topic that the size of data (tens of G, not several) is not enough to hold at one time in physical memory. Any good ideas are appreciated.
regards,
George
|
|
|
|
|
George_George wrote: Any good ideas are appreciated.
One has already been offered. See here.
"Normal is getting dressed in clothes that you buy for work and driving through traffic in a car that you are still paying for, in order to get to the job you need to pay for the clothes and the car and the house you leave vacant all day so you can afford to live in it." - Ellen Goodman
"To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne
|
|
|
|
|
Thanks DavidCrow,
I am stilling looking for more efficient solutions, how do you think using nth_element? I could save time to do sorting, since I only want the 1000 largest elements, no need to sort them.
regards,
George
|
|
|
|
|
George_George wrote: I am stilling looking for more efficient solutions...
Unless you have actual metrics from each solution, how do you know what is efficient and what is not?
George_George wrote: how do you think using nth_element?
How do you plan on using that with multiple files? At some point in time, you'll need the largest 1,000 from each file in memory, or in a separate file, so that the largest 1,000 from everything can be selected.
"Normal is getting dressed in clothes that you buy for work and driving through traffic in a car that you are still paying for, in order to get to the job you need to pay for the clothes and the car and the house you leave vacant all day so you can afford to live in it." - Ellen Goodman
"To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne
|
|
|
|
|
Thanks DavidCrow,
I appreciate your comments above. I must make my hands dirty to evaluate further.
regards,
George
|
|
|
|
|
So what have you discovered with this?
"Normal is getting dressed in clothes that you buy for work and driving through traffic in a car that you are still paying for, in order to get to the job you need to pay for the clothes and the car and the house you leave vacant all day so you can afford to live in it." - Ellen Goodman
"To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne
|
|
|
|
|
Hi,
I didn't check the correctness of the answers above. But you might use the auto sort mechanisme of std::map.
std::map<int, int> theMap;
int data[10];
data[0] = 1;
data[1] = 7;
data[2] = 3;
data[3] = 8;
data[4] = 1;
data[5] = 9;
data[6] = 12;
data[7] = 11;
data[8] = 13;
data[9] = 6;
for(int iIndex = 0; iIndex < 10; iIndex++)
{
if(theMap.size() < 4)
{
theMap[data[iIndex]] = data[iIndex];
}
else
{
if(data[iIndex] > theMap.begin()->second)
{
theMap[data[iIndex]] = data[iIndex];
theMap.erase(theMap.begin());
}
}
}
I don't know if this is performant on millions of data, but its a simple straightforward algorithm .
codito ergo sum
|
|
|
|
|
Thanks BadKarma,
Your solution works if we could contain all the data into memory (a Set instance), but in my situation the data size can not be afforded in memory. It is appreciated if you could share some insights with us.
regards,
George
|
|
|
|
|
In my solution i only contain 4 integer in memory.
The example does use an int array but that is just to provide me with input datan this data could alsoo be extracted from a file or any other means of input.
codito ergo sum
|
|
|
|
|
Thanks BadKarma,
I understand your solution now. Do you have any ideas to make improvements?
regards,
George
|
|
|
|
|
Hello!
I`m not that of an expert to provide scientific solutions, but having your problem and trying not to take into account what others have said before (nth_element and Map don`t tell anything to me, yet), I`m thinking at two solutions:
You say that you have the data into files. Well then, why not reading them line by line (or using a buffer, I don`t know the format of your file(s)) and by this way you will only work on several MBs of RAM, the rest is either read or will be read, but stored into that file.
--
For example, use a buffer of 5000 elements. Read the first 5000 elements. While reading them, you could already start building the Highest1000 vector (which I also suggest you sort it after adding one value, for efficiency). Read the next 5000 elements... modify the Highest1000 vector... and so on.
---
It might take some time, i know, but you don`t kill your computer and ... if you need the application to do something else while sorting, try using a multithreaded application.
--Even more effective it would be a thread to do reading/writing aspects, and another one to do the sorting and interplacing the new values (I`m afraid I cannot provide so much information on how to make the 2 treads work synchronous - global variables is all that comes to me)
The 2nd solution is actually adding to the first one, by saving your partial results into another file. So: if now let`s say you have 1G of data, save the highest 1000 highest values, and the next time you run your app and this time you have 2Gs of data, address the rest (which is 1G, again).
If timing is such a concern, you should think that actually this is the way Windows overcomes the lack of RAM, by writing data to HDD. It`s true that the system might get slow while reading/writing into files, but that`s the nature of your application. When working with such amounts of data, you cannot expect a pocket pc to do it while serving other applications and doing it fast. You say that you have 2G of memory, which is enough ...
Please provide feedback, I`ve been thinking to your problem and I`m interested in your result.
Best Wishes,
Shpid3r
PS: sorting is done the fastest by splitting in 2: let`s say the first position of Highest1000 vector is the lowest value, and that the last value (while adding, so not the 1000th) is the highest. Then, compare with the last one, then with the one at the half, and so on...
|
|
|
|
|
Thanks Shpid3r!
I like your multithread idea, really cool.
regards,
George
|
|
|
|
|
Adding threads to an application rarely makes it faster on a single CPU machine. In that scenario, context switching will be the bottleneck. Stick with solving the problem in the manner that is easiest for you rather than add the complexity of more threads.
"Normal is getting dressed in clothes that you buy for work and driving through traffic in a car that you are still paying for, in order to get to the job you need to pay for the clothes and the car and the house you leave vacant all day so you can afford to live in it." - Ellen Goodman
"To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne
|
|
|
|
|
Yes, DavidCrow, my desktop is single CPU. I will write a program to make a benchmark to see whether multithread is better. I am not sure whether in-memory sorting could be doing at the same time when I/O. Could I/O be done without control of CPU, so that CPU could be freed to do sorting? I am not sure.
regards,
George
|
|
|
|
|
hai george.
here i have worked it out using linked lists and iam posting the algorithm here.
it took 30 seconds (approx) to find the largest 1000 values from a chunk of 10 crore random values.
iam waiting for your comments to improvise the logic.
<small>and my sincere apologies to the members if my post is too big</small>.
struct node
{
struct node *prev;
unsigned long value;
struct node *next;
};
struct node *ll,*prevnode,*first,*last;
ll=(struct node *)malloc(sizeof(struct node));
ll->prev=NULL;
ll->value=0;
first=ll;
prevnode=ll;
//creating 1000 nodes and inserting 0s in all the nodes.
for(int i=1;i<1000;i++)
{
struct node *newnode;
newnode =(struct node *)malloc(sizeof(struct node));
prevnode->next=newnode;
newnode->prev=prevnode;
newnode->next=NULL;
newnode->value=0;
prevnode=newnode;
}
last=prevnode;
// reading the values from the file and inserting them at the appropriate slots.
FILE *fp;
fp=fopen("E:\\chandu\\values","rb");
for(;;)
{
unsigned long j;
fread(&j,4,1,fp);
if(feof(fp))
break;
if(j<=last->value)
{
//do nothing. just ignore it because, it nomore fits into the top 1000 list.
}//end if
else if(j>=first->value)
{
//then it is the most largest element sofar, and add it at the beginning
struct node *newnode;
newnode =(struct node *)malloc(sizeof(struct node));
newnode->prev=NULL;
newnode->value=j;
newnode->next=first;
first->prev=newnode;
first=newnode;
//deleting the last node, since a new element has entered the list.
struct node *temp;
temp=last;
last->prev->next=NULL;
last=last->prev;
free(temp);
}//end else if.
else
{
//traversing till the appropriate place and inserting it there.
struct node *curnode;
curnode=first;
while(curnode->value>j)
curnode=curnode->next;
struct node *newnode;
newnode =(struct node *)malloc(sizeof(struct node));
newnode->next=curnode;
newnode->value=j;
newnode->prev=curnode->prev;
curnode->prev->next=newnode;
curnode->prev=newnode;
//deleting the last node, since a new element has entered the list.
struct node *temp;
temp=last;
last->prev->next=NULL;
last=last->prev;
free(temp);
}//end else.
}//end for
--------------------------------------------
Suggestion to the members:
Please prefix your main thread subject with [SOLVED] if it is solved.
thanks.
chandu.
-- modified at 7:21 Saturday 10th November, 2007
|
|
|
|
|
chandu004 wrote: it took 30 seconds (approx)...
Which is pitifully slow. I created a linked list in C that found the largest 1,000 elements out of 1,000,000 possibilities in ~2 seconds. Changing that to 2,000,000 possibilities upped the time to ~3 seconds.
"Normal is getting dressed in clothes that you buy for work and driving through traffic in a car that you are still paying for, in order to get to the job you need to pay for the clothes and the car and the house you leave vacant all day so you can afford to live in it." - Ellen Goodman
"To have a respect for ourselves guides our morals; to have deference for others governs our manners." - Laurence Sterne
|
|
|
|
|
Hi chandu,
One minor comment to the implementation organization. I think you can merge the implementation logics of two blocks,
1. else if(j>=first->value)
2. else
into one block, right?
regards,
George
|
|
|
|
|
hi...
i have to get a string via text(edit)box.
then i have to convert it to TCHAR.
because my operation should be do in TCHAR.
then i have to do some operation.
last i to covert once again it to string and display in a edit box.
please anybody can help me?
regards,
Paulraj.G
paulraj
|
|
|
|
|
You already asked this question yesterday, what was the problem with the answers ?
Anyway, a very complete link about CString management (and how to convert them) can be found here[^]
|
|
|
|
|
I FOLLEWED YOUR WAY.
BUT THE ANSWER WILL BE LIKE "䚚秅嬸해㺌ຮ晗"
paulraj
|
|
|
|
|
No need to shout.
What did you do exactly ? Post some relevant snippet of code.
|
|
|
|
|
Probably you made mistake(s) on that way,
maybe posting your code snippet will help.
If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler.
-- Alfonso the Wise, 13th Century King of Castile.
|
|
|
|
|