Click here to Skip to main content
15,887,683 members
Please Sign up or sign in to vote.
5.00/5 (1 vote)
See more:
Hi, I want to ask: What is the best way that can be used to represent a huge matrix that contains digits to be processed? It is too big that it can be processed in the main memory at once.
Posted
Updated 26-May-10 22:47pm
v2
Comments
Moak 27-May-10 4:50am    
Updated question, perhaps narrow down which programming language you want to use.

1 solution

It all depends on what you want to do with your matrix and what sort of data it holds.

If it's a sparse matrix then you can represent it fairly efficiently as a std::map<std::pair<unsigned, unsigned>, double>.

If you want sequential access then store your data in a file and use an fstream or an istream_iterator if you're feeling flash.

If you're doing clustered access have a look at memory mapped files. This enables you to treat a chunk of a file as memory and map it into the address space of your process.

If you want random access with a working set bigger than your current system memory then whatever data format you use you're going to be stuffed, it's going to be slow. You might be able to use some form of database style hashing or indexing to alleviate this but it really depends on your application.

Cheers,

Ash
 
Share this answer
 
v3
Comments
Member 7203335 27-May-10 6:34am    
Thank you Ash,

The data I am going to process are rows of data tuples from a database, that I need to measure the distance between these attributes to measure the similarity. You may say that it's better to do it in with the SQL, That's right, but this process is only a part of a complete algorithm that will use the results during the run time [let's say i.e to do clustering]. So it's not a sparse matrix even.

Thanks,
Ibrahim
Aescleal 27-May-10 7:14am    
BIG IF warning... this post contains several ifs that may not apply!

If the access from the database is sequential you could try wrapping an SQL cursor up in a C++ input iterator. That'll save you a fair amount of space as you don't read the entire record set at once. And you can probably use something like std::transform to stick your results straight out to a file for further processing, avoiding yet more memory use.

No idea if this helps at all...

Cheers,

Ash

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900