Click here to Skip to main content
15,912,329 members

Welcome to the Lounge

   

For discussing anything related to a software developer's life but is not for programming questions. Got a programming question?

The Lounge is rated Safe For Work. If you're about to post something inappropriate for a shared office environment, then don't post it. No ads, no abuse, and no programming questions. Trolling, (political, climate, religious or whatever) will result in your account being removed.

 
GeneralRe: 2,314 Exabytes Pin
dandy7219-Dec-18 11:23
dandy7219-Dec-18 11:23 
GeneralRe: 2,314 Exabytes Pin
Mark_Wallace19-Dec-18 21:09
Mark_Wallace19-Dec-18 21:09 
GeneralRe: 2,314 Exabytes Pin
dandy7221-Dec-18 2:33
dandy7221-Dec-18 2:33 
GeneralRe: 2,314 Exabytes Pin
Nathan Minier19-Dec-18 2:57
professionalNathan Minier19-Dec-18 2:57 
GeneralRe: 2,314 Exabytes Pin
dandy7219-Dec-18 3:40
dandy7219-Dec-18 3:40 
GeneralRe: 2,314 Exabytes Pin
Nathan Minier19-Dec-18 4:30
professionalNathan Minier19-Dec-18 4:30 
GeneralRe: 2,314 Exabytes Pin
dandy7219-Dec-18 11:13
dandy7219-Dec-18 11:13 
GeneralRe: 2,314 Exabytes Pin
kalberts19-Dec-18 4:36
kalberts19-Dec-18 4:36 
How much of it is unique information? (That's a rhetorical question - I don't expect anyone to know).

On any computer I have been in touch with (including my own home PC), a significant percentage of the disk is occupied by duplicated files, or duplicated parts of files. Open-source header files for C are notorious: They may have a 50+ lines license text, followed by a couple (I have seen cases of one) line of declaration; the 50+ lines are identical in all the files. Some libraries put each function definition in a separate .c file, repeating the license text from the header file. The .c file has more useful information than the .h file, but very often less lines than the license.

If you do program development: Chances are that you will find the same utilities in several directories. If you keep your photos on the PC, chances are that you have quite a few duplicates (unless you are a very orderly person Smile | :) ). How many backups of the same, unchanged file do you have on a multitude of USB sticks, CDs, external hard disks etc? And so on and so on.

I will spend Christmas vacation completing my self-made deduplicating backup program. Deduplication is at the file level, not disk page level, so it will save no space for those extensive license headings. But once I started looking around, I realized that at least half of my disk space is taken up by file duplicates. At work, we have some huge file servers running deduplicating at page level; those responsible for it estimates that it reduces the requrements for real disk space to less than a third.

For even more savings, a lot of information could be encoded more efficiently. Sound and video have come a long way with compression. Lots of medical information is still in uncompressed text form; much of it could be codeified, and what must remain as text compressed using well known methods. (This is on the way in, but far from completed.)

Database files are notoriously huge; they often compress quite well - if you compress them. And lots of database developers (of schemas, not code) have not been drilling in database normalization: The same attributes are repated in two, three or more tables.

Disk is so cheap nowadays that noone worries. Terabytes, petabytes, exabytes ... that's all "quite big", with little further distinction. Until you come back from vacation with ten packed 256 GB memory cards holding your HD movies, and start copying them to the hard disk (and maybe your card reader is a USB 2 device)...

Ten years ago, developers stopped being concerned about CPU load: Just buy a faster CPU! Who cares abotu O()? The same is now happening with disk space - who cares about duplication? I disklike both thrends: Algorithmic complexity is still essential, and efficient data structures with low redundancy is still essential.
GeneralRe: 2,314 Exabytes Pin
dandy7219-Dec-18 11:18
dandy7219-Dec-18 11:18 
GeneralRe: 2,314 Exabytes Pin
kalberts19-Dec-18 17:03
kalberts19-Dec-18 17:03 
GeneralRe: 2,314 Exabytes Pin
Mark_Wallace19-Dec-18 23:58
Mark_Wallace19-Dec-18 23:58 
GeneralRe: 2,314 Exabytes Pin
dandy7221-Dec-18 2:29
dandy7221-Dec-18 2:29 
GeneralRe: 2,314 Exabytes Pin
rtischer827720-Dec-18 2:01
rtischer827720-Dec-18 2:01 
GeneralRe: 2,314 Exabytes Pin
PeejayAdams19-Dec-18 4:35
PeejayAdams19-Dec-18 4:35 
GeneralRe: 2,314 Exabytes Pin
Mike Hankey19-Dec-18 8:59
mveMike Hankey19-Dec-18 8:59 
GeneralRe: 2,314 Exabytes Pin
dandy7219-Dec-18 11:20
dandy7219-Dec-18 11:20 
GeneralRe: 2,314 Exabytes Pin
Mycroft Holmes19-Dec-18 14:59
professionalMycroft Holmes19-Dec-18 14:59 
GeneralRe: 2,314 Exabytes Pin
Mike Hankey19-Dec-18 15:12
mveMike Hankey19-Dec-18 15:12 
GeneralRe: 2,314 Exabytes Pin
Tomz_KV20-Dec-18 2:03
Tomz_KV20-Dec-18 2:03 
GeneralRe: 2,314 Exabytes Pin
agolddog20-Dec-18 3:19
agolddog20-Dec-18 3:19 
GeneralRe: 2,314 Exabytes Pin
obermd20-Dec-18 3:33
obermd20-Dec-18 3:33 
GeneralRe: 2,314 Exabytes Pin
JoeSox20-Dec-18 3:56
JoeSox20-Dec-18 3:56 
GeneralRe: 2,314 Exabytes Pin
matblue2520-Dec-18 4:11
professionalmatblue2520-Dec-18 4:11 
GeneralRe: 2,314 Exabytes Pin
Kent K20-Dec-18 7:10
professionalKent K20-Dec-18 7:10 
GeneralNow, that's service! Pin
OriginalGriff19-Dec-18 1:08
mveOriginalGriff19-Dec-18 1:08 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.