|
Hi,
Files can be stored on the drive in multiple ways... here are some examples that will cause differing 'Size on Disk'.
FileA.txt stored on Drive A:
If you store a 1KB file (such as small source code file) on a NTFS partition with 4KB (default) sector size... then the 1KB file has a 4KB 'Size on Disk'. The sector unfortunately has 3KB of wasted space.
FileB.txt stored on Drive B:
If you copy that 1kb file and paste it onto an external drive... and that drive saves the file directly into the $MFT then that 1KB file takes 1KB of space.
Right clicking in Explorer on the file in FileA.txt will have '4KB Size on Disk' and right clicking on FileB.txt it will have '1KB Size on Disk'.
Multiply that 3KB difference by a few thousand small source code files and your folders can have large differences in the size reported on disk.
Best Wishes,
-David Delaune
|
|
|
|
|
Randor wrote: FileB.txt stored on Drive B:
If you copy that 1kb file and paste it onto an external drive... and that drive saves the file directly into the $MFT then that 1KB file takes 1KB of space. That 1K in the MFT is spent in both cases, for the metadata. On disk A the total space consumption is 5K, but you won't notice it because the MFT is allocated in huge chunks, one really huge one when the volume is formatted. That 1K entry is just a block within that already allocated file.
Furtermore: If the file metatdata requires, say, 400 bytes, there is only 600 bytes left for file data, and a 1K file won't fit in but requires a separate 4K disk page - or whatever the allocation size is for that disk. On my 32 GB memory stick, for some reason the allocation size is 16K. (I could reformat it, but if I fill it up with 32 GB, that will be with huge files where 8K space loss at the end, on the average, is a drop in the ocean. The benefit of fewer units to retrieve outweighs the space loss.)
|
|
|
|
|
Member 7989122 wrote: That 1K in the MFT is spent in both cases, for the metadata.
You are almost correct.
Any and all files saved on an NTFS partition will result in a 1024 byte MFT entry. So it would be correct to say that all files saved on your NTFS partition have an additional overhead of 1024 bytes for the MFT entry. But if we want to get more technically correct... it *could* result in an additional 2048 bytes if we consider the $MFTMirror.
Member 7989122 wrote: Furtermore: If the file metatdata requires, say, 400 bytes, there is only 600 bytes left for file data, and a 1K file won't fit in but requires a separate 4K disk page - or whatever the allocation size is for that disk.
Correct, what happens in this case is that the filesystem driver saves the file directly into a cluster. If that cluster happens to be allocated at 16K then you'll have 15K wasted. + 1K for the MFT overhead and a possible entry into the $MFTMirror for another 1K.
Keep in mind that the topic here is 'Why does explorer report different 'Size on Disk' sizes?'
The answer is that explorer looks at NTFS cluster size, and also whether or not that file has been saved into a cluster. It will correctly round up to the cluster size to take into account the wasted space. If the file has not been saved into a disk cluster it reports the size of a single MFT entry (only on NTFS). Explorer does not even consider the $MFTMirror or take that into account.
There are even more caveats... on a regular file saved into a disk cluster... Explorer only reports the size of the disk clusters and does not report the 1K MFT overhead.
Best Wishes,
-David Delaune
|
|
|
|
|
Not necessarily.
One possibility is that the cluster size of the disks differs. The cluster size is the smallest unit that Windows will allocate on a disk, and if this differs between the disks, the amount of disk space taken by a file may differ, too.
Another possibility is that the disks don'the use the same file system. If one uses NTFS and the other uses FAT32, the meta data stored for each file differs in size, and so will the total space taken.
Lastly, your old disk may have unused areas within the meta data files - over time, files were deleted, created, and some of the meta data area was left unused. When copying the files to a new disk, any such gaps are eliminated.
If you are still worried, run CHKDSK on the disk. This should tell you if any errors exist. If they do, run CHKDSK /F, which will fix the errors.
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
Daniel Pfeffer wrote: The cluster size is the smallest unit that Windows will allocate on a disk ... because the space for the directory is always the same size, no matter how many sectors the disk has. Things quickly become wasteful when your drive (or partition) is too large and when you have many small files.
I hope I find a good balance when I get to implement my file system on the computer I'm building. 8 bit computers usually have small files, but the 'drives' will be up to 128 Gb.
I have lived with several Zen masters - all of them were cats.
His last invention was an evil Lasagna. It didn't kill anyone, and it actually tasted pretty good.
|
|
|
|
|
As far as the file system is concerned, directories are only files with special contents. The only exception is the root directory, which is treated differently on FAT file systems. As such, these "files" are subject to the same growth limitations as other files.
Even the ext family of file systems treats directories as "files". They, of course, store most essential metadara in inodes.
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
CodeWraith wrote: ... because the space for the directory is always the same size, no matter how many sectors the disk has. Usually.
When a disk is NTFS formatted, a sizable chunk is set aside for the directory, or "Master File Table" (MFT) - usually a lot more than you need. But it could overflow, and in that case, NTFS will extend the MFT, occupying more space. That doesn't happen every day, though.
|
|
|
|
|
Daniel Pfeffer wrote: The cluster size is the smallest unit that Windows will allocate on a disk, and if this differs between the disks, the amount of disk space taken by a file may differ, too.
Which is not completely true on NTFS partitions where thousands of small files can be stored directly into the MFT. Cluster size on NTFS can be 4KB and files <=1KB can be stored directly into the MFT.
However your statement is true for FAT,FAT16,FAT32 exFAT and ReFS.
Best Wishes,
-David Delaune
|
|
|
|
|
True, but my message was already more than long enough, without going into the more esoteric details of NTFS.
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
|
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
Well... Files significantly less than 1K may be stored in MFT. Each directory entry is alotted a 1K block, and a varying part of this block is used for file metadata. Whatever space is left after the metadata has taken their part can be used for file data that fits in. If it doesn't fit in, it all goes to a 4K disk page.
If you have a lot of extended attributes for a file, the 1K block may overflow and a second one linked to it. But in no case will you have a full 1024 bytes for file data in the MFT; just "the rest of the block".
|
|
|
|
|
Hi,
You are absolutely correct that the Microsoft NTFS filesystem driver requires that the file be significantly less than 1K to be stored in the MFT. The file attributes and metedata need a place to live!
Best Wishes,
-David Delaune
Edit:
I am the author of multiple filesystem filter drivers. Some of my cryptic statements are a result of having too much experience in this area. My reason for stating <= 1KB is from my experience of saving a 1MB file in 1000 MFT entries. Why do such a dumb thing? For science! (security research)
Best Wishes,
-David Delaune
modified 21-May-18 19:13pm.
|
|
|
|
|
Daniel Pfeffer wrote: Lastly, your old disk may have unused areas within the meta data files - over time, files were deleted, created, and some of the meta data area was left unused. When copying the files to a new disk, any such gaps are eliminated. That wouldn't afect the "Size on disk" for each file, though.
If you copy to an identically sized disk, it might give you more free space on the disk - but usually not from a better packed MFT! The MFT, with the metadata, is allocated as one huge chunk when the disk is formatted. For identical disks (and using default formatting parameters) they should be the same size. The cleanup gathers the that used records within that file at the start, unused records at the end, but the MFT file has the same file. Only if you at one time have had an excessive number of files on the old disk, so that the MFT had to be extended, may the new disk get away with the orignal MFT size, before it was extended.
There are, however, several other files that normally won't be copied over to the new disk, such as the recycle bin, maybe not the indexing data if your original file was indexed, desktop.ini files (holding icons used by the Explorer) and several other hidden/system files. You may actually loose some data, too, without noticing: Some files may have alternate data streams, and the copyuing method determines if they are preserved. If you go via a FAT disk, they are surely NOT preserved. (On the other hand: Alternate streams are rarely used, and when they are, it is often supplementary data you can do without. It is not as it was on the original Mac OS, before they went Unix, where "file forks" were in common use - forks and alternate streams are similar.) If you copy via FAT, such as by a memory stick, you will also loose any extended metadata - but only in exceptional cases will that save you any disk space when copying the files from the memory stick to then new NTFS disk.
If there old disks had a lot of hidden files, was indexed, and the waste basket was full, then your new disk may have hundred of megabytes of more free space. But each individual file will have the same on-disk size as long as the allocation size is the same.
|
|
|
|
|
Member 7989122 wrote: That wouldn't afect the "Size on disk" for each file, though.
No, it wouldn't. It would, however, affect the total used space on the disk. I doubt the OP compared file sizes for each of the (possibly millions) of files on his disk.
Member 7989122 wrote: The MFT, with the metadata, is allocated as one huge chunk when the disk is formatted.
Wrong. The MFT is allocated with a small initial size (256 entries, in my latest tests), and is expanded as necessary. It is never shrunk. Other metadata - the bitmap, the log, and possibly others are allocated at a fixed size.
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
Funny that I less than two weeks ago had the same question myself: I copied a fairly large set of small files requiring 6 GBytes on my NTFS disk to a memory stick, where it required 10 GBytes of space.
It burned down to the 32 GByte memory stick being FAT32 formatted with an allocation size of 16 Kbytes. So even the smallest file would require 16 KB on the stick, while it might fit into the MFT entry on NTFS.
I have no memory of formatting the memory stick myself; I think it must have been formatted by the manufacturer. You might argue that if you need a 32GB stick, you probably fill it with media files - sound, photo, video - where 16 KB allocation size is "small" compared to the files sizes, and requires fewer write operations and less index handling. On the avaerage, that holds true for my use of the stick. Increasing the space use by 65% is a very special situation, and after all: Why worry about 4 Mbyte space loss on a 32 GByte medium?
|
|
|
|
|
The software ideologies thread below got me thinking about the flexibility of some people and how well I work with them.
I just got out of a team who wouldn't want to start working on a user story until every little detail was crystal clear (but they still called it agile).
For example:
The story: As an admin I want to see the address of a user on the user overview page.
The questions: How do you want the address formatted? Where on the form do you want to see the address? Do you want address and house number in one field or in two separate fields? Do you want to see the address in that other page too?
That's a very short and simple story, but notice the amount of questions that HAD to be answered before they could put it in a sprint.
Personally, I'd put it in the sprint and I'd ask the story owner about the first three questions. That last question is out of scope, this is about the user overview page, not the other one too.
My current job is the exact opposite.
They want an entire application, possibly with about 10 third party tools, but they don't even know which ones yet (this has been going on for about a year).
And now I can start building... Figuring out the requirements as I go (which is probably about 50% of the job).
For Juun Software, my own business, I got a job "here's a fairly complicated Excel spreadsheet, we want that in an application. Good luck."
Alright, I'll build something, advice about certain subjects, and rebuild it if it's not what the customer wants after all.
Of course with everything clear up front you can make an estimate, you can build what is asked and you'll never have a dispute about if what you built is what they wanted.
With nothing clear it's pretty much impossible to estimate, things will take a lot more time, if your customer is an a**hole he'll give you a hard time for not building what he wanted or taking too long and without the right people these projects tend to fail miserably.
And then there's everything in between.
Personally, I like it when there's some work left for me to figure out. The more the better, like my current project.
Some people really can't deal with that uncertainty though (and when working with them I often find myself thinking "just start building already!").
What's your ideal situation?
|
|
|
|
|
Sander Rossel wrote: How do you want the address formatted? It is often faster to simply add a setting for those than to go through the ladder and get opinions on how stuff should be formatted. "Just make it configurable" is a good answer to most of those questions, and a good way to teach them not to ask stuff they can answer themselves.
Sander Rossel wrote: Of course with everything clear up front you can make an estimate, you can build what is asked and you'll never have a dispute about if what you built is what they wanted. Even if the specs are 100% clear, that hardly means that the estimate is correct. I wouldn't wait for the specs to be at 100%; even when old-fashionably waterfalling that's a bit too much (and yes, I still do).
As long as your customer realizes that changes will also mean that the estimates change, you don't need to pour your specs into concrete. Code is very mallable in that respect, cement isn't.
Bastard Programmer from Hell
If you can't read my code, try converting it here[^]
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
Eddy Vluggen wrote: It is often faster to simply add a setting for those Only if something somewhere is already configurable
On my current project everything is configurable, as it should be.
I'm not even halfway and it already paid itself back when the customer said "oh yeah, we were thinking we also want this and that" and I was like "you can add it yourself, it's all configurable"
Eddy Vluggen wrote: Even if the specs are 100% clear, that hardly means that the estimate is correct. It will be when you add the manager modifier of "developer's estimate x 2"
|
|
|
|
|
80/20 rule.
Deliver the bulk of the requirements, the must have, the basic functionality, and put the details, the 'nice to haves' till later. Dont let the second delay the first, thats the key for me.
And every plan can be changed, just make a start, make a decision and head in that direction. If it needs to change mid course, thats OK, at least progress is made.
Thats the way I design SW and run projects, and why I love the scientific method so much. Newtons laws arent perfect, but they are good enough to take man to the moon. For sat nav we need Einstains laws.
'Dont let lack of perfection hold you back' is my lesson from that!
|
|
|
|
|
Munchies_Matt wrote: 'Dont let lack of perfection hold you back' is my lesson from that! An ex-employer used to say "perfection is the enemy of good enough"
|
|
|
|
|
Let me add that to my list of favourite quotes!
|
|
|
|
|
Good enough begins to make money. Perfection just keeps costing money. This is a fundamental that separates businessmen from programmers.
I'm pretty sure I would not like to live in a world in which I would never be offended.
I am absolutely certain I don't want to live in a world in which you would never be offended.
Freedom doesn't mean the absence of things you don't like.
Dave
|
|
|
|
|
Mind you, those dealing with problems and failures, such as lawyers and doctors, make even more money than those who do good, such is the human condition.
|
|
|
|
|
DRHuff wrote: This is a fundamental that separates businessmen from programmers. Almost doesn't count*
Depends upon who decides it good enough, doesn't it!
Something about per-ordained third-rate quality that goes against my grain.
Possibly because, although I love getting my check, it's never been about the money. The work you do is your art. Do you want to be proud of what you did or proud of what you got away with?
* (yeah, I know, except in horseshoes)
Ravings en masse^ |
---|
"The difference between genius and stupidity is that genius has its limits." - Albert Einstein | "If you are searching for perfection in others, then you seek disappointment. If you are seek perfection in yourself, then you will find failure." - Balboos HaGadol Mar 2010 |
|
|
|
|
|