|
Neither Windows nor Linux do well when putting too many files in a single folder. I've tried it with a million files, it is very painful. Some operations, like simply listing the directory, or even trying to delete the files take absurdly long.
It seems to be doing some operations that are simply not designed for large numbers of files.
Like said, around 10,000 files in a folder is a reasonable max. I simply make it 1,000. So for a million files, spread them across 1,000 folders. There is a nice symmetry here, and it works like a charm.
|
|
|
|
|
A few years ago I worked on a system that generates around 50.000 to 100.000 files a day.
We ran in trouble right away.
Storing the files was not a problem, but retrieving them was impossible.
And a second problem was that we needed to search the contents of the files to find all files with a certain string in the text.
We eventually choose to store all files in a database. This was quite easy because the files were small. (Less than 10K)
We choose an Oracle database because of the CLOB datatype. (it allows for indexing and searching)
We had no problems since and have more the 200 million files.
|
|
|
|
|
I worked on a system that had to stream 1MB images to disk at 75fps. I found that once there were about 700 files in a directory, creating new files suddenly became slower and the required transfer rate was unachievable. I ended up creating a new subdirectory every 500 files.
Of course this won't be a problem if your system is purely for archive.
|
|
|
|
|
I don't know about access issues for a large number of files in a directory, but you might also consider security issues.
If, for example, you have several different users whose files should not be accessible by the others, creating a subfolder for each user might allow you to secure them such that only their user has access to their subfolder (plus maybe some 'admin' user that you use which can see all directories). Obvious organizational advantages as well.
|
|
|
|
|
It really depends on your use case for accessing/managing these files. If you're going to be enumerating the files a lot (or portions of the files) then everything in one directory/folder may not be the best. You can at least "chunk up" the enumeration by subfolder if you create those.
Also, if you break them up into subfolders in some logical way, then managing those units and/or groupings of files will become much easier. I.E. Backups, restoring, archiving, deleting.
If you are storing the path to each file in a database, then you're going to get the same performance either way (subdirectories and everyone in the pool together).
Can you explain a little more about the repository and how you'll be using it?
|
|
|
|
|
Consider drive corruption, backups, replication, file listeners, aging/document retention and all of the other access aspects as well. Folder per day/month/year can help out with some of those items as suggested on another post.
|
|
|
|
|
You might get by if using SSDs, or files are large and accessed directly & infrequently, and won't increase by orders of magnitude.
Better to spread them out.
Huge directories in NTFS:
* Accessing individual files is OK
* Adding/removing/listing/sorting gets slow (consider EnumerateFiles instead of GetFiles)
* Reading metadata (mod date) is slow (makes Explorer detail view slow)
* Network access is slower
* Defragging directories (with contig) helps some (also moving large dirs with robocopy /create)
Directories (and empty/tiny files) are stored in the MFT.
A massive number of MFT entries can be a problem.
The MFT starting size is set when (and only when) you format the disk (controlled by a registry key). It will expand if needed (but fragment), and will contract (if possible) when space is low.
Defragging MFT is possible but slow and difficult.
After a disk was full of files, or had the MFT filled by directories or tiny files -- it may be best to reformat.
How to segment depends on how sparse the file IDs will be.
About 4k entries is a good starting target.
If files have numeric IDs: Avoid bit shifts, for simplicity.
Group into 3 digits (base 10) = 1000 files + 1000 subdirs
or 3 hex chars 0xFFF = upto 4k files + 4k subdirs
eg.
000/0.dat - 999.dat
001/1000.dat - 1999.dat
999/999000.dat - 999999.dat
...
000/001/1000000.dat - 1000999.dat
001/001/1001000.dat - 1001999.dat
123/987/987123000.dat - 987123999.dat
|
|
|
|
|
Explorer does two things. Read the entries and sort them. Looks like reading is linear and sorting too. So you get N^2 time behavior. This didn’t change for decades.
Some file systems allow accessing the files with a kind of pointer, avoiding the directory once you know the pointer. Nevertheless, adding and deleting files still has to touch the directory.
Looking up directory names has the same problem. So better construct a directory “tree”.
File size doesn’t matter for name lookups.
|
|
|
|
|
A while ago I dropped half a mug of coffee over my cheap USB keyboard and of course it died. Been given a Corsair K68 RGB, a mechanical keyboard (+ non spill mug). Sturdy keys, good for bulk typing. So, few days ago, the unthinkable happened - I dropped a full mug of coffee on the new keyboard.
..and it still works, without flaws. There's an anti-spill rubber between the keys and the internals, and there's a metal covering under that which protects the electronics. I'm convinced that nothing is idiot proof and I will find a weak spot, but for now, I'm pretty impressed. It's easily taken apart and cleaned and might actually last longer than a year.
YouTube showing rain on a keyboard[^]
So, what's your keyboard?
Bastard Programmer from Hell
If you can't read my code, try converting it here[^]
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
Really boring - a MS 600 I bought after my Logitech died. And I only got the Logi because after 20 years of abuse the keytop legends had nearly all worn off and Herself complained that she didn't know where the letters were ...
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
OriginalGriff wrote: after 20 years of abuse the keytop legends had nearly all worn off
What a wastrel! Don't you know that you can get the keyboard re-engraved?
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
That would have cost more than the keyboard did!
It was a cheap one I bought my first "fast" computer (to play Doom2 properly): A DX4/100 with ... gasp ... 16MB of RAM.
For the yoof, that's not a misprint: 16 megabytes of RAM, and 100 megahertz processor (running an internal clock tripler from the 33MHz bus speed).
It was blindly fast, for it's time.
The keyboard survived many upgrades that nothing else from the original did ...
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
Dude, I had the DX2/66 clock DOUBLER! Good times, state if the art MAG monitor, thought I was in the future...
|
|
|
|
|
Did your case have the jumpers so you could set the "speed" you wanted - especially to annoy the gullible when you set your Turbo switch (remember those as well!) to show 128 instead of 66 ...
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
I honestly don't recall the jumpers. This was my very first PC and I wanted whatever the Computer Shopper said was cool. The Pentium had already peaked when these new "overclockers" were starting to come out. I knew nothing but that I was future-proofing by investing in the "next big thing". But I do remember the prominent turbo push button and wondering why you would ever not want it on - it wasn't as if running it at a blistering 66 made the lights dim.
|
|
|
|
|
Something from digital (DEC) probably made in the 90s. 101 keys, as the Maker intended.
modified 11-Jul-20 12:49pm.
|
|
|
|
|
Is the control key in the right place? I have to remap my keyboard to get the control key next to the 'A' key. I"m an old fart and still use WordStar control sequences for text editing.
|
|
|
|
|
Sadly, it's a pretty standard IBM-style 101-key keyboard. It can be difficult to use with DEC software, such as no proper "Gold" key, etc.
|
|
|
|
|
I use to have a programmable mechanical keyboard (can't remember brand) that I got in the early 80's just to be able to reprogram the control key. It eventually died but I was able to write a TSR to remap the Control key. Eventually with Windows I found a keyboard mapper utility (Ziff-Davis) that lets me do the remap. The remap utility still works under Windows 10.
|
|
|
|
|
Razer Black Widow.
Colorful and clicky!
The difficult we do right away...
...the impossible takes slightly longer.
|
|
|
|
|
Same!
Colorful, clicky, and loud!
|
|
|
|
|
Also mechanic, cheaper - but not anti spill
Bastard Programmer from Hell
If you can't read my code, try converting it here[^]
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
I have a Razer Blackwidow too, but I'm not all that impressed. Oh, it's very pretty, but I've only had it a year or two and some of the key mechanisms have already become flaky. I'm currently suffering through an update to Synapse that ate my simple configuration file and makes it harder to create a new one. One other thing, I can type so fast on the Razer that I make mistakes that I don't make on other keyboards. I'm not sure it's a net-positive.
|
|
|
|
|
SeattleC++ wrote: I can type so fast on the Razer that I make mistakes
C'mon, that's hardly the keyboard's fault.
The difficult we do right away...
...the impossible takes slightly longer.
|
|
|
|
|
It isn't exactly the keyboard's fault, but if good typists make more mistakes on this keyboard than on a slower keyboard, is it the best tool?
I'm just reporting my experience. You can make of it what you will.
|
|
|
|