|
If You Give A Mouse A Cookie*
I'm on a path to do a weird thing: generate the SHA256 hash of every file on my computer (store filename and hash in db).
To do that, I started thinking about how I might set multiple instances of the process to do a portion of the work so it'll be (overall) faster. I figure I could write every directory on my system to a sqlite db then let numerous processes grab a folder, get all files, independently of each other.
That made me start wondering about how much data it would be to store every directory on my system in a sqlite database.
Wrote A Quick C# Program
I wrote a quick little C# program that:
1. user gives starting path
2. program iterates through every directory
3. writes each directory to the sqlite db.
Fast Iteration, Slow Insert
I (of course) discovered :
1. it's super fast to iterate over the directories.
It can iterate over 239618 directories in my Linux user directory in a few seconds.
2. It's super slow to use EntityFramework to insert those dir names into the sqlite db.
Super slow means it takes more than 10 minutes.
Two Weird Parts (but maybe expected)
So, instead of inserting the records into the db directly I write data to a file (yes,it's 239,618 lines long)
Data looks like (pipe delimited):
239609|/home/fakepath/faker|2024-10-15 17:16:27
Weird 1
C# iterates all those directories and writes to file in 2-3 seconds
Weird 2 (also kind of wonderful)
sqlite imports that data (over 239 thousand rows) in less than 1 second on the SQLite command line:
> .import allPaths.dat finfo
Import command takes file name with data and the target table name (finfo)
I'm figuring this will make a lot of people say, "yeah, I figured so".
*In that book, one thing leads to another. Give a mouse a cookie, he'll want some milk. To get the milk you'll have to...
modified 9hrs 20mins ago.
|
|
|
|
|
raddevus wrote: *In that book, one thing leads to another. Give a mouse a cookie, he'll want some milk. To get the milk you'll have to...
Which leads to "If you give a Pig a Pancake" and "If you give a Moose a Muffin"
I’ve given up trying to be calm. However, I am open to feeling slightly less agitated.
I’m begging you for the benefit of everyone, don’t be STUPID.
|
|
|
|
|
Reading the book, Fundamentals of Software Architecture: An Engineering Approach 1st Edition[^] by Neal Ford & Mark Richards and I stumbled upon the following in _Chapter 11. Pipeline Architecture Style_:
Quote: Donald Knuth was asked to write a program to solve this text handling problem: read a file of text, determine the n most frequently used words, and print out a sorted list of those words along with their frequencies. He wrote a program consisting of more than 10 pages of Pascal, designing (and documenting) a new algorithm along the way. Then, Doug McIlroy demonstrated a shell script that would easily fit within a Twitter post that solved the problem more simply, elegantly, and understandably (if you understand shell commands):
tr -cs A-Za-z '\n' |
tr A-Z a-z |
sort |
uniq -c |
sort -rn |
sed ${1}q
Software Developer: Architecture can cost you money
Software Architect: It's how I feed my family.
|
|
|
|
|
It is a funny story but I'm sure you realize some details got swept under the rug. Beside, from the blurbs on the Amazon page you linked:
Quote: Everything in software architecture is a trade-off. First Law of Software Architecture
Some trade-offs:
1. The pipeline solution works only in *nix environments while Knuth's algorithm can probably be implemented on many platforms.
2. tr , uniq , sort and sed were all written by someone; maybe some of their cost should be added to the cost of the pipeline solution. Or, if the problem of finding most frequent words turns out to be important and frequently used, Knuth's program might become a new utility — dek . The new solution would than be "just use the dek command".
3. The anecdote doesn't say how big the text file is and how fast the solution should run. If you have to find the most frequent words in Encyclopedia Britannica in under 10ms, I doubt the pipeline solution would win the day.
|
|
|
|
|
Very good points!
Everything is a trade-off.
|
|
|
|
|
Did you know there are Deconstructors in C#?
Did you know they were called that?
class Rectangle
{
public readonly float Width, Height;
public Rectangle (float width, float height)
{
Width = width;
Height = height;
}
public ~Rectangle(){
Console.WriteLine("In destructor...");
}
public void Deconstruct (out float width, out float height)
{
width = Width;
height = Height;
}
}
Here's a driver for the code:
var r = new Rectangle(15, 30);
var (width, height) = r;
Console.WriteLine($"{width} : {height}");
If you don't have the deconstructor in your class and you try the destructuring you will get an error like:
C# Compiler: (1,6): error CS8130: Cannot infer the type of implicitly-typed │
│ deconstruction variable 'width'.
And, here is an example of Destructuring in JavaScript...
const person = {
firstName: "John",
lastName: "Doe",
age: 50
};
let {firstName, lastName} = person;
console.log(firstName);
consol.log(lastName);
FYI - The Rectangle example with the Deconstructor is from chapter 3 of
C# 12 in a Nutshell: The Definitive Reference[^] which I'm reading right now.
Hello, Software-Dev-Terms, could you be any more confusing? Could you, though?
Maybe add in the word Destructurizification.
modified 26-Sep-24 15:30pm.
|
|
|
|
|
never had an opportunity to develop in C#, but I'm not dead yet.
In the past, I seem to recall that one of the "C# is better than C or C++" is automatic deconstruction or memory management. What did I miss?
Charlie Gilley
“Microsoft is the virus..."
"the problem with socialism is that eventually you run out of other people's money"
|
|
|
|
|
Generally in C++ you had a constructor to build the object.
But then your object may have allocated memory for something it needed.
In that case it was necessary to deallocate memory when the object went off the stack.
In such a case you could write a destructor which had syntax exactly like constructor but included a prefixed ~ (tilde).
public ~Rectangle(){}
That insured that when the object is no longer referenced the code in the destructor would run automatically.
Also, in C++ the destructor ran "deterministically" -- when you set your object ref to null.
In C# it runs non-deterministically (whenever system or .NET CLR says memory needs cleaned).
In C# they followed this pattern so that if you had a file open or some other resource you could insure that the code to close the file would run when the object no longer had a reference.
Of course, C# manages memory (via GarbageCollector -GC) and so the destructor only runs once the GC runs collect() method.
You can
1. set the object ref to null
2. call (global) GC.Collect() manually but not reccommended in production code
And you'll see the destructor run in C#.
Destructors are obviously used less in the C# world than in C++ world.
|
|
|
|
|
Charlie Gilley
“Microsoft is the virus..."
"the problem with socialism is that eventually you run out of other people's money"
|
|
|
|
|
raddevus wrote: Did you know there are Deconstructors in C#?
Yes, ever since I read the first spec back in 1999.
Never had a use for one though.
|
|
|
|
|
EDIT (NOTE):
I'm leaving the note below (I was going to delete it.)
I saw some people saying, "Oh, you could use the Deconstruct method back in C# 1.0" in some StackOverflow answers.
That is odd to me. Everything else I searched would reveal the C# 7.0 usage and nothing further back.
So, you may be entirely correct. And, you may have had a very deep understanding of this Deconstructor Pattern. I never had heard about it before C# 7.0 & modern JavaScript use of destructuring.
FYI - I even asked Copilot about it and Copilot has no knowledge of it before C# 7.0 & everything it talks about refernces C# 7.0. I even tried just finding the "Deconstructor Pattern"
Anyways, you may be correct and the Internet is revising it so we forget about the old truths. It's crazy.
Hope you find this discussion interesting.
raddevus wrote:Did you know there are Deconstructors in C#?
PIEBALDconsult wrote: ever since I read the first spec back in 1999.
Well, errr... looks like you got caught on the word too.
You may have read about destructors back in 1999, but you didn't read about deconstructors in 1999.
Deconstructors were just added in C# 7.0:
Here's a snippet from "when things were added" section in C# 12 In a Nutshell:
C# Nutshell C# 7 introduced the deconstruction syntax for tuples (or any type with a Deconstruct method). C# 10 takes this syntax further, letting you mix assignment and declaration in the same deconstruction
This is my point. The terms are so close they muddle up everything.
I also went back through my O'Reilly online bookshelf and examined the each C# In A Nutshell book (from C# 5 on...)
Deconstructor is not mentioned until --
Quote: C# 7.0 In A Nutshell
: Deconstructors
C# 7 introduces the deconstructor pattern. Whereas a constructor typically takes a set of values (as parameters) and assigns them to fields, a deconstructor does the reverse and assigns fields back to a set of variables.
modified 27-Sep-24 14:40pm.
|
|
|
|
|
Oh! That! OK, yes, -- mea culpa -- I misread and didn't make the connection to Tuples.
I try not to use new C# features, so I've been using (one or two) features of v7 (and v6) for only a few months now.
Marc Clifton pointed out something about them earlier this month -- The Lounge[^] -- and I quickly ran into an error, something along the lines of : "No Deconstruct method found for type ..." (maybe I'll try to reproduce the error later, but I probably won't).
My reaction is to avoid using that feature if it means additional invisible method calls which can easily be avoided by other techniques. I still prefer avoiding using tuples in favor of custom classes in most cases.
In fact, after Marc's post, I decided to review my uses of Tuples in code I've written over the past few months and found that I wasn't using any -- I thought I was using Tuples in two particular situation, but I had already switched to a classes.
For the most part, I use a Tuple as a "that's good enough for now, I'll fix it later" technique. But if I used Tuples more, I would probably use that feature and maybe, that feature will make Tuples more palatable in situations in which I might use a Tuple.
I'll also add, that it's a case in which the Method being called has to be written for it and then the caller has to abide by it. From what I can tell, in a case in which you're calling a third-party API, the caller has no say in whether or not to use the technique.
|
|
|
|
|
Thanks MS. My summary of this discussion is that it should not be this complicated.
Any system - compiler or otherwise - should have well defined behavior. Leaving ANYTHING to Microsoft to determine "well defined behavior" is a recipe for disaster.
Charlie Gilley
“Microsoft is the virus..."
"the problem with socialism is that eventually you run out of other people's money"
|
|
|
|
|
|
Purely from an english language perspective, to me:
destructor is something that destroys, i.e. smashes, obliterates...
deconstructor deconstructs, i.e. disassembles, takes apart, leaving the components presumably intact and usable
destructuring sounds like something politicians do (or talk about).
Software rusts. Simon Stephenson, ca 1994. So does this signature. me, 2012
|
|
|
|
|
raddevus wrote: If you don't have the deconstructor in your class and you try the destructuring you will get an error
Unless you have a deconstructor extension method, which can be handy for third-party classes that don't provide them:
public class Rectangle(float width, float height)
{
public readonly float Width = width, Height = height;
}
public static class Shapes
{
public static void Deconstruct (this Rectangle rectangle, out float width, out float height)
=> (width, height) = (rectangle.Width, rectangle.Height);
}
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
|
|
|
Apple or low budget, choose one
GCS/GE d--(d) s-/+ a C+++ U+++ P-- L+@ E-- W+++ N+ o+ K- w+++ O? M-- V? PS+ PE Y+ PGP t+ 5? X R+++ tv-- b+(+++) DI+++ D++ G e++ h--- r+++ y+++* Weapons extension: ma- k++ F+2 X
The shortest horror story: On Error Resume Next
|
|
|
|
|
That's cool.
The iPhone support RAW video, so they can tweak and color correct in post-prod.
I assume they did test runs and it works for their workflow.
Remember that most movies have very short cuts, there is no problem with overheating or storage or draining the batteries too much.
I see no issues with that.
CI/CD = Continuous Impediment/Continuous Despair
|
|
|
|
|
There is no problem with the quality of the video, the iPhone has a good reputation, but compared to video camera's with a lens mount the choice of lenses is very limiting I think.
But as the saying goes:
Quote: in the limitation the master shows himself
|
|
|
|
|
of course.
I assume they use different cameras for specific shots.
but if it works for them ...
CI/CD = Continuous Impediment/Continuous Despair
|
|
|
|
|
Adding some cooling and keeping the phones charged with a battery pack is doable anyway. I like this kind of experimentations, they are really an artistic expression of the highest magnitude.
GCS/GE d--(d) s-/+ a C+++ U+++ P-- L+@ E-- W+++ N+ o+ K- w+++ O? M-- V? PS+ PE Y+ PGP t+ 5? X R+++ tv-- b+(+++) DI+++ D++ G e++ h--- r+++ y+++* Weapons extension: ma- k++ F+2 X
The shortest horror story: On Error Resume Next
|
|
|
|
|