|
David Crosby, legendary Crosby, Stills & Nash and Byrds founder, dead at 81[^] His personal life aside, he had a wonderful voice and was part of some great music.
"the debugger doesn't tell me anything because this code compiles just fine" - random QA comment
"Facebook is where you tell lies to your friends. Twitter is where you tell the truth to strangers." - chriselst
"I don't drink any more... then again, I don't drink any less." - Mike Mullikins uncle
|
|
|
|
|
|
Great LP, great song.
"the debugger doesn't tell me anything because this code compiles just fine" - random QA comment
"Facebook is where you tell lies to your friends. Twitter is where you tell the truth to strangers." - chriselst
"I don't drink any more... then again, I don't drink any less." - Mike Mullikins uncle
|
|
|
|
|
One of my all time favorite artists - a legend, gone too soon.
/ravi
|
|
|
|
|
Saw him live in Austin TX, 198??
Saw him again live in Austin TX, 2012
Every bit as good live as their records.
He lived longer than most expected.
"A little time, a little trouble, your better day"
Badfinger
|
|
|
|
|
turn, turn, turn ...
Crosby played rhythm and did vocals on 'Turn [^]
«The mind is not a vessel to be filled but a fire to be kindled» Plutarch
modified 20-Jan-23 2:04am.
|
|
|
|
|
A wonderful arrangement of that song, the vocals were always great.
"the debugger doesn't tell me anything because this code compiles just fine" - random QA comment
"Facebook is where you tell lies to your friends. Twitter is where you tell the truth to strangers." - chriselst
"I don't drink any more... then again, I don't drink any less." - Mike Mullikins uncle
|
|
|
|
|
Cop: that's not how field sobriety tests work.
To err is human. Fortune favors the monsters.
|
|
|
|
|
|
From what I understand there are several types of components in a processor: in some components the data is persistent in time , it survives more than one or several CPU pulses. My guess is the registers fall in this category.
Another category is the transistors, the data from the transistors is flushed when the oscillator briefly cuts the power off.
In terms of a classical 8 bit processor (the modern processors have all the bells and whistles which makes them difficult to understand) it takes one current pulse to process one line of assembly code. When the transistor web is flooded with current math takes place and the result ends up in the registers, when the next flood takes place the current flows through transistors in a pattern dictated by the next line of ASM code picking up data saved in registers in previous floods while doing so
How accurate is this? I’m bringing a 8 bit processor into discussion not because I’m sure how it works but because it should be simple compared to the other ones.
|
|
|
|
|
I recommend reading "Code" by Charles Petzold
|
|
|
|
|
|
I wish I could help answer, but the best I can do is verify that one CPU cycle does equate to one machine instruction / ASM mnemonic. How registers store values across cycles is beyond me, but an interesting idea to think about.
Jeremy Falcon
|
|
|
|
|
Nope. With modern CPUs it's actually possible for an instruction to be "ignored" because the execution preprocessor, which runs in parallel to the actual instruction execution unit, realizes that the instruction is an effective NOP. An example of this would be PUSH AX, POP AX, which older processors would dutifully execute and newer processors would simply cut out of the execution stream. Also, the vast majority of processor instructions, even on a RISC machine, take more than one clock cycle.
|
|
|
|
|
obermd wrote: Also, the vast majority of processor instructions, even on a RISC machine, take more than one clock cycle. But then again, thanks to pipelining, over time the average number of instructions executed per cycle is often close to one. Techniques such as 'hyperthreading' could even give you more than one instruction per cycle ... on the average, over time.
One consequence of all these speedup techniques, from pipelining/hyperthreading to speculative execution and extensive operand prefetching, is that the cost of an interrupt goes up and up, the more fancy the CPUs become. Some of it is (semi)hidden, e.g. when after interrupt handling you may have to redo the prefetch that was already done before the interrupt. The interrupt cost is more than the time from acknowledge of the interrupt signal to the handler return. You also have to count the total delays of the interrupted instruction stream, where several instructions might be affected.
At least early ARM processors were much closer to a 'direct' clock cycle to instruction relationship; its much more 'tidy' instruction set would allow it (and the gate count restrictions wouldn't allow all those fancy speedup techniques). This is compared to the x86 architecture and its derivatives, where you have to spend a million transistors on such functions to make the CPU fast enough.
Since the early ARMs, that architecture has been extended and extended and extended and ... Now it is so multi-extended that I feel I can only scratch the surface of the architecture. It probably, and hopefully, isn't (yet?) as messy as the X86 derivatives, but I am not one to tell. I fear the worst ...
I have a nagging feeling that if it was possible to start completely from scratch, it would be possible to build equally fast processors that didn't take a few billion transistors to realize. (Yes, I know that a fair share of those few billions go into the quite regular CPU caches - but those are part of the speedup expenses, too!) ARM was sort of 'a fresh start', but that is long ago. Multicore 64 bit ARM CPUs are quite different from those meant to replace 8051s ... I haven't had the time to look at RISC-V in detail yet; maybe that is another 'new start', not for replacing 8051, but aware of gigabyte RAM banks, 64 bit data and other modern requirement. I am crossing my fingers 
|
|
|
|
|
As you can probably tell, my info on this is a bit dated. Good to know though.
obermd wrote: Also, the vast majority of processor instructions, even on a RISC machine, take more than one clock cycle.
Do you mean for just one core? I was under the impression a single core still executes instructions one at a time.
Jeremy Falcon
|
|
|
|
|
> even on a RISC machine take more than one clock cycle
Which means that either the data is split in two (or more) pieces and the pieces pass through ALU one at a time or if the type of operation requires the result from a pass gets placed at the ALU entry point and the data receives one more pass
|
|
|
|
|
I guess that obermd is essentially referring to various effects of pipelining. Each individual instruction may spend 5-6 (or more) clock cycles total in various stages of processing: Instruction fetch, instruction decoding, operand fetch, *instruction execution*, storing results, ... The *instruction execution* (e.g. adding) is done in a single clock cycle, but that is only a part of the processing. In parallell with this instruction doing its add (say), the previous instruction is stuffing away its results, the next instruction is having operands fetched, the one two steps behind is being decoded and the one three steps behind is being fetched.
So the CPU may be doing one add (say) per cycle, and in the same cycle one of each of the other operations, on different instructions. For one given instruction, it takes a number of cycles to have all the different processing steps done, though.
|
|
|
|
|
Hi nice to meet you.
> I guess that obermd is essentially referring to
You are not too sure huh?
>clock cycles total
What I understand is that sometimes it takes one cycle to pass an instruction through an individual processing stage. There are several processing stages so it takes more than one cycle to land the operation result in the destination register.
modified 21-Jan-23 8:06am.
|
|
|
|
|
obermd wrote: An example of this would be PUSH AX, POP AX, which older processors would dutifully execute and newer processors would simply cut out of the execution stream. I wasn't aware of this optimization. What surprises me, though, is that compilers let anything like this through their optimization stages at the code generating level. More specifically: That code generators lets such things through so frequently that it justifies CPU mechanisms to compensate for the lack of compile time optimzation.
I guess it takes quite a handful of gates to analyze the instruction stream to identify such noop sequences and pick them out of the instruction stream. I was working (at instruction level, not hardware) on a machine which in its first version had a few hardware optimization for special cases, that were removed in later versions: The special cases occured so rarely that on a given gate budget (which you always have when designing a CPU), you could gain a much larger general speedup by spending your gates in other parts.
Do you have specific examples of CPUs that eliminate 'noop sequences' like the one you describe? (Preferably with link to documentation.)
|
|
|
|
|
Way, way off when talking about modern CPUs.
|
|
|
|
|
|
There is a constant DC charge across the whole chip, so the clock is not like the tide going in and out.
The clock is more like the wave on top of a deep body of water.
Start with digital logic and logic gates.
There used to be some software “work benches” where you could wire up circuits.
|
|
|
|
|
Back in the day 1970-80's, in grad school, we had to implement in software, a simulation of the hardware to execute CPU instructions such as division and multiplication. Sort of microcode. RISC was the latest and greatest back then, so our exercise was for such a processor. It was a bear of a project, at first, but we experienced compound learning, like compound interest. CPU's can be both complex and simple.
"A little time, a little trouble, your better day"
Badfinger
|
|
|
|
|
Circa 87-88 we used a software package to do the same.
We used it to build classics like Mark I, Eniac, etc
Then you had to write a small assembly program to run on it.
Hats off to some of the people that made real programs on the real hardware with a handful of op codes.
I think it took about 30-40 op codes of self modifying code to sum an array.
|
|
|
|