|
|
I wish I could help answer, but the best I can do is verify that one CPU cycle does equate to one machine instruction / ASM mnemonic. How registers store values across cycles is beyond me, but an interesting idea to think about.
Jeremy Falcon
|
|
|
|
|
Nope. With modern CPUs it's actually possible for an instruction to be "ignored" because the execution preprocessor, which runs in parallel to the actual instruction execution unit, realizes that the instruction is an effective NOP. An example of this would be PUSH AX, POP AX, which older processors would dutifully execute and newer processors would simply cut out of the execution stream. Also, the vast majority of processor instructions, even on a RISC machine, take more than one clock cycle.
|
|
|
|
|
obermd wrote: Also, the vast majority of processor instructions, even on a RISC machine, take more than one clock cycle. But then again, thanks to pipelining, over time the average number of instructions executed per cycle is often close to one. Techniques such as 'hyperthreading' could even give you more than one instruction per cycle ... on the average, over time.
One consequence of all these speedup techniques, from pipelining/hyperthreading to speculative execution and extensive operand prefetching, is that the cost of an interrupt goes up and up, the more fancy the CPUs become. Some of it is (semi)hidden, e.g. when after interrupt handling you may have to redo the prefetch that was already done before the interrupt. The interrupt cost is more than the time from acknowledge of the interrupt signal to the handler return. You also have to count the total delays of the interrupted instruction stream, where several instructions might be affected.
At least early ARM processors were much closer to a 'direct' clock cycle to instruction relationship; its much more 'tidy' instruction set would allow it (and the gate count restrictions wouldn't allow all those fancy speedup techniques). This is compared to the x86 architecture and its derivatives, where you have to spend a million transistors on such functions to make the CPU fast enough.
Since the early ARMs, that architecture has been extended and extended and extended and ... Now it is so multi-extended that I feel I can only scratch the surface of the architecture. It probably, and hopefully, isn't (yet?) as messy as the X86 derivatives, but I am not one to tell. I fear the worst ...
I have a nagging feeling that if it was possible to start completely from scratch, it would be possible to build equally fast processors that didn't take a few billion transistors to realize. (Yes, I know that a fair share of those few billions go into the quite regular CPU caches - but those are part of the speedup expenses, too!) ARM was sort of 'a fresh start', but that is long ago. Multicore 64 bit ARM CPUs are quite different from those meant to replace 8051s ... I haven't had the time to look at RISC-V in detail yet; maybe that is another 'new start', not for replacing 8051, but aware of gigabyte RAM banks, 64 bit data and other modern requirement. I am crossing my fingers 
|
|
|
|
|
As you can probably tell, my info on this is a bit dated. Good to know though.
obermd wrote: Also, the vast majority of processor instructions, even on a RISC machine, take more than one clock cycle.
Do you mean for just one core? I was under the impression a single core still executes instructions one at a time.
Jeremy Falcon
|
|
|
|
|
> even on a RISC machine take more than one clock cycle
Which means that either the data is split in two (or more) pieces and the pieces pass through ALU one at a time or if the type of operation requires the result from a pass gets placed at the ALU entry point and the data receives one more pass
|
|
|
|
|
I guess that obermd is essentially referring to various effects of pipelining. Each individual instruction may spend 5-6 (or more) clock cycles total in various stages of processing: Instruction fetch, instruction decoding, operand fetch, *instruction execution*, storing results, ... The *instruction execution* (e.g. adding) is done in a single clock cycle, but that is only a part of the processing. In parallell with this instruction doing its add (say), the previous instruction is stuffing away its results, the next instruction is having operands fetched, the one two steps behind is being decoded and the one three steps behind is being fetched.
So the CPU may be doing one add (say) per cycle, and in the same cycle one of each of the other operations, on different instructions. For one given instruction, it takes a number of cycles to have all the different processing steps done, though.
|
|
|
|
|
Hi nice to meet you.
> I guess that obermd is essentially referring to
You are not too sure huh?
>clock cycles total
What I understand is that sometimes it takes one cycle to pass an instruction through an individual processing stage. There are several processing stages so it takes more than one cycle to land the operation result in the destination register.
modified 21-Jan-23 8:06am.
|
|
|
|
|
obermd wrote: An example of this would be PUSH AX, POP AX, which older processors would dutifully execute and newer processors would simply cut out of the execution stream. I wasn't aware of this optimization. What surprises me, though, is that compilers let anything like this through their optimization stages at the code generating level. More specifically: That code generators lets such things through so frequently that it justifies CPU mechanisms to compensate for the lack of compile time optimzation.
I guess it takes quite a handful of gates to analyze the instruction stream to identify such noop sequences and pick them out of the instruction stream. I was working (at instruction level, not hardware) on a machine which in its first version had a few hardware optimization for special cases, that were removed in later versions: The special cases occured so rarely that on a given gate budget (which you always have when designing a CPU), you could gain a much larger general speedup by spending your gates in other parts.
Do you have specific examples of CPUs that eliminate 'noop sequences' like the one you describe? (Preferably with link to documentation.)
|
|
|
|
|
Way, way off when talking about modern CPUs.
|
|
|
|
|
|
There is a constant DC charge across the whole chip, so the clock is not like the tide going in and out.
The clock is more like the wave on top of a deep body of water.
Start with digital logic and logic gates.
There used to be some software “work benches” where you could wire up circuits.
|
|
|
|
|
Back in the day 1970-80's, in grad school, we had to implement in software, a simulation of the hardware to execute CPU instructions such as division and multiplication. Sort of microcode. RISC was the latest and greatest back then, so our exercise was for such a processor. It was a bear of a project, at first, but we experienced compound learning, like compound interest. CPU's can be both complex and simple.
"A little time, a little trouble, your better day"
Badfinger
|
|
|
|
|
Circa 87-88 we used a software package to do the same.
We used it to build classics like Mark I, Eniac, etc
Then you had to write a small assembly program to run on it.
Hats off to some of the people that made real programs on the real hardware with a handful of op codes.
I think it took about 30-40 op codes of self modifying code to sum an array.
|
|
|
|
|
Yeah, we also had to write a simple compiler to create the homemade assembly. The course required 2 semesters. Loved it though. Learned so much.
"A little time, a little trouble, your better day"
Badfinger
|
|
|
|
|
Reminds me of a simulator I was in touch with (as the councellor for a student project using it - the students did all the work ), for the 8051. So it was not a general simulator but specific for this one controller. This had the great advantage that the simulator knew all the internal working of the chip and could provide a graphical display of how a pulse flowed from one internal unit to the other on, as we single stepped through the clock cycles.
The simulator was so well made that even when you single stepped by clock pulses, it managed to generate 'real' output on the PC's COM port, where another PC was hooked up for displaying it (and also provide character input). 8051 is a simple processor, but an actual, industrial level one that I believe is still in use (at least it was five years ago). It is not a toy, not historical (well, that may be argued), not experimental - that gives the experience some real value.
I am surprised that you were able to obtain the schematics for Mark I and Eniac, though! Mark I wasn't an electronic computer, but built from relays. So I guess you had to re-interpret the relay signals and mechanical equipment controlled by the relays as if they were logic gates ... Must have been fun! For being curious: With the Eniac, did you model it valve-by-valve (there were 18000+ of them!), or did you see them as groups performing a function (such as a flip-flop) to model that function as such, independent of the original valve realization of the same? (Since both Mark I and Eniac were decimal machines, not binary ones, I guess you couldn't condsider adders etc. as logical units, simulating them by binary addition!)
Do you happen to have any links to this software package (I suspect that would be to a technical museum!), and to the Mark I and Eniac descriptions you based your implementation on?
|
|
|
|
|
I tried googling it but could not find it.
It was fairly simple.
I remember that you declared registers with bit width, something like:
ACC<1:12>
PC<1:10>
RAM<1:12>[0:1024]
And then you would define the op codes somehow. And how they affected the declared registers and memory.
I think the Mark I or II had a few op codes:
1. Load from memory address into ACC
2. Store inverse/negative from ACC to memory
3. Add from memory into ACC
4. Skip next address if ACC is zero
5. JMP / set PC
6. HALT
I do remember our final project was to create a single instruction multiple data grid computer.
|
|
|
|
|
In my student days - this was in 1979 - one of the lab exercises were with an AMD 2901 Evaluation Kit. The 2901 was a 4-bit "bit slice" ALU, with carry in and out, so you could hook two of them together for an 8-bit machine, four for a 16-bit or 8 for a 32-bit. We had only a single one.
With the ALU came a memory for 64 words of 16 bit microcode: Flip 16 switches, press Deposit, flip again, press Deposit ... 64 times to fill the entire microcode memory. We hooked up each of the 16 bits to the control lines for the ALU: Load accumulator from bus, dump accumulator to bus ... actually, today I have only a vague memory of what the control signals were. The 'sequencer' was a separate chip that selected one word from microcode RAM, transferring it to the ALU control inputs. It had a microcode address counter; one of the control signals incremented this counter.
We did succeed in microcoding an instruction for reading four switches (the "bus") as data, adding another 4 bit value, and display the result on 4 LEDs (plus one for the carry line).
This was an exceptionally valuable lab exercize to learn what (an extremely simpified) CPU is like in its very basic mechanisms. If 2901 Evaluation Kits were still on the market, I would recommend it to anyone who wants a true hands on experience with a CPU. (If you happen to find one on eBay: Be prepared to do some thorough studying of the ALU before trying to microcode it; microcoding is not to be don on intuition!)
Of course: Anything like the 2901 kit can teach you only the basic techniques of simple, unsophisticated computers, the way they were built in the old days. I see other people refer to 'modern' CPUs as if they have little to do with what an evaluation kit can teach you - but you can immediately forget jumping directly onto a 'modern' CPU. It is so complex, contains so many fancy tricks for speeding it up, that you will be be completely blown down. Better start with something that you have a chance to really understand, and then add the fancy techniques one by one. If you get as far as to thoroughly understand even a third of them, you will be qualified as Chief Engineer at AMD or Intel Or, to phrase it differently: Don't expect to understand the fancy techniques. You may get as far as to understand what they want to achieve, but don't expect to understand how.
|
|
|
|
|
I remember bit slicing processing and the 2901 set. We also do some similar exercises. I agree, modern CPU's are much more complex, but the AMD 2901 exercises really help one to understand the basics that almost all CPU's have in common. I also agree, micro-coding is not intuitive and that was why we were getting a lesson in it.
WOW blast from past. Thanx for sharing that.
"A little time, a little trouble, your better day"
Badfinger
|
|
|
|
|
all your assumptions are both correct, and are over-simplifications of what modern CPU's are, and do.
maybe do some reading on Von Neuman architecture [^], and what a Turing Machine is [^].
«The mind is not a vessel to be filled but a fire to be kindled» Plutarch
|
|
|
|
|
|
Doesn't exactly anwser your question, but Ben Eater has a series of videos where he builds an 8-bit computer on breaboard starting with a chip
Ben Eater Build an 8-bit computer from scratch
Gives a lot of insight into the workings, there are a couple of specific videos below, but the whole thing is fascinating.
The first thing is the CPU is powered - the microchip has a ground and dc positive pin. If you think about it interms of electronics, what is a "0" or "1": 0 is easy, it's ground voltage, but 1 needs to be voltage close to a reference - which the +ve provides. This dc reference voltage also provides the power to keep the values in the register. There isn't any flushing - the Clock (which is just a square wave) just gets the CPU to cycle.
“Hello, world” from scratch on a 6502 — Part 1 - YouTube
This video tells you how CPUs actually execute machine code
How do CPUs read machine code? — 6502 part 2 - YouTube
Hope this helps - the videos are right on the edge of electronics / programming. Bill Woodruff's suggestion about the von Neumann architecture and Turing machines is excellent, it explains how we got to 8 bit chips
|
|
|
|
|
Dolly Parton is 77 today.
I'm not really a fan of her music, but I adore her just the same.
Edit: I just found out:
She's getting into the Rock & Roll Hall of Fame.
Her reaction was "I guess I better actually make a Rock & Roll album". And she is.
Ha!
To err is human. Fortune favors the monsters.
|
|
|
|
|
She is a legend for sure. Love the music she made with Linda Ronstadt and Emmylou Harris.
"the debugger doesn't tell me anything because this code compiles just fine" - random QA comment
"Facebook is where you tell lies to your friends. Twitter is where you tell the truth to strangers." - chriselst
"I don't drink any more... then again, I don't drink any less." - Mike Mullikins uncle
|
|
|
|
|
She's an example of how celebrities should be.
Jeremy Falcon
|
|
|
|
|