| Instruction frequence is defined as maximum instructions per second with dependencies between all instructions, while average instruction frequence is higher. |
But the cycle time is much lower, and it limit the jump frequence. The jump
frequence is not importent, and in most cases will it be possible to fold
out loops, and to obtain a even higher jump frequence to about n times the
cycle frequence.
| The main trick is to to change design style to not use registers, and to not read / write data to register file for each instruction. This allows much higher bandwitdh. However, the design style is compatible with current processors by adding a register file for long time storage. |
Changing design style allows higher frequence because the processors not need to store / read the data in a register file. Instead they read the result directly from a processor.
Using dynamic logic result
in significant higher speed because only n-transistors are used to lead
current, and they are more strong, and more fast than p-fets. When a
processor is precharging is the other processors used at the precharge time.
It is same idea as when using more memory bank to obtain high speed from
dynamic memories.
| Only few transitors delay are needed between each instruction. And the processor not need clock. |
It is used more processors, and it is possible to take the result directly from a processor. Using dynamic logic makes the forward delay in processors very fast, and when using more processors are some precharged while other works.
Using more processors also improves the performance, because more
processors work together, and will cause a higher speed at no dependencies.
The idea works for both synchronous and asynchronous design style, but using
asynchronous logic makes it more easy to obtain out of order execution.
Inserting waits will not be needed.
| Using asynchronous design style solves the problems. No clock is needed, automatic out of order is obtained and there are no need for advanced logic for generating waits. The asynchronous design style only use the time needed. |
The main problem with using asynchronous logic was to make it working with
register based instruction sets. It needed a common register file for all
processors, and it was not easy to make complete asynchronous. In fact, it
was not needed either. The register file is not asynchronous, but self-timed.
| To obtain it compatible with register based processors, is the instructions converted by an instruction compiler before entering the code in cache. |
Because using a self-timed register file, is it importent not to store data
at same time at same cell, which result in an internal short-circuit. Since
more processors works at same time is it needed a space between writes. It
is guarenteed by the preprocessing into the cache, and will not limit the
speed.
The next problem is the order of reading from the self-timed register file.
It is handled by the preprocessor too. It guarentee that all data is read
from outputs of the processors or from a pipeline, before reading from
registers. This will guarentee the order of execution. And because it is
done before the cache, is there no delay for organizeing.
| It was not possible to make a fast register file for more processors. And recompiling the instructions was developed to solve it. |
Instruction preprocessing is controled by software. The idea with implementing in software is to reduce the cost of development. There are no hardware developing needed, and software is not as expensive. It is easy to change or update, even after computers are sold. It is possible to make much more advanced decoding in software, and it is possible to emulate advanced processor instructions. It is also possible to change the software to work with lot of processors, or to make it possible to update by user. It makes it more complex, and may make much better scheduling than any hardware is able to do. And it is possibel to reduce cost of changing for new versions. There is no hardware to modify and software are more easy to change. The processor may even be compatible with more processors, and the software makes the instruction set more complex and able to change with the processor emulating.
In this case, is the scheduling very complex because it need to