The question "what is the problem with R10K (or R12K) in SGI O2 and Indigo2 workstations" have been asked so often, that I decided to put this page together. Here is information that I have collected over time, through use of google (why haven't other people done that for themselves is beyond me). One good explanation of R10000 speculative execution issue is in message to port-sgimips mailing list of NetBSD from Jeff Smith.
In short problem may be summarized as follows: R10K may start executing load/store instructions that wouldn't be started by normal program flow (this can happen due to mispredicted brunch). In cache-coherent system it would just cancel this instruction later on, and no one would notice anything. In non cache-coherent systems like O2 and I2, it is impossible to cancel loads or stores. That may result in cache line being marked as dirty. If there is DMA going on to the same location during that time, data will be lost after write-back. Speculative loads are not as much of a problem - they can be taken care of by extra cache invalidation op after each DMA.
Following is how I see workaround for O2 myself (it will not work with I2, because it relies on special HW features present in O2 to take care of that).
There is a feature in O2 that causes any cached-coherent stores to all but first 8M of KSEG0 addresses to fail. That allows us to configure KSEG0 as cached coherent, put our most important kernel code into first 8M of RAM, and use HIHGMEM for the rest. Any memory that is being target of DMA transfer will be removed from TLB, and thus inaccessible while DMA is in progress (we still have to make sure it isn't brought back into TLB by an accidental reference).
Authoright © Total Knowledge: 1999-2013