Is there anything that can be done against the performance loss when executing from off-chip memory ?
Most software spends 90% of its time in only a small number of small inner loops. These could be copied from off-chip memory into on-chip RAM after reset and then be executed there. This way the performance loss may be almost fully compensated fo. Also some derivatives have bus controllers that implement countermeasures against the performance loss e.g. write buffer and burst mode