[v2] MIPS: Separate two consecutive loads in memset.S

Message ID 20101110134815.GA28312@metis
State Accepted
Delegated to: Ralf Baechle
Headers show

Commit Message

Tony Wu Nov. 10, 2010, 1:48 p.m.
partial_fixup is used in noreorder block.

Separating two consecutive loads can save one cycle on processors with
GPR intrelock and can fix load-use on processors that need a load delay slot.

Also do so for fwd_fixup.

Signed-off-by: Tony Wu <tung7970@gmail.com>
---
 arch/mips/lib/memset.S |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

Comments

root Nov. 10, 2010, 2:09 p.m. | #1
On Wed, Nov 10, 2010 at 09:48:15PM +0800, Tony Wu wrote:

This new version applies cleanly, so applied.

Only R2000/R3000 class processors are lacking the the load-user interlock
and even some of those got it retrofitted.  With R2000/R3000 being fairly
uncommon these days the impact of this bug should be minor but the last
R3000 DECstation user on this list may be interested ;-)

Thanks a lot!

  Ralf
Maciej W. Rozycki Nov. 10, 2010, 11:30 p.m. | #2
On Wed, 10 Nov 2010, Ralf Baechle wrote:

> Only R2000/R3000 class processors are lacking the the load-user interlock
> and even some of those got it retrofitted.  With R2000/R3000 being fairly
> uncommon these days the impact of this bug should be minor but the last
> R3000 DECstation user on this list may be interested ;-)

 Good catch Tony, thanks!

  Maciej

Patch

diff --git a/arch/mips/lib/memset.S b/arch/mips/lib/memset.S
index 77dc3b2..606c8a9 100644
--- a/arch/mips/lib/memset.S
+++ b/arch/mips/lib/memset.S
@@ -161,16 +161,16 @@  FEXPORT(__bzero)
 
 .Lfwd_fixup:
 	PTR_L		t0, TI_TASK($28)
-	LONG_L		t0, THREAD_BUADDR(t0)
 	andi		a2, 0x3f
+	LONG_L		t0, THREAD_BUADDR(t0)
 	LONG_ADDU	a2, t1
 	jr		ra
 	 LONG_SUBU	a2, t0
 
 .Lpartial_fixup:
 	PTR_L		t0, TI_TASK($28)
-	LONG_L		t0, THREAD_BUADDR(t0)
 	andi		a2, LONGMASK
+	LONG_L		t0, THREAD_BUADDR(t0)
 	LONG_ADDU	a2, t1
 	jr		ra
 	 LONG_SUBU	a2, t0