Skip to content

Files

This branch is 60837 commits behind RobertCNelson/u-boot:master.

doc

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Sep 2, 2012
Apr 16, 2012
Sep 2, 2012
Aug 9, 2010
Oct 18, 2008
Jul 28, 2006
Aug 1, 2005
Apr 13, 2010
Oct 18, 2010
Sep 1, 2008
Mar 27, 2012
Oct 18, 2008
May 21, 2010
Apr 13, 2010
Oct 21, 2010
Oct 18, 2008
Jun 27, 2003
Oct 21, 2011
Oct 21, 2011
Sep 15, 2009
Dec 3, 2005
Oct 22, 2010
Mar 27, 2012
Sep 2, 2012
Apr 15, 2004
Apr 22, 2012
Oct 25, 2011
May 19, 2011
Mar 27, 2012
Sep 11, 2011
Mar 13, 2012
Sep 1, 2012
Aug 6, 2008
Apr 21, 2010
Oct 27, 2009
Feb 5, 2008
Jul 26, 2009
Jan 18, 2011
Feb 14, 2012
Feb 23, 2004
Mar 27, 2012
Sep 23, 2011
Dec 24, 2011
Nov 3, 2011
Jul 12, 2010
Jul 23, 2009
May 3, 2010
Jul 11, 2012
Oct 26, 2011
Mar 27, 2012
Jul 11, 2011
Oct 18, 2008
Mar 11, 2010
Apr 5, 2003
Oct 8, 2008
Sep 4, 2011
Feb 12, 2012
Dec 7, 2008
Oct 27, 2009
Jul 10, 2012
Oct 18, 2008
Sep 1, 2012
Jul 28, 2011
Oct 18, 2008
Feb 12, 2012
Apr 24, 2008
Apr 21, 2010
Nov 2, 2002
Aug 10, 2007
Aug 5, 2005
Jul 22, 2012
Mar 27, 2012
Jul 25, 2005
Jul 20, 2012
Sep 1, 2012
May 18, 2012
May 9, 2008
Jan 9, 2005
Feb 26, 2004
Feb 12, 2012
Nov 4, 2011
Sep 23, 2010
Jun 21, 2012
Apr 3, 2010
Apr 13, 2008
Apr 5, 2003
Dec 12, 2011
Oct 18, 2008
Mar 28, 2008
Oct 18, 2008
Oct 18, 2008
Jul 7, 2012
Apr 25, 2012
Oct 21, 2011
Jul 7, 2012
Oct 19, 2010
Sep 11, 2011
Jul 26, 2011
Oct 21, 2011
Apr 28, 2011
Aug 9, 2012
Dec 5, 2011
Sep 1, 2012
Jul 7, 2012
Dec 23, 2011
AMCC suggested to set the PMU bit to 0 for best performace on the
PPC440 DDR controller. The 440er common DDR setup files (sdram.c &
spd_sdram.c) are changed accordingly. So all 440er boards using
these setup routines will automatically receive this performance
increase.

Please see below some benchmarks done by AMCC to demonstrate this
performance changes:


----------------------------------------
SDRAM0_CFG0[PMU] = 1 (U-boot default for Bamboo, Yosemite and Yellowstone)
----------------------------------------
Stream benchmark results
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 2000000, Offset = 0
Total memory required = 45.8 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 112345 microseconds.
   (= 112345 clock ticks)
Increase the size of the arrays if this shows that you are not getting
at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the precision of your system
timer.
-------------------------------------------------------------
Function      Rate (MB/s)   RMS time     Min time     Max time
Copy:         256.7683       0.1248       0.1246       0.1250
Scale:        246.0157       0.1302       0.1301       0.1302
Add:          255.0316       0.1883       0.1882       0.1885
Triad:        253.1245       0.1897       0.1896       0.1899


TTCP Benchmark Results
ttcp-t: socket
ttcp-t: connect
ttcp-t: buflen=8192, nbuf=2048, align=16384/0, port=5000  tcp  ->
localhost
ttcp-t: 16777216 bytes in 0.28 real seconds = 454.29 Mbit/sec +++
ttcp-t: 2048 I/O calls, msec/call = 0.14, calls/sec = 7268.57
ttcp-t: 0.0user 0.1sys 0:00real 60% 0i+0d 0maxrss 0+2pf 3+1506csw

----------------------------------------
SDRAM0_CFG0[PMU] = 0 (Suggested modification)
Setting PMU = 0 provides a noticeable performance improvement *2% to
5% improvement in memory performance.
*Improves the Mbit/sec for TTCP benchmark by almost 76%.
----------------------------------------
Stream benchmark results
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 2000000, Offset = 0
Total memory required = 45.8 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 120066 microseconds.
   (= 120066 clock ticks)
Increase the size of the arrays if this shows that you are not getting
at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the precision of your system
timer.
-------------------------------------------------------------
Function      Rate (MB/s)   RMS time     Min time     Max time
Copy:         262.5167       0.1221       0.1219       0.1223
Scale:        258.4856       0.1238       0.1238       0.1240
Add:          262.5404       0.1829       0.1828       0.1831
Triad:        266.8594       0.1800       0.1799       0.1802

TTCP Benchmark Results
ttcp-t: socket
ttcp-t: connect
ttcp-t: buflen=8192, nbuf=2048, align=16384/0, port=5000  tcp  ->
localhost
ttcp-t: 16777216 bytes in 0.16 real seconds = 804.06 Mbit/sec +++
ttcp-t: 2048 I/O calls, msec/call = 0.08, calls/sec = 12864.89
ttcp-t: 0.0user 0.0sys 0:00real 46% 0i+0d 0maxrss 0+2pf 120+1csw


2006-07-28, Stefan Roese <[email protected]>