[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [ccp4bb]: Linking CCP4 against Altivec BLAS and LAPACK
*** For details on how to be removed from this list visit the ***
*** CCP4 home page http://www.ccp4.ac.uk ***
It is important to run a profiler before doing optimizations.
Otherwise, you are likely to end up wasting a lot of time making a lot
of software changes to parts of the application that don't need it,
possibly introducing hard to find bugs and/or additional work for
everyone that is done out of "superstition". Before replacing LAPACK or
the fft routines, it would be nice to see that these functions are
consuming a generous portion of the calculation time in CCP4.
The good news is that anyone can do this experiment to find out where
the CPU time is going. It is quick and easy to do.
1) Download and install CHUD
http://developer.apple.com/tools/debuggers.html
2) Go to /Developer/Applications and open up Shikari.
3) Start up your favorite CCP4 task/app and start it doing something
that that you think is too slow.
4) As soon as the slow task starts crunching away, hit the "Start"
button at the bottom right of Shikari. Hit Stop if the task finishes
before Shikari is done sampling.
Shikari will record what the computer is doing for a while then stop on
its own and give a printout of what functions are using up the most CPU
time. It does this by stopping the CPU every so many milliseconds and
looks to see what function is executing. Do this for long enough and
you get a statistical sampling of where the CPU time is going. It will
look something like this (from Safari -- Apple's new web browser):
Total Symbol Name
---- ------------
6.1% objc_msgSend
4.0% vecCGSFill8b1
2.6% szone_malloc
2.0% ARGB32_image_RGB24
1.8% spin_unlock
1.5% dyld_stub_*
.....
You may need to set the Process popup menu to point to the CCP4 app, if
it isn't already. If the Symbol Names look more like "0x4215674a" than
"SomeFunctionName", you should recompile CCP4 with the -g flag so that
symbol names are included in the app.
The Symbol names are the names of the various functions that are
executing. The number next to them is the fraction of the total CPU
time used by the app that is consumed by that function. If we see FFT
or LAPACK/BLAS functionality in the top ten functions, then an obvious
solution would be to look for a faster version of these libraries.
Apple's vecLib is a great choice for that. If other functions are
dominating the output, then we can look to fixing those.
You can, by the way, do this experiment on any app. ;-)
Best Regards,
Ian Ollmann
Vector and Numerics Group
Core Operating Systems Division
Apple Computer