[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ccp4bb]: Linking CCP4 against Altivec BLAS and LAPACK

***  For details on how to be removed from this list visit the  ***
***          CCP4 home page http://www.ccp4.ac.uk         ***

It is important to run a profiler before doing optimizations. 
Otherwise, you are likely to end up wasting a lot of time making a lot 
of software changes to parts of the application that don't need it, 
possibly introducing hard to find bugs and/or additional work for 
everyone that is done out of "superstition". Before replacing LAPACK or 
the fft routines, it would be nice to see that these functions are 
consuming a generous portion of the calculation time in CCP4.

The good news is that anyone can do this experiment to find out where 
the CPU time is going. It is quick and easy to do.

1) Download and install CHUD


2) Go to /Developer/Applications and open up Shikari.

3) Start up your favorite CCP4 task/app and start it doing something 
that that you think is too slow.

4) As soon as the slow task starts crunching away, hit the "Start" 
button at the bottom right of Shikari. Hit Stop if the task finishes 
before Shikari is done sampling.

Shikari will record what the computer is doing for a while then stop on 
its own and give a printout of what functions are using up the most CPU 
time. It does this by stopping the CPU every so many milliseconds and 
looks to see what function is executing. Do this for long enough and 
you get a statistical sampling of where the CPU time is going. It will 
look something like this (from Safari -- Apple's new web browser):

Total	Symbol Name
----		------------
6.1%		objc_msgSend
4.0%		vecCGSFill8b1
2.6%		szone_malloc
2.0%		ARGB32_image_RGB24
1.8%		spin_unlock
1.5%		dyld_stub_*

You may need to set the Process popup menu to point to the CCP4 app, if 
it isn't already. If the Symbol Names look more like "0x4215674a" than 
"SomeFunctionName", you should recompile CCP4 with the -g flag so that 
symbol names are included in the app.

The Symbol names are the names of the various functions that are 
executing. The number next to them is the fraction of the total CPU 
time used by the app that is consumed by that function. If we see FFT 
or LAPACK/BLAS functionality in the top ten functions, then an obvious 
solution would be to look for a faster version of these libraries. 
Apple's vecLib is a great choice for that. If other functions are 
dominating the output, then we can look to fixing those.

You can, by the way, do this experiment on any app. ;-)

Best Regards,

Ian Ollmann
Vector and Numerics Group
Core Operating Systems Division
Apple Computer