Saturday, October 28, 2006

Which Profiler?

Jamie asks:
"Ted,What are you guys using for a Profiler? If you have any advice on a profiler
or, more importantly, common areas a profiler flags, I'd love to hear them."

Well we use:

  • JProfiler - for exploratory work (has the problem of dilation)
  • com.tedogrady.profiler - for precise timings (basically println)
  • The toplink profiler - for understanding the cache and gross database numbers
  • tkprof /explain plan- for understanding how queries are working in the database
  • math and common sense - for figuring out how to solve the problem
Performancing tuning is a highly empirical thing.
  1. Establish a target time.
  2. Make sure you have a test showing the slowness. Automate it.
  3. Develop Causal Theory
  4. Test Causal Theory
  5. Develop Potential Solution
  6. Experiment with Potential Solution
  7. If 4 doesn't help, go back to 3
  8. Implement Solution
JProfiler helps alot with the Causal theory. You can use it to explore your code as a whole looking for problem areas, testing out theories you may have developed about the code. But, if you are working in a database application, DO NOT BELIEVE EVERYTHING IT TELLS YOU.

It's lying to you. Remember it slows down your application, but not the database. So, hot spots and the relative timings it tells you are lies where concerned with the database. For example, the current piece I am profiling is spending half the time in the database dealing with foreign keys. The profiler told me my problem was in navigating some hierarchy.

I use Jprofiler for the following features:
  1. Execution stack trace - shows the order in which things were called
  2. Number of Invocations of a method - really useful (do quick math to see if the numbers make sense)
  3. Percentage time in a method - also really useful
  4. Hot spots - (rank methods by amount of CPU time/calls)
  5. Memory profiler - which objects are sucking back the memory

I tried the eclipse one about 6 months ago (TpTp) but it choked on simple stuff. Haven't tried it since.

com.tedogrady.profiler implements println with extra special fancy do-dad stuff. It allows you to verify what your profiler is doing. It lies to you a lot less frequently than JProfiler. It's about 40 lines of source code that I got tired of writing on every project, so its now open source. It is in with my toplink in memory stuff, or just email me if you want a copy. Basically, it has a really small overhead to verify what you learned in JProfiler, or if Jprofiler is really lying to you then you can use it as a more painful exploration tool.

To profile a suspicious method you can do the following:
Profiler.start("Profiling myMethod()");
this.myMethod()
Profiler.stop("Profiling myMethod()");
Profiler.dump();
You can put as many starts and stops (each with unique tags) in your code as you want. It'll tell you nice things like the average time, cumulative time, invocation order, blah blah blah. Many of you are thinking map. You'd be right.

TopLink probably comes with a profiler because of all the people that complain that toplink is slow. Very smart of them to give you the tools to see where the slowness/idiocy is. The tool usually tells me again how stupid I really am. As an idiot, I frequently forget I'm an idiot. It's part of the condition.

The command that lets you show sql in toplink is also very useful. Dario has been using it to fine tune the SQL we're throwing at Oracle. It lets you configure all those wacky options in toplink and see what they do to the SQL (just what does batch write or batch read do? Gee, they make things a lot faster).

tkprof, explain plan, and oracle analyzer are useful when you have to think about the database. They tell you what the database is doing, like how long a query is taking, or the strategy that the optimizer is using to evluate your query. (Sadly, you do need to understand how oracle implements things if you want to tune it). asktom.oracle.com has all your answers pre-packaged.

I haven't gone looking for Oracle analyzer in Oracle 10g, but I've used it in 9i. Basically, it is a way for you to ask the Oracle optimizer what will work best. Rather than you analyzing (guessing) what the best index are for a query, oracle will recommend some. 90% of them are optimal, and some recommendations slow things down. Remember that whole test thingamy.

math and common sense are the best tools of all. This would be why you went to school. When you look at the number of invocations, and the time, does it make sense to you? What do the numbers mean? A scrap of paper, a pencil, and some simple multiplication and division can tell you a lot more about what your program is doing than any profiler.

Just as a side note, I learned most my techniques sitting across the hall from Steve Northover at OTI when he was writing the profiler for OTI/Smalltalk. Thanks Steve.

Wednesday, October 25, 2006

Performance tuning in ernest

Up until this point we have been performance tuning for testing. Our users have been testing the application all along, and our goal has been to keep them happy with the performance of the system (not ecstatic).

Now we are nearing the end of our mamoth, huge, horrifically large project (why did I agree to help run this?) and things are starting to stabilize. We're five months away and seriously testing our system. Data structures and core internal pieces are pretty clear and we can optimize the living crap out of them. And optimize we shall.

Production accounting has a few unique attributes to it:

First: If you give us 7 numbers, we can very quickly turn those into 7,000. For example, processing 20 wells results in 72,000 numbers (or allocations) being created. Our biggest customer has 40,000 wells. (144,000,000 allocations per month in the work case - note this is the upper bound). Huh.

Second: Accountants have four (important) deadlines a month. For one of those deadlines (Oil/NGL Splits) we need to generate all of the data. Best of all the data for all their systems will probably only be available the day before the deadline. Don't ask, the production accounting business is wacky. We will have to be able to do all our allocations in about 12 hours.

We need to write 3400 numbers (allocations) a second.

We're out by about a factor of 10 (we do around 300-400/sec on a single Intel CPU box). That is not very far, especially as we are just beginning optimization. Better yet, we can expect between 8 and 32 cores doing the work on the boxes.

So what did I miss on my envelope?

We'll have fun, fun, fun 'til daddy takes our profiler away....