Monday, June 8, 2009

DateTime performance hit

This is mainly in answer to the comment to my previous post on revisiting plog.

plog is a piece of code ready for copy+paste into whatever codebase you're in at the moment, and almost completely agnostic. Yes, in code where I'm already using DateTime, then I use those routines.

You could also create a small sub to hide the "nasty idiom".

Or you could use Date::Format instead (ca. 10 kB codebase instead of 110 kB, and between one and two orders of magnitude quicker for this particular purpose):
require Date::Format;
@lt = localtime();
$dt = strftime("%Y-%m-%d %T",\@lt);

Of course, that leaves you with the problem of whether the TimeDate packages are installed or not, and another test for that.

Besides, the bloat is not exactly insignificant if you have code that's running repeatedely.

The following example with 100,000 repetitions may seem ludicrous, but I actually have production code where timestamped log entries run into that order of magnitude. And yes, I would very much like to save that extra time.

Edit 2009-06-09 00:32 UTC: thanks to Dave Rolsky for the simplified testing code, and to Ilmari for reminding me of POSIX::strftime, which I'd previously rejected based on other people's claims that it was dog slow. I've removed the home-grown tests in favour of the results from a slightly modified version of Dave's sample test script.

Here are the numbers for 100k iterations, times derived from the rates:
SolutionTimeRate
DateTime490.2 s204/s
DateTime (cached tz)76.7 s1 304/s
Date::Format6.4 s15 528/s
POSIX3.3 s30 030/s
localtime0.6 s163 934/s

6 comments:

Anonymous said...

See http://urth.org/~autarch/dt-bench for a way to make the DateTime benchmark ten times faster.

Regardless, the original sprintf version is still nasty and disgusting. The Date::Format version is fine.

bakkushan said...

Ten times faster on your computer, twenty times faster on mine, but liable to break when you run your code around DST change. I didn't mention this, even though I did consider it; it was the very reason why I didn't use the TimeZone module.

But in the time difference, I'm sure I could execute a bunch of code to check whether we're around a DST change or not.

"Nasty and disgusting" seems to be a feature here, as it's still around two orders of magnitude faster than the cached timezone, and one order of magnitude faster than Date::Format. And the difference increases non-linearly.

But as I also wrote, there are cases where the sheer feature set of DateTime is desirable.

bakkushan said...

Someone else made the comment that I could be using POSIX::strftime() instead.

Using the same test that Dave helpfully provided, it's only about twice as slow as the straight and ugly approach, and more aestheticically pleasing.

And considering that POSIX.pm is part of the main distribution, that makes for great availability as well.

I'm somewhat embarrassed that I rejected POSIX.pm out of hand, based on other people's reports of it being "slow".

bakkushan said...

I modified the main post with test data from dt-bench.pl and additional testing for POSIX.pm.

Anonymous said...

I think putting the call to localtime inside each of the subs to be called at every iteration is not testing the thing you're wanting to test. As far as I can tell, you're trying to time how long it takes to format the time, not retrieve it.

I recommend you cache the localtime result as well as the tz data. Each sub test should also have their own data so that later tests can't take advantage of memory optimizations.

bakkushan said...

I do want to test the time it takes to retrieve the time; that will be done in production, every single time (see the blog entry on the print-and-log sub).

Since the results from Benchmark.pm are of the same order of magnitude as the ones I get by starting separate perl instances from bash, I don't think there is any particular reason to assume that in-perl memory optimization plays a huge role.