As some of you may know, I’m currently working on an ambitious project to audit WordPress events.
One of the features I eventually added, is the ability for the log viewer to update results in (almost) realtime as soon as they’re sent to the server.
I achieved this using the Comet (also known as “long polling”) method. In simple terms, it works like this:
- the client asks the server if their are any updates
- the server check for updates at intervals
- if changes are found, they’re sent to the client, otherwise the request is terminated after some time
- the client processes any received updates and performs a new request (restart this process)
To get this working, there are a few prerequisites, such as, the request should take as little memory and processing power as possible. Or, in other words, the impact on the server should be minimal. Here’s a small snippet on how it’s done:
// get the number of events the client knows about
$old = (int)$_REQUEST['logcount'];
// the number of tries (multiply this by the duration for usleep and make
// sure it is less than 30s since servers, firewalls, routers and some web
// browsers are known to drop HTTP connections taking more than 30s)
$max = 40; // 40 * 500msec = 20sec
// check for new events in db by monitoring event count
do {
// count the number of events in db - make sure this impacts the DB
// and server as little as possible.
$new = count_events_in_db();
// give the server some time to breath (500msec)
usleep(500000);
} while (($old == $new) && (--$max > 0));
// print out the new event count or 'false' if none
die($old == $new ? 'false' : $new);
Anyway, we’ve got it working after a few test runs and off into production. It worked well for a while.
And then, one fine morning, we started getting all sort of weird issues. Sluggish site performance, client requests timing out etc…
Worst of all, system metrics looked fine: cpu usage, memory, network, disk… and no error log entries. In short, everything looked ok.
Eventually, we found an indication of the culprit. Well, when I say “we” I really mean Robert Abela. 🙂
It seemed that a particular plugin seemed to be causing the issue, somehow. I installed it locally and ran it through the XDebug profiler.
After a couple of runs, a stack trace was caught when a time out occurred, finally shedding some light on the source of the problem: it was simply a call to session_start();
Admittedly, my experience with session in PHP is a bit limited. I know the basics, but that’s it. So, I did a quick search on the internal effects of this function. Guess what? PHP makes use of a session file which it locks to keep the data integrity for sessions intact. It’s actually a very good thing to have, but alas, in my case, I wasn’t even using sessions. The problem was the long-polling request, sometimes, locked this file and waited for updates for up to 30s. This means any subsequent request that enabled sessions had to wait for the comet request to finish.
The solution is relatively simple: at the start of the comet request, one simply had to call session_write_close();
it basically tells PHP to stop writing session data and consequently release the lock.
Props to Konr Ness and his excellent article on the matter.