Information on site outages this weekend

Concerns? Let us know by posting here.

Moderators: avij, Phaseolus, Fons, dserrano5

Post Reply
User avatar
avij
Forum Moderator
Forum Moderator
Posts: 6120
Joined: Mon May 27, 2002 10:45 pm
Location: Helsinki Finland
Contact:

Information on site outages this weekend

Post by avij »

One of the duties of the Tech WG is to keep server software updated. We are using lighttpd as the web server software, and a new version of it was released about a week ago.

I installed the updated version to our development server on Friday morning. This is standard procedure for me -- I use the development server for testing out all the updates that are available prior to installing them on our main server. The update went smoothly, and the development server behaved like it should after the update.

The times in the following are in CET to cater for the majority of our users. I installed the lighttpd update to our main server late Friday evening at around 23:01. Everything seemed fine again and I went to bed.

At around 13:00 on Saturday I left home to watch the finals of a local drytooling competition, held under the northern arch of Helsinki Olympic Stadium. At around 14:39 the web server process crashed. Between Friday 23:01 and Saturday 14:39 the web server had served 243787 requests, so the crash was somewhat unexpected. Unfortunately I did not notice the crash immediately, as I was still busy watching the finals. I found out about the crash at 16:44, and after logging in to the server with my phone I noticed that the lighttpd process had died. I restarted the process at 16:47.

The next crash occurred at 17:12. I was already on my way back home at that time (the finals had ended), and restarted the web server process again at 17:35 with my phone.

When I was back home I had time to have a closer look at the logs. Turns out there was some sort of a bug with memory allocation. I set up a script to restart lighttpd whenever it died, and made appropriate arrangements to collect more data for debugging the problem. I'm now working with the lighttpd developers to find and fix the problem.

Since the initial crashes the web server process crashed an additional 10 times, but as the script to restart lighttpd was running, those crashes did not cause long outages. Six of those crashes were related to someone trying to enter notes to de.eurobilltracker.com and one was related to someone trying to post on the forum. The other crashes happened before the request was fully sent. All those crashes pointed to the same general area in the code, so I felt it was unnecessary to keep running the new buggy version of lighttpd to collect further data. I downgraded to the previous version of lighttpd today at around 14:41, so there shouldn't be crashes any more.

When there's a potential fix to this bug, I may switch to using the newest version of lighttpd again. I'll let you know when this happens.
Money makes the world go round. We track how the money goes round the world.
User avatar
avij
Forum Moderator
Forum Moderator
Posts: 6120
Joined: Mon May 27, 2002 10:45 pm
Location: Helsinki Finland
Contact:

Re: Information on site outages this weekend

Post by avij »

The lighttpd bug has not been fixed yet, and it has become apparent that it is very difficult (if not impossible) to reproduce the problem in controlled development environments. Therefore I'm going to try to find out the exact location of the bug by running various test builds of lighttpd on EBT. This may cause brief (max 1 minute) outages.

Sorry for the inconvenience.
Money makes the world go round. We track how the money goes round the world.
User avatar
avij
Forum Moderator
Forum Moderator
Posts: 6120
Joined: Mon May 27, 2002 10:45 pm
Location: Helsinki Finland
Contact:

Re: Information on site outages this weekend

Post by avij »

Status update: There were two user-visible crashes on Friday with two of my test builds, #4 and #5. Problems with #2 and #3 went initially unnoticed and they did not cause issues for users. We're now running lighttpd with a patch that should fix this problem. If all goes well, there will be no more crashes.
Money makes the world go round. We track how the money goes round the world.
User avatar
avij
Forum Moderator
Forum Moderator
Posts: 6120
Joined: Mon May 27, 2002 10:45 pm
Location: Helsinki Finland
Contact:

Re: Information on site outages this weekend

Post by avij »

The test patch seemed to work fine, and a fixed version of lighttpd has now been officially released and we're running that version now. I don't expect any further updates for this issue.
Money makes the world go round. We track how the money goes round the world.
Post Reply

Return to “Feedback and Development”