Re: code4lib.org down

From: Ryan Ordway <ryan.ordway_at_nyob>
Date: Tue, 16 Oct 2007 10:28:35 -0700
To: CODE4LIB_at_listserv.nd.edu
On 10/16/07 7:22 AM, "Jeremy Frumkin" <jeremy.frumkin_at_OREGONSTATE.EDU>
spake:

> Hi Folks -
>
> Our apologies - code4lib.org is currently down due to a non-responding
> database cluster. We are aware of the problem and are working to resolve it.

And it gets better and better.

Higher than normal database traffic produced higher than normal amounts of
database logging, filling up our database transaction logging disk on both
nodes of our database cluster. A few milliseconds after that, MySQL
sputtered, coughed and cursed my name... and then hung waiting for me to
cleanup the logs.

After some poking, prodding and cursing of my own the cluster is back up and
running.

Today I am going to be working on a better log cleanup script that purges
any logs already relayed to the other cluster node(s) and already written to
tape. That should help prevent the problem in the future. And I'm working on
some finer grained monitoring to detect the problem a little sooner so I can
clean it up before it gets to this point.

Ryan

--
Ryan Ordway                          E-mail:   rordway_at_oregonstate.edu
Unix Systems Administrator             rordway_at_library.oregonstate.edu
OSU Libraries, Corvallis, OR 97370        Office: Valley Library #4657
Received on Tue Oct 16 2007 - 13:34:37 EDT