At work we recently had an intermittent bug that took us several weeks to find. We have multi-threaded application servers in Java™ behind Apache™ web servers. To maintain fast page turns, we do all I/O in small set of four separate threads that are different than the page-turning threads. Every once in a while these would apparently get ‘stuck’ and cease doing anything useful, so far as our logging allowed us to tell, for hours. Since we had four threads, this was not in itself a giant problem - unless all four got stuck. Then the queues emptied by these threads would quickly fill up all available memory and crash our server. It took us about a week to figure this much out, and we still didn't know what caused it, when it would happen, or even what the threads where doing when they got ‘stuck’.
0 commit comments