Thursday, May 28, 2009

Cure Of The Notorious Ubuntu futex_wait Bug

It all started two months back, when DC++ suddenly stopped working on my Ubuntu 8.10 - the Intrepid Ibex. Each time I tried running it, it would hang : the Interface Window would turn gray and not respond. I'd have to use the Force Quit application or kill the process from the System Monitor list of processes. I tried reinstalling DC++, but it didn't solve the purpose. Soon other programs caught the infection and started exhibiting the same strange behaviour : Firefox, Gnome-do, EOG, python, and gvfs-fuse-daemon would hang at different times without issuing any warning. One day I noticed - in the Waiting Channel tab of the System Monitor - that they had a common waiting channel, “futex_wait”. I looked at the futex man page and learnt that futexes ( or better Fast Userspace muTexes ) were semaphores or mutex locks provided by Ubuntu to allow threads to work concurrently. The implications were clear : these multi-threaded applications were waiting on a futex and the duration of the wait indicated a deadlock.

The book “Operating System Principles” by Silberschatz, Galvin and Gagne says that in most Operating Systems, the method of handling deadlocks is to ignore deadlocks altogether and pretend that they never occur in the system. Further, it says that both Windows and UNIX use this solution. I researched a bit on the Internet and found that indeed it was a bug in the latest Ubuntu 2.6.27-* Kernels and the Ubuntu community was working on it to make sure the newer kernels don't fall prey to this problem.

I had learnt to live with the shortcoming until today when luckily I found a workaround for the problem. It turns out that switching off the "Assistive Technologies” does the trick.

Go to System > Preferences > Assistive Technologies
uncheck the "Enable Assistive Technologies" options.

5 comments:

  1. Fantastic, eog just crashed on me. I turned off assistive technologies. And restarted. And did a check trying to open an image with eog. Still wouldn't work. Checked system monitor to find that eog was still on futex_wait. Killed eog and them attempted opening a picture. eog was back to life. Thanks for the tip!!

    ReplyDelete
  2. The problem still lives for Ubuntu 10.04 running kernel: 2.6.32-25-generic, x86-64.

    In addition to OOo Writer (soffice.bin) having this Waiting Channel state, gvfs-fuse-daemon also shows the same futex_wait_queue_me state in the Waiting Channel column of System Monitor.

    I was NOT able to use the "End Process" button from System Monitor to end the process...it would not work. I was able to kill the process with a -9 from a terminal window just fine as superuser.

    kill -9 1888 where 1888 was the process id of soffice.bin.

    than this caught my eye in a ps -ef

    a8reason 1435 1 0 08:24 ? 00:00:00 /usr/lib/gvfs//gvfs-fuse-daemon /home/a8reason/.gvfs

    note the double forward slash in the path //

    I killed process 1435 as well and will reboot my Linux PC. I have not applied the fix mentioned above yet.

    ReplyDelete
  3. Turning off Assistive Tech's does not eliminate the problem for me.

    From the reports of the environments in which the problem occurs, I think it's an incompatibility between Firefox and something particular to Ubuntu, such as gvfs or possibly the Nvidia drivers.

    Perhaps a silly question, but one that may strike at the heart of the problem: Why would it be appropriate for Firefox to go into a futex_wait_queue_em under ANY circumstance?

    I do find it peculiar that this problem has been known continuously for two years and yet nothing has shown up in repositories in that time to fix the problem.

    Usual Ubuntu "fix the majority's problems" approach.

    ReplyDelete
  4. Hi!

    Did you guys solve this? I seem to be experiencing something similar...

    Thanks,
    Jens

    ReplyDelete
  5. I have noticed this too. It has something to do with Google tracking (the core of Firefox is reused from Google) and preventing pop-ups. It happens when the web browser goes for the analytical link in a page. Though I think the bug itself comes from the 2.6 core not working properly with old motherboards based on the SIS630ET chip set.

    My uneducated $0.02 from watching the hang-ups, router, etc.

    ReplyDelete