Allow wait_irq to be called in 32bit code.

If wait_irq() is called from 32bit code, then jump to 16bit mode for
the wait.

Have wait_irq check for threads, and have it use yield if threads are
pending.  This ensures threads aren't delayed if anything calls
wait_irq.

Use wait_irq() in 32bit mode during a failed boot.
diff --git a/src/boot.c b/src/boot.c
index 36450f0..335522f 100644
--- a/src/boot.c
+++ b/src/boot.c
@@ -449,7 +449,7 @@
         printf("No bootable device.\n");
         // Loop with irqs enabled - this allows ctrl+alt+delete to work.
         for (;;)
-            biosusleep(1000000);
+            wait_irq();
     }
 
     /* Do the loading, and set up vector as a far pointer to the boot