zerox86 - Blog

Aug 3th, 2014 - zerox86 development on a break

This is just a short blog post, letting you know that I don't have time to work on zerox86 for a while now. Today my summer vacation ends, and I also have some work to do for my commercial pax86 emulator project. Thus, I gave my GCW-Zero again to my nephew for some further testing of zerox86.

I will continue working on zerox86 after I get my device back from my nephew, probably in a couple of months. So, I hope you can manage with the current version of zerox86 until then! I know there is at least one game that I somehow broke in the latest version, Dangerous Dave Goes Nutz. Sorry about that, I will take a look at that game when I get back to working on zerox86. Thanks again for your interest in zerox86 and for your keeping the compatibility wiki up to date!

July 27th, 2014 - zerox86 version 0.06 beta released!

This version has a couple of major architectural changes, a few configuration improvements, and several game-specific changes. I wrote about some of the changes in my previous blog post, so I will only mention those briefly in this post. I did some more work to zerox86 since my blog post from last weekend, and will write longer chapters about these new changes. I like to order my change logs so that the new features are first, then the fixes that affect several games, and lastly the game-specific fixes. This is why the changes are not usually listed in chronological order.

Enabled Virtual Memory (paging) support in zerox86

This is the major architectural change in this version. I wrote a rather detailed blog post about this change last weekend, so see below for more information about this. Please note that this does not mean that every game that uses virtual memory will now work, it just means that support for those games can now be added.

Implemented a slowdown feature using key MHZ in zerox86.ini

This is a new configuration feature that you can use to slow down zerox86 for those very old games. The value of the new MHZ key should be an integer corresponding to the clock speed of a 80386 processor you wish to emulate. If you give any value outside the range of 1 .. 79, there will be no slowdown applied, so zerox86 runs at the approximate speed of a 80MHz 80386 (or 40MHz 80486, as the 80486 CPU is about twice as fast per clock cycle as the 80386).

When running those old games, you should use rather low values, anything between 1 and 4 should be worth a try. In this version the MHz value needs to be an integer, but if you feel this is too coarse, let me know and I'll look into implementing finer control in the next version. Since the value is only an approximation, the actual slowdown will vary depending on how the game is programmed.

Implemented further optimizations to IRQ emulation

This is again something that I already mentioned in my previous blog post. This change should allow me to add support for the PC Speaker digitized music that some games use, but I did not have time to implement it into this version yet.

Increased max number of game configs in zerox86.ini to 512

I had only allowed for 128 game-specific configurations in the zerox86.ini. This turned out to be pretty low (some of you seem to really like your DOS games!), so I increased this limit to 512 in this version. The limit can be increased still further, but this is probably enough for now, I believe.

Fixed Jazz Jackrabbit audio crackling

The compatibility wiki mentioned Jazz Jackrabbit having an audio crackling problem in addition to the pause menu selector problem that I wrote about in the previous blog post. Since I got the menu fixed, I thought it would be a good idea to make this game fully supported and fix the audio quality as well. I had not listened to the game all that closely, and when I did, it was pretty obvious that the audio quality was not as good as it could be.

First I needed to find out what actually goes wrong in the audio rendering. I added a memory buffer where I wrote the last 16 audio blocks that my SBEmulation() routine creates. I did not add any file writing feature to my code, as it occurred to me that I could simply use GDB for that. This is what the new SDL audio_callback routine looked like, after I added my memory logging mechanism (with the JAZZDBG define) for the created audio samples:

#define	JAZZDBG 1
#if JAZZDBG
#define JAZZBUFSIZE	(256*16)
short jazzbuf[JAZZBUFSIZE];
int jazzbufpos = 0;
#endif

void audio_callback(void *userdata, Uint8 *stream, int len)
{
    // Write the Sound Blaster samples to the stream (or clear the stream if nothing to play).
    int Need_SB_IRQ = SBEmulation(stream);
#if JAZZDBG
    memcpy(jazzbuf + jazzbufpos, stream, 256*2);
    jazzbufpos = (jazzbufpos+256)&(JAZZBUFSIZE-1);
#endif
    // Write the AdLib emulation samples to the stream. 2 times 128 samples.
    AdlibEmulation(stream);
    AdlibEmulation(stream+256);
    if (Need_SB_IRQ)
        SendSBIRQ();
}
I then ran Jazz Jackrabbit, and attached GDB to the running zerox86.exe, looked up the address of my jazzbuf, and dumped that to a file using the GDB command dump binary memory jazzdump.raw 0x1ac8620 0x1ac8620+256*16*2. Next I started up Audacity and imported the raw sample data. The resulting waveform looked like this:

Aha, I thought, there are gaps in the audio sample! I then added an artificial full volume sample to the beginning of those blocks where Need_SB_IRQ gets set, and sure enough, the gaps happened always and only in those specific blocks. Looking at my SBEmulation code, I then realized that when I run out of the input buffer, I simply return (with the return value for IRQ set), instead of wrapping around and continuing until the output buffer is full.

I fixed this problem, thinking that now Jazz Jackrabbit should sound good, and was then pretty dissappointed when there was not much of an improvement in the audio quality. I again dumped the waveform, and indeed the gaps were gone, but the audio still sounded crackly. Since such crackling is usually caused by single samples having incorrect values, I began to look for such in the waveform. After some studying I found several places where the sample values seemed to be suspiciously far away from their neighbours. Here below is one such sample (it is also visible in the above image if you look closely).

What was strange about these samples was that they always seemed to happen at the beginning of my 256-sample block. I could understand if they were at the end, as that would mean that my block end pointer was off by one sample. But for the error to happen in the beginning would mean that my start address was off, and I was pretty sure that was not the case.

To make sure my understanding of the problem was correct, I hacked my audio_callback to always copy the second sample of the block to the first sample, and that did indeed get rid of most of the crackling. There was still the occasional crackling, which was yet another mystery.

After running the sample copying in the debugger many many times and never seeing anything wrong, I was about ready to give up studying this problem. Then, almost by accident, I suddenly caught the problem! Here is the actual log from my debugger window of the situation. If you want to spot the problem yourself, you need these bits of information:

(gdb) x/8bx 0xa627b8
0xa627b8:       0x84    0x7f    0x90    0x8d    0x7c    0x8d    0x97    0x87
(gdb) stepi
1608    in /home/patrick/zerox86/source/ports_SB.S
(gdb) 
1609    in /home/patrick/zerox86/source/ports_SB.S
(gdb) 
1611    in /home/patrick/zerox86/source/ports_SB.S
(gdb) 
1613    in /home/patrick/zerox86/source/ports_SB.S
(gdb) 
.sb_13_irq_cont () at /home/patrick/zerox86/source/ports_SB.S:1616
1616    in /home/patrick/zerox86/source/ports_SB.S
(gdb) 
1617    in /home/patrick/zerox86/source/ports_SB.S
(gdb) 
1618    in /home/patrick/zerox86/source/ports_SB.S
(gdb) 
0x005d4b6c      1618    in /home/patrick/zerox86/source/ports_SB.S
(gdb) 
1620    in /home/patrick/zerox86/source/ports_SB.S
(gdb) 
sb_13_play_auto () at /home/patrick/zerox86/source/ports_SB.S:1608
1608    in /home/patrick/zerox86/source/ports_SB.S
(gdb) x/8bx 0xa627b8
0xa627b8:       0x96    0x7f    0x90    0x8d    0x7c    0x8d    0x97    0x87
(gdb)
If you looked at the log closely, you may have spotted that the input sample value at address 0xa627b8 was 0x84 when I started stepping through my code, but was 0x96 when I had copied it to my output buffer!

Since I was sure that my code never changes the input buffer contents, at first I just looked at this log with my mouth open, I couldn't understand what was going on! Until it suddenly hit me: Since zerox86 is multithreaded and I was stepping through the audio thread code, it must have been one of the other threads (most likely the CPU emulation thread where the game was running) that changed that memory!

Okay, now the problem began to make sense. If I report the position up to where I have read the samples from the input buffer to the game wrong, the game may well write new data to the ring buffer before I have managed to read the old data. The game uses the DMA position register to keep track of where the SoundBlaster card is reading the sample data. I adjusted my DMA position register to not point to the next sample I am going to read, but instead to the last sample I actually did read. That finally got rid of the crackling! This same problem may have caused crackling in other games as well, and also the fix for the first issue (audio gaps after sending an IRQ) may improve audio in several other games as well.

Fixed Jazz Jackrabbit pause menu selector drawing problem

I wrote about this problem extensively in my previous blog post, so read below for more information about this if you are interested.

Fixed unsupported situation in Corridor 7

After the previois blog post I still spent a day working on Windows 3.11 support, until I reached the same spot where I got stuck in DS2x86. It again seemed really difficult to get further from that point, so I spent some time improving my tracing functionality. I also changed my trace log format to be exactly similar to what I get from tracing in my debug-enabled DOSBox version. After I got stuck in Windows 3.11, I then began to look at games that had problems where my new logging features might help. These are games that run into an unsupported situation almost immediately when they attempt to start up (so that it does not create gigabytes of log data to reach that spot).

The first such game I looked into was Corridor 7. I first used my "post mortem" logging system that logs the last 1024 opcodes that the game has executed before encountering an unsupported opcode, and found the following interesting section in the log:

EAX=00000000 EBX=00000024 ECX=00000000 EDX=00000000
ESP=0000FF82 EBP=00000000 ESI=00000001 EDI=00000000
ES=3D77 CS=02ED SS=2D98 DS=2D98 FS=0000 GS=0000
02ED:00004144 CB              retf
EAX=00000000 EBX=00000024 ECX=00000000 EDX=00000000
ESP=0000FF86 EBP=00000000 ESI=00000001 EDI=00000000
ES=3D77 CS=0000 SS=2D98 DS=2D98 FS=0000 GS=0000
0000:00000000 CA01ED          retf ED01
EAX=00000000 EBX=00000024 ECX=00000000 EDX=00000000
ESP=0000D990 EBP=00000000 ESI=00000001 EDI=00000000
ES=3D77 CS=B74D SS=2D98 DS=2D98 FS=0000 GS=0000
B74D:00006D4D 0000            add  [bx+si],al

The game returns from a far function to a zero address (where the interrupt vectors are located), and the interpretation of the IRQ0 vector address as code is RETF ED01 (meaning a far return with popping 0xed01 = 60673 bytes from stack). This returns to the middle of the text mode screen buffer memory. No wonder things go wrong after that.

So, the actual problem was that the stack had gotten filled with zeros before the routine returns. I logged everything that the game does in both zerox86 and in DOSBox, and then used WinMerge to compare these logs. Since the memory addresses differ between zerox86 and DOSBox (partly because of my using 4DOS as the command processor, partly because of other architectural differences) it is not simply a case of "look for first difference in the logs" to find the problem location. Practically every row containing a segment value will be different in the logs, so I need to go through every row even when using WinMerge. It is still much faster to find the differences this way than by simply debugging them both separately.

After some browsing through the log I found out that in zerox86 one memory allocation fails, while the same memory allocation succeeds in DOSBox. I then added logging to my DOS memory allocation routines, and found out that in zerox86 there was a curious 4-block allocated memory area that caused the memory allocation increase request from the game to fail. There was no proper error handling for this situation in the game code, so things went wrong after that.

It took me a while to find out what caused that extra 4-block memory allocation. I finally found that it was my recent implementation of the DOS 3.3+ - SET HANDLE COUNT call that allocated memory for the increased file handle table. I studied how DOSBox does this, and noticed that it does not allocate real DOS memory, but instead creates the table in the ROM area. I changed my code to perform similarly, use an otherwise unused BIOS ROM area for the memory buffer, and this change then allowed the game to start.

Fixed unsupported situation in Ultrabots

The behaviour in Ultrabots was pretty similar to the problem in Corridor 7, but this time the game attempted to load the sound driver overlay to memory address 0000:0000. The load itself succeeded (strangely), but obviously things went wrong pretty soon after that. I again found an interesting snippet from the log some time before the code used that invalid load address:

EAX=00040437 EBX=00000416 ECX=00001BEE EDX=00000000
ESP=0000F1C8 EBP=00000000 ESI=0000012F EDI=0000065B
ES=1BEE CS=12D3 SS=1BEE DS=1BEE FS=0000 GS=0000
12D3:0000467D FF1EB400        call far word [00B4]
EAX=00040437 EBX=00000416 ECX=00001BEE EDX=00000000
ESP=0000F1C4 EBP=00000000 ESI=0000012F EDI=0000065B
ES=1BEE CS=0000 SS=1BEE DS=1BEE FS=0000 GS=0000
0000:00000000 C3              ret
EAX=00040437 EBX=00000416 ECX=00001BEE EDX=00000000
ESP=0000F1C8 EBP=00000000 ESI=0000012F EDI=0000065B
ES=1BEE CS=12D3 SS=1BEE DS=1BEE FS=0000 GS=0000
12D3:00004681 2E8B1E8947      mov  bx,cs:[4789]

This was somewhat similar to Corridor 7 behaviour, the game calls to a zero address (null pointer), and by chance this time the opcode there was a simple RET opcode. So the game code continues forward, but causing all sorts of problems later on because the subroutine it fails to call was supposed to set up the correct load address for the sound driver overlay, among other things.

I again used WinMerge to compare the logs between zerox86 and DOSBox. I had to do this for a few times, starting further and further into the game code, else the log files grew too large. Eventually I did found out that similarly to Corridor 7, there was a memory allocation call that fails in zerox86 but does not fail in DOSBox.

This memory allocation call was rather curious, though, as it happens like this:

  1. The game queries the amount of free memory.
  2. The game allocates all of that memory.
  3. The game queries the amount of free memory. This was zero in zerox86, since the game just allocated all the memory.
  4. The game allocates all the available memory. Here zerox86 returned an error (as there was no more memory available), which caused the game to skip important routines and then eventually crash.

What I thought was curious was that DOSBox still had 4 blocks of memory available at the last stage above even though the game had earlier allocated all the available memory.

Looking at how DOSBox initializes the DOS memory blocks I noticed that it purposefully leaves a small memory block free before allocating the DOS internal memory. I did not have such in zerox86, as I had not seen a need for it. I decided to check what happens if I do a similar extra memory block allocation, and sure enough, Ultrabots started up after that change! I am not sure if this is some DOS peculiarity that the game takes advantage of, or if this simply is a bug in the game, but at least the game starts up now.

Future work

There are still curious palette problems in a few games, for example in the main menu of 4th Generation and in the newscaster image in Ultrabots (as seen above). I have been studying these a bit, but the cause is still a mystery. In 4th Generation for example, it is a single palette index that gets a wrong value, but that value is read from a file at the same time as the other correct values, so I don't yet know what goes wrong.

My nephew just asked when he can continue testing the games in zerox86, so I will probably again give my GCW0 to him in a week or two. Hopefully this version allows a few more games to run again and does not cause any new major problems, so that I could again have my nephew spend some time (a month or two) testing the games, before I will then again continue working on zerox86.

Anyways, thanks for your interest in zerox86 again, and especially thank you for keeping the zerox86 Compatibility WIKI updated and alive!

July 20th, 2014 - zerox86 work continues

Sorry, no new version today, as there are not all that many useful enhancements yet in this version, and I have not had time to make sure my new changes have not broken any major features. I have been working on zerox86 quite a lot during the past week, though. Here is some information about the changes I have been working on.

Further optimizations to IRQ emulation

I started my work for the next version by rewriting the IRQ handling code yet again. The changes I made to the previous version already made it possible to run the emulated IRQs faster than the 250Hz task switch speed of GCW0, but I still had some thread synchronization code in my IRQ handling routine. However, now that I moved my timer IRQ not to use the asynchronic signals, I wanted to also move all the IRQ handling into the main emulation thread.

I was able to get rid of all the thread synchronization code, and also the whole signal broadcasting system I used to send IRQ requests from the UI and audio threads to the main emulation thread. Now the other threads simply set flags that the main thread reads. I still need to be careful when resetting the flag from the main thread, but other than that consideration there is no need for any extensive thread locking.

While I did that optimization, I also rewrote the timer I/O port handling in ASM. It used to be done in a C code (which is always slow to call from my main ASM code). When playing digitized sounds using the PC Speaker, the timer I/O ports are written to from the timer IRQ (at several thousand times per second), so to eventually handle that I need to have my timer I/O port handling pretty fast. Sadly this change caused Wizardry 6 to cause a division-by-zero error when using audio. This is probably caused by some timing problem, as the game seems to determine the speed of the PC using a very fast timing loop. I spent some time trying to adjust my timer handling, but was not able to fix this yet.

Fixed Jazz Jackrabbit menu selector on pause menu

Jazz Jackrabbit was reported on the combatibility list as having a problem in the game pause menu, where the selector bar is not visible. This was a rather curious issue, as the selection seems to work fine in all the other menus. It would be strange that the game would use two different techniques to handle menus. I had not played Jazz myself so far that I would have used the pause menu, so I had not noticed this problem, but going to the pause menu the problem was immediately visible. Finding the actual problem was considerably more difficult, though.

First I had to figure out the best way to debug this. I wanted to find out the opcode where the problem occurs, so I wanted to see which opcode Jazz uses to write the menu items to the screen. Since I did not know which opcode it was, it was difficult to figure out how and where to put a breakpoint. In the end it occurred to me that whatever opcode the game uses, it will always need to write something to the VGA registers before writing to screen (since the game runs in Mode-X). So I decided to let zerox86 run up to the pause menu, and then attached the GDB debugger to it, and set a breakpoint to the VGA port 0x3CE handler. Luckily the game does not draw anything else to the screen while on the pause menu, so my breakpoint got triggered only after I pressed cursor down.

When I got the code to stop to the break point, I copied the current CS:IP register value and used that to then cause zerox86 to break at the corresponding game code. The result looked like this (curiously the game uses 16-bit protected mode instead of the much more common 32-bit flat address space protected mode):

0207:0ED3 BACE03          mov  dx,03CE		VGA Graphics Registers I/O port
0207:0ED6 B80318          mov  ax,1803
0207:0ED9 EF              out  dx,ax		03 = VGA Function Register = 0x18 (XOR)
0207:0EDA B8003F          mov  ax,3F00
0207:0EDD EF              out  dx,ax		00 = VGA Set/Reset Register = 0x3F
0207:0EDE B8080F          mov  ax,0F08
0207:0EE1 EF              out  dx,ax		08 = VGA Bit Mask Register = 0x0F
0207:0EE2 8E068615        mov  es,[1586]	ES = 014F (segment base address = 0xA0000)
0207:0EE6 8B7E08          mov  di,[bp+08]
0207:0EE9 268A4551        mov  al,es:[di+51]	Load the VGA latch from VGA VRAM contents
0207:0EED A10483          mov  ax,[8304]

There are a couple of interesting things in that code. First, the game sets the VGA Function Register to XOR instead of the normal MOV which simply writes the data as-is. The XOR function uses the VGA latch value (a sort of internal VGA register that keeps a copy of the most recent data that was read from the VGA VRAM) and performs a bitwise XOR operation with the data that is about to be written to the VGA VRAM, before actually writing the data. This is used for various tricks in the 16-color EGA graphics modes, but I had not seen this technique used in the 256-color modes until now. Perhaps because in the 256-color modes it is usually simpler to just use the XOR opcode to perform such operations, instead of programming the VGA registers and then making sure the VGA latch register contains the correct data.

The second interesting thing is that the code loads a byte from the VGA VRAM (at the code address 0207:0EE9), but then immediately overwrites the AL register (low 8 bits of AX) on the next code line. So, obviously the code has no need for the data at the VGA VRAM at that location, and the only possible use for that code is that it simply preloads the VGA latch register for later.

I had not coded any support yet for using the XOR function in the Mode-X graphics mode, so that was most likely the cause for the missing selector bar. Now I just needed to find out in which opcode the actual writing happens. Checking for the XOR function in all my Mode-X opcodes would cause a performance hit in all games, which I wanted to avoid since no other game has so far used this trick. Annoyingly, Jazz used a function pointer to call the actual screen writing code, so it took me a while longer to find it. It turned out that Jazz uses opcodes C6 (MOV r/m8, imm8) and C7 (mov r/m16, imm16) to perform the write. That was very nice, as those are rather rare opcodes to use for screen writing (usually games write some dynamic values, not hardcoded single pixels as in these C6 and C7 opcodes). So, I added support for the XOR operation to only these two opcodes, which resulted in the menu selection appearing properly! Here below are some screen copies about the problem, before and after my fix.

 

Enabled Virtual memory support

I also began working on the Virtual memory support in zerox86. Since the code is based on my DS2x86 which does have (rather fragile) virtual memory support, much of the code was already in place, just disabled or commented out. I removed the comments and then began running some of my test games. I started with Warcraft II demo. The first problem was that zerox86 crashed with a SIGBUS error. After running zerox86 in GDB I realized that my TLB (Translation Lookaside Buffer) table clearing needed work. I use the highest bit of the memory address to determine whether the address is in physical RAM or in some special area (like EGA or ModeX graphics memory or in Virtual Memory). In DS2x86 all the physical addresses are in the range 0x80000000 .. 0x80FFFFFF while on GCW0 the RAM addresses are in the range 0x00000000 ... 0x7FFFFFFF. Thus I have the exact opposite handling of the highest bit in GCW0 than in my DS2x86 code. The SIGBUS was caused by my forgetting to initialize my TLB to a proper "uninitialized" value, which on DS2x86 was simply zero but on GCW0 is actually 0x80000000. So I needed to fill my full 4MB TLB with 0x80000000 when zerox86 starts.

The next couple of problems were caused by my not noticing a few places in the code that needed more extensive checks, which were still commented out. The main problem with the paging (virtual memory) support is that a single opcode may span multiple pages, either with the opcode bytes, or with the memory address the opcode uses. DOSBox handles this by loading every byte separately, performing the memory access mapping for each byte. To speed up the emulation, I use a CS:EIP pointer in the host address space and load the opcode bytes without any memory checking. Similarly I also load the memory address that the opcode uses with just a single access to the TLB. This works fine and fast with linear memory access, not so when paging is on.

To handle the code segment paging, I change my main opcode loop to test whether the current CS:EIP pointer is within the last 16 bytes of the 4K page, and if so, I relocate these 16 bytes together with the 16 first bytes of the next 4K page to a continuous 32-byte temporary area. Accessing the next 4K page while relocating it may cause a Page Fault, which may thus not happen at exactly the same opcode as it should. This usually does not cause problems, though. Then when the CS:EIP pointer advances to within the first 16 bytes of the next page, I adjust the CS:EIP pointer to go back to the correct physical memory address. Since these checks add extra code to the innermost opcode interpreter loop, I use self-modifying code to add/remove these checks when paging gets enabled/disabled. This feature handles the code segment paging, but it still does not help in a situation where the opcode performs a memory access that spans multiple pages.

For the memory access spanning multiple pages, again the perfect solution would be to load/store all data one byte at a time using a separate TLB lookup for each byte. This would slow down most of the opcodes quite noticeably, so I am not willing to do that. Instead, I have added checks to the opcodes that have had this problem in the games I have tested. What this means in practice is that games (using virtual memory) I haven't tested myself and added specific support for will most likely not run in zerox86. This is what I mean with a "fragile support" for virtual memory.

One interesting thing happened when I was adding support for a 16-bit memory read page check. This code needs to check whether the address ends in 0xFFF, so that the low byte is in this page but the high byte needs to be read from a different virtual page (which may be physically far away from the first byte). I originally had written the check simply like this:

	.set noat
	andi	AT, memory_address, 0xfff	AT contains the last 12 bits of memory_address
	sltiu	AT, at, 0xfff			AT = 1 if AT < 0xFFF, else AT = 0
	beqz	AT, spans_two_pages		Jump if AT = 0 (last 12 bits of address are 0xFFF)
	.set at
When I was looking at the MIPS architecture reference, I realized that this might actually be a situation where I could finally use the somewhat weird nor (not or) opcode of the MIPS assembly language! So, I changed the above code to the following.
	.set noat
	li	AT, 0xFFFFF000			AT = 0xFFFFF000
	nor	AT, memory_address, AT		AT = 0 if last 12 bits of memory_address are 0xFFF, else AT != 0
	beqz	AT, spans_two_pages		Jump if AT = 0 (last 12 bits of address are 0xFFF)
	.set at
In this code I NOR the memory address with 0xFFFFF000, where the result is zero only if the memory_address ends with 0xFFF. It is not any faster, but not any slower either, and finally uses the nor opcode for something! :-)

I got Warcraft II to start up pretty easily after a few such opcode fixes, and then I began to look into making Grand Theft Auto running. This game needed a few more opcodes to be supported, but otherwise it was surpiringly easy to get running. You obviously need to run the 8bit graphics version, as that is all that zerox86 supports. It is also not very fast, as is to be expected with all the additional code needed for virtual memory support.

 

After I got GTA working, I could not resist testing how far will I get with Windows 3.11. Back when I worked on DS2x86, it was partly my frustration trying to make Windows 3.11 work in it that eventually resulted in my stopping work on it. However, now with GCW0 I have a lot better debugging tools on a much faster platform, so it is again interesting to work on Windows 3.11 support. If I get it working, the next step would be to try to make Windows 95 running, as that is practically my end goal with my emulator projects.

I had pretty good notes from my work on Windows 3.11 on DS2x86. I ran into many of the same issues on zerox86, and could use my notes to see what needs fixing. Much of these fixes were just uncommenting some code, but some code also needed actual changes, mostly because of the 0x80000000 bit significance change. I still have not managed to get quite as far as I got with DS2x86, but I am already past the middle point of my notes from DS2x86. :-)

When working on changing my opcodes to be paging-enabled, I keep running Doom timedemo occasionally to keep track of how my changes affect the overall performance of zerox86. If the performance drops, I then spend some time figuring out new performance enhancements to the new paging-enabled opcode versions. Here is a table of Doom timedemo results at various steps on the way. The doom timedemo measures the number of timer ticks running the demo takes, so the smaller the number the faster the PC. The corresponding FPS can be counted as 74690/realticks, for example 74690/6451 = 11.57 fps.

realticksDescription of the change made
6451zerox86 version 0.03 on a high-resolution timer -enabled kernel.
7020zerox86 version 0.05 (workaround for the lack of high-res timers).
6762After further optimizations to IRQ emulation.
7408With paging-enabled opcodes for GTA and WAR2.
6909Performance enhancements to paging-enabled MOV opcodes.
6629Performance enhancement to paging-enabled PUSH opcode.

Future work

I probably am not able to resist working on Windows 3.11 a little while longer, even though there are still many games on the compatibility wiki that would need looking into. I do plan to make some game-specific fixes before releasing the next version, though. I also have an idea about how to make zerox86 slow down for the really old games that currently run too fast, so that is also something I want to experiment with.

July 13th, 2014 - zerox86 version 0.05 beta released!

This version has actually quite a large number of fixes, but the fixes themselves are not all that big or complex. A couple of new features (which I probably need to improve in the future versions), and quite a few game-specific fixes, mostly for games mentioned on the compatibility wiki as not working. I also noticed that my FAQ did not yet have a comprehensive list of options you can use in the zerox86.ini configuration file, so I added a section for those.

Implemented simple CD-ROM emulation

This version of zerox86 now includes a very simple CD-ROM emulation. There is no support for ISO images yet, but now you can configure a certain directory on your SD card to be used as the CD-ROM contents (which shows up as the D: drive in DOS). This can be configured per-game using the zerox86.ini, so you can have several games that need data from different CDs on your SD card. If you configure this in the default section, then you can access the D: drive also from the DOS prompt.

As an example, I copied two CD-ROM games, Fragile Allegiance and Realms of the Haunting on to my sd card. I copied the games to directories GAMES/FragAlle and GAMES/ROTH respectively. ROTH has a separate subdirectory "CD" that should be used for the CD-ROM contents, while Fragile Allegiance wants to have the same contents on the CD-ROM as are in the game directory. Thus I added the following configuration sections to my zerox86.ini:

[roth]
CDROM=/media/sdcard/GAMES/ROTH/CD

[fragile]
CDROM=/media/sdcard/GAMES/FragAlle

Both of those games run in 640x480 SVGA resolution, and even play videos at that resolution, so they are not all that fast (or readable) on GCW-Zero. They do work and are playable in this version of zerox86, though.

 

Implemented joystick support

There is now a new configuration option ANALOG, which defaults to ANALOG=MOUSE if you do not specifically set this in the zerox86.ini file. If you set this to ANALOG=JOYSTICK, you can have zerox86 emulate an analog PC joystick with the GCW-Zero analog nub. If you set that for some game, you will probably also want to use the new key configuration options JOYB1 ... JOYB4, for joystick buttons 1..4. For example, to run LineWars II with joystick emulation and the GCW-Zero A button emulating the joystick fire button (and B the secondary joystick button), you could have the following in the zerox86.ini file:

[LW2R]
ANALOG=JOYSTICK
KEY_A=JOYB1
KEY_B=JOYB2

There is also a third option for the analog nub, you can configure it to emulate cursor keys by using ANALOG=CURSORKEYS. In this mode moving the analog nub to each edge sends a cursor key press code, and moving the nub back to center sends a correspending key release code. This may be useful in some games that use many keys, so that you can use the D-Pad for additional key shortcuts.

Fixed PC Timer 2 handling

On the zerox86 compatibility wiki there are several games that are reported as running too fast. Some of them (especially very old ones) will run too fast simply because they were coded for the original IBM PC, and have no synchronization mechanism built in. You may attempt to run such games by first running a PC slow down program like Mo'Slo or similar. However, it looked like some games that should in fact run at the correct speed would still run too fast.

I noticed that one of my old test programs, Solar Winds, had this problem. I debugged this game in DOSBox to remind myself about the method it uses for speed synchronization. I noticed it uses the PC timer 2, which is the least common of the three synchronization methods (syncing to screen vertical sync, syncing to timer IRQ (timer 0), and syncing to timer 2). So, it looked like I had something wrong with my timer 2 handling. Comparing the code in DOSBox and in zerox86, I noticed that I don't handle the timer roll-over properly in a certain timer mode. I fixed that, and Solar Winds began to run at the correct speed. It uses a very peculiar Sound Blaster detection method which seems to fail on zerox86, though, so the game is not yet fully working on zerox86. But at least it is playable now.

Added a new Power Off key to the virtual keyboard

I had been requested to add an alternate way to shut down zerox86, besides using the lock button on the side of the device, or awkwardly typing EXIT on the DOS propmpt. Thanks to the idea from the forum user Gab1975, I added a "Power Off" key to the virtual keyboard. You can find it on the top right corner of the virtual keyboard, the button label is simply a red horizontal line. There was just enough room on the virtual keyboard to add such a wide but shallow key. There is still room for another similar key, in case there comes up a need for some feature later on.

This power off key should be sufficiently difficult to reach so that you don't select it by accident, yet easier to select than typing EXIT on the prompt, in case you wish to avoid using the physical lock key of the device to exit zerox86.

Added a separate error message for invalid SoundBlaster configuration

In the previous version I added a separate error message for virtual memory, and it then occurred to me that it might be a good idea to have a separate error message also for the unsupported Sound Blaster DSP command error. Since this happens usually when the game is configured to use Sound Blaster 16 instead of the plain Sound Blaster that zerox86 supports, and the user can usually change the game configuration, it would be nice to let the user know. Thus, in this version, you may get the following error message if you have configured the audio in the game wrong:

I also painted the whole error message background red, not simply the background behind the characters. So this is sort of a "Red Screen of Death" of zerox86. :-) These error messages will show for 5 seconds, and then you get dropped back to the GCW-Zero menu.

Fixed several bugs causing Tempest 2000 not to run

I actually began my work for this version by trying to fix the crash in Tempest 2000. I had tried to find the problem already when working on the previous version, but could not find the cause. This was mainly because zerox86 crashed in the screen blitting routine, which is used for many games and is pretty robust. Also the illegal address that caused the crash was actually a legal address, so I did not understand what was going on.

After I released version 0.04, I then decided to hunt for this problem with a more brute-force method. Since the game crashes almost immediately after launch, I thought that I could perhaps log every single opcode after the game switches to protected mode (so that I can avoid logging 4DOS code). However, to my surprise, I got nothing into the log! I had assumed that there is something wrong in some of my protected mode functions, but it looked like the game actually crashed even before going to the protected mode! Actually, it was only an assumption that the game even uses protected mode (an assumption that later also turned out to be false).

Anyways, I next logged every single opcode after the game starts running, and got a very reasonable-sized (0.8 megabytes) log. So, it looked like the game actually crashed zerox86 very early in the initialization routines. The last opcode in the log was SETALC, which is actually my placeholder for DOS interrupt handling (calling C code from ASM code). The code that the game was running at that point looked like this:

02EE:8273 2E803EB08400    cmp  byte cs:[84B0],00
02EE:8279 751A            jne  8295 ($+1a)
02EE:827B FA              cli
02EE:827C 1E              push ds
02EE:827D B423            mov  ah,23
02EE:827F B061            mov  al,61
02EE:8281 CD21            int  21
02EE:8283 2E891EB184      mov  cs:[84B1],bx
02EE:8288 2E8C06B384      mov  cs:[84B3],es
02EE:828D 2EC606B08401    mov  byte cs:[84B0],01
02EE:8293 1F              pop  ds
02EE:8294 FB              sti
02EE:8295 C3              ret

Looking at my DOS interrupt handling code, I realized that I do not support DOS call Int 21/AH=23h: DOS 1+ - GET FILE SIZE FOR FCB at all. That is a practically obsolete DOS 1.0 call to get a size of a file. It gets a pointer to FCB structure in DS:DX, and returns the status in AL (whether the file was found or not). Looking at the code above, it did not seem to make much sense. The code above sets AL to 0x61 (which is not actually used by the call), and after the call it saves the BX and ES registers (which the call never changes)! So, it looked like this code in Tempest 2000 is actually some sort of hack, perhaps attempting to call a Terminate And Stay Resident program that should have been loaded earlier, or perhaps this is actually some copy protection scheme that has later been hacked to do nothing, by changing the call to a harmless routine.

This routine was followed by a similar routine in the code, which also sets AL to 0x61 and then calls DOS function Int 21/AH=19h: DOS 1+ - GET CURRENT DEFAULT DRIVE, which only returns the drive in AL registers and has no need for the DS and DX registers that the code below sets before the call:

02EE:8296 2E803EB08400    cmp  byte cs:[84B0],00
02EE:829C 741A            je   82B8 ($+1a)
02EE:829E FA              cli
02EE:829F 1E              push ds
02EE:82A0 2E8E1EB384      mov  ds,cs:[84B3]
02EE:82A5 2E8B16B184      mov  dx,cs:[84B1]
02EE:82AA B419            mov  ah,19
02EE:82AC B061            mov  al,61
02EE:82AE CD21            int  21
02EE:82B0 2EC606B08400    mov  byte cs:[84B0],00
02EE:82B6 1F              pop  ds
02EE:82B7 FB              sti
02EE:82B8 C3              ret

Anyways, it seemed that the crash in zerox86 was caused by this unsupported INT call, which should just get reported as the yellow/red "Unsupported situation" message. After some debugging using gdb I found out that I had a bug in zerox86, it corrupts a register I use to flag the type of lazy flags I use, and when I then used that register as a jump table index, it jumped to some operating system code, which then eventually caused the SIGSEGV crash.

Since the game does not seem to actually need the file size, I hacked my support for this DOS call to check if AL == 0x61, and simply return if that is the case. After that change I got Temptest 2000 to start up!

 

However, as soon as I hit any key in Temptest 2000, I got an unsupported opcode error. The crash log looked like this:

TEMPEST: Unsupported opcode!
CPU: REAL, 16, 0000FFFF
EAX=000019FE EBX=00000000 ECX=00000300 EDX=0000001C
ESP=000003EC EBP=0000EFF9 ESI=00000CFA EDI=0000756C
ES=162E CS=02EE SS=31E2 DS=162E FS=2FE4 GS=28F6
NV UP EI NG NZ NA PE CY VM=0 IOPL=0
02EE:0B43 636629          arpl [bp+29],si
Disassembly of code around the location:
02EE:0B39 C3              ret
02EE:0B3A 33C0            xor  ax,ax
02EE:0B3C C3              ret
02EE:0B3D E84900          call 0B89 ($+49)
02EE:0B40 66A10663        mov  eax,[6306]
02EE:0B44 6629060263      sub  [6302],eax
02EE:0B49 8B1EEFD2        mov  bx,[D2EF]
02EE:0B4D D1EB            shr  bx,1

That was pretty strange, it looked like the code had lost sync and was executing opcode bytes at the wrong position. There are two opcodes starting at 0B40 and 0B44, but no opcode begins at 0B43 where my emulator found an unsupported arpl opcode! So, the next step was to trace the execution backwards to find out where the code goes out of sync.

It took a while for me to do this backwards tracing, but finally I found the problem! The code segment in tempest 2000 is pretty big, it uses almost all the 64KB area of this one segment starting at 0x02EE, and it keeps jumping up and down within this segment. The game uses 80386 features, including the long conditional jumps. I found out that my 16-bit long conditional jumps did not actually wrap around the segment properly, but instead they jumped forward, to 65536 bytes too far! The same problem is even in my disassembly routine, where the jump target is shown as 0x10B39, when it should wrap around and go back to 0x0B39!

02EE:E3BB 8AE0            mov  ah,al
02EE:E3BD 80E40C          and  ah,0C
02EE:E3C0 0AE4            or   ah,ah
02EE:E3C2 0F847327        jz   00010B39 ($+2773)
02EE:E3C6 50              push ax
02EE:E3C7 52              push dx

Fixing this problem allowed Tempest 2000 to run, and I was able to start a new game, watch a demo game, etc. So, all in all, I made the following fixes to zerox86 to make Temptest 2000 run properly:

Fixed a crash in California Games 2

California Games 2 is one of the games mentioned on the zerox86 compatibility wiki as not working, so I downloaded and tested it myself. It turned out that the game calls INT 15 AH=90: OS HOOK - DEVICE BUSY (AT,PS) BIOS call, which I had not yet supported. I added support for that call, and the game started up fine.

Fixed a crash in BAAL

The next game I selected from the compatibility list to test was BAAL. It used to crash zerox86 in 0.04 version, which I thought was probably caused by the bug in unsupported INT reporting. And sure enough, when I started it up I got a debug log entry about unsupported Int 21/AH=23h: DOS 1+ - GET FILE SIZE FOR FCB. This was the same DOS function as in Temptest 2000, but this time the game actually wanted to use that obsolete DOS 1.0 function. So, I had to implement that function properly after all. After that fix BAAL began to run.

Fixed a problem in Bio Menace

I also tested Bio Menace, which was reported as showing an Unsupported Situation a little while into the game. I found out that the game calls INT 4 (which is the CPU-generated Overflow Interrupt, which a software should not call directly). The situation where the game calls this is a result of a function getting a null pointer as a parameter, so this may even be a bug in the game code. Since this interrupt most likely performs a simple IRET on a real PC, I decided to ignore this interrupt on zerox86 as well. It looks like Bio Menace does not crash with that problem any more, but I did not play it all that much to test this properly.

Fixed a problem in Galactic Conqueror

Next I downloaded and tested Galactic Congueror. The problem in it was that it read from I/O port 0x3D9. I had that code commented out, and I can't remember why I had commented that out.. I enabled that code again, and then the game tried to write to I/O port 0x24, which again was unsupported. There is nothing on this I/O port on a real PC, so I decided to simply ignore that.

After those fixes the game displayed the logo screen and played the game tune using PC Speaker digital music technique, which is not yet supported in zerox86, so the result is pretty horrible. Then the game seemed to hang, but it actually waits for the joystick firebutton press. I was not able to get the game to start even with the new joystick support, so I may need to still work on this game in the future.

The game sets the timer IRQ to happen at 16.5kHz, so it is possible that this is also too fast to run properly before I have had time to improve my timer interrupt handling still further.

Fixed a crash in Dark Castle

I still had time to look into a few games, so I chose Dark Castle to test next. It was also calling an unsupported DOS INT, this time the call was Int 21/AX=4406h: DOS 2+ - IOCTL - GET INPUT STATUS. I looked into DOSBox for information about how to support that call, and also debugged the game in DOSBox to easily see what the game actually wants, and then added support into zerox86 using a similar technique. This solved the crashing problem, and at least the demo game works fine now. The game menu seems to be somewhat awkward to use, though, and the game tries to play digitized sound effects via PC speaker, which is not yet supported in zerox86.

Fixed a crash in Dark Century

The last game I had time to test was Dark Century. It was reported to show an error when running in CGA mode, and to crash when running in EGA mode. I did not get an error in CGA mode, so I assume my prior changes have fixed the problem in the CGA mode of this game as well. However, starting the EGA mode game did in fact crash zerox86 with a Segmentation Violation (SIGSEGV).

I ran zerox86 in GDB, but again the actual crash location did not make much sense. I learned from debugging Tempest 2000 that GDB just does not seem to be able to indicate the correct crash location, and added some code to save the actual emulated location to memory, and then checking this memory address when zerox86 had crashed. I found out that the crash happens in the first pushf opcode within the game code that looks like this:

1BB5:42EF FA              cli
1BB5:42F0 BB7100          mov  bx,0071
1BB5:42F3 8CD0            mov  ax,ss
1BB5:42F5 2E8907          mov  cs:[bx],ax
1BB5:42F8 8BC4            mov  ax,sp
1BB5:42FA 83C302          add  bx,0002
1BB5:42FD 2E8907          mov  cs:[bx],ax
1BB5:4300 8CD8            mov  ax,ds
1BB5:4302 8ED0            mov  ss,ax
1BB5:4304 83C728          add  di,0028
1BB5:4307 8BE7            mov  sp,di
1BB5:4309 9C              pushf 
1BB5:430A 9C              pushf 
1BB5:430B 9C              pushf 
1BB5:430C 9C              pushf 
1BB5:430D 9C              pushf 
1BB5:430E 9C              pushf 
1BB5:430F 9C              pushf 
1BB5:4310 9C              pushf 
1BB5:4311 9C              pushf 
1BB5:4312 9C              pushf 
1BB5:4313 9C              pushf 
1BB5:4314 9C              pushf 
1BB5:4315 9C              pushf 
1BB5:4316 9C              pushf 
1BB5:4317 9C              pushf 
1BB5:4318 9C              pushf 

What was interesting in this code was that at address 1BB5:4302 the ss (stack segment) register gets the value from ds (data segment), which at this point contains a value 0xA000 pointing to the EGA VRAM! That is, the game sets the stack to point to the EGA VRAM, and then stores the CPU flags (using the pushf opcode, "push flags to stack") to the EGA graphics memory! I can not understand why the game would use such strange trickery, but at least now I understood what goes wrong. I had not coded support to my pushf opcode handler to write to graphics memory. I added support for that, and then the game started into the actual play field fine. I could not figure out how to control the tank, though. I could not control it in DOSBox either.

Future work

I also tested Blood Money, but was not able to determine what causes the problem in zerox86. Curiously, I was not able to get this game running on DOSBox either. On my debug-enabled DOSBox it crashes with an access violation, and on the actual DOSBox 0.74 version it simply hangs at the title screen. So, it is possible that there is something wrong with the game itself, instead of zerox86.

This version of zerox86 also has some preliminary work done to eventually support running games that use virtual memory. I increased the size of my memory mapping table to contain the full 4GB of addressable RAM, and I also figured out a way to allow self-modifying code, so it looks like it will be possible to add the same support for virtual memory I had in DS2x86 into zerox86 as well. I would like to make the support more robust in zerox86, but adding similar level of support would be a good starting point.

I also want to improve the timer IRQ handling performance in the upcoming versions, and it would also be nice to have PC Speaker digital audio support. That is something I have not yet added to any of my emulators, so it would be an interesting new feature.

July 7th, 2014 - zerox86 Compatibility Wiki now available!

Thanks to some hard working GCW-Zero forum members, there is now a new Game Compatibility Wiki for zerox86, at http://wiki.gcw-zero.com/zerox86. Feel free to update that wiki as you test new games on zerox86! That wiki is the best way for you to help both me and the fellow GCW-Zero users. I can use that wiki for looking up new games to test when improving zerox86 compatibility. You can help reduce the number of "will game X work on zerox86?" questions, and give other GCW-Zero users ideas for games to play on zerox86. Big thanks to all of you who have already worked on the wiki!

July 6th, 2014 - zerox86 version 0.04 beta released!

This version has one major change and a few smaller fixes. Here is some additional info about the changes in this version.

No more high-resolution timer support needed in the kernel!

The major change in this version affects the timer interrupt handling. My original timer IRQ code in zerox86 was based on the code I use on Android and Raspberry Pi, where the Linux kernel has support for high-resolution timers. In those platforms I let the kernel interrupt my code at the correct timer interval, thus making my emulator timer interrupts happen in real time. This is a simple and easy system, but since the GCW-Zero kernel only runs timers at 250Hz (4ms interval), this is not sufficient to emulate games that need higher frequency timer interrupts.

I had already coded a different timer interrupt emulation system for my Windows Phone 8 version of pax86. The Windows Phone 8 task switch frequency was even less than that of GCW-Zero, so on that platform I had to use an internal counter and then periodically check the elapsed time using performance counters. This has the drawback of needing extra memory accesses for the counter in the main opcode interpreter loop, so it slows down all the emulation. However, luckily on zerox86 I had one MIPS register still unallocated, so instead of using a counter in memory, I was able to use this register for the counter. This meant that instead of loading a value from memory, decrementing it, saving it, and testing it for zero (4 opcodes), I was able to simply decrement the register and test it for zero (2 opcodes) on zerox86. This still causes a slowdown, but running Doom timedemo it looks like it only slows zerox86 down about 3% (meaning the new version runs at 97% of the speed of the previous version). I find this an acceptable trade-off for running games at their proper speed.

When the counter register reaches zero, I then call the C routine clock_gettime(CLOCK_MONOTONIC, &timestruct) to get the current time (with around a microsecond actual precission), and check whether it is time to start an emulated timer IRQ. I then reset the counter register to a suitable value (depending on the actual wanted IRQ frequency) for the next timer interval. Since I don't do cycle counting, this counter is rather approximate, which is why I try to have the counter reach zero several times during one timer IRQ period, and use the clock_gettime() call to determine the more accurate IRQ time.

I tested how much time the clock_gettime() call takes on GCW-Zero, and noticed that two adjacent calls return a time difference of about 1666 nanoseconds, and that I can call the routine over a million times in a loop that lasts for a single second. So the call is pretty fast, and it should not be a problem calling it at up to 100.000 times per second for a game that sets the timer IRQ to some very high value (like 33kHz, as in Wizardry 6). At those speeds this will of course still slow down the emulation quite a bit, but the normal timer IRQ speeds of less than 1000Hz should run quite fine, and even the games that play digitized audio using Direct DAC system (one sample of audio generated at every timer IRQ) should run reasonably OK. The actual routine that starts the timer IRQ handling in zerox86 is still not very fast, and I plan to still improve that in the future versions.

Here is a list of some of the games that will now run much better than before. Any game that used a higher timer speed than 250Hz used to run slower than normal.

Fixed screen corruption in Ishar: Legend of the Fortress logo screen

After I got the new timer system working, I wanted to take a look at the screen corruption problems in various games. I first started with Grand Prix 2, but since the corruption in that game happens only in the actual game and it takes a LONG time to get to that point in my debug-enabled DOSBox (which I use for comparisons), I decided to start with a different game. Ishar: Legend of the Fortress was a suitable test game, as in it the screen corruptions happens in the title screen as soon as the game starts. I spent a full day debugging the game in both zerox86 and DOSBox, but was not able to find the problem. On the second day of debugging I was able to get closer to the problem location, and noticed that something strange happens after the game has read the logo image from disk to EMS memory, and before it is copied from EMS memory to the VGA screen VRAM.

I added full logging to this code, but I could find no differences between the code running on zerox86 and it running on DOSBox. However, when I instead added a memory watch to a pixel on the screen, and compared the operations that are performed on that pixel, I found that on zerox86 some operations were not performed at all! After some head scratching it finally occured to me that my full logging code recalculates the CPU flags after every opcode, which does not happen when running the code without logging. I then decided to test what happens if I forcibly recalculate the CPU flags after every opcode, and that fixed the screen corruption! So, finally I found out how the screen gets corrupt, now I just had to find out which opcode uses my lazy flags incorrectly. I changed the full logging code to optionally not recalculate the flags, ran both versions, and then compared the logs. There I finally saw the problem: I had forgotten to adjust the zero flag testing algorithm for LOOPNE and LOOPE opcodes when I made a small performance fix to the zerox86 version of my code! After fixing that problem Ishar logo began to draw correctly. Here are some screen captures, first the problem screen and then the fixed screen in the new version of zerox86.

 

Fixed Sound Blaster detection in Ishar: Legend of the Fortress

While I was working with Ishar, I decided to also implement the same SoundBlaster detection routine fix to zerox86 that I had coded to my rpix86 emulator last December. Luckily I had commented the code in rpix86 with the changes I had to make, so it was pretty easy to port the changes from ARM ASM to MIPS ASM for zerox86. Ishar plays the intro music using Direct DAC audio, at around 13kHz, so the timer fix was also required before it was possible to play SB audio in Ishar. The resulting audio is still not very clear, but that is mostly due to the screen blitting time, which causes loss of audio synchronization at 60 times per second.

If the Sound Blaster detection fails, the game attempts to play digitized audio using the PC speaker, which on zerox86 just causes horrible screeching sounds. At some point I might attempt to look into supporting this feature as well, but there are not all that many games to use that audio technique.

Fixed screen corruption in Bard's Tale and Cloud Kingdoms

Bard's Tale was one of the games mentioned in the list of problem games I got from my nephew, who has been testing zerox86 for the last couple of months. He reported that moving the mouse leaves black boxes on the screen. I tested this myself and confirmed the problem.

I began debugging the problem by checking when the top left pixel of the screen changes value (using a simple memory watch function I have in zerox86). I found the situation where the pixel went black even though it should have been grey, and noticed that the code uses MOVSW opcode to move the data from RAM to EGA VRAM. This opcode is rather common, so I doubted there is anything wrong with this opcode. To make sure, I debugged the value that was written, and indeed even the low byte of the value in RAM was zero, which did not seem correct. Next I checked where this memory address gets that invalid value, and found out that it uses the same MOVSW opcode, but this time to copy data from EGA VRAM to normal RAM. This is a much less common situation, and looking at my implementation of this opcode I found the problem: I had a simple typo in my code. When I meant to temporarily save the register value that contained the low byte while loading the high byte (using the sw MIPS opcode for Store Word), I had written lw for Load Word. Thus the low byte never got saved and was left as zero.

Fixing this opcode made Bard's Tale behave correctly. I also checked some other games where people have reported screen corruption, and noticed that this fix also fixes Cloud Kingdoms, which suffered from the same problem. Here below are some screen copies of the screen corruption. This kind of corruption (in the EGA graphics mode) will not happen any more in the new version of zerox86.

 

Added a separate error message for games using Virtual Memory

In the prior versions of zerox86, when the game attempted to turn on virtual memory (paging), this would either crash zerox86 completely or give the normal "unsupported situation" error message. I decided it might be better to have a dedicated error message for this situation, since virtual memory is not supported in zerox86 by design. Those unsupported situations usually mean there is a bug in my code, whereas the virtual memory support is just something that I haven't implemented and may look into in the future.

Future work

The timer fix is still not very optimal, so I plan to revisit this fix and make it faster in the upcoming versions. I will also continue debugging the Grand Prix 2 screen corruption problem, and also try to find the cause for the Segmentation Violation that happens in some games (like Tempest 2000).

There are also still many games reported with problems, so I will continue debugging them and trying to find the cause for the problems. It seems that the better zerox86 gets compatibility-wise, the harder the remaining problems are to track down. It can easily take several days to debug a single misbehaving game, so I may not be able to release fixes all that fast. In any case I will continue working on zerox86 for a while longer, at least.

Hopefully you find this version to run some games better than the previous one, and let me know of any new bugs you encounter in this version. Thanks again for your interest in zerox86!

June 15th, 2014 - zerox86 version 0.03 beta released!

Okay, after a very long hiatus I am back working on zerox86! I have been working on my other projects in the mean time, and also for the past couple of months my nephew has been using my GCW-Zero to test my library of over 300 DOS games on zerox86. He has not yet had time to test all of the games, so I will give my GCW0 back to him for a few weeks again.

Anyways, I released a new version of zerox86 today, mainly because I heard reports that zerox86 does not work on the latest GCW0 firmware from May 5th. It turned out that zerox86 did actually work on it, but since my directory defaults and error messages were not correct, it was not easy for new zerox86 users to figure out the potential installation problems. There are not all that many fixes in this version yet (as I only have worked on it for a couple of partial days), but here is the list of the changes and improvements I managed to implement.

Implemented mouse scaling

This has been one of the most requested missing features. In this version you can have new configuration keys MouseXScale and MouseYScale in the zerox86.ini. These should have a floating point value (like 0.5 or 2.0) to make the mouse pointer move slower or faster than the default speed. You can have these settings in either the default section or in the game-specific sections in the ini file.

Fixed a crash in Grand Prix 2

Grand Prix 2 is one of my newer test games, mainly because it is very pretty and runs in either 640x480 SVGA mode or in 320x200 MCGA mode. The menus are in the SVGA mode, so I can get nice screen copies from it :-)

The game crashed with an unsupported opcode

0180:00222585 FF1DAEAD6500    call far dword [0065ADAE]
when starting the actual game. I had fixed the same problem in my other emulator versions, and luckily I had remembered to comment this fix so it was easy to make the same fix to zerox86. The problem was caused by the memory address 0065ADAE having an invalid value, which in turn is caused by my mistake in handling the 0x66 and 0x67 prefixes for the LOOP opcodes (0xE0, 0xE1, 0xE2 and 0xE3). I had assumed that the operand size prefix 0x66 determines whether to use the 32-bit ECX or the 16-bit CX register count, but instead it is the address size prefix 0x67 that determines this.

After fixing those loop opcodes the game starts up fine, but there is still some problem with the graphics. Seems like the screen is not cleared properly or some such. Also, the system requirements of the game mention a Pentium CPU, so it runs rather slow unless you turn the graphics settings down close to their minimum values.

Improved error messaging and directory defaults

In this version of zerox86, the error messages about missing 4DOS.COM and such should point to the correct directories, so it should be easier for you to determine whether you have a typo in the ini file or whether the files are in the wrong directories. This should help new users. I also added a new section to my FAQ for the initial setup of zerox86.

Future plans

I plan to look into the high resolution timer issue during my summer vacation. It would be nice to make zerox86 work at the proper speed even when the Linux kernel does not have high resolution timers enabled. I still hope that the GCW0 firmware people will enable the high resolution timer support at some point, though.

I will also have a long list of misbehaving games as a result of the tests my nephew has done, so those will keep me busy during my summer vacation. Thanks again for your interest in zerox86!

Sep 1st, 2013 - zerox86 version 0.02 alpha released!

This version has the following fixes and improvements:

Many of those fixes I described in my previous blog post, so here are some info about the additional fixes I managed to implement during the last week.

Alone in the Dark

Alone in the Dark crashed at start if Sound Blaster audio was enabled. This was again a situation I thought I had encountered before, and sure enough, in my rpix86 blog Apr 28th, 2013 entry I had written about the exact same kind of crash in rpix86. Since the cause was the same (short SB IRQ buffer with no playing speed given), the same fix also helped in zerox86. After that fix the game seemed to run properly.

Command line parameter

In this version I also added the option to give the DOS program to run on the command line. That is, you can start zerox86 with a command line like "./zerox86 /boot/local/home/DOS/LW2/LW2.COM" and have it immediately launch LineWars II instead of the 4DOS prompt. In fact, when launching games this way you don't even need to have 4DOS.COM at all! I think this should make it possible to launch DOS executables directly using the gmenu2x file browser, but I think this will still need some MIME type changes or something, so it might not work properly in this version yet. Also note that launching .BAT files this way is not possible, as those would need 4DOS to process them.

You should use a full (not relative) path when launching executables this way. That is because of the way the C:\ root directory emulation is handled in this situation. Launching the executable in this way is a three-step process:

  1. The path except the last two parts is considered the C:\ root directory. If the parameter was "/boot/local/home/DOS/LW2/LW2.COM", this would make the emulated C:\ root be located at "/boot/local/home/DOS".
  2. Next zerox86 performs a CD to the second last part of the full path, in the example case this will be "cd LW2".
  3. Last, the executable is loaded without path from the current directory, in the example case the executable is "LW2.EXE".
This is important to keep in mind if you install the program in DOSBox on your PC and then copy it to zerox86 for running. Some games need to be run from the exact same DOS directory they were installed in.

Mouse support

Finally I quickly hacked together some preliminary mouse support. You can now move the mouse using the analog controller (there is no setting for the mouse sensitivity yet, so the movement may be either too slow or too fast depending on the game). For mouse clicks, you need to map a GCW0 button to emulate either the left or right mouse button using special scan codes LMB and RMB in the zerox86.ini file.

Future work

It looks like my time to work on zerox86 in the near future will be quite limited. I got a new project that I need to spend my free time on, so I will not be able to implement any major changes to zerox86 for a while. I will get back to working on it after the other project is finished.

Thanks again for your interest in my emulation projects!

Aug 25th, 2013 - zerox86 progress

For the past week I have been working on implementing various missing features and fixing various bugs in zerox86. I also got a new 4GB SDHC card where I could fit all my DOS test programs (of which I have about 3GB's worth). I put that card into my GCW-Zero and thus now have easy access to all my DOS test programs. I then began to go thru my test programs (mostly alphabetically) and checking the problems in them.

Duke Nukem 2

I had left the Sound Blaster ADPCM sound emulation so far unimplemented, so I started my improvements by implementing that. It took a couple of days, mostly because I had all sorts of other activities after work during the first half of the last week. I ported the ADPCM code from the rpix86 version, converting the ASM code from ARM to MIPS. It now plays something at least resembling the correct audio, and since very few programs actually use ADPCM audio, I think it is now sufficient. Duke Nukem 2 actually uses all three ADPCM formats, 4-bit, 2-bit and also the slightly weird 2.6-bit (8 divided by 3) version. Thus it is simple to test the ADPCM code using a single game.

Heimdall

Heimdall is one of the few games that read data directly from disk to graphics VRAM. Jazz Jackrabbit does that into Mode-X VRAM, while Heimdall uses EGA 16-color graphics mode. I had thought that this was already working, but I decided to test it just to be sure. To my surprise it did not work, and after a little bit of debugging I realized that my EGA version did not restore the GP register properly when calling from C routine to my ASM routine.

Back when coding DS2x86 I noticed that the DSTwo firmware did not use the GP (global pointer) register for anything. MIPS convention is to use the GP register as a pointer into the middle of a 64KB "near data" block. This is because MIPS has very simple memory addressing system, you can only use a register with a 16-bit signed offset to address memory. By having the GP register always pointing to the most commonly used data area, you can avoid calculating memory addresses into temporary registers most of the time. Thus, in my DS2x86 ASM code I used the GP register to point into my data area, and also made the actual emulation address calculations take advantage of the GP register pointing to a certain address which was also aligned in a certain way.

In the GCW-Zero environment, however, all the libraries are linked in a way that has the linker decide the suitable value for the GP register, which differs from my needs. It would have slowed my emulation down quite a bit if I would have had to change my code to not rely on the GP register value, so I decided to let the C code use the linker's GP value, and have my ASM code use different value that suits my purposes better. In my ASM code I load the GP register with the value I want after every call to or return from C code (which does not happen all that often, luckily). For the linker to allow that, I needed to have my ASM code use the -mno-abicalls compiler flag. This will make the linker give a lot of linking abicalls files with non-abicalls files warnings, but as long as I remember to handle the differing GP register values this does not seem to cause any problems. Not using ABI calls also has the advantage that I can call my own ASM subroutines using the simple JAL address (jump and link) opcode, instead of the complex way that the C code uses by first calculating the target address into t9 register and finally calling the address using JALR t9 (jump and link register) opcode.

Mahjong Fantasia

After those fixes, I then began going thru my games alphabetically. The first game I tested was Mahjong Fantasia, which for some reason I have in a directory called 98mj2. It is the only game I have encountered that goes to the EGA 640x200 mode, and then changes the EGA registers so that the actual graphics screen goes to 640x400 resolution. I noticed that I had not taken this register change into account in my EGA graphics blitting routines, and fixed that problem. This made the game start up and look correct.

A-Train

The next game I tested was A-Train. It seemed to work fine, except that the main game screen was mostly monochrome. I thought that the problem seemed somehow familiar, so I went through my old DSx86 blog posts, and found a DSx86 blog June 27th, 2010 entry where I fixed the exact same problem in DSx86. The problem was the EGA Register Interface Library handling. I checked my code in zerox86, and realized that when I had copied the C code from rpix86, I had not changed the call parameters to match the way my MIPS ASM code differs from the ARM ASM code. I fixed the parameters, and the game began to look correct.

Alien Legacy

Next I checked Alien Legacy. It seemed to hang with a black screen, which has usually been a difficult situation to track down. I can attach the gdb debugger to the running zerox86 process (which is a huge step forward from the DS2x86 times), but since it is much more likely that the x86 code is the one running in a loop instead of my emulation code, stopping the emulation code does not get me very close to the root problem. Since I am already using signals to handle various interrupts in my code, I thought that it might be a good idea to have a signal that would print a disassembly of the currently executing x86 code to the standard output. This way I could send that signal repeatedly to my zerox86, and see if the x86 code is running in a tight loop.

The problem with this in zerox86 is that my emulation code keeps the x86 registers in MIPS registers and not in any memory address, so when the code gets the signal, the x86 registers (including the program counter) are not in any variable that I could simply examine. But since the signal handler needs to return to the code that was running at the time of the signal, all the register values need to have been pushed on the stack.

Luckily I already had a memory variable that tells the stack pointer value of my emulation core, so I began experimenting with this by coding a SIGUSR1 handler that simply printed the current stack pointer value and my emulation stack pointer value. The difference in Alien Legacy hang situation seemed to be a bit over 700 bytes, which means around 175 words had been pushed on the stack. The problem was just to find where in that area the signal handler has pushed which register.

I added code into my signal handler to print out all the stack values, and then used gdb to break when the signal handler is started, and then again when the code returns back to my emulation loop. Many of the register values only appeared one time in the stack, so using those addresses together with the known GP register value (which the signal handler also needs to push into stack) I was able to make an educated guess as to how the registers are pushed. I added code to my signal handler that looks for the GP value, and gets the register values from the certain offsets in the stack relative to the GP value. These register values are then saved into the memory variables that my disassembly routine uses. After having my signal handler print the disassembly, I managed to find out where Alien Legacy hangs:

0180:001B03DA D9F8            fprem
0180:001B03DC 9B              fwait
0180:001B03DD DFE0            fstsw ax
0180:001B03DF 66A90004        test ax,0400
0180:001B03E3 75F5            jne  001B03DA ($-b)

Alien legacy uses the floating point fprem (remainder) opcode, and then checks the floating point flags to see if the C2 flag (value 0x400) is clear (meaning the remainder operation was finished), and runs the remainder again if not. I had not coded proper FPU support into zerox86 yet, so this test always failed causing a never-ending loop.

So, I decided to finally start implementing the FPU operations, as it has been on my TODO list for a while. Since the MIPS processor in the GCW-Zero has proper hardware floating point support, I needed to look into how the floating point parameters are handled in function calls and so on. After some studying I realized that the floating point parameter passing convention is pretty easy to understand and suits my existing FPU emulation framework quite easily, so it only took a couple of hours to implement the missing floating point operations. After that Alien Legacy progressed to the main game screen quite fine. It seems to need mouse, though, so I think I need to implement mouse emulation next.

Thanks for your interest again, I will continue working on zerox86, adding the missing features and testing various games. There are still a lot of misbehaving games that I need to debug, but I hope at least some more games (including the ones mentioned above) will run better in the next version.

Aug 18th, 2013 - zerox86 version 0.01 alpha released!

Okay, here it is, the first public alpha version of zerox86! Note that a lot of features are still missing, and this can still only run a few games, but feel free to test it and report the miss-behaving games to me. I can then improve the compatibility in the upcoming versions.

Since the last blog post I changed the default key mappings somewhat. It was requested that I follow the default SDL key mapping of GCW Zero as closely as possible, so now the default key mappings use those defaults. However, I think it is important to have easy file and directory selection when on 4DOS prompt, so I added a different default configuration for 4DOS. Here the A key is used for bringing up the file selection window, and B key is mapped to ESC key (to back out from the selection window, for example). If you feel these mappings are not good, you can add a 4DOS section to the zerox86.ini file and change the mappings to be whatever you like. The only hardcoded keys are LOCK for exiting the emulator and SELECT for toggling the virtual keyboard on/off.

The image above shows the current configuration display when running 4DOS (the image is zoomed 2x to make it bigger on computer displays). The bottom left area shows the currently running executable (which is used as the section name in the zerox86.ini file when loading game-specific settings), the config that is currently used, and the current graphics mode. On the right side is a key legend, so you don't need to remember all the key assignments. This shows the A and B buttons (red and blue colored) having 4DOS-specific assignments. The image below shows the default key mapping, in this example I was running Doom, so the graphics mode is 320x200 Mode-X.

If the game requires some unsupported low-level feature (like virtual memory support) or encounters some internal error that causes it to misbehave, you may get a message like below on the screen. Please send me the crash logs that get written in this situation, as those will help me in tracking and eventually fixing the problem in zerox86. This display stays on the screen for 5 seconds, after which you get dropped back to the GCW Zero main menu.

Have fun with this alpha version of zerox86! I will continue working on it and improving the game compatibility in the future versions. Thank you for your interest in my x86 emulators!

Aug 4th, 2013 - zerox86 progress

For the past weeks I have been on my summer vacation, and I decided to spend my vacation mostly not programming at all. Thus I have not made much progress with zerox86 either. However, I could not resist spending a few hours every few days working on either rpix86 or zerox86, whenever I got momentarily fed up with just being lazy. :-) For zerox86 I have worked on four specific issues: Fixing the annoying race condition, implementing Sound Blaster Direct DAC support, implementing a virtual keyboard, and finally I also began implementing the configuration display.

Fixing the race condition problem

I began working on the race condition problem by coding a logging system that logged all IRQ-related events into a ring buffer. I could not log them directly to the file, as that changed the timing of the events so that the race condition did not occur, at least not as often. When the program then stopped because of the race condition, I wrote the ring buffer contents to a log file.

After a couple of attempts at studying the log, I finally understood the race condition. Sometimes the timer signal interrupted the audio thread (which in turn has interrupted the main thread), and because of a complex interaction between the main thread turning interrupts on or off and marking an interrupt handled, and the other threads queuing interrupts, it was possible for the system to get out of sync. Rather than attempting to fix the already too complex thread locking system, I decided to try rewriting it all using Linux real-time asynchronous signals.

The timer IRQ emulation already used the asynchronous SIGALRM signal, and I coded additional signal handlers for the other emulated IRQ lines, and also added new routines that any thread could use to send a signal to the main thread:

static void TimerIRQ()      { IRQRequest(0); }
static void KeyboardIRQ()   { IRQRequest(1); }
static void MouseIRQ()      { IRQRequest(3); }
static void SBIRQ()         { IRQRequest(7); }
static void PS2IRQ()        { IRQRequest(12); }

void SendKeyboardIRQ()      { pthread_kill(maintid, SIGRTMIN+1); }
void SendSBIRQ()            { pthread_kill(maintid, SIGRTMIN+7); }
void SendMouseIRQ()         { pthread_kill(maintid, SIGRTMIN+3); }
void SendPS2IRQ()           { pthread_kill(maintid, SIGRTMIN+2); }
void SendSIGTERM()          { pthread_kill(maintid, SIGTERM); }
The pthread_kill functions send a signal to the thread ID given as the first parameter, so calling the Send... routines from whatever thread always sends the signal to the main thread (which is where the actual emulation happens). IRQRequest is the common routine that handles the various IRQs in the main thread, and the static methods above are the actual signal handlers. I also added the program exit to the same system, so I can send the SIGTERM signal from any thread and handle it cleanly in the main thread.

Now I was able to code the actual IRQRequest routine so that I did not need to use any thread synchronization mechanisms, as it will always get called within the main thread context. The only thing I needed to protect against was a new signal attempting to interrupt the handling of the previous signal. I decided to block all the used real-time signals in the beginning of the IRQRequest routine, and then later unblock them at the end, so that took care of that problem. In other routines in the main thread I simply used the ll and sc MIPS ASM opcodes to protect against interrupted execution. Since only the IRQRequest can interrupt the main thread, and not the other way around, this was sufficient locking for the variables shared between the IRQRequest code and the main thread. These changes seem to have fixed the race condition, as I have not encountered the problem any more.

Sound Blaster Direct DAC support

Next, it occurred to me that using interval timers for the TimerIRQ emulation allowed me to support very high timer speeds (like 8400Hz as used in Star Control 2) as long as the GCW0 kernel supports high-resolution timers. This makes it possible to have the timer interrupt routine play a single audio sample, which is what the Sound Blaster "Direct DAC" method does. So, I decided to enhance my Sound Blaster emulation so that it can play samples buffered by the timer interrupt routine. Because the actual audio playing rate I use in zerox86 is 22kHz, I still need to convert the samples from 8.4kHz (or whatever the game uses) up to 22kHz, and thus I need to buffer them before playing them. This buffering may cause some skips to the audio, but the audio still seems to sound mostly correct.

Implemented preliminary virtual keyboard.

I also implemented some code to handle the keyboard image you can see in the screen copies of the previous blog post. I decided to use the SELECT key to toggle between activating the virtual keyboard (which then uses the D-Pad and A keys) and disabling it (which leaves D-Pad to emulate cursor keys and A to emulate Enter). I also tested using the amalog controller for the key selection, but noticed that it was too awkward to quickly select the correct key with that. Separate D-Pad key presses to move to adjacent virtual keyboard keys seems to work better.

Implemented preliminary config display.

Finally, I also worked on displaying the current configuration (including a key mapping legend). This is something that I have wanted to do for a long time now, and I wanted to get it done before starting the actual configuration system, so that I can see that the configuration system works properly. Here below is a screen copy of the preliminary config and key legend display. It uses the lower 40 scanlines of the screen to display the configuration in use, and the currently mapped keys. Using the SELECT key you can toggle the screen bottom area to show either this display or the virtual keyboard. I will perhaps also add a third option that will show nothing, in case this display is distracting in some situations. I think I will need to still make this config screen somewhat cleaner, but in principle the visual layout is now in place. The values it shows are currently just placeholders for the actual values. I used those to determine the space I need to reserve for the key code names. The next step is to have it show the proper values currently in use.

My summer vacation ends today, so I will get back to the normal daily routine, and hopefully get some proper progress done to zerox86 as well.

June 16th, 2013 - zerox86 progress

During the past couple of weeks I have made quite a few enhancements to zerox86, but it still is not quite ready for release. There are some essential features still missing, and there is a very annoying bug or race condition that occasionally crashes the whole program. I have had a similar problem in DS2x86, and I am pretty certain that the problem is caused by lack of proper thread synchronization in the IRQ emulation. I have divided my time between trying to hunt down this bug, and adding some new features.

Jazz Jackrabbit support

For some reason I had thought that Jazz Jackrabbit uses virtual memory and thus will not run in zerox86, but a beta tester reported that he got it running by disabling all audio. There were some bad graphics glitches, though. So, I decided to test it myself and fix at least the graphics glitches. One graphics problem was quite obvious, I had not yet implemented the file data reading directly to VGA VRAM, which Jazz Jackrabbit does. It is the only game I have encountered that does that while in Mode-X graphics mode. After that the sprites began to work correctly, but still the bottom part of the screen flickerd, and I found a bug in my Mode-X VGA Line Compare Register handling. Fixing that made the game run the demo game with no graphics glitches that I could see.

After those fixes I tried to get the audio support also working, but even after spending several hours with that I still could not find out the reason for the immediate crash. It does not actually crash zerox86, instead it causes a General Protection Fault when loading an invalid value to a segment register. I was able to determine that in DS2x86 it does not load this invalid value to the register, but I could not yet find out why it runs differently in zerox86 and what is the original cause for this difference. So, Jazz Jackrabbit does not yet run with audio enabled.

EMS memory and SuperVGA support

After I got frustrated with lack of progress on Jazz Jackrabbit support, I decided to work on some easier features, first by adding EMS support. This was pretty simple, as my EMS support is coded in C so I could port it from rpix86 with very few changes. Since 4DOS.COM by default swaps to EMS if it is available, it was easy to see that EMS memory support began to work. I also tested Wing Commander 2 digitized audio, which also needs EMS memory to work. I found a small bug in my SB digitized audio support, it played one more samples than what it should have, which sometimes caused a click to the sound, so I fixed that problem as well.

After EMS memory I coded support for SuperVGA 640x400 and 640x480 screen modes. I decided to support these modes scaled to 320x200 and 320x240 resolution, which meant averaging every four input pixels to one output pixel. As this needs to be done with CPU (at least for now), it will slow down the emulation somewhat. I tested my LineWars II and Little Big Adventure, both of which use 640x480 SVGA mode. Below is a screen copy from Little Big Adventure running in zerox86.

80x25 scaled text mode

Next I changed the 80x25 text mode to use 4x8 pixel font, so that I could fit all 80 characters to the 320 pixel wide GCW0 screen. The font is not nearly as readable as my default 6x8 font, but using that gets rid of the need to scroll the screen. I plan to have an option to switch between scaled and scrollable screens, same as I did in DSx86, but for now I will use scaled mode in all text/graphics modes.

I also found and fixed a problem I had in the text mode cursor routines where extra cursor images were sometimes left around the screen, for example in Norton Sysinfo. This was caused by the software setting the cursor start scanline to a large value, like 63, with the ending scanline being the default 7. This should skip the cursor drawing completely, but my routine always drew at least one scanline of the cursor, which was wrong.

PC Speaker support

I also added support for PC Speaker sounds. The game I use to test these is the old CGA game Paratrooper. It plays some musical tones using the PC Speaker, so it is good test bench for that, as well as CGA graphics. It was one of the first games I got working in DSx86, so I like to test that in all my emulators during their early development phase.

Virtual keyboard

The next step is to add a virtual keyboard to zerox86. I am thinking of having the virtual keyboard be visible always when in 80x25 (or 40x25) text mode, as these will only need 320x200 pixels, while the zerox86 screen is 320x240 pixels. So I have an area of 320x40 pixels to use for virtual keyboard. Since the standard PC keyboard has 6 rows of keys, I have 6 scanlines to use for each row, and still have 4 scanlines left over. I decided to use the 4x6 pixel font for the key labels, which allows me to fit the full 101-key PC keyboard image to the 320x40 area.

You can see the keyboard image in several of the screen copies above. I haven't yet decided how I handle the keys selection, it might perhaps be made to work using the analog controller for key selection and then either A or Start to actually press that key. Or simply using the D-Pad to move the focus from one key to the next, but since especially on DOS prompt the cursor keys are used while joystick is not, the analog joystick control would be a better choice. I will continue working on this feature, as well as trying to hunt down the annoying crashing bug during the upcoming weeks.

June 2nd, 2013 - EGA and audio work

Last week I worked on the high-resolution EGA graphics modes, first the 640x200 (which only needs horizontal scaling) and 640x480 (which needs 2-to-1 scaling in both directions). On Friday I then implemented also the 640x350 EGA mode, after I figured out that the easiest way to fit this vertically into 320x240 would probably be to alternatively copy one scanline directly and then average the next two scanlines, so that every 3 input scanlines generate 2 output scanlines. This just leaves a few black lines on the bottom.

The next step was to add audio support, which I then started working on Saturday morning. I began with the AdLib emulation, as that is easy and I could copy the code pretty much directly from an old version of DS2x86 (before I switched to my own transfer code and began using the ARM7 processor for audio). First I needed to look into how audio is supported on GCW0, and it looked like using SDL would be quite easy. I found a very simple tutorial and used that as an example, and it didn't take very long to have the audio framework in place. Next I just added a call to my AdLibEmulation routine from the audio callback code, and already at 9am I had AdLib audio working in Doom (which I used as my test bench). So, I decided to continue immediately with Sound Blaster support.

The AdLib support can manage with reasonably long latencies (longer latency just sounds like some inaccurate note timing, not as any distortion or other more distracting artifacts), but SoundBlaster digital audio support is rather timing-critical. For example Doom needs an IRQ after every 128 samples have been played (at 11.1kHz sample rate), and it will immediately cause audio problems if it does not get an IRQ when it expects one. I play audio at 22kHz, so that 128 samples translates pretty closely to 256 samples. I decreased the SDL sample buffer size from 512 to 256, and luckily this seemed to work fine without buffer underruns, and so Doom began to play also digital audio. I still need to do some adjustments to the audio routines, and all ADPCM audio playing code is still commented out, but I was pretty happy to get the audio code already mostly working, after basically just one day of work.

Next I will continue enhancing the emulation and adding the still missing features, hopefully I can release the first version publicly within a couple of weeks. The most important still missing features (in no specific order) are the following:

May 28th, 2013 - High Resolution Timers

During the past week I continued working on zerox86, mainly by adding the various graphics modes. A week ago I only had text mode support, and now I have CGA, MCGA, some VGA Mode-X and the 320x200 EGA modes supported. However, when testing Commander Keen 4 I noticed that it intro scroll runs strangely slowly. I compared the speed to that of DS2x86, and yes indeed, it ran at about half speed on zerox86! I began to suspect my timer routines, and added some debug to them. I learned that the game sets the PC timer to run at 560Hz speed. Since I use a separate thread to handle timing, I would need that thread to get woken up at 560Hz speed, but it turns out that the kernel in GCW0 uses only 250Hz timing for the process scheduler. That means that no thread will get woken up faster than at 250Hz, and that caused the half-speed problem in Commander Keen 4.

Since Linux can have timers that run faster than the scheduler interval (the so called "high resolution timers" feature), I checked whether the current kernel has support for that (by giving command cat /proc/timer_list on the shell prompt). It reported that the clock resolution was 4000000 nanoseconds (= 4 ms = 250Hz), which meant that the kernel did not support high-resolution timers. With some help from the GCW0 kernel programmers, I downloaded the kernel sources, compiled them with the CONFIG_HIGH_RES_TIMERS option, and began experimenting with them.

Plain adding that config option did not help with the problem, the resolution was still only 250Hz. So, it looked like some feature of the hardware timer prevented the kernel from actually switching to the high resolution mode. I added some debug messages to the timer initialization routines into my kernel, rebooted, and then looked at the dmesg output to see where the problem might be. After several attempts I managed to get the debug output into the correct routine, so that I could see that the problem was caused by the hardware clock not having CLOCK_SOURCE_IS_CONTINUOUS flag set. I added that flag, but it still failed to go into high resolution mode, this time because the clock event did not have CLOCK_EVT_FEAT_ONESHOT flag set. Adding this flag meant that I also needed add the actual support for one-shot behaviour. This finally did the trick, the timer_list reported that the clock has 1 nanosecond resolution! Well, it does not actually run that fast, it is just a way for the kernel to keep track of high-resolution timer support. Actually the timer in GCW0 runs at 750 kHz, which still should be fast enough.

Sure enough, with the high resolution timers active, Commander Keen 4 began to run at normal speed in zerox86! After these experiments I realized that it would actually be better to use the interval timers instead of a separate thread for the timer IRQ emulation, so I changed my code to use them. I also added my kernel version to be downloadable here, same as the changed linux/arch/mips/jz4770/time.c source module, and reported this to the actual GCW0 kernel developers, hoping that they would look into implementing this properly into the mainline kernel. I do not consider myself a Linux kernel developer, so making these changes was quite scary, and I am pretty proud that I actually managed to make it work without seemingly destroying anything. :-)

May 21st, 2013 - Norton SYSINFO runs!

After I got 4DOS.COM to start up in zerox86, the next step was to get Norton SYSINFO to run, so that I could see whether I am on track with the emulation speed. Even though that CPU speed test is not all that accurate, it does give me an idea about the relative speeds of all my emulator versions. Since all of them use the same architecture, their differences can easily be tested using that benchmark.

On top of the things that 4DOS needs, SYSINFO basically needs just a working timer IRQ handling, and obviously a way of starting it up. For the timer IRQ, I copied the timer code from rpix86, which uses a separate thread that uses clock_nanosleep() to sleep for the interval that the current PC timer runs at. The default timer speed (which also Norton SYSINFO uses) is 18.2Hz, or 55ms interval.

Being able to launch SYSINFO meant that I had to have some sort of keyboard handling, so I added a call to SDL_PollEvent() into my rendering thread, and mapped a couple of GCW0 buttons to PC keys, so that I could start it from the command prompt. I used the following key mapping:

	START	= Enter
	SELECT	= F7 (launch a directory/executable selection window)
	A	= Enter
	B	= ESC
	D-Pad	= Cursor Keys
	LOCK	= Exit zerox86
	L	= Scroll screen left
	R	= Scroll screen right
With such a minimal key configuration I was able to run SYSINFO, and go to the CPU benchmark page. The result was as follows (I photoshopped two screen copies side by side, as the GCW0 320x240 screen can only show 52 chars when using my 6x8 default text mode font):

So, the speed looked to be pretty good, about a 80486/40MHz level. The MHz value that SYSINFO shows is based on a division opcode speed, which obviously runs faster on a 1GHz processor than on the original 486 processor, so that value can be ignored.

May 12th, 2013 - All unit tests work!

Okay, I just got all the unit tests to pass without errors. I had to implement many rare string operations that I had skipped in the original DS2x86, because no game has used them. However, the new unit test program framework tests also these rare variations, so I neede to implement them to make the test program continue further. And even if no game that I have come across has used them, it doesn't mean that there would be no game that does.

Anyways, now it is time to start working on the actual zerox86. I need to first port all the DOS emulation routines, and then the text mode graphics blitting code, as my first goal is to get 4DOS to start up within zerox86. I am porting the DOS routines from the rpix86 version, which has some slightly different interface to the ASM code compared to the DS2x86 version, so I will need to make some changes as I go.

May 5th, 2013 - GCW Zero port started

Last week I got my new GCW Zero "engineering sample" (or a so called "frankenzero") game console. I began porting my DSTwo version DS2x86 to that device, and that has taken pretty much all of my free time for the past week. I plan to continue working on this during the following weeks as well, so my work on rpix86 is on a small break until I get something running on the GCW0 as well.

Working on GCW Zero is interesting because it uses a processor that has a MIPS32r2 architecture. This Release 2 version of the architecture brings a couple of very powerful and useful new opcodes to the MIPS ASM language, namely EXT and INS. I can use these new opcodes in quite a few places in my code to speed up the emulation, so I am very interested to see how fast zerox86 (current working name of the project) will run on the device.

I began my porting project by first porting my unit test program. I then began changing the code to use the new EXT and INS opcodes where possible. After a few changes I run the unit test program to make sure I don't break anything, and then continue making the changes. Since I use the unit test program from the special 386-enabled DSx386 version, I have also needed to fix some incompatibilities between the unit test program and my old DS2x86 sources. The current status is that opcodes 0x00..0x0E work fine, and I am working on the 386-specific 0x0F group opcodes, mainly fixing those incompatibilities.