$Id: TODO,v 1.536 2007/06/15 22:30:17 debug Exp $ Some things, in no specific order, that I'd like to fix: (Some items in this list are perhaps already fixed.) M88K: o) Neither NIP nor FIP valid in rte? o) FIP != NIP + 4, in rte! (Simulate delayed branch stuff.) o) cpu_dyntrans.c: MEMORY_USER_ACCESS implementation for M88K! o) xmem: Set transaction registers! o) CMMUs: o) Translation invalidations, could be optimized. o) Move initialization from dev_mvme187 to somewhere more reasonable? o) Instruction trace by using bits of ??IP control regs. o) Interrupts (these are machine dependent, though). o) Implement devices etc. for one or more machine modes, to get some guest OS running. OpenBSD/mvme88k on MVME187 seems to be the smartest path to follow for now. o) VME bus device o) PCC2 o) Cirrus Logic serial port controller o) Instruction disassembly, and implementation: o) See http://www.panggih.staff.ugm.ac.id/download/GCC/info/gcc.i5 for some strange cases of when "div" can fail (?) o) Floating point stuff o) "Graphics" instructions (M88110-specific) MIPS: o) Nicer MIPS status bits in register dumps. o) Floating point exception correctness. o) Fix this? Triggered by NetBSD/sgimips? Hm: to_be_translated(): TODO: unimplemented instruction: 000000000065102c: 00200800 (d) rot_00 at,zr,0 o) Some more work on opcodes. x) MIPS64 revision 2. o) Find out which actual CPUs implement the rev2 ISA! o) DINS, DINSM, DINSU etc o) DROTR32 and similar MIPS64 rev 2 instructions, which have a rotation bit which differs from previous ISAs. x) _MAYBE_ TX79 and R5900 actually differ in their opcodes? Check this carefully! o) Dyntrans: Count register updates are probably not 100% correct yet. o) Refactor code for performance and readability/maintainability. o) (Re)implement 128-bit loads/stores for R5900. o) Coprocessor 1x (i.e. 3) should cause cp1 exceptions, not 3? (See http://lists.gnu.org/archive/html/qemu-devel/2007-05/msg00005.html) o) R4000 and others: x) watchhi/watchlo exceptions, and other exception handling details o) MIPS 5K* have 42 physical address bits, not 40/44? o) R10000 and others: (R12000, R14000 ?) x) The code before the line /* reg[COP0_PAGEMASK] = cpu->cd.mips.coproc[0]->tlbs[0].mask & PAGEMASK_MASK; */ in cpu_mips.c is not correct for R10000 according to Lemote's Godson patches for GXemul. TODO: Go through all register definitions according to http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi/hdwr/bks/SGI_Developer/books/R10K_UM/sgi_html/t5.Ver.2.0.book_263.html#HEADING334 and make sure everything works with R10000. Then test with OpenBSD/sgi? x) Entry LO mask (as above). x) memory space, exceptions, ... x) use cop0 framemask for tlb lookups (http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi/hdwr/bks/SGI_Developer/books/R10K_UM/sgi_html/t5.Ver.2.0.book_284.html) SuperH: x) Auto-generation of loads/stores! This should get rid of at least the endianness check in each load/store. x) Experiment with whether or not correct ITLB emulation is actually needed. (20070522: I'm turning it off today.) x) SH4 interrupt controller: x) MASKING should be possible! x) SH4 DMA (0xffa00000) x) SH4 UBC (0xff200000) x) Store queues can copy 32 bytes at a time, there's no need to copy individual 32-bit words. (Performance improvement.) x) SH4 BSC (Bus State Controller) x) Instruction tracing should include symbols for branch targets, and so on, to make the output more human readable. x) SH3-specific devices: Pretty much everything! x) NetBSD/evbsh3, hpcsh! Linux? x) Floating point speed! x) Floating point exception correctness. x) NetBSD HEAD (as of April 2007) hangs during bootup, because it turns on/off interrupts in an unfortunately synchronized way with dyntrans. This needs to be fixed. x) Exceptions for unaligned load/stores. OpenBSD/landisk uses this mechanism for its reboot code (machine_reset). x) Think carefully about how to implement SH5/SH64 (for evbsh5). Landisk SH4: x) When NetBSD/landisk 4.0 has been released, make sure it works in the emulator. (Update documentation, etc.) Dreamcast: x) G2 DMA x) LAN adapter (dev_mb8696x.c). NetBSD root-on-nfs. x) PVR: Lots of stuff. See dev_pvr.c. x) Better GDROM support x) Modem x) PCI bridge/bus? x) Maple bus: x) Correct controller input x) Mouse input x) Software emulation of BIOS calls: x) GD-ROM emulation: Use the GDROM device. x) Use the VGA font as a fake ROM font. (Better than nothing.) x) Make as many as possible of the KOS examples run! x) More homebrew demos/games. x) SPU: Sound emulation (ARM cpu). x) VME processor emulation? "(Sanyo LC8670 "Potato")" according to Wikipedia, LC86K87 according to Comstedt's page. See http://www.maushammer.com/vmu.html for a good description of the differences between LC86104C and the one used in the VME. Alpha: x) OSF1 PALcode, Virtual memory support. x) PALcode replacement! PAL1E etc opcodes...? x) Interrupt/exception/trap handling. x) Floating point exception correctness. x) More work on bootup memory and register contents. x) More Alpha machine types, so it could work with OpenBSD, FreeBSD, and Linux too? SPARC (both the ISA and the machines): o) Implement Adress space identifiers; load/stores etc. o) Exception/trap/interrupt handling. o) Save/restore register windows etc! Both v9 and pre-v9! o) Finish the subcc and addcc flag computation code. o) Add more registers (floating point, control regs etc) o) Disassemly of some more instructions? o) Are sll etc 32-bit sign-extending or zero-extending? o) Floating point exception correctness. o) SPARC v8, v7 etc? o) More machine modes and devices. POWER/PowerPC: x) Fix DECR timer speed, so it matches the host. x) NetBSD/prep 3.x triggers a possible bug in the emulator: <0x26c550(&ata_xfer_pool,2,0,8,..)> <0x35c71c(0x3f27000,0,52,8,..)> <__wdccommand_start(0xd005e4c8,0x3f27000,0,13,..)> [ wdc: write to SDH: 0xb0 (sectorsize 2, lba=1, drive 1, head 0) ] <0x198120(0xd005e4c8,72,64,0xbb8,..)> Note: x) PPC optimizations; instr combs x) 64-bit stuff: either Linux on G5, or perhaps some hobbyist version of AIX? (if there exists such a thing) x) macppc: adb controller; keyboard (for framebuffer mode) x) make OpenBSD/macppc work (PCI controller stuff) x) Floating point exception correctness. x) Alignment exceptions. PReP: x) Clock time! ("Bad battery blah blah") Algor: o) Other models than the P5064? o) PCI interrupts... needed for stuff like the tlp NIC? BeBox: o) Interrupts. There seems to be a problem with WDC interrupts "after a short while", although a few interrupts get through? o) Perhaps find a copy of BeOS and try it? HPCmips: x) Mouse/pad support! :) x) A NIC? (As a PCMCIA device?) ARM: o) See netwinder_reset() in NetBSD; the current "an internal error occured" message after reboot/halt is too ugly. o) Generic ARM "wait"-like instruction? o) try to get netbsd/evbarm 3.x or 4.x running (iq80321) o) make the xscale counter registers (ccnt) work o) make the ata controller usable for FreeBSD! o) Debian/cats crashes because of unimplemented coproc stuff. fix this? Test machines: o) dev_fb block fill and copy o) dev_fb draw characters (from the built-in font)? o) dev_fb input device? mouse pointer coordinates and buttons (allow changes in these to cause interrupts as well?) o) Redefine the halt() function so that it stops "sometimes soon", i.e. usage in demo code should be: for (;;) { halt(); } Debugger: o) How does SMP debugging work? Does it simply use "threads"? What if the guest OS (running on an emulated SMP machine) has a usertask running, with userland threads? o) Try to make the debugger more modular and, if possible, reentrant! o) Remove the emul command? (But show network info if showing machines?) o) Memory dumps should be able to dump both physical and virtual emulated memory. o) Evaluate expressions within []? That would allow stuff like cpu[x] where x is an expression. o) "pc = pc + 4" doesn't work! Bug. Should work. ("pc=pc+4" works.) o) Settings: x) Special handlers for Write! +) MIPS coproc regs +) Alpha/MIPS/SPARC zero registers +) x86 64/32/16-bit registers x) Value formatter for resulting output. o) Call stack display (back-trace) of emulated programs. o) Nicer looking output of register dumps, floating point registers, etc. Warn about weird/invalid register contents. o) Ctrl-C doesn't enter the debugger on some OSes (HP-UX?)... Dyntrans: x) For 32-bit emulation modes, that have emulated TLBs: tlbindex arrays of mapped pages? Things to think about: x) Only 32-bit mode! (64-bit => too much code) x) One array for global pages, and one array _PER ASID_, for those archs that support that. On M88K, there should be one array for userspace, and one for supervisor, etc. x) Larger-than-4K-pages must fill several bits in the array. x) No TLB search will be necessary. x) Total host space used, for 4 KB pages: 1 MB per table, i.e. 65 MB for 32-bit MIPS, 2 MB for M88K, if one byte is used as the tlb index. x) (The index is actually +1, so that 0 means no hit.) x) "Merge" the cur_physpage and cur_ic_page variables/pointers to one? I.e. change cur_ic_page to cur_physpage.ic_page or something. x) Instruction combination collisions? How to avoid easily... x) Think about how to do both SHmedia and SHcompact in a reasonable way! (Or AMD64 long/protected/real, for that matter.) x) 68K emulation; think about how to do variable instruction lengths across page boundaries. x) Dyntrans with valgrind-inspired memory checker. (In memory_rw, it would be reasonably simple to add; in each individual fast load/store routine = a lot more work, and it would become kludgy very fast.) x) Dyntrans with SMP... lots of work to be done here. x) Dyntrans with cache emulation... lots of work here as well. x) Remove the concept of base RAM completely; it would be more generic to allow RAM devices to be used "anywhere". o) dev_mp doesn't work well with dyntrans yet o) In general, IPIs, CAS, LL/SC etc must be made to work with dyntrans x) Redesign/rethink the delay slot mechanism used for e.g. MIPS, so that it caches a translation (that is, an instruction word and the instr_call it was translated to the last time), so that it doesn't need to do slow to_be_translated for each end of page? x) Program Counter statistics: Per machine? What about SMP? All data to the same file? A debugger command should be possible to use to enable/ disable statistics gathering. Configuration file option! x) Breakpoints: o) Physical vs virtual addresses! o) 32-bit vs 64-bit sign extension for MIPS, and others? x) INVALIDATION should cause translations in _all_ cpus to be invalidated, e.g. on a write to a write-protected page (containing code) x) 16-bit encodings? (MIPS16, ARM Thumb, 32-bit SH on SH64) x) Lots of other stuff: see src/cpus/README_DYNTRANS x) Native code generation backends: o) calculate at runtime whether or not chunks of emulated (physical) memory are worth translating to native code (it is assumed that it has high overhead) o) experiment with calling the host's cc and ld externally; extremely high overhead, but could be interesting none- theless. o) experiment with using LLVM, or GNU Lightning? o) Important cases to think about: x) loads/stores x) delay branches x) other kinds of calls, branches o) branches to already translated code blocks can link the blocks together (block-chaining), although I'll probably want to wait with this until other things work. o) The first tests should be done with "testm88k", because that does not affect other modes. ------------------------------------------------------------------------------- Performance comparison when emulating the QEMU_MIPS machine (QEMU's default MIPS machine mode): mips-test-0.2: -------------- 1. while true; do ls -l > /dev/null; echo -n .; done, 80x36 dots 2. while true; do /usr/bin/md5sum /usr/bin/* > /dev/null; echo -n .; done, 80 dots 3. while true; do grep hej lib/libc.so.6 > /dev/null; echo -n .; done, 80 dots Test 1 Test 2 Test 3 ------ ------ ------ QEMU 0.9.0: 2 min 20 sec 45 sec 4 min 41 seconds GXemul-20070608: 1 min 59 sec 3 min 18 sec 18 min 10 seconds [A] [A] = Normal portable dyntrans, no native code generation. ------------------------------------------------------------------------------- Simple Valgrind-like checks? o) Mark every address with bits which tell whether or not the address has been written to. o) What should happen when programs are loaded? Text/data, bss (zero filled). But stack space and heap is uninitialized. o) Uninitialized local variables: A load from a place on the stack which has not previously been stored to => warning. Increasing the stack pointer using any available means should reset the memory to uninitialized. o) If calls to malloc() and free() can be intercepted: o) Access to a memory area after free() => warning. o) Memory returned by malloc() is marked as not-initialized. o) Non-passive, but good to have: Change the argument given to malloc, to return a slightly larger memory area, i.e. margin_before + size + margin_after, and return the pointer + margin_before. Any access to the margin_before or _after space results in warnings. (free() must be modified to free the actually allocated address.) Better CD Image file support: x) Support CD formats that contain more than 1 track, e.g. CDI files (?). These can then contain a mixture of e.g. sound and data tracks, and booting from an ISO filesystem path would boot from [by default] the first data track. (This would make sense for e.g. Dreamcast CD images, or possibly other live-CD formats.) Networking: x) Redesign of the networking subsystem, at least the NAT translation part. The current way of allowing raw ethernet frames to be transfered to/from the emulator via UDP should probably be extended to allow the frames to be transmitted other ways as well. x) Also adding support for connecting ttys (either to xterms, or to pipes/sockets etc, or even to PPP->NAT or SLIP->NAT :-). x) Documentation updates (!) are very important, making it easier to use the (already existing) network emulation features. x) Fix performance problems caused by only allowing a single TCP packet to be unacked. x) Don't hardcode offsets into packets! x) Test with lower than 100 max tcp/udp connections, to make sure that reuse works! x) Make OpenBSD work better as a guest OS! x) DHCP? Debian doesn't actually send DHCP packets, even though it claims to? So it is hard to test. x) Multiple networks per emulation, and let different NICs in machines connect to different networks. x) Support VDE (vde.sf.net)? Easiest/cleanest (before a redesign of the network framework has been done) is probably to connect it using the current (udp) solution. x) Allow SLIP connections, possibly PPP, in addition to ethernet? Cache simulation: o) Command line flags for: o) CPU endianness? o) Cache sizes? (multiple levels) o) Separate from the CPU concept, so that multi-core CPUs sharing e.g. a L2 cache can be simulated (?) o) Instruction cache emulation is easiest (if separate from the data cache); similar hack as the S;I; hack in cpu_dyntrans.c. NOTE: if the architecture has a delay slot, then an instruction slot can actually be executed as 2 instructions. o) Data cache emulation = harder; each arch's load/store routines must include support? running one instruction at a time and having a cpu-dependant lookup function for each instruction is another option (easier to implement, but very very slow). Documentation: x) Update the documentation regarding the testmachine interrupts. x) Note about sandboxing/security: Not all emulated instructions fail in the way they would do on real hardware (e.g. a userspace program writing to a system register might work in GXemul, but it would fail on real hardware). Sandbox = contain from the host OS. But the emulated programs will run "less securely". x) Try NetBSD/arc 4.x! (It seems to work with disk images!) x) NetBSD/pmax 4 install instructions: xterm instead of vt100! x) BETTER DEVICE EXAMPLES! o) Move away from technical.html to somewhere new. o) DEVICE_TICK o) Implement example devices using interrupts, dyntrans memory access, etc.? x) Document the dyntrans core? x) Rewrite the section about experimental devices, after the framebuffer acceleration has been implemented, and demos written. (Symbolic names instead of numbers; example use cases, etc. Mention demo files that use the various features?) x) "a very simple linear framebuffer device (for graphics output)" under "which machines does gxemul emulate" ==> better description? x) Better description on how to set up a cross compiler? Example for MIPS64. o) Automagic documentation generation? x) machines, cpus, devices. x) REMEMBER that several machines/devices can be in the same source file! o) Try to rewrite the install instructions for those machines that use 3MAX into using CATS or hpcmips? (To remove the need to use a raw ffs partition, using up all of the disk image.) The Device subsystem: x) allow devices to be moved and/or changed in size (down to a minimum size, etc, or up to a max size); if there is a collision, return false. It is up to the caller to handle this situation! x) NOTE: Translations must be invalidated, both for registering new devices, and for moving existing ones. cpu->invalidate translation caches, for all CPUs that are connected to a specific memory. x) keep track of interrupts and busses? actually, allowing any device to be a bus might be a nice idea. x) turn interrupt controllers into devices? :-) x) refactor various clocks/nvram/cmos into one device? PCI: x) Pretty much everything related to runtime configuration, device slots, interrupts, etc must be redesigned/cleaned up. The current code is very hardcoded and ugly. o) Allow cards to be added/removed during runtime more easily. o) Allow cards to be enabled/disabled (i/o ports, etc, like NetBSD needs for disk controller detection). o) Allow devices to be moved in memory during runtime. o) Interrupts per PCI slot, etc. (A-D). o) PCI interrupt controller logic... very hard to get right, because these differ a lot from one machine to the next. x) last write was ffffffff ==> fix this, it should be used together with a mask to get the correct bits. also, not ALL bits are size bits! (lowest 4 vs lowest 2?) x) add support for address fixups x) generalize the interrupt routing stuff (lines etc) Clocks and timers: x) Fix the PowerPC DECR interrupt speed! (MacPPC and PReP speed, etc.) x) DON'T HARDCODE 100 HZ IN cpu_mips_coproc.c! x) NetWinder timeofday is incorrect! Huh? grep -R for ta_rtc_read in NetBSD sources; it doesn't seem to be initialized _AT ALL_?! x) Cobalt TOD is incorrect! x) Go through all other machines, one by one, and fix them. Config file parser: o) Rewrite it from scratch! o) Usage of any expression available through the debugger o) Allow interrupt controllers to be added! and interrupts to be used in more ways than before o) Support for running debugger commands (like the -c command line option) Floating point layer: o) make it common enough to be used by _all_ emulation modes o) implement correct error/exception handling and rounding modes o) implement more helper functions (i.e. add, sub, mul...) o) non-IEEE modes (i.e. x86)? Userland emulation: x) Try to prefix "/emul/mips/" or similar to all filenames, and only if that fails, try the given filename. Read this setting from an environment variable, and only if there is none, fall back to hardcoded string. x) File descriptor (0,1,2) assumptions? Find and fix these? x) Dynamic linking! x) Lots of stuff; freebsd, netbsd, linux, ... syscalls. x) Initial register/stack contents (environment, command line args). x) Return value (from main). x) mmap emulation layer x) errno emulation layer x) struct conversions for many syscalls Sound: x) generic sound framework x) add one or more sound cards as devices; add a testmachine sound card first? x) Dreamcast sound? Generic PCI sound cards? ASC SCSI controller: x) NetBSD/arc 2.0 uses the ASC controller in a way which GXemul cannot yet handle. (NetBSD 1.6.2 works ok.) (Possibly a problem in NetBSD itself, http://mail-index.netbsd.org/source-changes/ 2005/11/06/0024.html suggests that.) NetBSD 4.x seems to work? :) Caches / memory hierarchies: (this is mostly MIPS-specific) o) src/memory*.c: Implement correct cache emulation for all CPU types. (currently only R2000/R3000 is implemented) (per CPU, multiple levels should be possible, associativity etc!) o) R2000/R3000 isn't _100%_ correct, just almost correct :) o) Move the -S (fill mem with random) functionality into the memory.c subsystem, not machine.c or wherever it is now o) ECC stuff, simulation of memory errors? (Machine dependent) o) More than 4GB of emulated RAM, when run on a 32-bit host? (using manual swap-out of blocks to disk, ugly) o) A global command line option should be used to turn cache emulation on or off. When off, caches should be faked like they are right now. When on, caches and memory latencies should be emulated as correctly as possible. File/disk/symbol handling: o) Make sure that disks can be added/removed during runtime! (Perhaps this needs a reasonably large re-write.) o) Remove some of the complexity in file format guessing, for Ultrix kernels that are actually disk images? o) Better handling of tape files o) Read function argument count and types from binaries? (ELF?) o) Better demangling of C++ names. Note: GNU's C++ differs from e.g. Microsoft's C++, so multiple schemes must be possible. See URL at top of src/symbol_demangle.c for more info. Userland ABI emulation: o) see src/useremul.c Better framebuffer and X-windows functionality: o) Generalize the update_x1y1x2y2 stuff to an extend-region() function... o) -Yx sometimes causes crashes. o) Simple device access to framebuffer_blockcopyfill() etc, and text output (using the built-in fonts), for dev_fb. o) CLEAN UP the ugly event code o) Mouse clicks can be "missed" in the current system; this is not good. They should be put on a stack of some kind. o) More 2D and 3D framebuffer acceleration. o) Non-resizable windows? Or choose scaledown depending on size (and center the image, with a black border). o) Different scaledown on different windows? o) Non-integral scale-up? (E.g. 640x480 -> 1024x768) o) Switch scaledown during runtime? (Ala CTRL-ALT-plus/minus) o) Bug reported by Elijah Rutschman on MacOS with weird keys (F5 = cursor down?). o) Keyboard and mouse events: x) Do this for more machines than just DECstation x) more X11 cursor keycodes x) Keys like CTRL, ALT, SHIFT do not get through by themselves (these are necessary for example to change the font of an xterm in X in the emulator) o) Generalize the framebuffer stuff by moving _ALL_ X11 specific code to src/x11.c! -------------------------------------------------------------------------------