Routine-finding algorithm (was Re: [z-machine] je string vs. je number)

Amir Karger amirkargerweb@yahoo.com
Wed, 1 Oct 2003 07:22:51 -0700 (PDT)


--- David Kinder <d.kinder@btinternet.com> wrote:
> > It seems like this could lead to major problems. What if I code up
> > je local1 3543, which doesn't begin a string? then txd will (?) 
> > try to decode that string and come up with a whole bunch of
garbage.
> 
> As I understand it, txd looks to see if the number is a valid
> dictionary
> entry or in its list of the strings it has already found, and only
> tries
> to treat the number as a string address if it matches one of the two
> cases: so in your above example, you'd just get the number 3543 in
> the
> disassembly.

Aha. Interestingly, in the rare case where an address is a valid
dictionary entry AND packed string address, it writes something like

    je local1 "to" or s003

Which would be even MORE confusing to my txd-output parser. I was able
to ignore this problem while doing simple opcodes, because my test
cases didn't create a dictionary, but while working on object/property
opcodes, I realized the problem would get worse. So I decided to just 
bite the bullet and decode the hex codes myself.

Admittedly, I'm doing this in stages. Right now, I'm using txd -d,
which gives me the hex bytes, but nicely chunked up command by command.
Once that's working nicely, I'll switch to using txd only to tell me
each routine's address, and then read the Z-file myself. Then the only
remaining detail will be working out an algorithm to figure out routine
locations.

Regarding the suggestion to change the txd code, I believe I mentioned
in an earlier post that this project is part of a project that'll be
used by Parrot (Perl 6's core) which will be released under a license
not compatible with txd's (nonexistent one). So I'm really trying to
not look at the txd code at all (let alone change it). As such, though,
I'll need an algorithm for finding routine locations. And of course if
someone (other than the original author) tells me the algorithm that
txd uses, I'm going to get sued by SCO someday.

One option would be to start at main() and follow all possible paths.
That is, in pseudocode:

start with one path at main. 
rtn_addr[0] = address of main.
while numpaths > 0 {
  foreach path in paths {
    read opcode at path's PC.
    mark this address as having been read
    if (opcode == ret, print_ret) end this path
    else if (opcode == branch instruction or call instruction) {
      don't end this path
      if (opcode == call instruction) {
        rtn_addr[++i] = call address
        newPC = byte after subroutine header bytes
      } else
        newPC = branch address
      }
      start a new path at newPC**
    }
    else PC++
  }
}

The starred lines is the kicker. minizork.pl, for example, does "call
sp -> sp" and the like a number of times. My algorithm won't find
those.

I could assume that my algorithm is lucky enough to find the first and
last address, and just make sure to read any gaps in between. For
Inform games, I'm in even better shape, since main is guaranteed to be
the first sub. Sadly, those Infocom authors were inconsiderate...

I guess in low memory, you could do:

PC = end of dictionary + 1
while PC < first sub I found {
  while (byte at PC > 15) { PC++} ! can't be first byte of subroutine
  ! (Could also test that PC is a valid packed address)
  ! (read N words for versions 1-4) 
  try to parse an opcode at PC.
  if opcode is believable, start a path here
  end path if you get to a malformed byte
}

You could do the same in high memory after the last routine you found.
Although any time you find a malformed byte, you could instead assume
it's a string & read through the next word w/ the high bit set.

Or, I know! When I translate the Z-code into another language, I embed
in the program an entire Z-code interpreter and any unparsed bytes.
Then if we ever do a call sp that leads to those bytes, we just call
the Z-code interpreter on those bytes!

Yuck!

-Amir

__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com