[z-machine] Replicating txd subroutine-finding functionality

Amir Karger amirkargerweb@yahoo.com
Thu, 20 May 2004 10:33:28 -0700 (PDT)


--- Mark J Musante <olorin@world.std.com> wrote:
> 
> On Wed, 19 May 2004, Amir Karger wrote:
> 
> > 2. Will space between end of one sub and beginning of the next (at
> > a packed address) always be zeroes? This will also help determine
> > whether I've finished a sub.
> 
> What I'd do would be a 'depth-first' search.  You know where main
> is, where the code starts, so begin there.  Whenever you come across
> a call, push your current location on the stack, and follow that
> call.  Eventually you'll hit a routine that doesn't make any calls
> whatsoever, and you can pop your way back up the stack.

I thought of it that way. I decided to go breadth-first, because I
think I can get better coverage that way. For example:

123: foo
126: call 250
129: call 260
132: ret
...
250: add
255: sub
258: ret

If I read all the way through 129 first, rather than jumping to 250
immediately, then I know that there's a sub at 260. That means when I
get to 258, I can say with confidence that I'm done with the routine
and shouldn't even try looking at 260 as if it's part of this routine.

Again, the big missing piece of data is where subs end, so we have to
optimize to find that.

> One tricky bit would be jumps, both conditional and unconditional.
> There's nothing in the z-machine spec which prevents a user from
> jumping from one routine to another, so it's something you'll have
> to be aware of.  The inform assembler does its best to prevent this,
> but I believe there are ways around it.

Well, in response to a question about this last  year, Matthew
Russotto wrote "There are, as far as I know, no Infocom or Inform
games which jump across subroutine boundaries." I *could* write my
program so that translates into one huge Perl program with no
subroutines in it, and just jump from label to label. But I thought it
would be more fun to use Perl subroutines. If people try to jump to a
different sub, they'll get an error.

Once I make that choice, though, jumps become useful: if I jump ahead
20 bytes, then I can guarantee that the sub doesn't end for another 20
bytes, so even if I hit a ret,  I know I can keep going.

> I wouldn't count on the padding between routines being zeros.
> Instead I'd create a bitmap of all the bytes in the code section of
> the z-file and use that to determine what areas may have been
> missed.

Bitmap? You're just saying a list that says which bytes I haven't
looked at yet, right? This works, kind of. If I find a sub from
100-150 and another from 180-200, then there's probably a sub from
150-180. Again, because compilers will probably write out all the subs
in a block. Btw, this is why I'm hoping the space between the end of a
ret and the nex t packed address will be zero - because why would
anyone put anything else there?

The problem wht the coverage map is that some bytes in high memory are
strings. And while I can parse (and mark) a lot of them, some strings
will also be called by object properties or whatever - and again, in
theory, strings and routines could be totally intermingled in high
memory. That doesn't even get into the issues of low memory, which has
a bunch of data tables that can be parsed neither as strings nor as
opcodes. I'm mostly planning on throwing up my hands on low memory
routines, unless I'm lucky and they're called explicitly.

> Also bear in mind that some routines will only be called via object
> properties - these won't be found using the depth-first search
> outlined above.  Instead you'll have to infer these by the gaps in
> the bitmap.

This is certainly a possibility. It's one reason for my question about
action routines, by the way, because they're a significant minority of
the routines I'm missing. And making a bitmap doesn't guarantee I'll
find these.

> Finally, don't forget that the various 'call' opcodes are not the
> only ones that call routines.  There's also 'sread', 'aread',
> 'sound_effect', and 'read_char'.  

Ah, thanks for those. I've never used those capabilities, but I guess
they're there.

> Also, there's the indirect call, but there's not much you can do
> about that.

Sigh. I spent about five minutes thinking about "semi-executing" the
code, in case someone does "push sp 123; call sp". But I think it'll
be a LOT of work, and don't know how much it'll help. I mean, what if
I do call local1 right after entering a sub? So I'm hoping the
coverage map will fix or at least alleviate this. If the code will do
most of the work, then I could maybe add a "hints" file that would
find the extra subs. (I could then give that hints file away with the
code, and a person who didn't have txd could still play the game --
um, assuming they wanted to play it in Perl for some reason. Have I
mentioned that this whole project is completely useless?)

> This is an ambitious project.  Good luck.

Heh, thanks. I may give up on this, because I've got so much else to
do with it - v4, v5, fixing up the IO (currently stolen from
Games::Rezrov and hacked at)... and then doing what was supposed to be
the whole point of the project, which is getting Parrot to read Z
files.

-Amir




	
		
__________________________________
Do you Yahoo!?
Yahoo! Domains – Claim yours for only $14.70/year
http://smallbusiness.promotions.yahoo.com/offer