[z-machine] Replicating txd subroutine-finding functionality
Amir Karger
amirkargerweb@yahoo.com
Thu, 27 May 2004 21:13:52 -0700 (PDT)
Thanks for your informative reply.
--- "Matthew T. Russotto" <mrussotto@speakeasy.net> wrote:
>
> On May 20, 2004, at 11:18 PM, Amir Karger wrote:
>
> This is txd's approach -- it assumes if you jump ahead to an
> address, it's within the subroutine. The other thing it does is
> examines the low area (below main, above the globals) and the high
> area (between the highest located subroutine and the strings).
Hm. Do you really mean "above the globals"? Ah, you must be referring
to the Spec 5 Remark: "Note that it is permissible for a routine to be
in dynamic memory. Marnix Klooster suggests this might be used for
compiling code at run time!" Yuck! I think that if neither Infocom
games nor Inform do this, then at least for now I'll ignore that
possibility.
Can I
just start at the high memory mark, and get most subs I need? I'm
still worried about the Spec 1 Remark that "many Infocom games group
tables of static data just above the high memory mark, before routines
begin; some, such as 'Nord 'n' Bert...', interleave static data
between routines, so that static memory actually overlaps code; and a
few, such as 'Seastalker' release 15, even contain routines placed
below the high memory mark."
> I think txd's approach is theoretically sound given the constraints
> of the compilers -- namely, that all subroutines are contiguous,
All subroutines are contiguous? That doesn't seem to fit Nord n' Bert
mentioned above. In fact, when I txd it (Masterpieces CD), I get:
=====================
Resident data ends at 7af0, program starts at 7c34, file ends at 2992c
Starting analysis pass at address 7af0
End of analysis pass, low address = 7af0, high address = 24ebd
1569 bytes of data in code from f04f to f670
=====================
As my two-year-old would say, "Yucka-bucka!"
Oh wait. I just realized that by "contiguous", you probably mean that
code *within one subroutine* is contiguous, i.e., they didn't put data
inside a sub that gets jumped over.
> all subroutines end, no subroutines contain unreachable code at the
> end (this is violated in one case but it is not really important)
> and there are not any cross subroutine jumps.
Well, I used the method I described in my emails, with just two simple
additions:
1) Try to read a sub after finishing a sub -- either by finding a zero
byte where the next opcode belongs or after a return-ish opcode that
you're not jumping past. (By the way, can I confirm that "jump sp" is
illegal? Inform won't let me code it, but according to the spec it
*seems* legal, because jump (unlike other opcodes) takes its label as
a regular arg, not an extra "label only" arg.
2) Stop reading when you get to the lowest string address referenced
in any subs you read in the game.
I'm happy to report that this almost works! In fact, it finds every
sub in Advent.z5, and I suspect would do pretty well in most
Inform-compiled games.
Unfortunately, it fares rather less well with Infocom. I've only tried
it on minizork and zork1. On the former, it finds all the subs, but
finds one extra "broken" sub, because minizork NEVER calls print_paddr
with a constant address, only with variables! So my code doesn't know
when the strings start (and only v6/7 have strings_offset, right?).
zork1 has a similar problem, plus a couple subs that are lower than
the first sub called are apparently called only with computed calls,
too, so I don't find them either.
Still, I'm definitely making progress! I was shocked when I saw that
for Advent (and almost for zork) I got almost identical results to
txd. I guess the point is that even though the spec allows much
yuckier tricks, sane compilers wouldn't use them. And I'm not too
worried now about translating hand-coded Z-files. Maybe for version 2
:)
I was thinking maybe of looking at every packed address above the
dictionary, which seems to be the last thing in dynamic memory.
For any one that's a 0-15 (hence could be the start of a sub), I'll
try to read a sub starting there. If it overlaps with a known thing,
then I know it's an error.
The slightly good news in all this is that even if I try to read a
subroutine where there isn't one, either (a) it'll break and I will
know it's not real, or (b) by some miracle it'll read as a real sub.
In the latter case, it's STILL OK, because when I translate the
program to Perl, I'll have an extra sub that never gets called, so the
game will still work!
The only problem is if I *think* it's a valid sub, and it overlaps the
beginning of an actual valid sub. To at least slightly alleviate this
problem, I'll do this only after reading all of the more obvious subs.
I guess in theory I could translate EVERYTHING that starts with a 0-15
at a packed address, which will yield lots of overlapping subs, but
delete all the ones that have illegal opcodes & things.
Anyway, I'd appreciate thoughts on Infocom-ish abuses of non-high
memory, interleaved data and subroutines, etc.
-Amir
__________________________________
Do you Yahoo!?
Friends. Fun. Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/