04.01.2012, 14:58
(
Last edited by kizla; 04/02/2012 at 10:00 AM.
)
Abstract Machine eXecutor (AMX)
The first issue is: Why an abstract machine at all? By compiling into the native machine language of the processor of your choice, the preformance will be so much better.
A few reasons are (I will put few):
* Cross-platform compatibility of the comiled binary code. PAWN scripts now run on a variety of processors and microcontrolloers (8- bit to 64-bit), with and without operating system of different architectures.
* It is easier to design a language where a data object (an array) can contain P-code which is later executed. Modern operating systems separate code ande data sections: you cannot write into a code section and you cannot execute data; this is, not without serious effort.
* It is far easier to keep a program running in an abstract machine inside its "sand-box". For example, an unbounded recursion in an abstract machine crashes the abstract machine itself, but not much else. If you run native machine code, the recursive routine may damage the system stack and crash the application. Although modern operating system support multi-threading, with a separate stack per therad, the default action for an overrun of any stack is still to shut down the entire application.
Stack machines are surely compact, flexible and simple to implement, but they are also more difficult to optimize for speed. To see why, let's analyze a specific example.
Native code
In 32-bit assembler, this would be:
Register-based abstact machine
Microprocessors have used registers since their theoretical inception by Von Neumann. Extending this architecture to an abstact machine is only natural. There are two advantages: the abstract machine instucrions map better to the native insructions (you may actually use the processor's registers to implement the abstract machine's registers) and the number of virtual instructions that is needed to executed a simple expression can be reduced.
As an example, here is the code for the PAWN "AMX", a two-register abstract machine (AMX stands for "Abstract Machine eXecutor");
Register layout
Here is the list with the names and description of all registers:
Notably missing from the register set is a "flags" register. The abstract machine keeps no seperate set of flags; instead all conditional branches are taken depending on the contents of the PRI register.
Memory image
The heap and the stack share a memory block. The stack grows downwards from STP towards zero; the heap grows upwards. An exception occurs when the STK and the HEA registers collide. (An exception means that the abstract machine aborts with an error message. There is currently no exception trapping mechanism.)
Figure 1 is a proposed memory image layout, and one that the standard Abstract Machine assumes for a self-contained AMX "job". Alternative layouts are possible. For instance, when you "clone" an AMX job, the new job will share the Prefix and the Code sections with the original job, and have the Data/Heap/Stack sections in a different memory block. Specifically, an implementation may choose to keep the heap and the stack in a separate memory block next to the memory block for the code, the data and the prefix. The top of the figure represents the lowest for the code, the data and the prefix. The top of the figure represents the lowest address in memory.
The binary file (on disk) consists of the "prefix", and the code and data sections. The heap and stack sections are not stored in the binary file, the abstract machine can build them from information in the "prefix" section. The prefix also contains start-up information, and the definitions of native and public functions.
Symbolic (debug) information may follow the code and data sections in the file. This symbolic infromation is typically not read into memory (at least not by the abstract machine).
Evry instruction consists of an opcode followed by zero or one parameters. Each opcode is one byte in size; an instruction parametar has the size of a cell (usually four bytes). A few "debugging" instructions (at the end of the list) from an exception to these rules: the have two or more parametars and those parameters are not always cell sized.
Many instructions have implied registers as operands. This reduces the number of operands that are needed to decode an instruction and, hence, it reduces the time needed to decode an instruction. In serveral cases, the implied register is part of the name of the opcode. For exa,ple, PUSH.pri is the name of the opcode that stores the PRI register on the stack. This instruction has no parameters: its parameter (PRI) is implied in the opcode name.
The instruction reference is ordered by opcode. The description of two opcodes is sometimes combined in one row in the table, because the opcodes differ only in a source or a destination register. In these cases, the opcodes and the variants of the registers are separated by a "/".
The "semantics" column gives a brief description of what the opcode does. It uses the C language syntax for operators, which are the same as those of the PAWN languag. An item between square brackets indicates a memory access (relative to the DAT register, except for jump and call instructions). So, PRI = [address] means that the value read from memory at location DAT + address is stored in PRI.
Native Call opcodes
There are two opcodes that are not generated by the PAWN compiler: SYSREQ.D and SYSREQ.ND. There opcodes are direct call variants of SYSREQ and SYSREQ.N respectively. These opcodes are generated by the abstract machine itself, by runtime patching. When the script calls a native function, the PAWN compiler generates a SYSREQ opcode for the core instruction set and a SYSREQ.N opcode when macro instructions are enabled. Both these opcodes cause a jump out of the abstract machine to a routine with amx_SetCallback, but there also is a default routine -called amx_Callback. The callback/dispatcher function must look up the native function from the parameter of the originating SYSREQ. * opcode and then call that native function with the function parameters forwarded. There is a double call in this chain: the SYSREQ.* opcode causes a call to the callback* function, which then calls the requested native function.
The SYSREQ.D and SYSREQ.ND opcodes remove one call, and thereby improve the performance of the native call link. After the callback function has looked up the address of the native function, it patches this address right into the code stream of the compiled script, and it changes the SYSREQ.N opcode to SYSREQ.ND -or SYSREQ opcode to SYSREQ.D for older systems. The next time this native function is called, there is a new opcode, which calls to the address of the native function directly, bypassing the callback.
This "trick" only works if you use the default callback, or if you implement a similar patching functionality in your custom callback. It also requires that the P-code stream is writeable. If you store the code section of the compiled script in (Flash) ROM, the callback function will be unable to patch the opcodes.
Ok, now when you read it you now some basic, now you can read this one:
#emit
Special thnaks
Slice
pawn_language
Y_Less
The first issue is: Why an abstract machine at all? By compiling into the native machine language of the processor of your choice, the preformance will be so much better.
A few reasons are (I will put few):
* Cross-platform compatibility of the comiled binary code. PAWN scripts now run on a variety of processors and microcontrolloers (8- bit to 64-bit), with and without operating system of different architectures.
* It is easier to design a language where a data object (an array) can contain P-code which is later executed. Modern operating systems separate code ande data sections: you cannot write into a code section and you cannot execute data; this is, not without serious effort.
* It is far easier to keep a program running in an abstract machine inside its "sand-box". For example, an unbounded recursion in an abstract machine crashes the abstract machine itself, but not much else. If you run native machine code, the recursive routine may damage the system stack and crash the application. Although modern operating system support multi-threading, with a separate stack per therad, the default action for an overrun of any stack is still to shut down the entire application.
Stack machines are surely compact, flexible and simple to implement, but they are also more difficult to optimize for speed. To see why, let's analyze a specific example.
pawn Code:
a = b + 2; //where "a" and "b" are simple variables
In 32-bit assembler, this would be:
pawn Code:
mov eax, [b]
add eax, 2
mov [a], eax
Microprocessors have used registers since their theoretical inception by Von Neumann. Extending this architecture to an abstact machine is only natural. There are two advantages: the abstract machine instucrions map better to the native insructions (you may actually use the processor's registers to implement the abstract machine's registers) and the number of virtual instructions that is needed to executed a simple expression can be reduced.
As an example, here is the code for the PAWN "AMX", a two-register abstract machine (AMX stands for "Abstract Machine eXecutor");
pawn Code:
load.pri b ; "pri" is the primary register, i.e. the accumulator
const.alt 2 ; "alt" is the alternate register
add ; pri = pri + alt
stor.pri a ; store "pri" in variable "a"
Here is the list with the names and description of all registers:
pawn Code:
PRI primary register (ALU, general purpose).
ALT alternate register(general purpose)
FRM stack frame pointer, stack-relative memory reads and writes are relative to the address in this register
CIP code instruction pointer.
DAT offset to the start of the data.
COD offset to the start of the code.
STP stack top.
STK stack index, indicates the current position in the stack. The stack runs downwards from the STP register towards zero.
HEA heap pointer. Dynamically allocated memory comes from the heap and the HEA register indicates the top of the heap.
Memory image
The heap and the stack share a memory block. The stack grows downwards from STP towards zero; the heap grows upwards. An exception occurs when the STK and the HEA registers collide. (An exception means that the abstract machine aborts with an error message. There is currently no exception trapping mechanism.)
Figure 1 is a proposed memory image layout, and one that the standard Abstract Machine assumes for a self-contained AMX "job". Alternative layouts are possible. For instance, when you "clone" an AMX job, the new job will share the Prefix and the Code sections with the original job, and have the Data/Heap/Stack sections in a different memory block. Specifically, an implementation may choose to keep the heap and the stack in a separate memory block next to the memory block for the code, the data and the prefix. The top of the figure represents the lowest for the code, the data and the prefix. The top of the figure represents the lowest address in memory.
The binary file (on disk) consists of the "prefix", and the code and data sections. The heap and stack sections are not stored in the binary file, the abstract machine can build them from information in the "prefix" section. The prefix also contains start-up information, and the definitions of native and public functions.
Symbolic (debug) information may follow the code and data sections in the file. This symbolic infromation is typically not read into memory (at least not by the abstract machine).
pawn Code:
|--------------------------------------------|
| Prefix |
|--------------------------------------------|
| Code |
|--------------------------------------------|
| Data |
|--------------------------------------------|
| Heap |
|--------------------------------------------|
| |
|--------------------------------------------|
| Stack |
|--------------------------------------------|
Many instructions have implied registers as operands. This reduces the number of operands that are needed to decode an instruction and, hence, it reduces the time needed to decode an instruction. In serveral cases, the implied register is part of the name of the opcode. For exa,ple, PUSH.pri is the name of the opcode that stores the PRI register on the stack. This instruction has no parameters: its parameter (PRI) is implied in the opcode name.
The instruction reference is ordered by opcode. The description of two opcodes is sometimes combined in one row in the table, because the opcodes differ only in a source or a destination register. In these cases, the opcodes and the variants of the registers are separated by a "/".
The "semantics" column gives a brief description of what the opcode does. It uses the C language syntax for operators, which are the same as those of the PAWN languag. An item between square brackets indicates a memory access (relative to the DAT register, except for jump and call instructions). So, PRI = [address] means that the value read from memory at location DAT + address is stored in PRI.
# | mnemonic | operand | semantics |
0 | NOP | for code alignment | |
1/2 | LOAD.pri/alt | address | PRI/ALT = [address] |
3/4 | LOAD.S pri/alt | offset | PRI/ALT = [FRM + offset] |
5/6 | LREF.S pri/alt | offset | PRI/ALT = [[FRM + offset]] |
7 | LOAD.I | PRI = [PRI] (full cell) | |
8 | LOADB.I | number | PRI = "number" bytes from [PRI] (read 1/2/4 bytes) |
9/10 | CONST.pri/alt | value | PRI/ALT = value |
11/12 | ADDR.pri/alt | offset | PRI/ALT = FRM + offset |
13 | STOR | address | [address] = PRI |
14 | STOR.S | offset | [FRM + offset] = PRI |
15 | SREF.S | offset | [[FRM + offset]] = PRI |
16 | STOR.I | [ALT] = PRI(full cell) | |
17 | StrB.I | number | "number" bytes as [ALT] = PRI (write 1/2/4 bytes) |
22/23 | PUSH.pri/alt | [STK] = PRI/ALT , STK = STK - cell size | |
24 | PUSHR.pri | [STK] = PRI + DAT, STK = STK - cell size | |
25 | POP.pri/alt | STK = STK + cell size, PRI/ALT = [STK] | |
28 | STACK | value | ALT = STK, STK = STK + value |
29 | HEAP | value | ALT = HEA, HEA = HEA + value |
34 | JUMP | offset | CIP = CIP + offset (jump to the address relative from the current position) |
38 | SHR | PRI = PRI >> ALT (without sign extension) | |
42 | SMUL | PRI = ALT * PRI (signed multiply) | |
43 | SDIV | PRI = ALT / PRI (signed divide), ALT = ALT mod PRI) | |
44 | ADD | PRI = ALT + PRI | |
45 | SUB | PRI = ALT - PRI | |
46 | AND | PRI = ALT & PRI | |
47 | OR | PRI = ALT | PRI | |
49 | NOT | PRI = !PRI | |
50 | NEG | PRI = -PRI | |
51 | INVERT | PRI = ~PRI | |
66 | FILL | number | Fill memory at [ALT] with value in [PRI]. |
There are two opcodes that are not generated by the PAWN compiler: SYSREQ.D and SYSREQ.ND. There opcodes are direct call variants of SYSREQ and SYSREQ.N respectively. These opcodes are generated by the abstract machine itself, by runtime patching. When the script calls a native function, the PAWN compiler generates a SYSREQ opcode for the core instruction set and a SYSREQ.N opcode when macro instructions are enabled. Both these opcodes cause a jump out of the abstract machine to a routine with amx_SetCallback, but there also is a default routine -called amx_Callback. The callback/dispatcher function must look up the native function from the parameter of the originating SYSREQ. * opcode and then call that native function with the function parameters forwarded. There is a double call in this chain: the SYSREQ.* opcode causes a call to the callback* function, which then calls the requested native function.
The SYSREQ.D and SYSREQ.ND opcodes remove one call, and thereby improve the performance of the native call link. After the callback function has looked up the address of the native function, it patches this address right into the code stream of the compiled script, and it changes the SYSREQ.N opcode to SYSREQ.ND -or SYSREQ opcode to SYSREQ.D for older systems. The next time this native function is called, there is a new opcode, which calls to the address of the native function directly, bypassing the callback.
This "trick" only works if you use the default callback, or if you implement a similar patching functionality in your custom callback. It also requires that the P-code stream is writeable. If you store the code section of the compiled script in (Flash) ROM, the callback function will be unable to patch the opcodes.
Ok, now when you read it you now some basic, now you can read this one:
#emit
Special thnaks
Slice
pawn_language
Y_Less