[Tutorial] PAWN Pre-Processor - Case study - Part 6/7
#1

Contents

Part 1 - Covers an introduction to the pre-processor and covers some important things when writing function-like macros.
Part 2 - Explains exactly what the compiler searches for and looks at some common macro uses.
Part 3 - Describes the other directives available (beside "#define") and looks at definitions with no replacement value.
Part 4 - How to use strings with the pre-processor.
Part 5 - Alternatives to the pre-processor, multiple symbols and recursion.
Part 6 - Case Study (y_remote).
Part 7 - Macro issues and spaces.

Additional

Utilising tagof - By g_aSlice.
Future-proof string concatenation
String bug
Macros that do multiple things
Advanced "tag" macros

Case Study

For this tutorial, I am going to walk you through the development of one of the YSI libraries, designed to (IMHO) simplify the use of "CallRemoteFunction" and "CallLocalFunction" by removing the need for format specifiers ("si", "iii" etc) and adding compile-time parameter checking. How many times have you seen people complain that this isn't working:

pawn Code:
public OnPlayerConnect(playerid)
{
    CallRemoteFunction("Anything", "");
}

public Anything(playerid)
{
    printf("%d", playerid);
}
The problem is that as far as the compiler is concerned, the parameters to "CallRemoteFunction" are correct, and because "Anything" is a string, there's no way for it to associate the call with the function. Or now about these mistakes:

pawn Code:
public OnPlayerConnect(playerid)
{
    CallRemoteFunction("NEthing", "");
}

public Anything(playerid)
{
    printf("%d", playerid);
}
pawn Code:
public OnPlayerConnect(playerid)
{
    CallRemoteFunction("Anything", "s", playerid);
}

public Anything(Float:playerid)
{
    printf("%f", playerid);
}
The first one spells the function name wrong, the second gets the types very very wrong. Again, no warnings or errors from the compiler producing hard to track bugs. This is how y_remote does "CallRemoteFunction":

pawn Code:
public OnPlayerConnect(playerid)
{
    broadcast Anything(playerid);
}

remote Anything(playerid)
{
    printf("%d", playerid);
}
Spell the function wrong - error!
Miss a parameter - warning!
Get the tag wrong - warning!
Pass a string instead of a variable - error!
Get the format specifier wrong - impossible!

Target

The underlying implementation looks something like this once compiled:

pawn Code:
public OnPlayerConnect(playerid)
{
    Anything_Call(playerid);
}

stock Anything_Call(playerid)
{
    CallRemoteFunction("Anything", "i", playerid);
}

forward Anything(playerid);

public Anything(playerid)
{
    printf("%d", playerid);
}
Because "Anything_Call" has the same parameters as "Anything", and that is what you really call, that is where the compile-time errors come from if you do anything wrong (including typos). Additionally, the "CallRemoteFunction" line is automatically generated so shouldn't ever be wrong.

As for the call site, if you do:

pawn Code:
public OnPlayerConnect(playerid)
{
    Anything(playerid);
}
This looks like a normal function call, and so should be treated as a normal function call. Calling the same function in many scripts at once is not the same thing at all and may be unexpected, this is why we use "broadcast". We could of course NOT use "broadcast" and instead generate code that looks something like this:

pawn Code:
public OnPlayerConnect(playerid)
{
    Anything(playerid);
}

stock Anything(playerid)
{
    CallRemoteFunction("Anything_Impl", "i", playerid);
}

forward Anything_Impl(playerid);

public Anything_Impl(playerid)
{
    printf("%d", playerid);
}
This is in fact what "y_master" does - uses normal looking function calls and redirects them based on run-time or compile-time information to either the current script or another script, allowing you to transparently write multi-script systems such as Anti-Cheat APIs (safe "GivePlayerMoney" for example) without ever worrying about modifying existing scripts.

Basics

So how is this done? The basic code is simple, it is only when we come to parameters that we have an issue:

pawn Code:
#define remote%0(%1) stock %0_Call(%1){CallRemoteFunction(#%0, /* ??? */);}forward %0(%1);public %0(%1)
#define broadcast%0(%1) %0_Call(%1)
How are the parameters done? How does the compiler know to use "a" for arrays and "s" for strings? And how does it know to not include the "[]" after arrays in "CallRemoteFunction"? Doing this is wrong:

pawn Code:
stock Anything(str[])
{
    CallRemoteFunction("Anything_Impl", "s", str[]);
}
It should be:

pawn Code:
stock Anything(str[])
{
    CallRemoteFunction("Anything_Impl", "s", str);
}
Or we get an error. This is where "tag macros" come in. I've touched on them briefly before but I'll cover them more now.

pawn Code:
#define TYPE(%0) _:TYPE_0:TYPE_1(%0)
#define TYPE_0:TYPE_1(%0[]) "array"
#define TYPE_1(%0) "varaible"
In the code above is that there are three macros - one called "TYPE", one called "TYPE_0" and one called "TYPE_1". The first macro is NOT called "TYPE_0:TYPE_1" as the name ends at the first symbol character (except "@"). The macro "TYPE_0" only matches if it is followed by the text "TYPE_1" (which it is) and if whatever comes between the brackets includes square brackets. The contents of "%0" will be everything typed BEFORE the square brackets and will end at the square brackets. This also makes the order important, this will not work and will identify everything as "variable", given what has been discussed in previous posts, I leave working out WHY to the reader:

pawn Code:
#define TYPE(%0) _:TYPE_0:TYPE_1(%0)
#define TYPE_0:TYPE_1(%0) "variable"
#define TYPE_1(%0[]) "array"
Now given the working version of the code, we can do:

pawn Code:
printf(TYPE(arr[]));
printf(TYPE(var));
And this will compile as:

pawn Code:
printf(_:"array");
printf(_:TYPE_0:"variable");
"_:" and "TYPE_0" in the final code are tag overrides, same as "Float". The "_:" is always required in these cases as it converts the types from whatever else they are back to normal variables; the "TYPE_0:" will be parsed by the compiler, then ignored because the tag is changed again straight away (the technical term is that tags are right-associative and are the same as doing this):

pawn Code:
printf(_:"array");
printf(_:(TYPE_0:"variable"));
The string "variable" is first converted to a string with tag "TYPE_0", then converted to a string with tag "_:", which is what "printf" takes ("_:" means no tag).

So that is how we can determine the type of a variable, now instead of printing the type we can do:

pawn Code:
#define TYPE(%0) _:TYPE_0:TYPE_1(%0)
#define TYPE_0:TYPE_1(%0[%2]%3) "a", %0
#define TYPE_1(%0) "i", %0
Using the same code above again we get:

pawn Code:
printf(_:"a", arr);
printf(_:TYPE_0:"i", var);
Giving us both the format specifier ("a" for array or "i" for integer) and the variable itself with no additional syntax ("arr" instead of "arr[]" for the array variable, "var" as normal). The "TYPE_0" macro has been expanded slightly to "(%0[%2]%3)"; the original version would match "(arr[])", but not "(arr [ ] )" or "(arr [])", this new version will match the additional whitespace and discard it. Although we are using "printf" for now, we will move on from this as you will notice that none of the variables are actually printed - this is just a demonstration.

Recursion

We have already looked at recursive macros in an earlier post, so this will be brief. Given:

pawn Code:
printf(TYPES(arr[], var);
We want:

pawn Code:
printf("ai", arr, var);
And vastly harder, given:

pawn Code:
printf(TYPES());
We want:

pawn Code:
printf("");
Again recall that macros are not evaluated twice once they have failed to match. If we do:

pawn Code:
#define A:B(%0) %0
#define C B

A:C("hi")
The output will NOT be:

pawn Code:
"hi"
Because "C" is converted to "B" AFTER "A" has been evaluated and the pre-processor will not return to those older characters which previously failed to match. This macro, even for basic detection, is hard. These are the requirements:
  • Detect 0 parameters and not add extra commas.
  • Detect the CURRENT parameter uniquely.
  • Detect the end of the parameter list.
  • Add all parameter types to the type string.
  • Strings may NOT have tags in the middle - only at the start.
These are the solutions I use:
  • Add an explicit tag for this case, detecting commas with nothing in between.
  • The current parameter has an unknown number of commas before it, so we need some unique symbol. I use "|||" as it isn't used anywhere else in the language.
  • If the current parameter has "|||" either side, then the end will look like "||||||" (no parameter between the two sets of three).
  • As in the example above.
  • This just means we need to pile all the tags up in the same place (and pile up they will)...
From this list the code is a bit of a leap, but it is also fairly standard now - it took a while to perfect. I have also switched to "TYPES:%0)" - that's not a typo as you'll see later:

pawn Code:
// There are TWO extra commas here.
#define TYPES:%0) _:TYPE_N:TYPE_M:##%0,,)

// Detect no parameters - nothing between the commas.
#define TYPE_N:TYPE_M:##,,) "")

// Detect one or more parameters - "%3" may CONTAIN commas (see earlier posts).
// Also select the CURRENT parameter.
#define TYPE_M:##%2,%3) TYPE_0:TYPE_1:TYPE_E:##|||%2|||%3)

// This macro detects an array surrounded by "|||"s as detailed above.  It also
// extracts all the text between the "#"s ("%0") and any previously parsed
// parameters ("%1").  The output is the same string with an additional "a",
// followed by the existing variables, then "%2" without the square brackets.
// Finally we select the next single parameter.
#define TYPE_0:TYPE_1:TYPE_E:#%0#%1|||%2[]|||%3,%4) TYPE_0:TYPE_1:TYPE_E:#%0a#%1,%2|||%3|||%4)

// This code detects normal variables.  As you can see, the output is now fairly
// standard.
#define TYPE_1:TYPE_E:#%0#%1|||%2|||%3,%4) TYPE_0:TYPE_1:TYPE_E:#%0i#%1,%2|||%3|||%4)

// Finally, this code detects the end of the parameters.
#define TYPE_E:#%0#%1||||||%3) #%0#%1)
Now analysing the code above may take a few reads through, but something may stand out - surely detecting "||||||" should come BEFORE detecting "|||%2|||"? "%2" can legally match an empty string, so will be "" at the end and we will get the wrong macro? This is not the case due to the two extra commas initially added, as best shown by a step-by-step expansion:

0 parameters:

pawn Code:
TYPES:)
// Apply "TYPES".
_:TYPE_N:TYPE_M:##,,)
// Apply "TYPE_N".
_:"")
// Done.
1 parameter:

pawn Code:
TYPES:arr[])
// Apply "TYPES".
_:TYPE_N:TYPE_M:##arr[],,)
// Fail "TYPE_N".
// Apply "TYPE_M" (notice that "%3" is only ONE comma - the other was matched).
_:TYPE_N:TYPE_0:TYPE_1:TYPE_E:##|||arr[]|||,)
// Apply "TYPE_0" (one comma consumed, zero remain).
_:TYPE_N:TYPE_0:TYPE_1:TYPE_E:#a#,arr||||||)
// Fail "TYPE_0".
// Fail "TYPE_1" (no required comma to match).
// Apply "TYPE_E".
_:TYPE_N:TYPE_0:TYPE_1:#a#,arr)
// Done.
3 parameters:

pawn Code:
TYPES:arr[], var, other)
// Apply "TYPES".
_:TYPE_N:TYPE_M:##arr[], var, other,,)
// Fail "TYPE_N".
// Apply "TYPE_M".
_:TYPE_N:TYPE_0:TYPE_1:TYPE_E:##|||arr[]||| var, other,,)
// Apply "TYPE_0".
_:TYPE_N:TYPE_0:TYPE_1:TYPE_E:#a#,arr||| var||| other,,)
// Fail "TYPE_0".
// Apply "TYPE_1".
_:TYPE_N:TYPE_0:TYPE_0:TYPE_1:TYPE_E:#ai#,arr, var||| other|||,)
// Fail "TYPE_0".
// Apply "TYPE_1".
_:TYPE_N:TYPE_0:TYPE_0:TYPE_0:TYPE_1:TYPE_E:#aii#,arr, var, other||||||)
// Fail "TYPE_0".
// Fail "TYPE_1".
_:TYPE_N:TYPE_0:TYPE_0:TYPE_0:TYPE_1:#aii#,arr, var, other)
// Done.
I said tags would pile up - every time a match fails a tag gets left behind. For this reason code inside YSI uses tag macros with names like "@Ya:" to keep the number of letters as small as possible to avoid line-length limits as much as possible.

More Parameters

So far this code supports arrays and variables, but we want strings and floats too. To the compiler strings and arrays look the same, so we need to tell them apart. Additionaly, arrays MUST be followed by their length in "CallRemoteFunction", strings don't need to be. The convention in YSI to solve this is to use "string:" before all strings:

pawn Code:
// We don't ACTUALLY want parameters to have this string, it is just used to
// detect strings not arrays.  Remove it in all other cases.
#define string:

#define TYPES:%0) _:TYPE_N:TYPE_M:##%0,,)
#define TYPE_N:TYPE_M:##,,) "")

// Now we have to detect more types.
#define TYPE_M:##%2,%3) TYPE_0:TYPE_1:TYPE_E:##|||%2|||%3)

// What type of array is this?
#define TYPE_0:TYPE_1:TYPE_E:#%0#%1|||%2[]|||%3,%4) TYPE_0a:TYPE_0b:#%0#%1|||%2|||%3,%4)

// String.
#define TYPE_0a:TYPE_0b:#%0#%1|||string:%2|||%3,%4) TYPE_0:TYPE_1:TYPE_E:#%0s#%1,%2|||%3|||%4)

// Other.  Note parameter "%5" to ENSURE that there is AT LEAST one more
// parameter after the array to contain the length.  Will give an error if it is
// not a variable because we do not strip the "[]"s.  We add TWO parameters here
// instead of 1 everywhere else.
#define TYPE_0b:#%0#%1|||%2|||%3,%4,%5) TYPE_0:TYPE_1:TYPE_E:#%0ai#%1,%2,%3|||%4|||%5)

// What type of variable is this:
#define TYPE_1:TYPE_E:#%0#%1|||%2|||%3,%4) TYPE_1a:TYPE_1b:#%0#%1|||%2|||%3,%4)

// Tagged (has ":"), we can still use "i" - trust me.
#define TYPE_1a:TYPE_1b:#%0#%1|||%6:%2|||%3,%4) TYPE_0:TYPE_1:TYPE_E:#%0i#%1,_:%2|||%3|||%4)

// Normal.
#define TYPE_1b:#%0#%1|||%2|||%3,%4) TYPE_0:TYPE_1:TYPE_E:#%0i#%1,%2|||%3|||%4)

// End.
#define TYPE_E:#%0#%1||||||%3) #%0#%1)
Now to use this macro we do (you see here why we changed the "TYPES:%0)" macro from "()" to "":

pawn Code:
#define remote%0(%1) stock %0_Call(%1){CallRemoteFunction(#%0, TYPES:%1);}forward %0(%1);public %0(%1)
Now there is just one remaining snag - spacing! For one thing, the macro MUST be called as "remote Func(params)". This means that "%0" ALWAYS has a space at the start and MUST NOT have a space at the end. For another thing, all the other parameters MAY have spaces before and after that need ignoring. With this the final version is:

pawn Code:
#define string:

#define TYPES:%0) _:TYPE_N:TYPE_M:##%0,,)
#define TYPE_N:TYPE_M:##,,) "")

#define TYPE_M:##%2,%3) TYPE_0:TYPE_1:TYPE_E:##|||%2|||%3)

// [].
#define TYPE_0:TYPE_1:TYPE_E:#%0#%1|||%2[%8]%9|||%3,%4) TYPE_0a:TYPE_0b:#%0#%1|||%2|||%3,%4)

// String.
#define TYPE_0a:TYPE_0b:#%0#%1|||%9string:%2|||%3,%4) TYPE_0:TYPE_1:TYPE_E:#%0s#%1,%2|||%3|||%4)

// Other.
#define TYPE_0b:#%0#%1|||%2|||%3,%4,%5) TYPE_0:TYPE_1:TYPE_E:#%0ai#%1,%2,%3|||%4|||%5)

// Variable.
#define TYPE_1:TYPE_E:#%0#%1|||%2|||%3,%4) TYPE_1a:TYPE_1b:#%0#%1|||%2|||%3,%4)

// Tag.
#define TYPE_1a:TYPE_1b:#%0#%1|||%6:%2|||%3,%4) TYPE_0:TYPE_1:TYPE_E:#%0i#%1,_:%2|||%3|||%4)

// _.
#define TYPE_1b:#%0#%1|||%2|||%3,%4) TYPE_0:TYPE_1:TYPE_E:#%0i#%1,%2|||%3|||%4)

// End.
#define TYPE_E:#%0#%1||||||%3) #%0#%1)

#define remote%0(%1) stock%0_Call(%1){CallRemoteFunction(#%0, TYPES:%1);}forward%0(%1);public%0(%1)
Challenge

Slice wrote a new version of "CallLocalFunction":

http://forum.sa-mp.com/showthread.ph...on#post1642880

See if you can add new elements in to the code above to detect "&" and generate the "v" parameter for it. Hint "TYPE_0:TYPE_1:TYPE_2:TYPE_E".
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)