[Tutorial] PAWN Pre-Processor (Updated 22/04/13)
#1

Contents

Part 1 - Covers an introduction to the pre-processor and covers some important things when writing function-like macros.
Part 2 - Explains exactly what the compiler searches for and looks at some common macro uses.
Part 3 - Describes the other directives available (beside "#define") and looks at definitions with no replacement value.
Part 4 - How to use strings with the pre-processor.
Part 5 - Alternatives to the pre-processor, multiple symbols and recursion.
Part 6 - Case Study (y_remote).
Part 7 - Macro issues and spaces.

Additional

Utilising tagof - By g_aSlice.
Future-proof string concatenation
String bug
Macros that do multiple things
Advanced "tag" macros

Introduction

My coding recently has involved an awful lot of macro writing (for eaxmple see this file) and although its not as good as say the GCC compiler, PAWN's pre-processor is still very powerful! This post will go over what the PAWN compiler can do - getting into some quite advanced stuff, but don't worry - it builds up gradually so just read as far as you feel comfortable (or maybe try going a little further to learn something new), if you find some parts simple skip ahead. Many macros are very simple, and many only need to be simple, but just ocassionally you need something more - that's what this post is here for.

Basic Replacements

First a quick overview of some simple macros and even simpler definitions. These are perfect for describing how the pre-processor works.
  • Definitions
A definition, like the one below, is just one word and the replacement text.

Code:
#define MAX_PLAYERS                     500
The classic - defines the number of players on your server. This is a perfect example to show how these things work. The pre-processor, as the name implies is run before (pre-) the processor (the main compiler). The main compiler takes written code and converts it to an AMX, the pre-processor generates written code. Say you have the following code:

Code:
printf("%d", MAX_PLAYERS);
The pre-processor will run first and convert that to this:

Code:
printf("%d", 500);
This is the code that the compiler will see and convert into the AMX file. This is a purely text based replacement. All macros are of the form:

Code:
#define <search string><space(s)><replacement>
Note that the space is important - the FIRST space indicates the end of the search string, everything after the first space is part of the replacement string! In the example above the search string (the part the pre-processor will find and replace) is "MAX_PLAYERS" and the replacement (what the search string will be replaced with) is "500".
  • Macros
A macro is a bit like a function - it takes parameters. These are named from "%0" to "%9" - you can't name your own parameters. A function to return the maximum number of players multiplied by a number would be:

Code:
MaxPlayersTimesNumber(number)
{
    return MAX_PLAYERS * number;
    // Remember that "MAX_PLAYERS" is a definition so this will compile as:
    //  return (500) * number;
}
A macro to do the same thing would be:

Code:
#define MAX_PLAYERS_TIMES_NUMBER(%0)    MAX_PLAYERS * %0
This time the search string (the part before the first space) is "MAX_PLAYERS_TIMES_NUMBER(%0)" and the replacement is "MAX_PLAYERS * %0". "%0" is a special marker - it doesn't mean search for the string "%0", it means search for anything between two brackets. The "%0" in the replacement gets the same value as whatever was between the brackets in the string which was replaced. Example:

Code:
#define MAX_PLAYERS                     500
#define MAX_PLAYERS_TIMES_NUMBER(%0)    MAX_PLAYERS * %0

printf("%d", MAX_PLAYERS_TIMES_NUMBER(7));
After pre-processing (replacing the "%0" with the "7" from between the brackets) becomes:

Code:
#define MAX_PLAYERS                     500

printf("%d", MAX_PLAYERS * 7);
"MAX_PLAYERS" is also a macro so this finally becomes:

Code:
printf("%d", 500 * 7);
Note that the compiler is also very clever - if it sees a sum like that which has no variables it will do it for you, so the code which finally gets compiled (the compiler can't format strings) is:

Code:
printf("%d", 3500);
You could also do:

Code:
#define MAX_PLAYERS                     500
#define MAX_PLAYERS_TIMES_NUMBER(%0)    MAX_PLAYERS * %0

new value = 7;
printf("%d", MAX_PLAYERS_TIMES_NUMBER(value));
After pre-processing (replacing the "%0" with the "value" from between the brackets) becomes:

Code:
#define MAX_PLAYERS                     500

new value = 7;
printf("%d", MAX_PLAYERS * value);
"MAX_PLAYERS" is also a macro so this finally becomes:

Code:
new value = 7;
printf("%d", 500 * value);
Because this expression uses a variable (value) the compiler can't reduce this further (some compilers can, but don't worry about that) so this is the final code that gets compiled.

Why?

So why use macros instead of functions (or why use functions instead of macros)? Macros replace text - so wherever you put the macro, there will your replacement go. If you have a macro in your code 100 times, that code will get generated 100 times. On the other hand if you have a function in your code 100 times, its code will only appear once with 100 calls to it. The latter is probably preferable if you have a lot of code - large chunks of code appearing 100 times will make a very large AMX file! Macros tend to be used for very small bits of code - calling a function takes time, so if you have very tiny code it's not really worth the effort of calling a function, but this is not a rule! If you used a function instead of the macros above, the code compiled would look like:

Function:

Code:
MaxPlayersTimesNumber(number)
{
    return (500) * number;
}
Example 1:

Code:
printf("%d", MaxPlayersTimesNumber(7));
Example 2:

Code:
new value = 7;
printf("%d", MaxPlayersTimesNumber(value));
For both of these the PAWN compiler can not reduce the code any more.

Conventions

One thing you may have already noticed is that the function in the examples above was called "MaxPlayersTimesNumber" whereas the equivalent macro was called "MAX_PLAYERS_TIMES_NUMBER". This is just convention - functions in this tutorial will have all words lower case except the first letter, macros will have all letters upper case with words separated by "_", this is so you can tell what something is in code just by looking at it where it is used, without having to look up the definition.

The other convention is that the replace part of a single line macro starts at column 40 (where possible) - this is just to make lots of macros next to each other more readable (IMHO) as it gives:

Code:
#define DEFINITION_1                    1
#define MY_DEF                          2
#define SOME_OTHER_LONG_NAME_DEFINITION 3
#define A_MACRO(%0)                     3 * %0
Instead of:

Code:
#define DEFINITION_1 1
#define MY_DEF 2
#define SOME_OTHER_LONG_NAME_DEFINITION 3
#define A_MACRO(%0) 3 * %0
Neither of these conventions are rules however so feel free to ignore them if you disagree with their use - many macros later on ignore the naming convention above for good reasons. However I would encourage you to have a good reason before ignoring them.

Syntax/Semantics

Just a quick word before the next section. "Syntax" is how code looks, "Semantics" is what code does. The "syntax" of a for loop is: "for (<initialiser>; <conditional>; <modifier>) {}", the "semantics" of a for loop is: loop some number of times based on passed parameters. The important thing to remember in the next section is the syntax and semantics of "printf". The syntax is: "printf(string[], ...);" - that is a string followed by anything else, the contents of the string do not affect the SYNTAX - "printf("%d", 6, 7);" is valid syntax, but the seven won't display as the string determines the SEMANTICS of the function (what it actually does). It will compile but won't work and this is a VERY important distinction to make. The pre-processor deals with code compilation, so if the output compiles it is valid.

Parameters

A macro can have multiple parameters:

Code:
#define MULTIPLY_TWO_NUMBERS(%0,%1)     %0 * %1
Infact it can have up to 10 (this macro is an example of a macro where the replacement can't be aligned to column 40, as the search parameter is too long):

Code:
#define MULTIPLY_NUMBERS(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9) %0 * %1 * %2 * %3 * %4 * %5 * %6 * %7 * %8 * %9
Some people like to put spaces after the comma in function parameter lists, like so:

Code:
#define MULTIPLY_TWO_NUMBERS(%0, %1)     %0 * %1
You CANNOT do this in macros - as mentioned before the FIRST space is the end of the search string, so this will look for "MULTIPLY_TWO_NUMBERS(%0,", not "MULTIPLY_TWO_NUMBERS(%0, %1)", and replace it with "%1) %0 * %1".

Now that you know what a macro is and what the parameters are there are a few important differences to note between macro and function parameters.

Firstly - macro parameters are not the same as function parameters and should not be thought of in the same way, they just match text and are quite simplistic. Function parameters are separated by commas, macro parameters are separated by whatever you tell them to be separated by and this catches a lot of people out.

This code is wrong, there are too many parameters to the funtion:

Code:
MyFunc(a)
{
    return a;
}

main()
{
    printf("%d", MyFunc(1, 2));
}
This code is NOT wrong - there are not too many parameters to the macro:

Code:
#define MY_FUNC(%0)                     %0

main()
{
    printf("%d", MY_FUNC(1, 2));
}
In that example the "MY_FUNC" macro searches for something between two brackets after the letters "MY_FUNC" - that is what "MY_FUNC(%0)" means. In the printf there is something between brackets after the letters "MY_FUNC" - that thing just happens to be "1, 2". This contains a comma but that makes no difference to the macro at all. The code generated by the pre-processor for this macro, by replacing "MY_FUNC(%0)" with the contents of "%0", is thus:

Code:
main()
{
    printf("%d", 1, 2);
}
That is perfectly valid code (though the 2 will not get displayed by the printf function).

If parameters are not comma separated, how can you have more than one? That's not what I said - parameters are separated by what you tell them to be separated by:

Code:
#define MULTIPLY_TWO_NUMBERS(%0,%1)     %0 * %1
That code will search for "MULTIPLY_TWO_NUMBERS(", followed by anything not a comma, followed by a comma, followed by anything not a close bracket, followed by a bracket:

Code:
printf("%d", MULTIPLY_TWO_NUMBERS(6, 7));
That code will match the "MULTIPLY_TWO_NUMBERS" macro (the space is allowed here, it's ONLY not allowed in the declaration) and will generate the following code:

Code:
printf("%d", 6 * 7);
However, a comma is not a closing bracket ("," is clearly not ")"), so this is also valid:

Code:
printf("%d", MULTIPLY_TWO_NUMBERS(6, 7, 8));
In this case the contents of "%0" (anything not a comma) is "6" and the contents of "%1" (anything not a closing bracket) is "7, 8". This will generate the following code:

Code:
printf("%d", 6 * 7, 8);
Brackets

If the parameters are so flexible, how can you actually control what gets output? All the macros so far have been very bad - they didn't use brackets. Compare the following two macros:

Code:
// Without brackets (first).
#define MULTIPLY_TWO_A(%0,%1)           %0 * %1

// With brackets (second).
#define MULTIPLY_TWO_B(%0,%1)           ((%0) * (%1))

main()
{
    // Two with first.
    printf("%d", MULTIPLY_TWO_A(6, 7));
    
    // Two with second.
    printf("%d", MULTIPLY_TWO_B(6, 7));
    
    // Three with first.
    printf("%d", MULTIPLY_TWO_A(6, 7, 8));
    
    // Three with second.
    printf("%d", MULTIPLY_TWO_B(6, 7, 8));
}
The pre-processor will generate the following bits of code:

Code:
main()
{
    // VALID
    printf("%d", 6 * 7);
    
    // VALID
    printf("%d", ((6) * (7)));
    
    // VALID
    printf("%d", 6 * 7, 8));
    
    // INVALID!
    printf("%d", ((6) * (7, 8)));
}
The final piece of code shows the important difference - now that we've added the brackets the generated code is wrong (the syntax is wrong). The code is trying to multiply "6" by "7, 8" - which is not right, so the user will get an error and know that they have done something wrong.

The other use of brackets is in getting operator precidence correct. This is the order in which operations are done - "4 + 5 * 6" gives "34", not "54". Because * is stronger than + (binds more tightly) it is done first (imagine brackets around the "5 * 6" part) so "4 + 5 * 6" becomes "4 + 30" becomes "34". If the operators were done in order you would get "4 + 5 * 6" becomes "9 * 6" becomes "54"

Imagine the following code:

Code:
#define ADD_TWO(%0,%1)                  %0 + %1

main()
{
    printf("%d", ADD_TWO(3, 3) * 7);
}
3 plus 3 is 6, 6 times 7 is 42 right? Wrong! Look at the pre-processed code:

Code:
main()
{
    printf("%d", 3 + 3 * 7);
}
We already know * happens first, so we get "3 + 3 * 7" becomes "3 + 21" becomes "24" - not the correct answer. * happens before +, but brackets happen before *, so we can get the right answer by doing:

Code:
#define ADD_TWO(%0,%1)                  (%0 + %1)
It gets better though:

Code:
#define MUL_TWO(%0,%1)                  (%0 * %1)

main()
{
    printf("%d", MUL_TWO(3 + 3, 7));
}
We've added brackets here so this should give the right answer (42) now? No - this is still wrong! Again consider the generated code:

Code:
main()
{
    printf("%d", (3 + 3 * 7));
}
Yes the calculation is now in brackets, but so is the addition so again we get 24. This is the reason most macros will have the whole sum and all the parameters in brackets. The final version will always give the expected result:

Code:
#define MUL_TWO(%0,%1)                  ((%0) * (%1))
REMEMBER: Put marco parameters in brackets and put the whole macro in brackets (we will see later when this is NOT the case - but these are exceptions, not the norm).

Multiple Lines

A macro can span multiple lines using "\" to denote continuation - that is, if a line of a macro has that at the end it continues on the next line. This continuation cannot be in the search parameter for the same reason as spaces can't be. Note also that the convention in this tutorial series for line continuations is to have them at column 80:

Code:
#define MUL_TWO(%0,%1)                                                          \
    ((%0) * (%1))
Code:
#define MUL_TWO(%0,%1)                                                          \
    (                                                                           \
        (%0)                                                                    \
        *                                                                       \
        (%1)                                                                    \
    )
Code:
#define MUL_TWO(%0,%1)                                                          \
    (                                                                           \
        (                                                                       \
            %0                                                                  \
        )                                                                       \
        *                                                                       \
        (                                                                       \
            %1                                                                  \
        )                                                                       \
    )
The line continuation operator only goes on lines when there is something on the next line - the last line does not have one.

One Pitfall

There is one very important issue to consider when using macros instead of functions:

Function version:

Code:
PrintSquare(var)
{
    printf("%d", var * var);
}

main()
{
    new
        var = 2;
    PrintSquare(var++);
    printf("%d", var);
}
This will output:

Code:
4
3
Define version:

Code:
#define PRINT_SQUARE(%0)                printf("%d", (%0) * (%0))

main()
{
    new
        var = 2;
    PRINT_SQUARE(var++);
    printf("%d", var);
}
This MAY output:

Code:
4
4
Or it MAY output:

Code:
6
4
This is because the parameter passed to the macro is incremented, so the increment is included in the output code:

Code:
main()
{
    new
        var = 2;
    printf("%d", (var++) * (var++));
    printf("%d", var);
}
This means that by the second printf "var" has been incremented TWICE - which is wrong and not what happened in the function version.

The order of evaluation for the postfix increment operator ("var++") is any time after the variable is used, so there are two possible execution orders:

Code:
temp1 = var;
temp2 = var;
var   = var + 1;
var   = var + 1;
printf("%d", temp1 * temp2);
OR:


Code:
temp1 = var;
var   = var + 1;
temp2 = var;
var   = var + 1;
printf("%d", temp1 * temp2);
EITHER is technically valid - in both cases the increment is done after the variable is used, it's just how much longer after that's the issue. Though it is likely the compiler will only do one - it could validly change. This is why the first output is either "4" or "6".

Be VERY careful when using macros with parameters which modify variables - this is why macros are often in all upper case letters, so that users know that it's a macro and can be careful with what parameters they pass.
Reply
#2

Nice. I was just thinking about the 2 threads that really hurt alot.

This and code optimizations thread.
Reply
#3

Misiur if you are going to re-release all of Y_Less's stuff make sure that you keep a detailed table of contents in another thread.
Reply
#4

Yeah, though currently working against clock, as wayback machine lacks a lot of pages, and ****** cache can be purged any minute.
Reply
#5

Also the "part" redirection links needs editing.
Reply
#6

Links dead, are gone forever?
Reply
#7

Yup, sorry, ****** didn't cache them so couldn't recover them.
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)