Code optimisations
#1

Code optimisation


Contents
  • Contents
  • Introduction
  • String optimisation
  • State machines (automata)
  • Callback hooks
  • Interesting Macros
  • Testing
    • foreach
  • Minor optimisations
    • IEEE numbers
    • Rearranging
    • Data rearrangements
    • Know the language
    • Distance checks
    • Speed order
    • Know your values
    • Return values
    • Small snippets
      • Equivalence
      • Empty strings
      • Copying strings
  • Assumptions
    • Introduction
    • Solutions
      • Ignore it
      • Make modifications easy
      • Code defensively
    • Important
  • Memory reduction
    • All vehicles
    • All attributes
    • More that 32 values
    • Excess dimensions
  • CPU vs Memory
  • Lists
    • Types
    • Mixed lists
    • Code
  • Binary trees
    • Example
    • Balanced and unbalanced
    • Modification
      • Addition
      • Deletion
Introduction

These are just some techniques for making your code faster that I have picked up up on the way. Please note that I in no way pretend to be an authority on this subject, these are just what I know, others may know other methods, in which case please do share them as even if no-one else cares I would be interested in knowing them. Note also that many of these techniques apply to languages beside PAWN (the last one I wrote was accused of being stupid because PAWN does not have dynamic memory allocation (personally I think that's a good thing but there we go)), however a lot of other languages may incorporate bits of these into the compiler to do in-line optimisation.

This does not promise to make your code good, just hopefully better. Also note that some bits are fairly complex, it's aimed as a more advanced tutorial, so it does make some assumptions about knowledge in some areas.

String optimisation

For those of you who haven't read my thread on better string usage, it's here:

Why you shouldn't make your strings 256 cells big

State machines (automata)

For those of you who haven't read my thread on state machines (automata), it's here:

State machines (automata)

Callback hooks

For those of you who haven't read my thread on callback hooking for libraries, it's here:

Simpler library writing and usage

Interesting Macros

This topic gives some very interesting uses for macros to make complex code simpler:

Interesting macros

Testing

Firstly I'm going to explain my testing procedure. If I have two pieces of code which do the same thing but in different ways and I want to know which is faster I clock them:

pawn Код:
#define CODE_1 printf("%d", 42);
#define CODE_2 new str[4]; format(str, sizeof (str), "%d", 42); print(str);
#define ITERATIONS (10000)

Test()
{
    new
        t0,
        t1,
        t2,
        i;
    t0 = GetTickCount();
    for (i = 0; i < ITERATIONS; i++)
    {
        CODE_1
    }
    t1 = GetTickCount();
    for (i = 0; i < ITERATIONS; i++)
    {
        CODE_2
    }
    t2 = GetTickCount();
    printf("Time 1: %04d, time 2: %04d", t1 - t0, t2 - t1);
}
Clearly both the pieces of code will display the number "42" in the server console, however they both do it in different ways. Hopefully no-one will need to run this code to know which method will be faster, but it's a good simple example of testing to see which of two equivalent pieces of code is faster. The ITERATIONS loop is important, in all likelihood both of these pieces of code will take less than a millisecond each, so both will report their time taken as zero. Also, if you only do it once threading becomes a major issue, if one version is interrupted by the OS it can report itself as taking substantially longer when in fact it's faster. If both are done lots and lots of times then interrupts will hopefully negate each other and each loop will take more than one millisecond. The layout of the code is also important, all the variables are declared first to move their overhead outside of the loop (their execution time is so small it likely won't affect the outcome anyway, but just for consistency it's good).

It's also sometimes good to wrap the Test function in another loop, especially if the results are very close, to verify the results. Execution times can vary slightly due to threads, this is visible if you run multiple tests in the form of maybe a few milliseconds variation on each time. If you run close results repeatedly then you can check than one is consistently faster rather than faster just the once, as that may have been a fluke and 90% of the time the other is faster, just not that one time.

Sometimes you may need more advanced test code, e.g. to test more than two equivalents, this is fairly easy to expand:

pawn Код:
#define CODE_1 printf("%d", 42);
#define CODE_2 new str[4]; format(str, sizeof (str), "%d", 42); print(str);
#define CODE_3 print("42");
#define ITERATIONS (10000)

Test()
{
    new
        t0,
        t1,
        t2,
        t3,
        i;
    t0 = GetTickCount();
    for (i = 0; i < ITERATIONS; i++)
    {
        CODE_1
    }
    t1 = GetTickCount();
    for (i = 0; i < ITERATIONS; i++)
    {
        CODE_2
    }
    t2 = GetTickCount();
    for (i = 0; i < ITERATIONS; i++)
    {
        CODE_3
    }
    t3 = GetTickCount();
    printf("Time 1: %04d, time 2: %04d, time 3: %04d", t1 - t0, t2 - t1, t3 - t2);
}
Etcetera...
  • foreach
I recently clocked my foreach function against the default for/IsPlayerConnected (IPC) code. foreach uses a linked list of players so when you loop it ONLY loops through connected players, compared to IPC code which loops through ALL players and checks if they're connected. I knew that on a large server with not many players it was faster, but I wasn't sure about more full servers, so I clocked it. I didn't have 200 players to test with but I know that IsPlayerConnected takes pretty much the same time to run whether it returns true or false (it basically just returns a variable in the server which could be either), so this wasn't a problem as for a given number of players IPC would run at a constant speed regardless of whether they were on or not. foreach runs very differently depending on how many players are connected, taking next to no time with no players and a lot longer for a full server, I just wanted to know if this longest execution time was longer than the constant time IPC ran at. I actually wanted to profile it at all player counts, this meant faking player connections in foreach, which wasn't hard as it's my code and you just call a connect function. The test code for this ended up looking like:

pawn Код:
#define FAKE_MAX 200
#define SKIP 0
Iter_Create(P2, FAKE_MAX);
 
TestFunc()
{
    new
        fep = 0,
        fet = 0,
        fip = 0,
        fit = 0,
        i = 0;
    while (i < SKIP)
    {
        Itter_Add(P2, i++);
    }
    while (i <= FAKE_MAX)
    {
        new
            t0,
            t1,
            t2,
            j;
        t0 = GetTickCount();
        for (j = 0; j < 10000; j++)
        {
            for (new playerid = 0; playerid < FAKE_MAX; playerid++)
            {
                if (IsPlayerConnected(playerid))
                {
                    // Do something
                }
            }
        }
        t1 = GetTickCount();
        for (j = 0; j < 10000; j++)
        {
            foreach(P2, playerid)
            {
                // Do something
            }
        }
        t2 = GetTickCount();
        printf("Players: %04d, for: %04d, foreach: %04d", i, t1 - t0, t2 - t1);
        fit = fit + t1 - t0;
        fet = fet + t2 - t1;
        fip += FAKE_MAX;
        fep += i;
        if (i < FAKE_MAX)
        {
            Itter_Add(P2, i);
        }
        i++;
    }
    printf("for ms/p: %04d, foreach ms/p: %04d", (fit * 100) / fip, (fet * 100) / fep);
}
This ran the code 201 times, one for each player count (0-200) (with both foreach and IPC it doesn't matter WHICH players are connected in terms of speed). It also allowed me to fake more players, e.g. to test how this code would run on a 0.3 server with 500 players, IPC is actually slightly faster if the player you're testing doesn't exist, and it was STILL slower, so that was fairly conclusive. The one test I didn't run was:

pawn Код:
t0 = GetTickCount();
        for (j = 0; j < 10000; j++)
        {
            for (new playerid = 0; playerid < FAKE_MAX; playerid++)
            {
                Kick(playerid);
            }
        }
        t1 = GetTickCount();
        for (j = 0; j < 10000; j++)
        {
            foreach(P2, playerid)
            {
                Kick(playerid);
            }
        }
        t2 = GetTickCount();
Kick, and in fact all player functions, has an internal IsPlayerConnected check, so if you're only running one function in a loop it's more efficient to NOT call IsPlayerConnected and just call the function direct. If the player is connected you've saved a function call, if they're not connected you've not lost anything as the only code that's been executed is the same as if you called IsPlayerConnected. Unfortunately this example may affect the speed of foreach at high player counts as you will be using the foreach code AND the IPC code in the same loop. If you did:

pawn Код:
t0 = GetTickCount();
        for (j = 0; j < 10000; j++)
        {
            for (new playerid = 0; playerid < FAKE_MAX; playerid++)
            {
                if (IsPlayerConnected(playerid))
                {
                    Kick(playerid);
                }
            }
        }
        t1 = GetTickCount();
        for (j = 0; j < 10000; j++)
        {
            foreach(P2, playerid)
            {
                Kick(playerid);
            }
        }
        t2 = GetTickCount();
Then foreach would be faster, but that's not the most efficient way of doing it in the first instance.

I have actually looked into modifying a compiler so you can do something like:

Код:
eqiv
{
    {
        for (new playerid = 0; playerid < FAKE_MAX; playerid++)
        {
            if (IsPlayerConnected(playerid))
            {
                Kick(playerid);
            }
        }
    }
    {
        foreach(P2, playerid)
        {
            Kick(playerid);
        }
    }
}
Which will compile and optimise both versions of the code, then accurately clock both based on generated code and known OpCode clock cycles to see which is faster. However there are all sorts of problems involved regarding test sets and I've not done it yet so it's really a moot point (truthfully there's a number of improvements I'd like to make to various language compilers, I've just not yet).

Minor optimisations
  • IEEE numbers
There is a float representation for positive and negative infinity, and one for invalid numbers. I've seen people try all sorts of representations for large or invalid float numbers, take these examples:

pawn Код:
if (!IsPlayerConnected(playerid))
{
    return -1.0;
}
return GetDistance(playerid);
pawn Код:
new
    bool:first = true,
    Float:distance = 0.0;
foreach (Player, playerid)
{
    if (first)
    {
        first = false;
        distance = GetDistance(playerid);
    }
    else
    {
        new
            Float:temp = GetDistance(playerid);
        if (temp < distance)
        {
            distance = temp;
        }
    }
}
The first code could do anything, with a distance of -1.0 meaning the player isn't connected, so retuning an invalid distance. The second code finds the closet player to something (exact details aren't important). The second piece of code can be optimised by choosing a very large start number instead of the "first" variable:

pawn Код:
new
    Float:distance = 100000.0;
foreach (Player, playerid)
{
    new
        Float:temp = GetDistance(playerid);
    if (temp < distance)
    {
        distance = temp;
    }
}
But that misses the rare case when a player is over 100000 units away (note that in actual fact you would use squared values here, as detailed below, but this is for example only). Both of these examples have well defined solutions:

pawn Код:
#define FLOAT_INFINITY   (Float:0x7F800000)
#define FLOAT_NEG_INFINITY (Float:0xFF800000)
#define FLOAT_NAN     (Float:0xFFFFFFFF)
This would make the code above:

pawn Код:
new
    Float:distance = FLOAT_INFINITY;
foreach (Player, playerid)
{
    new
        Float:temp = GetDistance(playerid);
    if (temp < distance)
    {
        distance = temp;
    }
}
pawn Код:
if (!IsPlayerConnected(playerid))
{
    return FLOAT_NAN;
}
return GetDistance(playerid);
"NaN" is a very special number - comparing it to any other number (including itself) will return false. To check for NaN you would do:

pawn Код:
stock IsNaN(number)
{
    return !(number <= 0 || number > 0);
}
That function returns false if the number passed is less than, equal to, or greater than 0 - all numbers, including + and - infinity, match those criteria - NaN DOESN'T because it's not a number, so doesn't have a value. Using these values guarantee (they're defined in the IEEE floating point number spec) that you will never use numbers which could be confused with real values. Due to the unique properties of NaN this very odd looking code should also work:

pawn Код:
stock IsNaN(number)
{
    return (number != number);
}
DO NOT TRY:

pawn Код:
if (number == FLOAT_NAN)
That code will fail because, as previously mentioned, NaN is not equal even to itself.

Let's look at this when you want the distance to see if they're in range of something:

pawn Код:
if (DistanceWithConnectionCheck(playerid) < 100.0)
{
    // They're in range and connected
}
Returning infinity will mean the check fails, as will returning NaN (it's not less than, greater than or equal to 10) - compare this to the extra code you need here to check for "-1.0":

pawn Код:
new
    Float:distance = DistanceWithConnectionCheck(playerid);
if (distance == -1.0)
{
    // They're not connected
}
else if (distance < 100.0)
{
    // They're in range and connected
    // -1.0 is less than 100, so we need to check for that specially
}
See the YSI object streamer for a real usage.
  • Rearranging
The compiler can do constant maths, this means if you do:

pawn Код:
printf("%d", 4 + 5);
The compiler will do:

pawn Код:
printf("%d", 9);
It won't bother putting in code to do the maths as there's no point - it'll always be the same result. Often a simple formula rearrangement can help your code:

pawn Код:
new
    var = (4 + somevar) - 11;
That will compile to do two bits of maths, first to add 4 to a number, then to subtract 11 from the result (some compilers may actually be able to optimise this in the way I'm about to describe, but it's a simple example). If you rearrange this sum you get:

pawn Код:
new
    var = somevar + (4 - 11);
The compiler can very quickly optimise this to:

pawn Код:
new
    var = somevar - 7;
A more complex example with no compiler optimisations:

pawn Код:
new
    gLastTime[MAX_PLAYERS];
 
#define EXPIRY 1000
pawn Код:
new
    time = GetTickCount();
foreach (Player, playerid)
{
    if (time - gLastTime[playerid] > EXPIRY)
    {
        SendClientMessage(playerid, 0xFF0000AA, "Your time expired");
    }
}
That's a basic example which detects when a certain time has passed since a player last did something. No options for optimisation there you may think, and you may be right, but it's always better to try. The equation here is:

Код:
time - gLastTime[playerid] > EXPIRY
=, ==, >= etc can all be rearranged in the same way, so the above equation is the same as:

Код:
time > EXPIRY + gLastTime[playerid]
More importantly, it is also the same as:

Код:
time - EXPIRY > gLastTime[playerid]
Why is this important? In terms of the loop "time - EXPIRY" is now a constant as neither change in the loop. This means you can do:

pawn Код:
new
    time = GetTickCount() - EXPIRY;
foreach (Player, playerid)
{
    if (time > gLastTime[playerid])
    {
        SendClientMessage(playerid, 0xFF0000AA, "Your time expired");
    }
}
You have just cut out up to 200 repeated subtractions with basically no effort. The more constant, or pseudo-constant, elements you can get in a sum the better, especially when you're doing lots of them. If you were only checking one player I'd be tempted to put everything, including the GetTickCount() call, in the if statement, but not if you're doing it multiple times.
  • Data rearrangements
Another thing to consider is how your data is laid out and how you want to access it. Take the following data for example:

pawn Код:
#define MAX_OWNDED_VEHICLES 10

new gVehicleOwner[MAX_OWNDED_VEHICLES] = {0, 2, 4, 6, 8, 10, 12, 14, 16, 18};
Here you have 10 vehicles, each with a player who owns them (assume for this example one player can only own up to one vehicle). If you want to find out who owns a vehicle you can simply do:

pawn Код:
printf("The owner of vehicle %d is %d", vehicleid, gVehicleOwner[vehicleid]);
But what if you want to find out which vehicle a player owns? For that you would need to do something like:

pawn Код:
new i = 0;
while (i < MAX_OWNDED_VEHICLES)
{
    if (gVehicleOwner[i] == playerid)
    {
        printf("Player %d owns vehicle %d", playerid, i);
        break;
    }
    i++;
}
if (i == MAX_OWNDED_VEHICLES)
    printf("Player %d does not own a vehicle", playerid);
Now lets look at it a different way round:

pawn Код:
#define MAX_PLAYERS 20

new gPlayerVehicle[MAX_PLAYERS] = {0, -1, 1, -1, 2, -1, 3, -1, 4, -1, 5, -1, 6, -1, 7, -1, 8, -1, 9, -1};
Now if you want to find out which vehicle a player owns it's just a simple array lookup, but if you want to see who owns a vehicle it's a loop. The question you need to consider here is which do you want to know more? If you don't care who owns a given vehicle but do care which vehicle someone owns then use the second layout, and vice-versa the first. If you use both a lot you may actually want to consider mirroring the data in two different arrays. This is a trade-off between speed and memory, and is a VERY common trade-off you find people dealing with. Personally I think speed is more important than memory in modern 32 and 64 bit processors so would use two arrays, but you may disagree given the multiple gigahertz that they run at so would use one array and a loop.

Let's look at a far more clean-cut example that was actually in a topic I read recently. This example is VERY cut down, I'm only using 10 models here:

pawn Код:
new
    gCars[] = {400, 403, 404, 406, 408, 409},
    gHeavyVehicles[] = {400, 402, 408},
    gBoats[] = {401, 407},
    gFireEngines[] = {402, 405};
If you want to know if a model is a car you need to loop through "gCar" till you find that model or you reach the end. On the other hand with this code it's very easy to tell what model a car at a given position is, but this means nothing as is data you are unlikely to ever want to know. So the question is; why is it easy to get data you don't want and hard to get data you do want? That makes no sense at all... We know that for a given model we want to know if it's a car, so we need to change the code to use the model as the array index (offsetting by 400), same as we did above to find what vehicle a player owns:

pawn Код:
new
    gIsACar[] = {1, 0, 0, 1, 1, 0, 1, 0, 1, 1},
    gIsAHeavyVehicle[] = {1, 0, 1, 0, 0, 0, 0, 0, 1, 0},
    gIsABoat[] = {0, 1, 0, 0, 0, 0, 0, 1, 0, 0},
    gIsAFireEngine[] = {0, 0, 1, 0, 0, 1, 0, 0, 0, 0};
Now if you want to find out if a model is a car or not you simply do:

pawn Код:
if (gIsACar[model - 400])
Isn't that SO much simpler and faster than a loop?

Using an entire cell to store a boolean value (1 or 0) is also very inefficient, but we'll cover that later.
  • Know the language
And I mean WELL. As someone reminded me the other day (I had commented on it on IRC) a little while ago I found out that:

pawn Код:
if (2 <= a <= 4)
Works in PAWN (it doesn't in C), I had thought it was like in C, so had been doing:

pawn Код:
if (2 <= a && a <= 4)
Not a vast improvement, but most of these aren't, it's the combined and repeated effect that's important.

If, for example, you didn't know about the "&" operator and you wanted to test if the second bit of a number was set you would need to do something like:

pawn Код:
if ((a << 30) >>> 31)
or:

pawn Код:
if ((a % 4) >>> 1)
Both those pieces of code would ensure that only one bit of a number was set, and would do what you wanted, but it's far better to know about "&":

pawn Код:
if (a & 2)
It's clearly faster (shifts aren't too bad, but MOD is VERY slow, out of the other two versions always go for the first if you have to go for one of them), it's also a lot more obvious what you're trying to do. If you don't know what it does - go read pawn-lang.pdf.

This is clearly a very basic example but there are many many other examples. People tend to baulk when I tell them to read pawn-lang.pdf then wonder why I seem to know more about PAWN than they do, 2 + 2 = ... There's a reason I have it bookmarked, and it's not so I can quickly copy the link to post for people (although it is handy for that too).

As my example at the start of this section shows, there's always something new for you to learn, no matter how much you may already know. I can think of two huge parts of PAWN that I don't know at all and the fact that I don't know of any other areas just means I don't know them yet, it doesn't mean they don't exist (it's hard to know what you don't know).
  • Distance checks
0.3 now supports distance checks automatically, just use:

pawn Код:
IsPlayerInRangeOfPoint(playerid, Float:range, Float:x, Float:y, Float:z);
This is a very common thing for people do, and a very common thing for people to do wrong. Example:

pawn Код:
if (PlayerDistanceToPoint(playerid, 10.0, 20.0, 2.0) <= 5.0)
{
    // They're near 10.0, 20.0, 2.0 - do something
}
That code will work, will trigger when the player is in 5.0 units of a point, but getting the distance to a point is very slow. The equation to get the distance between two points (x1, y1, z1 and x2, y2, z2) is:

Код:
(((x1 - x2) ^ 2) + ((y1 - y2) ^ 2) + ((z1 - z2) ^ 2)) ^ 0.5
"^" in this case means power, not XOR, "^ 0.5" is square root (trust me - try it on a calculator). The most common implementation of this is:

pawn Код:
floatsqroot(floatadd(floatadd(floatpower(floatsub(x1, x2), 2), floatpower(floatsub(y1, y2), 2)), floatpower(floatsub(z1, z2), 2)));
Don't ask me why whoever originally wrote it didn't bother using standard operators, I have no idea, but simplified this code is:

pawn Код:
floatsqroot(floatpower(x1 - x2, 2) + floatpower(y1 - y2, 2) + floatpower(z1 - z2, 2));
Now, the first thing to note is that the code to raise something to a generic power is complicated, it doesn't optimise for simple ones like 2 it just uses the basic algorithm. We know that something to the power 2 is just that thing multiplied by itself (or you should do). 3^2 is the same as 3*3, 57^2 is the same as 57*57 etc. Multiplication is much simpler that power, so the code becomes:

pawn Код:
floatsqroot(((x1 - x2) * (x1 - x2)) + ((y1 - y2) * (y1 - y2)) + ((z1 - z2) * (z1 - z2)));
You could remove some brackets, but there's no point, this is just as fast as the reduced bracket version and more explicit. There is now more code here, but it's much faster. Now the quest for optimisation gets more interesting. We do each of the subtractions twice, you could export these to variables to only do them once, but then you've got additional variable writes which may not offset the gain in speed:

pawn Код:
x1 -= x2;
y1 -= y2;
z1 -= z2;
floatsqroot((x1 * x1) + (y1 * y1) + (z1 * z1));
We now have our nice efficient code for getting someone's distance from something, but there's still one major problem - all the optimisations done so far are nothing compared to floatsqroot, that is an insanely inefficient function (well, it's not inefficient, it's actually very efficient, but that doesn't make it fast because it's so complicated). Believe it or not most of the time you don't actually need to know exactly how far from the point someone is, just whether they're near the point or not. Now you should have read the part on rearranging (if not, go read it again), so let's apply that here:

Код:
((x * x) + (y * y) + (z * z)) ^ 0.5 <= 5.0
You can rearrange inequalities in exactly the same way as regular equations:

Код:
((x * x) + (y * y) + (z * z)) ^ 0.5 <= 5.0
(x * x) + (y * y) + (z * z) <= 5.0 ^ 2
(x * x) + (y * y) + (z * z) <= 5.0 * 5.0
Anyone familiar with maths should be able to vouch for that very simple rearrangement. We know how to quickly square something (as I just told you), and we know that square-rooting is very slow, so that's a vast improvement:

pawn Код:
if (PlayerDistanceToPointSquared(playerid, 10.0, 20.0, 2.0) <= 5.0 * 5.0)
{
    // They're near 10.0, 20.0, 2.0 - do something
}
Or:

pawn Код:
if (IsPlayerInRangeOfPoint(playerid, 5.0, 10.0, 20.0, 2.0))
{
    // They're near 10.0, 20.0, 2.0 - do something
}
pawn Код:
stock IsPlayerInRangeOfPoint(playerid, Float:range, Float:x, Float:y, Float:z)
{
    new
        Float:px,
        Float:py,
        Float:pz;
    GetPlayerPos(playerid, px, py, pz);
    px -= x;
    py -= y;
    pz -= z;
    return ((px * px) + (py * py) + (pz * pz)) < (range * range);
}
You can write your own code of course, but for reasons I can't go into I STRONGLY suggest you use that function name and parameter order.

Update:

I've found the original timing analysis I did on this area and the results are not what some people would expect. I compared a whole load of different distance analysis functions, including less accurate ones people use for speed, for example:

pawn Код:
Type1(Float:x1, Float:y1, Float:z1, Float:x2, Float:y2, Float:z2, Float:dist)
{
    x1 = (x1 > x2) ? x1 - x2 : x2 - x1;
    if (x1 > dist) return false;
    y1 = (y1 > y2) ? y1 - y2 : y2 - y1;
    if (y1 > dist) return false;
    z1 = (z1 > z2) ? z1 - z2 : z2 - z1;
    if (z1 > dist) return false;
    return true;
}
The results were that even these "faster" implementations were slower than the implementation outlined above, AND less accurate.

The results I got were:

Код:
1703 1781 1594 1641 2265 1782 2281 1891
1703 is the time for the "faster" version, 1594 was the time for my version. Conclusion - don't try to optimise distance checks by making them worse - the originals are both faster and more accurate.

Note that running the linked code will produce 9 values due to a bug - just ignore the last value (a fatal mistake I made).

My analysis code can be found here.
  • Speed order
Different language features take different times to execute, in general the order is (from fastest to slowest):
  • Nothing
  • Constants
  • Variables
  • Arrays
  • Native functions
  • Custom functions
  • Remote functions
So for example:

pawn Код:
for (new i = 0; i < MAX_PLAYERS; i++)
Is faster than:

pawn Код:
for (new i = 0, j = GetMaxPlayers(); i < j; i++)
As the main part of the loop in the first uses a constant, whereas the main part in the second uses a variable (the overhead of a single function call in a loop is negligible compared to the repeated check). This second version is itself faster than:

pawn Код:
for (new i = 0; i < GetMaxPlayers(); i++)
As this third version uses a repeated function call rather than a variable or constant.

I'm not sure where control structures fit in to the list, for example I'm not sure which of these is faster:

pawn Код:
new var = a ? 0 : 1;
printf("%d", var);
printf("%d", var);
printf("%d", var);
printf("%d", var);
printf("%d", var);
pawn Код:
printf("%d", a ? 0 : 1);
printf("%d", a ? 0 : 1);
printf("%d", a ? 0 : 1);
printf("%d", a ? 0 : 1);
printf("%d", a ? 0 : 1);
I suspect the first, unless you only have one print, in which case definitely the second, but again there are for more complex examples where it's less clear. This requires clocking but I've not done it yet (and there are loads of control structures, so I just apply my general rule (see below) to these).

So why is "nothing" in the list? Consider the following two bits of code:

pawn Код:
new var = random(10);
printf("%d", var);
pawn Код:
printf("%d", random(10));
Clearly the second is faster as there's no intermediary step. In actual fact most compilers will optimise out the variable but not all do.

Where the line starts getting blurry is with repeated calls. For example, which of these is faster:

pawn Код:
new var = gArr[10];
printf("%d %d", var, var);
pawn Код:
printf("%d %d", gArr[10], gArr[10]);
The first one has one array access, one variable write and two variable reads, the second has two array accesses. Truthfully I'm not sure which is faster but I have a general rule:

More than one function call - save it in a variable, more than two array reads save it in a variable, so for the above code I would probably use the second version, however a three element print would require three array accesses, which is more than two, thus I would use an intermediary variable:

pawn Код:
new var = gArr[10];
printf("%d %d %d", var, var, var);
And I never call the same function more than once when I don't have to (especially as this can have unintended results if the function has changing returns or side effects).
  • Know your values
Another common bit of code (I'm sure most of you will recognise it) is this:

pawn Код:
if (killerid == INVALID_PLAYER_ID)
    SendDeathMessage(INVALID_PLAYER_ID, playerid, reason);
else
    SendDeathMessage(killerid, playerid, reason);
Lets look at another example of similar code to try make it more obvious what exactly this is doing and why it's stupid:

pawn Код:
if (var == 1)
    printf("%d", 1);
else
    printf("%d", var);
If var is one this prints '1', if var isn't one this prints the value of var, either way the value of var gets printed, so what does the if do? This can just as easily be written:

pawn Код:
printf("%d", var);
For the same reason, this does the same as the code above:

pawn Код:
SendDeathMessage(killerid, playerid, reason);
The original code comes from the 0.2 version of LVDM, but in there other things were done too, making the check not pointless, but people took this, removed the other bits and didn't think about what the code was actually now doing. It doesn't matter if killerid isn't valid as as shown above INVALID_PLAYER_ID is a perfectly acceptable input to SendDeathMessage.
  • Return values
Contrary to what the wiki says, many function's return values are important as they indicate whether a (native) function succeeded or not. This can be utilised as we know from before that variables are faster than function calls, and doing things once is faster that doing things twice. An example:

pawn Код:
new Float:health;
for (new i = 0; i < MAX_PLAYERS: i++)
    if (IsPlayerConnected(i))
    {
        GetPlayerHealth(i, health);
        SetPlayerHealth(i, health + 10.0);
    }
Pretty self explanatory and typical code, but if we understand error returns this can be optimised. Due to bugs in 0.1 all player and vehicle functions now have checks to make sure the thing you’re operating on actually exists, so if you did GetPlayerHealth on a player that didn’t exist the function would fail and the health variable would have the same value as before. More importantly pretty much all functions without an important return value return 0 on failure and 1 on success. Internally the player connection check in GetPlayerHealth is more or less identical to the one in IsPlayerConnected, so we’re checking if a player is connected twice. If they’re not connected GetPlayerHealth will end instantly, so instead of the above code we can do:

pawn Код:
new Float:health;
for (new i = 0; i < MAX_PLAYERS: i++)
    if (GetPlayerHealth(i, health))
        SetPlayerHealth(i, health + 10.0);
This is no slower for players not connected and faster for players who are, so worst case you get no improvement, best case loads.

This can also be applied to other functions, even if they have return values:

pawn Код:
if (IsPlayerInAnyVehicle(playerid))
{
    new vehicleid = GetPlayerVehicleID(playerid);
    SetVehiclePos(vehicleid, 0.0, 0.0, 10.0);
}
Again, an example of very common code, but again we need to ask what do these functions actually return? GetPlayerVehicleID returns the ID of the vehicle the player is in, and if they’re not in a vehicle it returns 0 (as this is an invalid vehicle ID). So, if we’re going to get the ID of the vehicle they’re in and this knows if they’re in a vehicle or not and can tell you, why check if they’re in one?

pawn Код:
new vehicleid = GetPlayerVehicleID(playerid);
if (vehicleid)
    SetVehiclePos(vehicleid, 0.0, 0.0, 10.0);
Now, instead of two function calls you only have one and you check the return of that one is valid (i.e. not 0).

One other aspect of return values is how long they exist for. If you set a variable in an if statement (which, if you’re not careful, will give an unintended assignment warning) the value you just set is still in the if statement, so if you did:

pawn Код:
new
    a = 1,
    b = 0;
if ((b = a))
(Note the double brackets to avoid the unintended assignment warning)

Then "a" would be assigned to "b", so "b" would be 1, and that 1 would still be active in the if effectively as the return of the assignment, so this if is true, however:

pawn Код:
new
    a = 0,
    b = 1;
if ((b = a))
After this code "b" will be 0, as "a" has successfully been assigned to "b", but as the result of this assignment is 0, the if fails. Just to illustrate this, figure this code out (re-read the part on strings if you need to):

pawn Код:
stock strcpy(dest[], src[])
{
    new i = 0;
    while ((dest[i] = src[i])) i++;
}
This also means you can do:

pawn Код:
new vehicleid;
for (new i = 0; i < MAX_PLAYERS: i++)
    if ((vehicleid = GetPlayerVehicleID(i)))
        SetVehiclePos(vehicleid, 0.0, 0.0, 10.0);
Example of a player name check implementation in PAWN using this:

pawn Код:
NameCheck(name[])
{
    new
        i,
        ch;
    while ((ch = name[i++]) && ((ch == ']') || (ch == '[') || (ch == '_') || ('0' <= ch <= '9') || ((ch |= 0x20) && ('a' <= ch <= 'z')))) {}
    return !ch;
}
This function will return true if the name is OK (i.e. all 0-9, a-z, A-Z, [, ] or _) and false if not. The (ch |= 0x20) is another little trick to convert a character to lower case, whatever it's previous case, based on the fact that in ASCII upper and lower case characters are exactly 0x20 apart (A = 0x40, a = 0x60).
  • Small snippets
  • Equivalence
Multiple things can mean the same thing, for example:

pawn Код:
if (string[1] == 65)
Is the same as:

pawn Код:
if (string[1] == 0x41)
Is the same as:

pawn Код:
if (string[1] == 0b01000001)
Is the same as:

pawn Код:
if (string[1] == 'A')
Although they all mean the same thing and thus won't get any speed improvements the last one is more obvious as to what you're trying to do (check if a character is 'A'). However in other circumstances one of the other two may be more appropriate, i.e. you want to see if something is 65, and the fact that that's the same as 'A' is merely coincidence.

This can be taken a bit further with zero:

pawn Код:
if (string[1] == 0)
Is the same as:

pawn Код:
if (string[1] == 0x00)
Is the same as:

pawn Код:
if (string[1] == ((27 + 3) / 5) - 6)
Is the same as:

pawn Код:
if (string[1] == 0b00000000)
Is the same as:

pawn Код:
if (string[1] == '\0')
Is the same as:

pawn Код:
if (string[1] == false)
Is the same as:

pawn Код:
if (string[1] != true)
Is the same as:

pawn Код:
if (string[1] == 0.0)
Is the same as:

pawn Код:
if (!string[1])
In fact that can be taken a lot further (floatround_round, radian, SPECIAL_ACTION_NONE, seek_start, io_read and PLAYER_STATE_NONE are all also 0).
  • Empty strings
The standard way of checking if a string is empty is:

pawn Код:
if (strlen(string) == 0)
{
    // The string is empty because it's length is 0
}
This clearly does check if a string is empty, and is fast if the string is empty, but is slower if the string isn't empty. The way strlen works is by looping through the string until it finds the end (which, as you should know from the other topic on strings, is denoted by a NULL character), once the end is hit the length is returned, so if the string isn't empty it will take a while to find the end.

As we know that the end of a string is denoted by a NULL character all we need to do to see if a string is empty is to see if the first character is the end of the string or not:

pawn Код:
if (string[0] == '\0')
{
    // The string is empty because it's first character is the end
}
This can be further improved:

pawn Код:
if (!string[0])
{
    // The string is empty because it's first character doesn't exist
}
The only place this doesn't apply is in strings passed by CallRemoteFunction and CallLocalFunction. Due to the way the PAWN VM works these strings MUST NOT have 0 length, so empty strings are passed as "\1\0" (i.e. character 1 (SOH), character 0 (NULL)). To check for this do:

pawn Код:
if (string[0] == '\1' && string[1] == '\0')
{
    // The string is empty because it's specially marked as empty
}
Or, using isnull from YSI:

pawn Код:
#define isnull(%1) \
    ((!(%1[0])) || (((%1[0]) == '\1') && (!(%1[1]))))
pawn Код:
if (isnull(string))
{
    // The string is empty because isnull said so
}
Credit to Simon for the suggestion.
  • Copying strings
A lot of people tend to copy strings like this:

pawn Код:
format(dest, sizeof (dest), "%s", src);
This is one of the worst ways to do it! I did timings on six different methods of copying strings, in all cases "b" is the destination and "a" is the source. "strcpy" is a hand written PAWN function to copy strings:

pawn Код:
strmid(b, a, 0, strlen(a), sizeof (b));

format(b, sizeof (b), "%s", a);

b[0] = '\0';
strcat(b, a, sizeof (b));

memcpy(b, a, 0, strlen(a) * 4 + 4, sizeof (b)); // Length in bytes, not cells.

strcpy(b, a);

b = a;
Note that "b = a;" is the standard PAWN array copy and only works for arrays known at compile time to be the same size, or with a larger desination. Unfortunately I ran a range of tests and they do not point to a single best function. What they DO do is show quite clearly that both the hand coded PAWN version and format are very slow at copying strings:

For short strings in small arrays, "b = a;" is fastest when applicable, strcat with prior NULL termination (important) is second.

For short strings in large arrays, strcat is fastest.

For longer strings in longer arrays, "b = a;" is again fastest, with memcpy second.

For huge arrays "b = a;" seems to be fastest.

Where possible use standard array assignment, however this is not always possible, for example when a string of unknown size is passed to a function. In these cases I would suggest using strcat (if you're interested, note the bizzare syntax):

pawn Код:
#define strcpy(%0,%1,%2) \
    strcat((%0[0] = '\0', %0), %1, %2)
Use:

pawn Код:
strcpy(dest, src, sizeof (dest));
Credit to Simon for the suggestion.

Assumptions
  • Introduction
The bottom line is don't make assumptions!
Assumptions are saying you know what something always is just because it's often that. Example:

pawn Код:
public OnGameModeInit()
{
    AddPlayerClass(167, 0.0, 0.0, 5.0, 0.0, 0, 0, 0, 0, 0, 0); // Clucking Bell employee
    AddPlayerClass(179, 0.0, 0.0, 5.0, 0.0, 0, 0, 0, 0, 0, 0); // Ammunation employee
}

public OnPlayerRequestClass(playerid, classid)
{
    switch (classid)
    {
        case 0:
        {
            GameTextForPlayer(playerid, "Clucking Bell", 5000, 3);
        }
        case 1:
        {
            GameTextForPlayer(playerid, "Ammunation", 5000, 3);
        }
    }
    return 1;
}
That's probably very familiar looking code to almost every one of you - setting up your mode's skins and doing things based on the selected one (I've omitted setting cameras here as it's irrelevant to the point). On your own private server this may be fine, you know there's only two skins, you know what order they're added in, and you know they'll never be modified - fine. But if you're going to release a mode this is VERY risky. A few problems which can arise:
  • A person using your mode also has filterscripts which add skins. This will throw off your class values.
  • A person using your mode decides they want to add skins and add them to the start of the list. This again throws off your class values.
  • A SA:MP version change alters the way IDs are assigned. Entirely altering your class IDs.
I'm sure there are more but those are the basics. The problem here is that you're using constants and assuming that they'll never change, either through mode modification or through AddPlayerClass not returning what you expect. There are a few solutions to this problem.
  • Solutions
  • Ignore it
If people want to use your mode differently to how you intended, that's their own problem and they can modify the code accordingly. I'm sure everyone here has experience with people asking why, when they add new vehicles to GF, do all the other house and job cars mess up. This is because GF, which was coded for a single server with the intention of not being modified, made assumptions about what cars there were.

This is clearly not a solution at all.
  • Make modifications easy
The second solution, which balances efficiency and modification, is to make the assumptions, but to minimise their usage. You may have commands which rely on the chose skin, in which case you may have code like:

pawn Код:
dcmd_chicken(playerid, params[])
{
    if (gClass[playerid] != 0) return SendClientMessage(playerid, 0xFF0000AA, "Sorry, you're not a Clucking Bell employee");
    ...
}
Your assumption that the Clucking Bell class is class 0 is now in your mode twice - once in OnPlayerRequestClass and once in dcmd_chicken. Assuming your mode is more than 20 lines long you could end up with this appearing hundreds of times, making modification a nightmare as you have to ensure you get every instance. The simplest way to get round this is use a define (or an enum):

pawn Код:
#define CLUCKING_BELL_CLASS (0)
#define AMMUNATION_CLASS (1)

public OnGameModeInit()
{
    AddPlayerClass(167, 0.0, 0.0, 5.0, 0.0, 0, 0, 0, 0, 0, 0); // Clucking Bell employee
    AddPlayerClass(179, 0.0, 0.0, 5.0, 0.0, 0, 0, 0, 0, 0, 0); // Ammunation employee
}

public OnPlayerRequestClass(playerid, classid)
{
    switch (classid)
    {
        case CLUCKING_BELL_CLASS:
        {
            GameTextForPlayer(playerid, "Clucking Bell", 5000, 3);
        }
        case AMMUNATION_CLASS:
        {
            GameTextForPlayer(playerid, "Ammunation", 5000, 3);
        }
    }
    return 1;
}

dcmd_chicken(playerid, params[])
{
    if (gClass[playerid] != CLUCKING_BELL_CLASS) return SendClientMessage(playerid, 0xFF0000AA, "Sorry, you're not a Clucking Bell employee");
    ...
}
Now when you add a new skin to the beginning of the list, or when you run a filterscript with it's own skins, you need only modify one part of your mode to reflect this change. However this can still cause problems if you can't guarantee that a return will always be constant.
  • Code defensively
The only way to guarantee that you'll never run into problems is to not make any assumptions at all. Save return values and use those known saves, don't try to guess what it might be:

pawn Код:
new
    gCluckingBellSkin = -1,
    gAmmunationSkin = -1;

public OnGameModeInit()
{
    gCluckingBellSkin = AddPlayerClass(167, 0.0, 0.0, 5.0, 0.0, 0, 0, 0, 0, 0, 0); // Clucking Bell employee
    gAmmunationSkin = AddPlayerClass(179, 0.0, 0.0, 5.0, 0.0, 0, 0, 0, 0, 0, 0); // Ammunation employee
}

public OnPlayerRequestClass(playerid, classid)
{
    if (classid == gCluckingBellSkin)
    {
        GameTextForPlayer(playerid, "Clucking Bell", 5000, 3);
    }
    else if (classid == gAmmunationSkin)
    {
        GameTextForPlayer(playerid, "Ammunation", 5000, 3);
    }
    return 1;
}

dcmd_chicken(playerid, params[])
{
    if (gClass[playerid] != gCluckingBellSkin) return SendClientMessage(playerid, 0xFF0000AA, "Sorry, you're not a Clucking Bell employee");
    ...
}
It now doesn't matter what the return values may or may not be because there's no way they'll change between being saved and used.
  • Important
There are times when assumptions are OK, as I said if you know your mode won't be released and you know you yourself won't modify it, or are willing to accept all the extra work involved in modifying it, then go for it. But don't be surprised when, some time down the line, everything breaks because you missed something.

Memory reduction

Recall this code from earlier:

pawn Код:
new
    gIsACar[] = {1, 0, 0, 1, 1, 0, 1, 0, 1, 1},
    gIsAHeavyVehicle[] = {1, 0, 1, 0, 0, 0, 0, 0, 1, 0},
    gIsABoat[] = {0, 1, 0, 0, 0, 0, 0, 1, 0, 0},
    gIsAFireEngine[] = {0, 0, 1, 0, 0, 1, 0, 0, 0, 0};
In PAWN all variables are 32bits big, that means they can store up to 4294967296, this code is only storing 0 or 1 - you can do that in a single bit. There are ten vehicles, each with four pieces of information (ignoring mutually exclusive information like IsACar/IsABoat for now), that's only 40 pieces of binary information (i.e. 40 true/falses), at 32bits per cell that's two cells worth of data stored in 40 cells (5 bytes of data (bound to 8 bytes) stored in 160 bytes - a 32fold increase and VAST waste). There are a few ways to easily reduce this use (although none listed here will attain full compression). The first is to mark all vehicles with an attribute in a single variable, the second is to mark all vehicle attributes in a single variable.
  • All vehicles
Each vehicle is either something or it isn't; for this example they're either a car or they're not. There are also 10 vehicles, this means there are 10 bits of binary data which we'll be storing in a single variable, that's still a waste of 22 bits, but that's better than 310, and up to 32 vehicles will reduce waste, not increase it.

Код:
Bit: 0 1 2 3 4 5 6 7  8  9  10 ...     31
Val: 1 2 4 8 16 32 64 128 256 512 1024 ... 2147483648
1/0: 1 0 0 1 1 0 1  0  1  1  x ...     x
x means we don't care about the value of these bits as they don't represent vehicles (we could even in theory stick another bit of information in these wasted bits, but won't for now). Assuming all the other bits are 0 then this number is "857", unfortunately this doesn't mean a lot looking at it in decimal, so we need to read it in binary.

Let's say we want to find out if vehicle model 403 is a car. Firstly we subtract 400 to get the number in range then we need to test the fourth bit (0, 1, 2, 3). The fourth bit has a value of eight, so we need to test whether the number 857 has the eight bit set, this is done using bitwise AND:

pawn Код:
if (857 & 8)
{
    // Bit is set, the vehicle is a car
}
Or, using our array (which is now just a variable as we've reduced it to 1 cell):

pawn Код:
if (gIsACar & 8)
{
    // Bit is set, the vehicle is a car
}
That's great, but currently this will produce code like:

pawn Код:
switch (model - 400)
{
case 0: if (gIsACar & 1) ...
case 1: if (gIsACar & 2) ...
case 2: if (gIsACar & 4) ...
case 3: if (gIsACar & 8) ...
case 4: if (gIsACar & 16) ...
Which is pointless as if you were going to go to that trouble you could just do:

pawn Код:
switch (model - 400)
{
    case 0: return true;
    case 1: return false;
    case 2: return false;
    case 3: return true;
    case 4: return true;
}
We need some way to generate the bit from the model. 1 is 2^0 (where ^ is "to the power of", not XOR), 2 is 2^1, 4 is 2^2 and 8 is 2^3, so 8, the bit we want, is 2^3, where 3 is 403-400:

pawn Код:
new
    bit = 1;
model -= 400;
for (new i = 0; i < model; i++)
{
    bit *= 2;
}
That will get the correct result, but there's a far simpler way of doing two to the power of n, the left shift operator:

Код:
8 = 1 << 3
So now we can just do:

pawn Код:
if (gIsACar & (1 << (model - 400)))
{
    return true;
}
return false;
Better than that, if checks a boolean which when true we return true and when false we return false, so we can skip the check entirely:

pawn Код:
stock IsACar(model)
{
    return gIsACar & (1 << (model - 400));
}
That's now such a small function you could just make it a define:

pawn Код:
#define IsACar(%0) \
    (gIsACar & (1 << ((%0) - 400)))
  • All attributes
Every vehicle has a mark for if they're a car, a mark for if they're a boat, a mark for if they're a fire engine and a mark for if they're heavy. This is 4 bits of information per vehicle, and as we've just shown you can store multiple pieces of information in a single cell using bits, where each bit is a single true/false and we can access those bits individually. So what if, instead of using each bit for a different vehicle, we used each bit for a different attribute; so if bit 0 is set it's a car, if bit 1 is set it's a boat, bit 2 is heavy and bit 3 is fire engine? Let's look at an example:

Model 402 is a heavy fire engine, it's not a car or boat. So we have bits 2 and 3 set, that's 4 and 8, 4 + 8 (technically 4 | is 12, so we have:

pawn Код:
new gModel402 = 12;
Now we want to check if it's a boat, that's bit 2, or the number 2:

pawn Код:
return gModel402 & 2;
This will return false (if it's true it will return TWO, not ONE, as many people incorrectly check for).

This can be turned into an array of attributes for all models:

pawn Код:
new gModels[] = {x, x, 12, x, x, x, x, x, x, x};
(x means unknown other)

pawn Код:
return gModels[model - 400] & 2;
There is no way to simplify the 2, you would simply have functions like:

pawn Код:
#define IsACar(%0) \
    (gModels[(%0) - 400] & 1)

#define IsABoat(%0) \
    (gModels[(%0) - 400] & 2)

#define IsAHeavy(%0) \
    (gModels[(%0) - 400] & 4)
This can't be optimised any more, but it can be made more readable and more easily editable. If I told you that one model was type 10, you would probably have to look at the above lists to see that it was a fire boat, but if I told you that one was a heavy car then you would instantly know what it was. So let's do that instead:

pawn Код:
#define MODEL_CAR  (1)
#define MODEL_BOAT (2)
#define MODEL_HEAVY (4)
#define MODEL_FIRE (8)

new gModels[] =
    {
        x,
        x,
        MODEL_HEAVY | MODEL_FIRE,
        x,
        x,
        x,
        x,
        x,
        x,
        x
    };

#define IsACar(%0) \
    (gModels[(%0) - 400] & MODEL_CAR)

#define IsABoat(%0) \
    (gModels[(%0) - 400] & MODEL_BOAT)

#define IsAHeavy(%0) \
    (gModels[(%0) - 400] & MODEL_HEAVY)
You can now instantly tell what a vehicle is from it's array entry, but there's an even better way of doing this. I'm not going to go into too much detail on this as it's explained in pawn-lang.pdf but you can do:

pawn Код:
enum (<<= 1)
{
    MODEL_CAR = 1,
    MODEL_BOAT,
    MODEL_HEAVY,
    MODEL_FIRE
}
The rest of the code is exactly the same but this is less typing and means you can add something between MODEL_CAR and MODEL_BOAT without needing to update any values.
  • More that 32 values
What happens when you have more than 32 values that you want to store? In this case you need an array indexed by how many times over 32 the value is. So 1 would be cell 0 (as it's not over 32), bit 2, 32 would be 1,0 and 66 would be 2,3. There is code for the simplification of this in YSI in the form of YSI_bit. In this case your code would look something like:

pawn Код:
enum e_MODELS
{
    MODEL_CAR,
    MODEL_BOAT,
    MODEL_HEAVY,
    MODEL_FIRE,
    ...
    MODEL_SOMETHING
}

new Bit:gModels[410 - 400][Bit_Bits(e_MODELS)];

#define IsACar(%0) \
    (Bit_Get(gModels[(%0) - 400], MODEL_CAR))

#define IsSomething(%0) \
    (Bit_Get(gModels[(%0) - 400], MODEL_SOMETHING))
The latest code for this (I rewrote it while doing this tutorial as it made me revisit my own code) can be found here.
  • Excess dimensions
I don't know why people do this, I'm pretty sure they copy it from one of the more common modes, but that doesn't make it right and I don't know why it was done this way in the first place. If you have an array of values, don't waste dimensions. Example:

I have an array of 10 values, let's for the sake of argument call them weapon prices. So we have 10 weapons, each with a price, and we want to store them in an array. Each weapon has an ID, from 0 to 9, so to get that weapon's price you need to access that index in the array:

pawn Код:
new
    gPrices[10] = // 10 weapons, thus 10 prices
    {
        1000,
        2000,
        5000,
        2000,
        10000,
        500,
        3000,
        2000,
        100,
        750
    };
pawn Код:
new
    weaponPrice = gPrices[5];
That's all you need - it's BASIC array access, and yet for some reason people insist on doing the following:

pawn Код:
new
    gPrices[10][1] = // 10 weapons, thus 10 prices
    {
        {1000},
        {2000},
        {5000},
        {2000},
        {10000},
        {500},
        {3000},
        {2000},
        {100},
        {750}
    };
pawn Код:
new
    weaponPrice = gPrices[5][0];
What purpose does the extra dimension serve? None at all! If you had two prices per weapon then yes - you would need the extra dimension, but you don't so you don't - just don't do it, simple as! It's a waste of time - it's slower and a waste of space - it's bigger.

CPU vs Memory

This is such a common factor in writing code that it deserves a special mention. In almost everything you have a choice of how to do something, usually one way will be fast but use a lot of memory and the other will be slow but use very little memory. For example in the foreach example above the foreach code is faster, but uses a big array, IPC is slower but uses next to no memory as it has nothing to store. In all these cases you just have to make the decision based on circumstances. The previous section on memory reduction listed all sorts of ways to use less memory but they all used extra code to do it, making them slower than the original huge arrays. However in this case the reduction was so great (an approximately thirty-two fold reduction in memory use) that it more than offset the minor speed decrease (as stated above also bitwise operators are very fast).

There is, however, a third option - complexity.

In the foreach/IPC example foreach had the greater speed and IPC has the better memory footprint, but it's possible to combine the best of both worlds by writing more complex code. More complex code generally handles more situations explicitly, meaning you don't have to write generic code to handle all possibilities. In the player loop example this would manifest itself as something like:

pawn Код:
if (IsPlayerConnected(0))
{
    // Do something
}
if (IsPlayerConnected(1))
{
    // Do something
}
if (IsPlayerConnected(2))
{
    // Do something
}

...

if (IsPlayerConnected(199))
{
    // Do something
}
Clearly this is very inefficient looking code and very hard to maintain, but you've got rid of the loop and variable, making the code faster and giving it a 0 cell memory footprint. From the table we also know that constants (0, 1 etc) are faster than variables (i).. I've not clocked this version so I don't know which is faster for large numbers of players (there's no contest at low numbers) between it and foreach, but it is definitely faster than an IPC loop. Clearly this example is still better with more memory usage but there are circumstances where it may not be.

Lists

This and the next setion are basically ripped straight from the wiki as I wrote it there in the first place.

Lists are a very useful type of structure, they're basically an array where the next piece or relevant data is pointed to by the last piece.

Example:

Say you have the following array:

Код:
3, 1, 64, 2, 4, 786, 2, 9
If you wanted to sort the array you would end up with:

Код:
1, 2, 2, 3, 4, 9, 64, 786
If however you wanted to leave the data in the original order but still know the numbers in order for some reason (it's just an example), you have a problem, how are you meant to have numbers in two orders at once? This would be a good use of lists. To construct a list from this data you would need to make the array into a 2d array, where the second dimension was 2 cells big, one containing the original number, the other containing the index of the next largest number. You would also need a separate variable to hold the index of the lowest number, so your new array would look like:

Код:
start = 1
3, 1, 64, 2, 4, 786, 2, 9
4, 3, 5, 6, 7, -1, 0, 2
The next index associated with 786 is -1, this is an invalid array index and indicates the end of the list, i.e. there are no more numbers. The two 2's could obviously be either way round, the first one in the array is the first on in the list too as it's the more likely one to be encountered first.

The other advantage of this method of sorting the numbers is adding more numbers is a lot faster. If you wanted to add another number 3 to the sorted array you would need to first shift at least 4 numbers one slot to the right to make space, not terrible here but very slow in larger arrays. With the list version you could just append the 3 to the end of the array and modify a single value in the list;

Код:
start = 1
3, 1, 64, 2, 4, 786, 2, 9, 3
8, 3, 5, 6, 7, -1, 0, 2, 4
^ modify this value    ^ next highest slot
None of the other numbers have moved so none of the other indexes need updating, just make the next lowest number point to the new number and make the new number point the number the next lowest used to be pointing to. Removing a value is even easier:

Код:
start = 1
3, 1, 64, X, 4, 786, 2, 9, 3
8, 6, 5, 6, 7, -1, 0, 2, 4
  ^ Changed to jump over the removed value
Here the first 2 has been removed and the number which pointed to that number (the 1) has been updated to point to the number the removed number was pointing to. In this example neither the removed number's pointer nor number have been removed, but you cannot possibly get to that slot following the list so it doesn't matter, it is effectively removed.
  • Types
The lists in the examples above were just basic single lists, you can also have double lists where every value points to the next value and the last value, these tend to have a pointer to the end of the list too to go backwards (e.g. to get the numbers in descending order):

Код:
start = 1
end = 5
value: 3, 1, 64, 2, 4, 786, 2, 9, 3
next: 8, 3, 5, 6, 7, -1, 0, 2, 4
last: 6, -1, 7, 1, 8, 2,  3, 4, 0
You have to be careful with these, especially when you have more than one of any value, that the last pointer points to the number who's next pointer goes straight back again, e.g this is wrong:

Код:
2, 3, 3
1, 2, -1
-1, 2, 0
The 2's next pointer points to the 3 in slot one, but that 3's last pointer doesn't go back to the two, both lists are in order on their own (as the two threes can be either way round) but together they are wrong, the correct version would be:

Код:
2, 3, 3
1, 2, -1
-1, 0, 1
Both of those lists start and end on the end two numbers, the back list in the wrong example started on the middle number.

The other type of list is the looping one where the last value points back to the first. The obvious advantage to this is that you can get to any value from any other value without knowing in advance whether the target is before or after the start point, you just need to be careful not to get into an infinite loop as there's no explicit -1 end point. These lists do still have start points. You can also do double looping lists where you have a next and last list, both of which loop round:

Код:
start = 1
end = 5
3, 1, 64, 2, 4, 786, 2, 9, 3
8, 3, 5, 6, 7, 1,  0, 2, 4
6, 5, 7, 1, 8, 2,  3, 4, 0
  • Mixed lists
Mixed lists are arrays containing multiple lists at once. An example could be an array of values, sorted by a list, with another list linking all unused slots so you know where you can add a new value. Example (X means unused (free) slot):

Код:
sortedStart = 3
unusedStart = 1
value: 34, X, X, 6, 34, 46, X, 54, 23, 25, X, 75, X, 45
sort: 4,    8, 13, 7,   11, 9, 0,   -1,  5
free:   2, 6,      10,       12,   -1
Obviously the two lists never interact so both can use the same slot for their next value:

Код:
sortedStart = 3
unusedStart = 1
value: 34, X, X, 6, 34, 46, X, 54, 23, 25, X, 75, X, 45
next: 4, 2, 6, 8, 13, 7, 10, 11, 9, 0, 12, -1, -1, 5
  • Code
Before you start the code you need to decide what sort of list is best suited for your application, this is entirely based on application can't easily be covered here. All these examples are mixed lists, one list for the required values, one for unused slots.

This example shows how to write code for a list sorted numerically ascending.

pawn Код:
#define NUMBER_OF_VALUES (10)

enum E_DATA_LIST
{
    E_DATA_LIST_VALUE,
    E_DATA_LIST_NEXT
}

new
    gListData[NUMBER_OF_VALUES][E_DATA_LIST],
    gUnusedStart = 0,
    gListStart = -1; // Starts off with no list

// This function initializes the list
List_Setup()
{
    new
        i;
    size--;
    for (i = 0; i < size; i++)
    {
        // To start with all slots are unused
        gListData[i][E_DATA_LIST_NEXT] = i + 1;
    }
    // End the list
    gListData[size][E_DATA_LIST_NEXT] = -1;
}

// This function adds a value to the list (using basic sorting)
List_Add(value)
{
    // Check there are free slots in the array
    if (gUnusedStart == -1) return -1;
    new
        pointer = gListStart,
        last = -1
        slot = gUnusedStart;
    // Add the value to the array
    gListData[slot][E_DATA_LIST_VALUE] = value;
    // Update the empty list
    gUnusedStart = gListData[slot][E_DATA_LIST_NEXT];
    // Loop through the list till we get to bigger/same size number
    while (pointer != -1 && gListData[pointer][E_DATA_LIST_VALUE] < value)
    {
        // Save the position of the last value
        last = pointer
        // Move on to the next slot
        pointer = gListData[pointer][E_DATA_LIST_NEXT];
    }
    // If we got here we ran out of values or reached a larger one
    // Check if we checked any numbers
    if (last == -1)
    {
        // The first number was bigger or there is no list
        // Either way add the new value to the start of the list
        gListData[slot][E_DATA_LIST_NEXT] = gListStart;
        gListStart = slot;
    }
    else
    {
        // Place the new value in the list
        gListData[slot][E_DATA_LIST_NEXT] = pointer;
        gListData[last][E_DATA_LIST_NEXT] = slot;
    }
    return slot;
}

// This function removes a value from a given slot in the array (returned by List_Add)
List_Remove(slot)
{
    // Is this a valid slot
    if (slot < 0 || slot >= NUMBER_OF_VALUES) return 0;
    // First find the slot before
    new
        pointer = gListStart,
        last = -1;
    while (pointer != -1 && pointer != slot)
    {
        last = pointer;
        pointer = gListData[pointer][E_LIST_DATA_NEXT];
    }
    // Did we find the slot in the list
    if (pointer == -1) return 0;
    if (last == -1)
    {
        // The value is the first in the list
        // Skip over this slot in the list
        gListStart = gListData[slot][E_LIST_DATA_NEXT];
    }
    else
    {
        // The value is in the list
        // Skip over this slot in the list
        gListData[last][E_LIST_DATA_NEXT] = gListData[slot][E_LIST_DATA_NEXT];
    }
    // Add this slot to the unused list
    // The unused list isn't in any order so this doesn't matter
    gListData[slot][E_LIST_DATA_NEXT] = gUnusedStart;
    gUnusedStart = slot;
    return 1;
}
Binary trees

Binary trees are a very fast method of searching for data in an array by using a very special list system. The most well known binary tree is probably the 20 questions game, with just 20 yes/no questions you can have over 1048576 items. A binary tree, as it's name implies, is a type of tree, similar to a family tree, where every item has 0, 1 or 2 children. They are not used for ordering data like a list but sorting data for very efficient searching. Basically you start with an item somewhere near the middle of the ordered list of objects (e.g. the middle number in a sorted array) and compare that to the value you want to find. If it's the same you've found your item, if it's greater you move to the item to the right (not immediately to the right, the item to the right of the middle item would be the item at the three quarter mark), if it's less you move left, then repeat the process.
  • Example
Код:
1 2 5 6 7 9 12 14 17 19 23 25 28 33 38
You have the preceding ordered array and you want to find what slot the number 7 is in (if it's in at all), in this example it's probably more efficient to just loop straight through the array to find it but that's not the point, that method increases in time linearly with the size of the array, a binary search time increases linearly as the array increases exponentially in size. I.e. an array 128 big will take twice as long to search straight through as an array 64 big, but a binary search 128 big will only take one check more than a binary search 64 big, not a lot at all.

If we construct a binary tree from the data above we get:



If you read left to right, ignoring the vertical aspect you can see that the numbers are in order. Now we can try find the 7.

The start number is 14, 7 is less than 14 so we go to the slot pointed to by the left branch of 14. This brings us to 6, 7 is bigger than 6 so we go right to 9, then left again to 7. This method took 4 comparisons to find the number (including the final check to confirm that we are on 7), using a straight search would have taken 5.

Lets say there is no 7, we would end up with this binary tree:



This, unlike the example above, has a single child number (the 9), as well as 2 and 0 child numbers. You only get a perfect tree when there are (2^n)-1 numbers (0, 1, 3, 7, 15, 31 ...), any other numbers will give a not quite full tree. In this case when we get to the 9, where the 7 will be, we'll find there is no left branch, meaning the 7 doesn't exist (it cannot possibly be anywhere else in the tree, think about it), so we return -1 for invalid slot.
  • Balanced and unbalanced
The trees in the examples above are called balanced binary trees, this means as near as possible all the branches are the same length (obviously in the second there aren't enough numbers for this to be the case but it's as near as possible). Constructing balanced trees is not easy, the generally accepted method of constructing almost balanced trees is putting the numbers in in a random order, this may mean you end up with something like this:



Obviously this tree is still valid but the right side is much larger than the left, however finding 25 still only takes 7 comparisons in this compared to 12 in the straight list. Also, as long as you start with a fairly middle number the random insertion method should produced a fairly balanced tree. The worst possible thing you can do is put the numbers in in order as then there will be no left branches at all (or right branches if done the other way), however even in this worst case the binary tree will take no longer to search than the straight list.
  • Modification
  • Addition
Adding a value to a binary tree is relatively easy, you just follow the tree through, using the value you want to add as a reference untill you reach an empty branch and add the number there. E.g. if you wanted to add the number 15 to our original balanced tree it would end up on the left branch of the 17. If we wanted to add the number 8 to the second balanced tree (the one without the 7) it would end up in the 7's old slot on the left of the 9.
  • Deletion
Deleting a number from a binary tree can be hard or it can be easy. If the number is at the end of a branch (e.g. 1, 5, 7, 12 etc in the original tree) you simply remove them. If a number only has one child (e.g. the 9 in the second example) you simply move that child (e.g. the 12) up into their position (so 6's children would be 2 and 12 in the new second example with 9 removed). Deletion only gets interesting when a node has two children. There are at least four ways of doing this:

The first method is the simplest computationally. Basically you choose one of the branches (left or right, assume right for this explanation) and replace the node you've removed with the first node of that branch (i.e. the right child of the node you've removed). You then go left through than new branch till you reach the end and place the left branch there. E.g. if you removed the 14 from the original example you would end up with 25 taking it's place at the top of the tree and 6 attached to the left branch of 17. This method is fast but ends up with very unbalanced trees very quickly.

The second method is to get all the numbers which are children of the node you just removed and rebuild a new binary tree from them, then put the top of that tree into the node you've just removed. This keeps the tree fairly well balanced but is obviously slower.

The third method is to combine the two methods above and rebuild the tree in-line, this is more complex to code but keeps the tree balanced and is faster than the second method (though no-where near as fast as the first).

The final method listed here is to simply set a flag on a value saying it's not used any more, this is even faster than the first method and maintains the structure but means you can't re-use slots unless you can find a value to replace it with later.
Reply
#2

Very nice, is this a ****** restoration or your own?
Reply
#3

Restoration. There were additional posts, but ****** cache and wayback machine have problems with pages further down the road...
Reply
#4

Great work reposting this, I needed it.
But actually the first four topics refer to an external thread that don't exist... So can you ?
Reply
#5

Thank you i'll read it :3
Reply
#6

omg this is hugee! but yeah nice work with restoration
Reply
#7

I thought I remembered this topic being longer... :/

Anyways, I've been looking everywhere for this! Then I realised that different regions spelt 'optimization' differently. I decided to compare the usage of the two spellings in ****** trends...

https://www.******.com/trends/explor...=Etc%2FGMT%2B4

It seems 'optimisation' is mostly used in Tunisia, Cфte d’Ivoire, France, Morocco, Algeria; while 'optimization' is mostly used in... basically everywhere else. My way of spelling it, 'optimization' (which is why I couldn't find this thread, damned Y-Less), is used more than Y-Less's European way, 'optimisation'.
Reply
#8

That was such a fun read, great tutorial.
Reply
#9

Can some mod/admin pin these topic? Its very useful for everyone...
Reply
#10

Wow, i needed it so much, thanks for repost.
Reply
#11

I just want to say that a lot of the information in this topic is now obsolete. The general approach to optimisation is:

1) Just write your code, try make it good, but don't waste time making it excellent. A lot of these tips sacrifice readability for TINY improvements you will never notice.

2) If you actually notice some code running slowly, profile it with the profiler plugin. That will tell you exactly what code is truly running slowly.

3) Once you know which part is slow, you can improve the ALGORITHMS! You could write a bubble sort in hand-optimised x86 assembly, but it will still be beaten by a PAWN version of quicksort, because quicksort is just a better way of doing things!

4) Only then should you use the tips listed here to get the most out of the worst offending pieces of code.

You could spend hours making a function 50% faster, but if it is only called once an hour it doesn't matter. On the other hand, if you determine that another function is called 50 times a second, then just a 0.5% improvement will be far more useful!
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)