14.04.2015, 19:43
Aaaand credits to Y_Less..
Contents
First Post
Information on the internal implementation of various parts of "y_iterate".
Introduction - A brief word on what this tutorial is about and a recap.
Multi-Dimensional Iterators - A review of multi-dimensional iterators required for later.
Macros - How the "Iterator:" macro works.
Linked Lists - An explanation of the data type used by iterators.
Start Point - How "foreach" loops over arbitrary arrays.
Improvements - Why "foreach" is better than "for" in most cases.
Generic Loop Functions - Using iterators without "foreach".
Reverse Iteration - Going backwards through iterators.
Second Post
Function list.
Functions - A list of all the functions provided by "y_iterate".
Third Post
An introduction to "special iterators" - those that are defined by functions and
not by using "Iterator:".
Special Iterators - An introduction to writing special iterator functions.
Example 1 - Writing an iterator for every positive even integer.
Explanation - A more detailed look at the code for Example 1.
Example 2 - Writing an iterator for every positive even integer.
Example 3 - Writing an iterator for RCON admins, which can be done multiple ways.
Fourth Post
More advanced special iterators.
Iterators In Iterators - Using iterators to create new iterators.
Multi-Dimensional Special Iterators - Special iterators that look like multi-dimensional iterators.
More Parameters - Passing more than one parameter to a special iterator.
Function-Like Iterators - Special iterators that look like regular functions.
Fifth Post
A detailed look at the macros that make up "y_iterate" and make all these
different iterator types work together.
"foreach" Types - A review of all the "foreach" syntax forms.
Redefining "new" - "y_iterate"'s dirty little secret.
Y_FOREACH_SECOND - Detecting one syntax form.
Y_FOREACH_THIRD - Detecting a second syntax form.
Why Redefine "new"? - Why the "#define new" hack is required.
The Remainder - All the other syntax form detections.
Introduction
Here we will look at how y_iterate (aka "foreach") works internally - what code
is generated by the numerous macros and why this code is good. To begin with,
lets review declaring and using a standard simple iterator. This will be the
example used throughout this tutorial:
Output:
You can add any value less than the iterator limit to the iterator, so for this
example "7" cannot be added because it is not LESS THAN 7. We also can't add
negative numbers, so the following lines will not work (they will compile, but
do nothing):
Multi-Dimensional Iterators
From the original release topic it should be known that you can have arrays of
iterators (multi-dimensional, or MD, iterators):
Originally this used "IteratorArray:" instead of "Iterator:", but that has since
been changed so you can always use either (generally using just "Iterator:").
As a result, the only important difference is the use of "Iter_Init", the
reasons for which are addressed later. Everywhere you would normally use an
iterator you can instead use a selected dimension of a multi-dimensional
iterator (in exactly the same way as you can normally use a variable or array
slot in the same places).
As a side note. Currently everything but "Iter_Init" supports 3-dimensional
iterator arrays - this function must be called for all multi-dimensional
iterators, but can't be easily if there are more than two dimensions. You can
do:
You can also do:
But you can't currently call "Iter_Init" directly on 3d iterator arrays, despite
the fact that you need to. Instead, you have to do:
A possibly clearer way is:
Note that you can only do even this as of very recently. If you have a slightly
older version the code becomes:
Macros
The "Iterator:" macro generates two variables - one to store the main iterator
data (the array) and the other to store the number of items in the iterator (the
count):
Becomes:
"YSII" stands for "YSI Iterator", "C" stands for "Count", "A" stands for
"Array", and "g" stands for "global" (though this is inaccurate as iterators can
be local variables too). "MyIter@YSII_Cg" is the number of items currently
added to the iterator, and not the total size of the iterator. "MyIter@YSII_Ag"
is the values currently stored in the iterator.
Iterators are "linked lists", which means that each element points to the next
valid element. Before adding anything to the array it is declared using:
This declares the array as 7 cells big to store all the valid data, with an
extra cell to store the start point of the list. The initialisation data is
carefully crafted to be invalid. The use of "- 1" for the second value means
that the compiler attempts to infer a pattern to the array declaration using the
first two values (and determines that the pattern is that each slot is one less
than the previous slot). This means that the array starts off looking like:
Most of those values are greater than 7, so they are invalid. Our extra slot
(the last one) has a value of "7" which is ALMOST valid. The code is designed
so that the last slot ALWAYS contains its own index, which equals the declared
size of the iterator. This is important for the linked list as will be shown
later.
After adding the values "0", "4", and "6" the array will look like:
FOUR slots have changed value - 0, 4, and 6 (the values we added to the
iterator), and slot 7 (our extra slot added by the "Iterator:" macro). Slots
1, 2, 3, and 5 have not been touched so are all still invalid (as shown by the
fact that they are all greater than the maximum valid slot value).
Linked Lists
That's interesting, but why is it like that; and why is slot 0 "4", slot 4 "6"
etc? Each valid slot has a value LESS THAN OR EQUAL TO the iterator size. So
slot 0 is valid and has a value "<= 7" to prove it, as do slots 4, 6, and 7.
But wait, slot 7 CAN'T be valid because it is too big - only slots 0 to 6 are
within the declared range of the iterator, so why does it have a valid value
stored in it? More to the point, why is that last slot initially declared as
"7", when it shouldn't be valid?
Each valid slot's value is the index of the next valid slot. If we start at
slot 0 we get the value "4", jumping to slot 4 gives us the value "6", and slot
6 is the last item we initially added to the iterator:
Already it should be clear how "foreach" can loop over only values that exist,
unlike "for" which must loop over every possible value. From the information
above we can write:
That will loop forever and output:
We need an end condition for the loop - which is exactly what we have the extra
slot at the end for. Slot 6 has the value "7", which is a value we have already
determined should be invalid, and indeed it is. So we can modify our loop to:
New output:
Start Point
Excellent. The linked list works and ends correctly - but does it START
correctly? In this example yes, but in the more general case NO! What happens
if we never added slot 0 to the iterator?
The output SHOULD BE:
But if we use our loop above we get:
The problem is here:
After adding only the values "4" and "6", the array looks like:
But we start at index 0, which has the value "14". We print that value, then
try use that value as an index in to the array. The array has size "7 + 1", so
accessing slot "14" is an out-of-bounds error and won't work (or will crash or
do any other unpredictable things). So instead we need:
And if we never add ANY values to the array we still have the original:
And the resulting required code is below (note that this will end instantly
because the value of slot 7 is "7", so the condition will fail immediately):
In each case, the "new i" value must be initialised to the value stored in the
first valid slot - so we need to determine WHICH is the first valid slot. That
just so happens to be stored in the last slot of the array:
This is, in essence, how "foreach" works, and that loop will now correctly
print all the added values (if any) in all cases.
Improvements
Now we have a basic understanding of how "foreach" works under the hood, lets
look at how the macro itself works and generates significantly better code than
that presented above. The simplest way to present this is to show it:
Doing:
Gives:
This is significantly simplified from the original version, but still
equivalent. The only possibly confusing part is the assignment within the
condition, but this is equivalent to:
The first action in the loop is to get the NEXT item, then use that item. Thus
we have to start with an item that is invalid or else the very first element
would always get skipped because we would have the first item and instantly get
the second item.
Generic Loop Functions
It might help to think of the loop like this:
Or more generically:
But slightly more efficient. It just so happens that "START_SLOT" and
"END_VALUE" are always the same, and always equal to the declared size of the
iterator). We can define and use the generic values:
These macros already exist in y_iterate so you don't even need the "YSII_Ag"
suffixes:
That code is EXACTLY equivalent to the default "foreach" use, plus it gives you
more control over the looping should you so desire:
Using the alternate form:
Granted in this case the second version may be less appealing to look at, but it
does avoid the use of "break" and keeps all the loop end conditions in one
place. It also combines the efficiency of "foreach" with more code.
You may be tempted to "optimise" the loop by calling "Iter_End" just once with:
You can, and that will run, but "Iter_End" is actually a macro not a function
call and doing it this way is a rare case in which that will end up being LESS
efficient unfortunately.
Reverse Iteration
Using similar methods to those above you can also go BACKWARDS through an
iterator:
However, while this was added to "y_iterate" for completeness sake, it is
INCREDIBLY inefficient and you are MUCH better off using this where possible:
This is because iterators are one-way (and very good at doing things one way),
but when you try and go backwards they can still only go forwards so must go all
theway through the whole loop until they hit the current value and know where
they came from for EVERY iteration. Tracing the loop would give something like:
I did add efficient reverse iterators to the code once, but they doubled the
memory usage and were rarely (if ever) used so were basically worthless.
Contents
First Post
Information on the internal implementation of various parts of "y_iterate".
Introduction - A brief word on what this tutorial is about and a recap.
Multi-Dimensional Iterators - A review of multi-dimensional iterators required for later.
Macros - How the "Iterator:" macro works.
Linked Lists - An explanation of the data type used by iterators.
Start Point - How "foreach" loops over arbitrary arrays.
Improvements - Why "foreach" is better than "for" in most cases.
Generic Loop Functions - Using iterators without "foreach".
Reverse Iteration - Going backwards through iterators.
Second Post
Function list.
Functions - A list of all the functions provided by "y_iterate".
Third Post
An introduction to "special iterators" - those that are defined by functions and
not by using "Iterator:".
Special Iterators - An introduction to writing special iterator functions.
Example 1 - Writing an iterator for every positive even integer.
Explanation - A more detailed look at the code for Example 1.
Example 2 - Writing an iterator for every positive even integer.
Example 3 - Writing an iterator for RCON admins, which can be done multiple ways.
Fourth Post
More advanced special iterators.
Iterators In Iterators - Using iterators to create new iterators.
Multi-Dimensional Special Iterators - Special iterators that look like multi-dimensional iterators.
More Parameters - Passing more than one parameter to a special iterator.
Function-Like Iterators - Special iterators that look like regular functions.
Fifth Post
A detailed look at the macros that make up "y_iterate" and make all these
different iterator types work together.
"foreach" Types - A review of all the "foreach" syntax forms.
Redefining "new" - "y_iterate"'s dirty little secret.
Y_FOREACH_SECOND - Detecting one syntax form.
Y_FOREACH_THIRD - Detecting a second syntax form.
Why Redefine "new"? - Why the "#define new" hack is required.
The Remainder - All the other syntax form detections.
Introduction
Here we will look at how y_iterate (aka "foreach") works internally - what code
is generated by the numerous macros and why this code is good. To begin with,
lets review declaring and using a standard simple iterator. This will be the
example used throughout this tutorial:
pawn Code:
// Declare the iterator:
new
Iterator:MyIter<7>;
// Add any value between 0 and 6 to the iterator:
Iter_Add(MyIter, 0);
Iter_Add(MyIter, 4);
Iter_Add(MyIter, 6);
// Use the iterator:
foreach (new i : MyIter)
{
printf("%d", i);
}
pawn Code:
Code:
0
4
6
example "7" cannot be added because it is not LESS THAN 7. We also can't add
negative numbers, so the following lines will not work (they will compile, but
do nothing):
pawn Code:
Iter_Add(MyIter, -11);
Iter_Add(MyIter, 7);
Iter_Add(MyIter, 100);
From the original release topic it should be known that you can have arrays of
iterators (multi-dimensional, or MD, iterators):
pawn Code:
new
Iterator:OwnedVehicle[MAX_PLAYERS]<MAX_VEHICLES>;
Iter_Init(OwnedVehicle);
// Add vehicles to players here...
Iter_Add(OwnedVehicle[playerid], 42);
Iter_Add(OwnedVehicle[playerid], 43);
Iter_Add(OwnedVehicle[playerid], 44);
// Other code...
foreach (new vehicleid : OwnedVehicle[playerid])
{
printf("Player %d owns vehicle %d", playerid, vehicleid).
}
been changed so you can always use either (generally using just "Iterator:").
As a result, the only important difference is the use of "Iter_Init", the
reasons for which are addressed later. Everywhere you would normally use an
iterator you can instead use a selected dimension of a multi-dimensional
iterator (in exactly the same way as you can normally use a variable or array
slot in the same places).
As a side note. Currently everything but "Iter_Init" supports 3-dimensional
iterator arrays - this function must be called for all multi-dimensional
iterators, but can't be easily if there are more than two dimensions. You can
do:
pawn Code:
new
Iterator:Iter2[5]<10>;
Iter_Init(Iter2);
Iter_Add(Iter2[3], 7);
pawn Code:
new
Iterator:Iter3[5][8]<10>;
//Iter_Init(Iter3);
Iter_Add(Iter3[3][6], 7);
the fact that you need to. Instead, you have to do:
pawn Code:
new
Iterator:Iter3[5][8]<10>;
for (new i = 0; i != Iter_InternalSize(Iter3); ++i)
{
Iter_Init(Iter3[i]);
}
Iter_Add(Iter3[3][6], 7);
pawn Code:
#define SIZE 5
new
Iterator:Iter3[SIZE][8]<10>;
for (new i = 0; i != SIZE; ++i)
{
Iter_Init(Iter3[i]);
}
Iter_Add(Iter3[3][6], 7);
older version the code becomes:
pawn Code:
for (new i = 0; i != Iter_InternalSize(Iter3); ++i)
{
Iter_InitInternal(Iter_InternalArray(Iter3[i]), Iter_InternalSize(Iter3[]), Iter_InternalSize(Iter3[][]) - 1);
}
The "Iterator:" macro generates two variables - one to store the main iterator
data (the array) and the other to store the number of items in the iterator (the
count):
pawn Code:
new
Iterator:MyIter<7>;
pawn Code:
new
// Number of items currently in the iterator.
MyIter@YSII_Cg,
// The data in the iterator, plus internal data (all slots invalidated).
MyIter@YSII_Ag[7 + 1] = {14, 13, ...};
"Array", and "g" stands for "global" (though this is inaccurate as iterators can
be local variables too). "MyIter@YSII_Cg" is the number of items currently
added to the iterator, and not the total size of the iterator. "MyIter@YSII_Ag"
is the values currently stored in the iterator.
Iterators are "linked lists", which means that each element points to the next
valid element. Before adding anything to the array it is declared using:
pawn Code:
MyIter@YSII_Ag[7 + 1] = {7 * 2, 7 * 2 - 1, ...};
extra cell to store the start point of the list. The initialisation data is
carefully crafted to be invalid. The use of "- 1" for the second value means
that the compiler attempts to infer a pattern to the array declaration using the
first two values (and determines that the pattern is that each slot is one less
than the previous slot). This means that the array starts off looking like:
pawn Code:
14, 13, 12, 11, 10, 9, 8, 7
(the last one) has a value of "7" which is ALMOST valid. The code is designed
so that the last slot ALWAYS contains its own index, which equals the declared
size of the iterator. This is important for the linked list as will be shown
later.
After adding the values "0", "4", and "6" the array will look like:
pawn Code:
4, 13, 12, 11, 6, 9, 7, 0
iterator), and slot 7 (our extra slot added by the "Iterator:" macro). Slots
1, 2, 3, and 5 have not been touched so are all still invalid (as shown by the
fact that they are all greater than the maximum valid slot value).
Linked Lists
That's interesting, but why is it like that; and why is slot 0 "4", slot 4 "6"
etc? Each valid slot has a value LESS THAN OR EQUAL TO the iterator size. So
slot 0 is valid and has a value "<= 7" to prove it, as do slots 4, 6, and 7.
But wait, slot 7 CAN'T be valid because it is too big - only slots 0 to 6 are
within the declared range of the iterator, so why does it have a valid value
stored in it? More to the point, why is that last slot initially declared as
"7", when it shouldn't be valid?
Each valid slot's value is the index of the next valid slot. If we start at
slot 0 we get the value "4", jumping to slot 4 gives us the value "6", and slot
6 is the last item we initially added to the iterator:
pawn Code:
Iter_Add(MyIter, 0);
Iter_Add(MyIter, 4);
Iter_Add(MyIter, 6);
unlike "for" which must loop over every possible value. From the information
above we can write:
pawn Code:
for (new i = MyIter@YSII_Ag[0]; ; i = MyIter@YSII_Ag[i])
{
printf("%d", i);
}
pawn Code:
0
4
6
7
0
4
6
7
...
slot at the end for. Slot 6 has the value "7", which is a value we have already
determined should be invalid, and indeed it is. So we can modify our loop to:
pawn Code:
for (new i = MyIter@YSII_Ag[0]; i != 7; i = MyIter@YSII_Ag[i])
{
printf("%d", i);
}
pawn Code:
0
4
6
Excellent. The linked list works and ends correctly - but does it START
correctly? In this example yes, but in the more general case NO! What happens
if we never added slot 0 to the iterator?
pawn Code:
new
Iterator:MyIter<7>;
Iter_Add(MyIter, 4);
Iter_Add(MyIter, 6);
foreach (new i : MyIter)
{
printf("%d", i);
}
pawn Code:
4
6
pawn Code:
14
<crash>
pawn Code:
new i = MyIter@YSII_Ag[0];
pawn Code:
14, 13, 12, 11, 6, 9, 7, 4
try use that value as an index in to the array. The array has size "7 + 1", so
accessing slot "14" is an out-of-bounds error and won't work (or will crash or
do any other unpredictable things). So instead we need:
pawn Code:
for (new i = MyIter@YSII_Ag[4]; i != 7; i = MyIter@YSII_Ag[i])
{
printf("%d", i);
}
pawn Code:
14, 13, 12, 11, 10, 9, 8, 7
because the value of slot 7 is "7", so the condition will fail immediately):
pawn Code:
for (new i = MyIter@YSII_Ag[7]; i != 7; i = MyIter@YSII_Ag[i])
{
printf("%d", i);
}
first valid slot - so we need to determine WHICH is the first valid slot. That
just so happens to be stored in the last slot of the array:
pawn Code:
for (new first = MyIter@YSII_Ag[7], i = MyIter@YSII_Ag[first]; i != 7; i = MyIter@YSII_Ag[i])
{
printf("%d", i);
}
print all the added values (if any) in all cases.
Improvements
Now we have a basic understanding of how "foreach" works under the hood, lets
look at how the macro itself works and generates significantly better code than
that presented above. The simplest way to present this is to show it:
Doing:
pawn Code:
foreach (new i : MyIter)
{
printf("%d", i);
}
pawn Code:
for (new i = 7; (i = MyIter@YSII_Ag[i]) != 7; )
{
printf("%d", i);
}
equivalent. The only possibly confusing part is the assignment within the
condition, but this is equivalent to:
pawn Code:
i = MyIter@YSII_Ag[i];
if (i != 7)
{
printf("%d", i);
}
else break;
we have to start with an item that is invalid or else the very first element
would always get skipped because we would have the first item and instantly get
the second item.
Generic Loop Functions
It might help to think of the loop like this:
pawn Code:
for (new i = MyIter[7]; i != 7; i = MyIter[i])
{
}
pawn Code:
for (new i = MyIter[START_SLOT]; i != END_VALUE; i = MyIter[i])
{
}
"END_VALUE" are always the same, and always equal to the declared size of the
iterator). We can define and use the generic values:
pawn Code:
#define START_SLOT(%0) (sizeof (%0) - 1)
#define END_VALUE(%0) (sizeof (%0) - 1)
for (new i = START_SLOT(MyIter@YSII_Ag); (i = MyIter@YSII_Ag[i]) != END_VALUE(MyIter@YSII_Ag); )
{
printf("%d", i);
}
suffixes:
pawn Code:
for (new i = Iter_Begin(MyIter); (i = Iter_Next(MyIter, i)) != Iter_End(MyIter); )
{
}
more control over the looping should you so desire:
pawn Code:
stock PrintFivePlayers()
{
new
printed = 0;
foreach (new i : Player)
{
printf("%d", i);
// Limit the display.
++printed;
if (printed == 5) break;
}
}
pawn Code:
stock PrintFivePlayers()
{
for (new i = Iter_Begin(Player), printed = 0; printed != 5 && (i = Iter_Next(MyIter, i)) != Iter_End(MyIter); ++printed)
{
printf("%d", i);
}
}
does avoid the use of "break" and keeps all the loop end conditions in one
place. It also combines the efficiency of "foreach" with more code.
You may be tempted to "optimise" the loop by calling "Iter_End" just once with:
pawn Code:
stock PrintFivePlayers()
{
for (new i = Iter_Begin(Player), printed = 0, end = Iter_End(MyIter); printed != 5 && (i = Iter_Next(MyIter, i)) != end; ++printed)
{
printf("%d", i);
}
}
call and doing it this way is a rare case in which that will end up being LESS
efficient unfortunately.
Reverse Iteration
Using similar methods to those above you can also go BACKWARDS through an
iterator:
pawn Code:
stock PrintPlayersBackwards()
{
for (new i = Iter_End(Player); (i = Iter_Prev(MyIter, i)) != Iter_Begin(MyIter); )
{
printf("%d", i);
}
}
INCREDIBLY inefficient and you are MUCH better off using this where possible:
pawn Code:
stock PrintPlayersBackwards()
{
for (new i = MAX_PLAYERS; i--; )
{
if (IsPlayerConnected)
{
printf("%d", i);
}
}
}
but when you try and go backwards they can still only go forwards so must go all
theway through the whole loop until they hit the current value and know where
they came from for EVERY iteration. Tracing the loop would give something like:
pawn Code:
7, 0, 4, 6, 7 - Print 6
7, 0, 4, 6 - Print 4
7, 0, 4 - Print 0
7, 0 - End
memory usage and were rarely (if ever) used so were basically worthless.