Compiler Questions! - What does it do? Why does it take a while? Are there ways to speed it up?
#1

Now before someone posts "it translates human readable code into machine code", I know this already!
I want detailed technical answers!



I've always wondered, I have a basic understanding of what it does but I'd like to know more.

Particularly because since upgrading my PC it takes around the same amount of time to compile, are there any ways of speeding the compile time up? Run parameters? Multicore?



Forgive my "noobish" terms, but that's why I'm posting, to learn more
Reply
#2

The compiler is single-threaded, unfortunately, which means a faster clock speed on the processor and faster RAM memory would be the only way to increase its speed (hardware folks, feel free to correct me on this one).

The first thing the compiler does, as indicated by the name, is pre-process your code. It will basically start from the main file you chose to compile, replace all #include by their actual code (unless #endinput or file already included).
After that, it will go from top to bottom and replace macros. It will never go backwards and replace macros, and it won't do it again (only in the same position).
You can see this output by compiling with the -l (capital L) flag, which generates a single, pre-processed file (can be useful for macro debugging).

After that, it will work that single file from top to bottom, storing all enums, global variables, functions, and such.
If everything checks out, all used functions/variables will then be generated into AMX opcodes (those you write with #emit). Function names and variables becomes mere memory addresses. Constants become single values.

After the AMX opcodes are generated, it start writing the names of public functions/variables/tags and their respective addresses into the AMX file. Also other stuff that I don't feel the need to go into detail on. You can read about this in the PAWN Implementer's Guide if interested.

After writing the initial info, the AMX opcodes will be written, as raw data (not in text like #emit).

If you compiled with -d2/3, debug information will be written at the end of the file. The debug information basically contains names of all functions/variables and what addresses they point to - this is what crashdetect uses to help you find errors.


I didn't go into depth because it'd probably be too overwhelming, and this information was just written off the top of my head so it might not be completely accurate. Ask some questions and I'll try to answer them!
Reply
#3

Thanks Slice, this is great!
It's just the right amount of information, I understand better now. And thanks for that "-l" compile param, I didn't know about that
Reply
#4

If you want to see how macros are resolved one by one, have a look here. Select "Resolve macros" in the drop-down next to the run button.

Some demo macros:
pawn Код:
#define SOMETHING(%1,%2) FUNC_%1(%2)
#define FUNC_Foo(%1) FooFunc(%1 * 5)
#define FUNC_Bar(%1) BarFunc(50, %1 + 20)

SOMETHING(Foo, 20);
SOMETHING(Bar, 30);

#define DURATION(%1)        (DURATION_PT:%1,0)
#define DURATION_PT:%1,     (%1:DURATION)+_:DURATION_PT:

#define second%1:DURATION
#define seconds%1:DURATION
#define minute%1:DURATION   * DURATION_MINUTE
#define minutes%1:DURATION  * DURATION_MINUTE
#define hour%1:DURATION     * DURATION_HOUR
#define hours%1:DURATION    * DURATION_HOUR
#define day%1:DURATION      * DURATION_DAY
#define days%1:DURATION     * DURATION_DAY
#define week%1:DURATION     * DURATION_WEEK
#define weeks%1:DURATION    * DURATION_WEEK
#define month%1:DURATION    * DURATION_MONTH
#define months%1:DURATION   * DURATION_MONTH
#define year%1:DURATION     * DURATION_YEAR
#define years%1:DURATION    * DURATION_YEAR

// Scroll down or pull up the middle bar to see this
new
    g_SomeDuration = DURATION(20 minutes, 1 hour, 20 seconds)
;
Reply
#5

I came across a problem with macro order recently, where I was trying to make a simple macro for admin commands

#define LEVEL 2 at the top of the include with level 2 commands
then in the commands there was this:

ACMD:someadmincommand[LEVEL](playerid, params[])

problem is, it replaced the "ACMD:%0[%1](%2)" structure before the "LEVEL" definition so commands ended up like this

public cmd_someadmincommand_LEVEL(...)
instead of
public cmd_someadmincommand_2(...)


I will have a play around with that online editor, that's a really great tool!

Also, I made an odd discovery, after adding "-l" you mentioned compile times seemed to have improved!
These are the parameters:

"-O1 -d3" = 9.6 seconds
"-O1 -d3 -l" = 2.4 seconds


Again, thanks for this!
Reply
#6

Well, if you add -l it only does pre-processing!
Reply
#7

Ah okay, I thought it compiled as normal as well as putting the pre-compiled data into a file...

I guess I should have worked that out myself! Doing more stuff != taking less time...
Reply
#8

Quote:
Originally Posted by Slice
Посмотреть сообщение
The compiler is single-threaded, unfortunately, which means a faster clock speed on the processor and faster RAM memory would be the only way to increase its speed (hardware folks, feel free to correct me on this one).

The first thing the compiler does, as indicated by the name, is pre-process your code. It will basically start from the main file you chose to compile, replace all #include by their actual code (unless #endinput or file already included).
After that, it will go from top to bottom and replace macros. It will never go backwards and replace macros, and it won't do it again (only in the same position).
You can see this output by compiling with the -l (capital L) flag, which generates a single, pre-processed file (can be useful for macro debugging).

After that, it will work that single file from top to bottom, storing all enums, global variables, functions, and such.
If everything checks out, all used functions/variables will then be generated into AMX opcodes (those you write with #emit). Function names and variables becomes mere memory addresses. Constants become single values.

After the AMX opcodes are generated, it start writing the names of public functions/variables/tags and their respective addresses into the AMX file. Also other stuff that I don't feel the need to go into detail on. You can read about this in the PAWN Implementer's Guide if interested.

After writing the initial info, the AMX opcodes will be written, as raw data (not in text like #emit).

If you compiled with -d2/3, debug information will be written at the end of the file. The debug information basically contains names of all functions/variables and what addresses they point to - this is what crashdetect uses to help you find errors.


I didn't go into depth because it'd probably be too overwhelming, and this information was just written off the top of my head so it might not be completely accurate. Ask some questions and I'll try to answer them!
Interesting, good explanation ^^
Reply
#9

I did discover that very large global arrays can actually slow the compiler down a lot. Most of this speed loss will come from writing the blank arrays to the output file (file ops are generally quite slow) - I recently wrote a script which needed an array of 128Mb and the compiler ground to taking at least a minute to compile compared to seconds for the same script with a much smaller array.
Reply
#10

Quote:
Originally Posted by Y_Less
Посмотреть сообщение
I did discover that very large global arrays can actually slow the compiler down a lot. Most of this speed loss will come from writing the blank arrays to the output file (file ops are generally quite slow) - I recently wrote a script which needed an array of 128Mb and the compiler ground to taking at least a minute to compile compared to seconds for the same script with a much smaller array.
Perhaps you could put a large #pragma dynamic directive then steal half the heap and point the array to it!
Reply
#11

Possibly, though it would make the code slightly more awkward. Specifically this was using y_malloc and though I do have an API that would make this abstraction very possible I actually often ignore that API and use the underlying arrays (though that is very bad practice).
Reply


Forum Jump:


Users browsing this thread: 2 Guest(s)