14.05.2012, 20:30
Packed and unpacked strings
The PAWN language does not have variable types. All variables are "cells" which are typically 32-bit wide (there exist implementations of PAWN that use 64-bit cells). A string is basically an array of cells that holds characters and that is terminated with the special character '\0'.
However, in most character sets a character typically takes only a single byte and a cell typicall is a four-byte entity: storing a single character per cell is then a 75% waste. For the sake of compactness, PAWN supports packed strings, where each cell holds as many characters as fit. In our example, one cell would contain four characters, and there is no space wasted.
At the same time, PAWN also supports unpacked strings where each cell holds only a single character, with the purpose of supporting Unicode or other wide-character sets. The Unicode character set is usually represented as a 16-bit character set holding the 60,000 characters of the Basic Multilingual Plane (BMP), and access to other "planes" trough escape codes. A PAWN script can hold all characters of all planes in a cell, since a cell is typically at least 32-bit, without needing escape codes.
Many programming language solve handling of ASCII/Ansi character sets versus Unicode with their typing system. A function will then work either on one or on the other type of string, but the types cannot be mixed. PAWN, on the other hand, does not have types or a typing system, but it can check, at run time, whether a string a packed or unpacked. This also enables you to write a single function that operates on both packed and unpacked strings. The functions in the String Manipulation Library have been constructed so that they work on packed and unpacked strings.
This tutorial is to better familiarize PAWN structure and more about it...
The PAWN language does not have variable types. All variables are "cells" which are typically 32-bit wide (there exist implementations of PAWN that use 64-bit cells). A string is basically an array of cells that holds characters and that is terminated with the special character '\0'.
However, in most character sets a character typically takes only a single byte and a cell typicall is a four-byte entity: storing a single character per cell is then a 75% waste. For the sake of compactness, PAWN supports packed strings, where each cell holds as many characters as fit. In our example, one cell would contain four characters, and there is no space wasted.
At the same time, PAWN also supports unpacked strings where each cell holds only a single character, with the purpose of supporting Unicode or other wide-character sets. The Unicode character set is usually represented as a 16-bit character set holding the 60,000 characters of the Basic Multilingual Plane (BMP), and access to other "planes" trough escape codes. A PAWN script can hold all characters of all planes in a cell, since a cell is typically at least 32-bit, without needing escape codes.
Many programming language solve handling of ASCII/Ansi character sets versus Unicode with their typing system. A function will then work either on one or on the other type of string, but the types cannot be mixed. PAWN, on the other hand, does not have types or a typing system, but it can check, at run time, whether a string a packed or unpacked. This also enables you to write a single function that operates on both packed and unpacked strings. The functions in the String Manipulation Library have been constructed so that they work on packed and unpacked strings.
This tutorial is to better familiarize PAWN structure and more about it...