[Include] i_lstr - Length-prefixed backward-compatible strings
#1

i_lstr
Download
About
This include introduces a new type of string, internally called an "lstring". This kind of a string contains along the character data also the string's length, which means it can store the '\0' character and obtaining its length is done in constant time. This include also contains functions to manipulate lstrings like normal strings, deleting, inserting and file manipulation. Due to the way the length is stored in the string, it can be passed to normal SA-MP natives without any issue (although it will be cropped after the first null character in it).

Introduction
Null-terminated strings (using the '\0' character to mark the end of a string) are a relic from the old days where memory was scarce. Although they are able to store potentially unlimited number of characters, the null character cannot be used in them, creating a potential security issue.

Pawn arrays are formed from cells, each 4 bytes wide. However, most of you use unpacked strings, which simply fill only the lower 8 bits (1 byte) in a string cell, making your script size 4 times larger than it should be. In contrast, packed strings can stuff 4 characters into a single cell (starting from the most significant byte in the cell). You can prefix any string with '!' (like !"Hello world") and it will denote a packed string.

You can pass both packed and unpacked strings to functions in SA-MP and they will work as usual. How does the server know what type string you've provided? By looking at the most significant byte in a cell. As packed strings start character data from this position, it will always be non-zero and contain valid character (or null character if the string is empty).

However, if the top byte is empty, the server recognizes an unpacked string and ignores the two bytes in the middle of the cell. This leaves us with 65536 possible additional "metadata" values for a single character. I have decided to store the length of the string starting at the position of the character there, but other values can be placed there too. Because the server simply ignores the additional data when calling native functions, they will work normally when given an lstring (it is still terminated with a null character for compatibility reasons). Because the length is inserted at all positions in the string, passing a substring to a function (indexing the string) is also an lstring. Thanks to this, you can instantly know the end of the string from any place in it.

Functions
Code:
lstrnew(string[], length=-1)
Initializes a new lstring from a specified (unpacked) string variable. If unspecified, the length is the length of the null-terminated string. Returns true on success, false otherwise (packed string, too long).

Code:
strc(string[], pos)
Denotes a char variable at the specified position in a string (unpacked or lstring). Use code like strc(string, 3) = 'A'; or new c = strc(string, 1); to access individual character data in a string.

Code:
lstrgetc(const lstring[], pos)
Obtains the character at the specified position in an lstring. Returns -1 on error.

Code:
lstrsetc(lstring[], pos, value)
Sets the character value at the specified position in an lstring. Returns false on error, true otherwise.

Code:
lstrsetlen(lstring[], newlength, padding='\0')
Crops or pads an lstring to the specified length, using the specified character as a padding, or the null character. Returns true on success, false otherwise.

Code:
strtype(const string[])
Detects the type of a string (STRING_TYPE_EMPTY, STRING_TYPE_UNPACKED, STRING_TYPE_PACKED or STRING_TYPE_LSTRING).

Code:
lstrlen(const lstring[])
Returns the length of an lstring, or -1 on error.

Code:
stranylen(const string[])
Detects the type of the string and calls the appropriate function (lstrlen or strlen) to get its length.

Code:
lstrdel(lstring[], start, end)
Deletes a part of an lstring (from start to end, inclusive). Returns the number of characters deleted, or -1 on error.

Code:
lstrins(lstring[], const substr[], pos, maxlength=sizeof(lstring))
Inserts a substring (of any type) at a given position in an lstring. Returns the number of inserted characters, or -1 on error.

Code:
lstrcat(ldest[], const source[], maxlength=sizeof(ldest))
Appends a string (of any type) to a given lstring (at the end). Returns the number of appended characters, or -1 on error.

Code:
lfwrite(File:handle, lstring[])
Writes an lstring (in ANSI format) to a file.

Code:
lfread(File:handle, lstring[], size=sizeof(lstring))
Reads a line from a file and stores it in an lstring variable.

Example
Code:
new text[] = "Hello 1 2 3";
lstrnew(text); //Initializes an lstring
strc(text, 5) = '\0'; //Changes the space at text[5] to '\0'
lstrdel(text, 0, 4); //Deletes the first 5 characters
lstrins(text, "Hi", 0); //Inserts "Hi" at their place
printf("%d %d %s", lstrlen(text), lstrgetc(text, 2), text);
This outputs "8 0 Hi", and text is now "Hi\01 2 3"

Download
At the top of this topic.
Reply
#2

very nice job!

Thanks for share it!
Reply
#3

What about text like привет?
Reply
#4

Quote:
Originally Posted by OneDay
View Post
What about text like привет?
SA-MP uses ANSI strings, which means the characters with codes from 128 to 255 are displayed based on your system's language settings. "привет" are letters from the upper half of the character set, so players from different countries will probably see some gibberish instead of these characters. But in ANSI and the correct codepage, they are supported by this include, if that's what you ask.

I would love to be able to use characters like "č", "п" and "銭" in one string, but unfortunately neither the SA-MP client nor the server supports this. However, thanks to the technique I use in this include, they can be represented in Pawn using encodings like UTF-8, UTF-16 or UTF-32. I am going to create a plugin that can handle Unicode strings some day.
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)