[Include] y_stringhash - Fast string comparisons.
#1

Introduction

A hash is a unique numerical representation of a string. I'm sure you all know numbers are easier to compare than strings, so this uses hashes to compare strings. Example:

Old code:

Код:
if (!strcmp(str, "first"))
{
    // Do first code.
}
else if (!strcmp(str, "second"))
{
    // Do second code.
}
// etc...
New code:

Код:
switch (YHash(str))
{
    case _H<first>:
    {
        // Do first code.
    }
    case _H<second>:
    {
        // Do second code.
    }
}
Download

This library is a part of YSI, which can be found here. Keep your eye on that topic and your server log for updates.

YSI Download Topic

Functions

There are three main functions (one function, two macros):
  • YHash - This takes a string, for example "cmdtext" in "OnPlayerCommandText" and converts it to a hash. This does runtime strings - that is, strings which are not known at compile time (things people type etc). This returns a value which can be compared to the compile time hashed strings.

    Код:
    YHash(str[], bool:sensitive = true, e_HASH_TYPE:type = bernstein);
    The first parameter is the string to hash. The second parameter is optional, if you set it to "false" the system will do a case-insensitive hash, so "Hello" will match "hello", "hELlo" etc...

    The third parameter is the algorithm used to hash the string. The default is, as before, bernstein. The other options (see below) are "fnv1" and "fnv1a".
  • _H - This is the macro to generate a compile time constant hash. This version is used for case sensitive comparisons. A compile time constant is something which is known when you compile the script. The old version of this had words in round brackets with every letter separated by commas. The new version uses angle brackets and no commas. The new version however is slightly less flexible - it doesn't allow quite as long strings and doesn't support spaces. If you have a space in your word only the first word will be hashed. Note that you can mix and match which version you use if you need longer words or spaces.

    Код:
    abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_
    That's all letters, all numbers, spaces and underscores. Example:

    Код:
    printf("The hash of \"hello\" is: %d", _H<hello>);
  • _I - This is the same as "_H" but gives case insensitive hashes. Note - you should use this whenever possible.
Example:

Код:
switch (YHash(cmd, false))
{
    case _I<ban>:
        // Do the "/ban" command.
    case _I<kick>:
        // Do the "/kick" command.
    case _I<spawn>:
        // Do the "/spawn" command.
}
Update:

This library now includes bernstein, FNV1 and FNV1a hashes. If you get a collision (which the compiler will signal by a "duplicate case label" error), try a different hash. Bernstein is "b", FNV1 is "f" and FNV1a is "a". These letters are placed after the "_H" or "_I" with an "@":

Код:
switch (YHash(cmd, true, fnv1))
{
    case _H@f<ban>:
        // Do the "/ban" command.
    case _H@f<kick>:
        // Do the "/kick" command.
    case _H@f<spawn>:
        // Do the "/spawn" command.
}
Make sure your YHash and _H@ or _I@ match algorithm, otherwise the checks may fail even if they're true.

Important note: If your compiler hangs, consider using the old style on longer strings. This is not well suited to hashing long strings. For example:

Код:
case _I<color22>:
That will not compile because the generated line will be too long (but don't worry, the lines get optimised well, it's just a problem in the intermediate stages). On the other hand this will compile and will give the same answer:

Код:
case _I(c,o,l,o,r,2,2):
This way just doesn't look quite as nice.

Credits
I am reposting this include (made by ******) with the thought of what he said to several members of the community. Everything that could help someone should not be deleted from the forums, which is something I agree with since I learned a lot by reading the tutorials on here. He made a lot of things that are used by lots of servers and that knowledge should not be lost for present and future developers.
Reply
#2

*deleted*
Reply
#3

This is even slower then using strcmp...
Код:
Comparing a string 1000 times in a row using switch().
Loop finished after: 6 milliseconds.
..
Comparing a string 1000 times in a row using strcmp.
Loop finished after: 2 milliseconds.
Код:
Comparing a string 100,000 times in a row using switch().
Loop finished after: 330 milliseconds.
..
Comparing a string 100,000 times in a row using strcmp.
Loop finished after: 134 milliseconds.
Код:
Comparing a string 500,000 times in a row using switch().
Loop finished after: 1636 milliseconds.
..
Comparing a string 500,000 times in a row using strcmp.
Loop finished after: 664 milliseconds.
Code used:
PHP код:
#include <a_samp>
#include <YSI\y_stringhash>
main() {
    
printf("Comparing a string 1000 times in a row using switch().");
    new 
count GetTickCount();
    for(new 
01001i++) {
        new 
string[144];
        
format(string144"something random (%i)"random(10000));
        switch(
YHash(string)) {
            case 
_H<dfsd>: {
                print(
"waow");
            }
            case 
_H<fghfggfhgf>: {
                print(
"waow");
            }
            case 
_H<fghfsdfsdfggfhgf>: {
                print(
"waow");
            }
        }
    }
    
printf("Loop finished after: %i milliseconds."GetTickCount()-count);
    print(
"..");
    
printf("Comparing a string 1000 times in a row using strcmp.");
    
count GetTickCount();
    for(new 
01001i++) {
        new 
string[144];
        
format(string144"something random (%i)"random(10000));
        if(!
strcmp(string"dfsd")) {
            print(
"waow");
        }
        else if(!
strcmp(string"fghfggfhgf")) {
            print(
"waow");
        }
        else if(!
strcmp(string"fghfsdfsdfggfhgf")) {
            print(
"waow");
        }
    }
    
printf("Loop finished after: %i milliseconds."GetTickCount()-count);

(Sorry if I missunderstod this include.)
Reply
#4

Amazing include.
Reply
#5

****** is right but I don't understand the purpose of comparing a lot of strings with this way
PHP код:
main()
{
    new 
count,
        
MAX_LOOP 100000,
        
something;
    print(
"Starting...");
    
count GetTickCount();
    for(new 
iMAX_LOOPi++)
    {
        switch(
YHash("Arrestation"))
        {
            case 
_H<Entreprise> : print("Oki");
            case 
_H<Restaurent> : print("Oki");
            case 
_H<Bar> : print("Oki");
            case 
_H<Vehicule> : print("Oki");
            case 
_H<Bar1> : print("Oki");
            case 
_H<Vehicule1> : print("Oki");
            case 
_H<Bar2> : print("Oki");
            case 
_H<Vehicule2> : print("Oki");
            case 
_H<Bar3> : print("Oki");
            case 
_H<Vehicule3> : print("Oki");
            case 
_H<Bar4> : print("Oki");
            case 
_H<Vehicule4> : print("Oki");
            case 
_H<Bar5> : print("Oki");
            case 
_H<Vehicule5> : print("Oki");
            case 
_H<Arrestation> : something++;
        }
    }
    
count GetTickCount() - count;
    
printf("[1] Finished in %ims"count);
    
count GetTickCount();
    for(new 
iMAX_LOOPi++)
    {
        if(!
strcmp("some_type""Entreprise"))
            print(
"Oki");
        else if(!
strcmp("some_type""Restaurent"))
            print(
"Oki");
        else if(!
strcmp("some_type""Bar"))
            print(
"Oki");
        else if(!
strcmp("some_type""Vehicule"))
            print(
"Oki");
        else if(!
strcmp("some_type""Bar"))
            print(
"Oki");
        else if(!
strcmp("some_type""Vehicule"))
            print(
"Oki");
        else if(!
strcmp("some_type""Bar"))
            print(
"Oki");
        else if(!
strcmp("some_type""Vehicule"))
            print(
"Oki");
        else if(!
strcmp("some_type""Bar"))
            print(
"Oki");
        else if(!
strcmp("some_type""Vehicule"))
            print(
"Oki");
        else if(!
strcmp("some_type""Bar"))
            print(
"Oki");
        else if(!
strcmp("some_type""Vehicule"))
            print(
"Oki");
        else if(!
strcmp("Arrestation""Arrestation"))
            
something++;
    }
    
count GetTickCount() - count;
    
printf("[2] Finished in %ims"count);

Код:
[21:00:16] [1] Finished in 65ms
[21:00:16] [2] Finished in 71ms
Reply
#6

I really like to see this include back on forum!
Reply
#7

Good job.
Reply
#8

What's the "YHash" function string limit? Seems like I can't go over 16 characters. I am trying to do: 'case _H<COLT45_CROUCHFIRE>:'
Reply
#9

Quote:
Originally Posted by Dayrion
Посмотреть сообщение
****** is right but I don't understand the purpose of comparing a lot of strings with this way

Код:
[21:00:16] [1] Finished in 65ms
[21:00:16] [2] Finished in 71ms
The purpose is simple - its faster in runtime.

As the first few lines of the thread say "its faster to compare integer against integer than string"

Hash by itself is a bit of slowdown, I am not sure why ****** didn't save the results (of _H<>) in an array (maybe because of proper code readability), but if you have lots of conditional strings stacked one after another in a linear manner, using y_stringhash can surely speed things up.

Your code does what Meller's one does and you can't really see the benefit of it that way, try to stack all of the conditional one after another without using loop and then benchmark it.


Quote:
Originally Posted by Riddick94
Посмотреть сообщение
What's the "YHash" function string limit? Seems like I can't go over 16 characters. I am trying to do: 'case _H<COLT45_CROUCHFIRE>:'
I guess the only solution is breaking them into two and adding them back together (?)
Reply
#10

I think there is some confusion here - YHash and _H are not the same thing. _H is a compile-time macro that converts constant strings in to constant numbers. The results are not cached in an array as you suggested because there is no need, they are literally just numbers in the compiled code, no strings or hashings are present at run-time. YHash is the run-time equivalent, it takes unknown strings and hashes them while the server is running, for comparison with the pre-computed hashes of _H.

As for the length limits, I suspect you are not actually asking about YHash, which has no length limit for the reasons I just said, but _H, whose length limits are dictated by the compiler's line length limits. If you aren't already using Zeex's compiler, you should be, so try that on instead - it has a vastly increased line length limit. If you are, you could just be trying to has a very complex string, in which case I point you to this quote:

Quote:
Originally Posted by corne
Посмотреть сообщение
Important note: If your compiler hangs, consider using the old style on longer strings. This is not well suited to hashing long strings. For example:

Код:
case _I<color22>:
That will not compile because the generated line will be too long (but don't worry, the lines get optimised well, it's just a problem in the intermediate stages). On the other hand this will compile and will give the same answer:

Код:
case _I(c,o,l,o,r,2,2):
This way just doesn't look quite as nice.
The same applies to _H<> and _H(). The () syntax was the original version before I figured out how to parse arbitrary strings, but I kept it because the macros were MUCH simpler and can handle vastly longer strings. That second syntax is also the only one that supports spaces - you can't have them in the <> syntax.

Also, adding two hashes probably will not give the same result as hashing the concatenation of the two strings (in fact I know it won't - that would totally break the security of hashes, though these are not cryptographically secure ones, just basic comparison ones).
Reply
#11

Quote:
Originally Posted by ******
Посмотреть сообщение
I think there is some confusion here - YHash and _H are not the same thing. _H is a compile-time macro that converts constant strings in to constant numbers. The results are not cached in an array as you suggested because there is no need, they are literally just numbers in the compiled code, no strings or hashings are present at run-time. YHash is the run-time equivalent, it takes unknown strings and hashes them while the server is running, for comparison with the pre-computed hashes of _H.

As for the length limits, I suspect you are not actually asking about YHash, which has no length limit for the reasons I just said, but _H, whose length limits are dictated by the compiler's line length limits. If you aren't already using Zeex's compiler, you should be, so try that on instead - it has a vastly increased line length limit. If you are, you could just be trying to has a very complex string, in which case I point you to this quote:



The same applies to _H<> and _H(). The () syntax was the original version before I figured out how to parse arbitrary strings, but I kept it because the macros were MUCH simpler and can handle vastly longer strings. That second syntax is also the only one that supports spaces - you can't have them in the <> syntax.

Also, adding two hashes probably will not give the same result as hashing the concatenation of the two strings (in fact I know it won't - that would totally break the security of hashes, though these are not cryptographically secure ones, just basic comparison ones).
Very interesting. I have never heard of Zeex's PAWN Compiler patches. Thanks for letting me know and explaining everything deeply. I have always liked you for that.

Everything works as it should.
Reply


Forum Jump:


Users browsing this thread: 2 Guest(s)