PawnScraper -
SyS - 12.11.2018
PawnScraper
Installing
Thanks to Southclaws,plugin installation is now much easier with sampctl
PHP Code:
sampctl p install Sreyas-Sreelal/pawn-scraper
OR
- Download suitable binary files from releases for your operating system
- Add it your plugins folder
- Add PawnScraper to server.cfg or PawnScraper.so (for linux)
- Add pawnscraper.inc in includes folder
Building
- Clone the repo
PHP Code:
git clone https://github.com/Sreyas-Sreelal/pawn-scraper.git
- Compile the plugin using nightly compiler
- Windows
PHP Code:
cargo +nightly-i686-pc-windows-msvc build --release
- Linux
PHP Code:
cargo +nightly-i686-unknown-linux-gnu build --release
API
- ParseHtmlDocument(document[])]
- Params
- document[] - string of html document
- Returns
- Html document instance id
- if failed to parse document INVALID_HTML_DOC is returned
- Example Usage
PHP Code:
new Html:doc = ParseHtmlDocument("\
<!DOCTYPE html>\
<meta charset=\"utf-8\">\
<title>Hello, world!</title>\
<h1 class=\"foo\">Hello, <i>world!</i></h1>\
");
ASSERT(doc != INVALID_HTML_DOC);
DeleteHtml(doc);
- ResponseParseHtml(Response:id)
- Params
- id - Http response id returned from HttpGet
- Returns
- Html document instance id
- if failed to parse document INVALID_HTML_DOC is returned
- Example Usage
PHP Code:
new Response:response = HttpGet("https://www.sa-mp.com");
new Html:doc = ResponseParseHtml(response);
ASSERT(doc != INVALID_HTML_DOC);
DeleteHtml(doc);
- HttpGet(url[],Header:headerid=INVALID_HEADER)
- Params
- url[] - Url of a website
- header - id of header object created using CreateHeader
- Returns
- Response id if successful
- if failed to INVALID_HTTP_RESPONSE is returned
- Example Usage
PHP Code:
new Response:response = HttpGet("https://www.sa-mp.com");
ASSERT(response != INVALID_HTTP_RESPONSE);
DeleteResponse(response);
- HttpGetThreaded(playerid,callback[],url[],Header:headerid=INVALID_HEADER)
- Params
- playerid - id of the player
- callback[] - name of the callback function to handle the response.
- url[] - Url of a website
- header - id of header object created using CreateHeader
- Example Usage
PHP Code:
HttpGetThreaded(0,"MyHandler","https://sa-mp.com");
//********
forward MyHandler(playerid,Response:responseid);
public MyHandler(playerid,Response:responseid){
ASSERT(responseid != INVALID_HTTP_RESPONSE);
DeleteResponse(responseid);
}
- ParseSelector(string[])
- Params
- Returns
- Selector instance id if successful
- if failed to INVALID_SELECTOR is returned
- Example Usage
PHP Code:
new Selector:selector = ParseSelector("h1 .foo");
ASSERT(selector != INVALID_SELECTOR);
DeleteSelector(selector);
- CreateHeader(…)
- Params
- key,value pairs of String type
- Returns
- Header instance id if successful
- if failed to INVALID_HEADER is returned
- Example Usage
PHP Code:
new Header:header = CreateHeader(
"User-Agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
);
ASSERT(header != INVALID_HEADER);
new Response:response = HttpGet("https://sa-mp.com/",header);
ASSERT(response != INVALID_HTTP_RESPONSE);
ASSERT(DeleteHeader(header) == 1);
- GetNthElementName(Html:docid,Selector:selectorid,i dx,string[],size = sizeof(string))
- Params
- docid - Html instance id
- selectorid - CSS selector instance id
- idx - the n’th occurence of element in the document (starts from 0)
- string[] - element name is stored
- size - sizeof string
- Returns
- 1 if successful
- 0 if failed
- Example Usage
PHP Code:
new Html:doc = ParseHtmlDocument("\
<!DOCTYPE html>\
<meta charset=\"utf-8\">\
<title>Hello, world!</title>\
<h1 class=\"foo\">Hello, <i>world!</i></h1>\
");
ASSERT(doc != INVALID_HTML_DOC);
new Selector:selector = ParseSelector("i");
ASSERT(selector != INVALID_SELECTOR);
new i= -1,element_name[10];
while(GetNthElementName(doc,selector,++i,element_name)!=0){
ASSERT(strcmp(element_name,"i") == 0);
}
DeleteSelector(selector);
DeleteHtml(doc);
- GetNthElementText(Html:docid,Selector:selectorid,i dx,string[],size = sizeof(string))
- Params
- docid - Html instance id
- selectorid - CSS selector instance id
- idx - the n’th occurence of element in the document (starts from 0)
- string[] - element name
- size - sizeof string
- Returns
- 1 if successful
- 0 if failed
- Example Usage
PHP Code:
new Html:doc = ParseHtmlDocument("\
<!DOCTYPE html>\
<meta charset=\"utf-8\">\
<title>Hello, world!</title>\
<h1 class=\"foo\">Hello, <i>world!</i></h1>\
");
ASSERT(doc != INVALID_HTML_DOC);
new Selector:selector = ParseSelector("h1.foo");
ASSERT(selector != INVALID_SELECTOR);
new element_text[20];
ASSERT(GetNthElementText(doc,selector,0,element_text) == 1);
new check = strcmp(element_text,("Hello, world!"));
ASSERT(check == 0);
DeleteSelector(selector);
DeleteHtml(doc);
- GetNthElementAttrVal(Html:docid,Selector:selectori d,idx,attribute[],string[],size = sizeof(string))
- Params
- docid - Html instance id
- selectorid - CSS selector instance id
- idx - the n’th occurence of element in the document (starts from 0)
- attribute[] - the attribute of element
- string[] - element name
- size - sizeof string
- Returns
- 1 if successful
- 0 if failed
- Example Usage
PHP Code:
new Html:doc = ParseHtmlDocument("\
<!DOCTYPE html>\
<meta charset=\"utf-8\">\
<title>Hello, world!</title>\
<h1 class=\"foo\">Hello, <i>world!</i></h1>\
");
ASSERT(doc != INVALID_HTML_DOC);
new Selector:selector = ParseSelector("h1");
ASSERT(selector != INVALID_SELECTOR);
new element_attribute[20];
ASSERT(GetNthElementAttrVal(doc,selector,0,"class",element_attribute) == 1);
new check = strcmp(element_attribute,("foo"));
ASSERT(check == 0);
DeleteSelector(selector);
DeleteHtml(doc);
- DeleteHtml(Html:id)
- Params
- id - html instance to be deleted
- Returns
- 1 if successful
- 0 if failed
- DeleteSelector(Selector:id)
- Params
- id - selector instance to be deleted
- Returns
- 1 if successful
- 0 if failed
- DeleteResponse(Html:id)
- Params
- id - response instance to be deleted
- Returns
- 1 if successful
- 0 if failed
- DeleteHeader(Header:id)
- Params
- id - header instance to be deleted
- Returns
- 1 if successful
- 0 if failed
Example Usage
A small example to fetch all links in wiki.sa-mp.com
PHP Code:
new Response:response = HttpGet("https://wiki.sa-mp.com");
if(response == INVALID_HTTP_RESPONSE){
printf("HTTP ERROR");
return;
}
new Html:html = ResponseParseHtml(response);
if(html == INVALID_HTML_DOC){
DeleteResponse(response);
return;
}
new Selector:selector = ParseSelector("a");
if(selector == INVALID_SELECTOR){
DeleteResponse(response);
DeleteHtml(html);
return;
}
new str[500],i;
while(GetNthElementAttrVal(html,selector,i,"href",str)){
printf("%s",str);
++i;
}
//delete created objects after the usage..
DeleteHtml(html);
DeleteResponse(response);
DeleteSelector(selector);
The same above with threaded http call would be
PHP Code:
HttpGetThreaded(0,"MyHandler","https://wiki.sa-mp.com");
//...
forward MyHandler(playerid,Response:responseid);
public MyHandler(playerid,Response:responseid){
if(responseid == INVALID_HTTP_RESPONSE){
printf("HTTP ERROR");
return 0;
}
new Html:html = ResponseParseHtml(responseid);
if(html == INVALID_HTML_DOC){
DeleteResponse(response);
return 0;
}
new Selector:selector = ParseSelector("a");
if(selector == INVALID_SELECTOR){
DeleteResponse(response);
DeleteHtml(html);
return 0;
}
new str[500],i;
while(GetNthElementAttrVal(html,selector,i,"href",str)){
printf("%s",str);
++i;
}
DeleteHtml(html);
Delete(response);
DeleteSelector(selector);
return 1;
}
More examples can be found in examples
Repository
https://github.com/Sreyas-Sreelal/pawn-scraper
Note
The plugin is in primary stage and more tests and features needed to be added.I’m open to any kind of contribution, just open a pull request if you have anything to improve or add new features.
Special thanks
Re: PawnScraper -
Gabriel432135 - 12.11.2018
cool
Re: PawnScraper -
kristo - 12.11.2018
hot
.
Re: PawnScraper -
Ermanhaut - 12.11.2018
This is really good.
Re: PawnScraper -
Chaprnks - 15.11.2018
Amazing! Finally a well-rounded solution to the HTTP() function
Re: PawnScraper -
SyS - 24.11.2018
New version released!
https://github.com/Sreyas-Sreelal/pa...ases/tag/0.1.0
Changes
- Added HttpGetThreaded
- Changed reqwest to minihttp
- Smaller binary
Still might need more tests but the basic functionalities are working as expected.Big thanks to
Eva who patiently listened to my questions and doubts and for giving me guidance in certain parts.
Usage of HttpGetThreaded
pawn Code:
HttpGetThreaded(0,"MyHandler","https://wiki.sa-mp.com");
//...
forward MyHandler(playerid,Response:responseid);
public MyHandler(playerid,Response:responseid){
if(responseid == INVALID_HTTP_RESPONSE){
printf("HTTP ERROR");
return 0;
}
new Html:html = ResponseParseHtml(responseid);
if(html == INVALID_HTML_DOC){
DeleteResponse(response);
return 0;
}
new Selector:selector = ParseSelector("a");
if(selector == INVALID_SELECTOR){
DeleteResponse(response);
DeleteHtml(html);
return 0;
}
new str[500],i;
while(GetNthElementAttrVal(html,selector,i,"href",str)){
printf("%s",str);
++i;
}
DeleteHtml(html);
Delete(response);
DeleteSelector(selector);
return 1;
}
Re: PawnScraper -
Infin1ty - 24.11.2018
no
no you didnt
:O
Re: PawnScraper -
AmirSavand - 26.11.2018
SAMP http requests are known to fail without a reason so does the http calls here always succeed without bugs?
Re: PawnScraper -
SyS - 26.11.2018
Quote:
Originally Posted by AmirSavand
SAMP http requests are known to fail without a reason so does the http calls here always succeed without bugs?
|
Http requests is working fine as per the tests,if you encountered any bugs open an issue on github. But do note that main scope of this plugin is not sending http requests (plugin can only be used to send GET requests ) but parsing HTML doc and using CSS selectors. Southclaw' requests plugin already gives a better solution to http requests.
Re: PawnScraper -
fiki574 - 26.11.2018
Nice work!
However, is there any way to send a HTTP request towards the SAMP server instead of only external URLs?
Re: PawnScraper -
IllidanS4 - 27.11.2018
Quote:
Originally Posted by fiki574
Nice work!
However, is there any way to send a HTTP request towards the SAMP server instead of only external URLs?
|
How are you supposed to send an HTTP requrest to a SA-MP server? You may try
HttpGet("http://localhost"); if you have something listening on HTTP there.
Anyway, how does this plugin handle cleanup of created objects (responses, selectors etc.)?
Re: PawnScraper -
SyS - 27.11.2018
Quote:
Originally Posted by IllidanS4
Anyway, how does this plugin handle cleanup of created objects (responses, selectors etc.)?
|
Clean up is done through "Delete" functions. Its automatically called when created variable get out of scope through
destructors. But they won't work in cases having global and static lifetime. Users have to call these functions manually in those cases.
Re: PawnScraper -
fiki574 - 27.11.2018
Quote:
Originally Posted by IllidanS4
How are you supposed to send an HTTP requrest to a SA-MP server? You may try HttpGet("http://localhost"); if you have something listening on HTTP there.
|
Maybe this plugin has an implementation for starting a HTTP listener with the SAMP server, so I could (for example) send GET requests from an external app towards that listener and parse some in-game stuff I want to the response.
Re: PawnScraper -
SyS - 27.11.2018
Quote:
Originally Posted by fiki574
Maybe this plugin has an implementation for starting a HTTP listener with the SAMP server, so I could (for example) send GET requests from an external app towards that listener and parse some in-game stuff I want to the response.
|
That's not what this plugin is about...
Re: PawnScraper -
IllidanS4 - 27.11.2018
Quote:
Originally Posted by SyS
Clean up is done through "Delete" functions. Its automatically called when created variable get out of scope through destructors. But they won't work in cases having global and static lifetime. Users have to call these functions manually in those cases.
|
I am not sure using a destructor is safe in this case. First, you ignore the
size parameter, so arrays of these objects will not be destroyed properly. Second, imagine this code:
pawn Code:
new Response:globalResp;
main()
{
new Response:resp = HttpGet("https://wiki.sa-mp.com");
if(...)
{
globalResp = resp;
}
}
When
resp goes out of scope,
globalResp will become invalid as well (and could potentially refer to a completely different response after a while, depending on your implementation).
Re: PawnScraper -
SyS - 27.11.2018
Quote:
Originally Posted by IllidanS4
I am not sure using a destructor is safe in this case. First, you ignore the size parameter, so arrays of these objects will not be destroyed properly. Second, imagine this code:
pawn Code:
new Response:globalResp;
main() { new Response:resp = HttpGet("https://wiki.sa-mp.com"); if(...) { globalResp = resp; } }
When resp goes out of scope, globalResp will become invalid as well (and could potentially refer to a completely different response after a while, depending on your implementation).
|
Yes you are right that will result in fault. I think I should change my approach then.Something like borrow check or overload = operator to make a clone. I don't know whether either of is possible though
Re: PawnScraper -
fiki574 - 27.11.2018
Quote:
Originally Posted by SyS
That's not what this plugin is about...
|
That's why I was asking this question
Thanks for clearance
Re: PawnScraper -
fordawinzz - 01.12.2018
Can I get data from a tag that 'has' a class, like:
Code:
<span class="some_class_here">data_i_want_to_get</span>
using this plugin?
I'm not into HTML parsing so I don't know yet how to work with this. Thank you.
Re: PawnScraper -
SyS - 01.12.2018
Quote:
Originally Posted by fordawinzz
Can I get data from a selector that 'has' a class, like:
Code:
<span class="some_class_here">data_i_want_to_get</span>
using this plugin?
I'm not into HTML parsing so I don't know yet how to work with this. Thank you.
|
Use
PHP Code:
new Selector:selectclass = ParseSelector(".some_class_here");
new data[20];
GetNthElementText(your_html_doc,selectclass,0,data);//now data will have the text
DeleteSelector(selectclass);
Re: PawnScraper -
Marshall32 - 01.12.2018
Thank you so much for this plugin! Also big thanks for this
example other ******* 2 mp3 solutions is not working anymore but this plugin does it neatly
Rep+=3;