This year’s Google Summer of Code brings us some new cool stuff.
One of them is web crawling which comes with HTTP::UserAgent and IO::Socket::SSL modules!
You all know what web crawling is, so I describe it with just a few words:
Web crawling (limited for the purposes of this article) is everything we do with the websites using scripts, sending and receiving requests.
We can write a simple Web Crawler with just a few lines of code.
use HTTP::UserAgent; my $ua = HTTP::UserAgent.new(useragent => 'firefox_linux'); my $response = $ua.get: "https://perl6advent.wordpress.com/";
Simple as that.
What could draw attention is the second line where we use a magic phrase ‘firefox_linux’.
It automatically generates an User-Agent header’s field for us. The rule is simple, just write ‘browser_system’ and the correct user agent will be used.
The list of predefined user agents can be found here, feel free to add more.
We want to decode the content.
my $content = $response.decoded_content;
As we have the content decoded, we can do everything we want with it. It’s simple, isn’t it?
The most fascinating thing which comes with this project is SSL/TLS support built on OpenSSL library.
To use SSL/TLS we must install IO::Socket::SSL.
panda install IO::Socket::SSL
And again, that’s it, simple.
All of that gives us some opportunities like WWW::Mechanize, Net::GitHub and so on.
I really want these modules to be available in Perl 6 — do you want to help us writing? Join us. :)