perl自动获取网页上的信息

perl获取网页上的信息


perl自动上网,然后获取网页上的信息:



#!/usr/bin/perl -w
# Perl pragma to restrict unsafe constructs
use strict;
# use LWP::UserAgent model
use LWP::UserAgent;
 
# main function
sub main {
    # get params
    # @_  
    # Within a subroutine the array @_ contains the parameters passed to that subroutine. 
    # Inside a subroutine, @_ is the default array for the array operators push, pop, shift, and unshift.
    my $url = 'http://www.taobao.com';
    die "no url param!\n" unless $url;
 
    # create LWP::UserAgent object
    my $ua = LWP::UserAgent->new;
    # set connect timeout 
    $ua->timeout(20);
    # set User-Agent header
    $ua->agent("Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; SV1; .NET CLR 2.0.50727)");
    # send url use get mothed, and store response at var $resp
    my $resp = $ua->get($url);
 
    # check response
    if ($resp->is_success) {
        # get response content(html source code)
        my $content = $resp->decoded_content;
        # use Regex get page title from $content
        if ( $content =~ m{<title>(.*)</title>}si ) {
            # <title>(.+?)</title> (.+?) match title string, use () to store this str at a special variable $1 (this is a perl variable ),
            # The bracketing construct ( ... ) creates capture groups (also referred to as capture buffers). To refer to the current contents of a group later on, within the same pattern, use $1 for the first,$2 for the second, and so on.
            my $head = $1;
            print "find page title : $head\n";
        } else {
            print "no page title for url : $url\n";
        }
    } else {
#display status information and exit
        die $resp->status_line;
    }
}
 
# pass params to main function,
# @ARGV
# The array @ARGV contains the command-line arguments intended for the script.
 
main(@ARGV);


  • 发表于 2019-02-21 09:53
  • 阅读 ( 2498 )
  • 分类:perl

0 条评论

请先 登录 后评论
omicsgene
omicsgene

生物信息

702 篇文章

作家榜 »

  1. omicsgene 702 文章
  2. 安生水 351 文章
  3. Daitoue 167 文章
  4. 生物女学霸 120 文章
  5. xun 82 文章
  6. rzx 78 文章
  7. 红橙子 78 文章
  8. CORNERSTONE 72 文章