Go语言爬虫1-网络请求 - 服务器托管|北京服务器租用|机房托管租用|IDC托管租用|机房机柜带宽租用-价格及费用咨询

下面是找的几个例子：

例子1：获得百度首页的html源文件： 

package main
 
import(
    "fmt"
    "io/ioutil"
    "net/http"
)
 
func main(){
    response,_:=http.Get("http://www.baidu.com")
    defer response.Body.Close()
    body,_:=ioutil.ReadAll(response.Body)
    fmt.Println(string(body))
}

例子2，增加了一些错误验证

代码来自：https://gist.github.com/ijt/950790

package main
 
import(
    "fmt"
    "io/ioutil"
    "net/http"
    "os"
)
 
func main(){
    response,err:=http.Get("http://www.baidu.com/")
    if err!=nil{
        fmt.Printf("%s",err)
        os.Exit(1)
    }else{
        defer response.Body.Close()
        contents,err:=ioutil.ReadAll(response.Body)
        if err!=nil{
            fmt.Printf("%s",err)
            os.Exit(1)
        }
        fmt.Printf("%sn",string(contents))
    }
}

http下有Get，Post，PostForm三个函数。这三个函数直接实现了简单的http客户端

下一个简单的例子增加了log， http://gameor.com/archives/178/golang%E5%8F%96%E9%93%BE%E6%8E%A5%E4%B8%8Ephp%E6%AF%94%E8%BE%83%E4%B8%8B/

package main
 
import(
    "fmt"
    "io/ioutil"
    "log"
    "net/http"
)
 
func main(){
    res,err:=http.Get("http://www.ghj1976.net/")
    if err!=nil{
        log.Fatal(err)
    }
    defer res.Body.Close()
    robots,err:=ioutil.ReadAll(res.Body)
    if err!=nil{
        log.Fatal(err)
    }
    fmt.Printf("%s",robots)
}

例子3：把百度的网页存在本地一个文件：

http://david-je.iteye.com/blog/1602774

package main
 
import(
    "fmt"
    "log"
    "net/http"
    "os"
)
 
func main(){
    resp,err:=http.Get("http://www.baidu.com")
    if err!=nil{
        //handleerror
        fmt.Println(err)
        log.Fatal(err)
    }
    defer resp.Body.Close()
    if resp.StatusCode==http.StatusOK{
        fmt.Println(resp.StatusCode)
    }
 
    buf:=make([]byte,1024)
    //createfile
    f,err1:=os.OpenFile("baidu.html",os.O_RDWR|os.O_CREATE|os.O_APPEND,os.ModePerm)
    if err1!=nil{
        panic(err1)
        return
    }
    defer f.Close()
 
    for{
        n,_:=resp.Body.Read(buf)
        if 0==n{
            break
        }
        f.WriteString(string(buf[:n]))
    }
 
}
 
 
其他可以借鉴的
 
golang 批量检查页面
golang 批量检查页面

 

除了使用Get、Post、PostForm 这三个函数来建立一个简单客户端，还可以使用：
http.Client和http.NewRequest来模拟请求

例子：指定公共头的请求百度页面


package main
 
import(
    "fmt"
    "io/ioutil"
    "net/http"
)
 
func main(){
    client:=&http.Client{}
    reqest,_:=http.NewRequest("GET","http://www.baidu.com",nil)
 
    reqest.Header.Set("Accept","text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8")
    reqest.Header.Set("Accept-Charset","GBK,utf-8;q=0.7,*;q=0.3")
    reqest.Header.Set("Accept-Encoding","gzip,deflate,sdch")
    reqest.Header.Set("Accept-Language","zh-CN,zh;q=0.8")
    reqest.Header.Set("Cache-Control","max-age=0")
    reqest.Header.Set("Connection","keep-alive")
 
    response,_:=client.Do(reqest)
    if response.StatusCode==200{
        body,_:=ioutil.ReadAll(response.Body)
        bodystr:=string(body)
        fmt.Println(bodystr)
    }
}

参考资料：

用golang的正则regexp：去除HTML，CSS，SCRIPT代码，仅保留页面文字
http://bpbp.iteye.com/blog/1668869

Golang解析网页入门
http://mjason.github.com/blog/2013/01/29/golangjie-xi-wang-ye-ru-men/

golang做的webCrawl： gocrawl

一个Go语言实现的web爬虫
http://www.sharejs.com/codes/go/4416

服务器托管，北京服务器托管，服务器租用 http://www.fwqtg.net
机房租用，北京机房租用，IDC机房托管， http://www.fwqtg.net

相关推荐: Redis系列17：聊聊布隆过滤器（实践篇）

Redis系列1：深刻理解高性能Redis的本质 Redis系列2：数据持久化提高可用性 Redis系列3：高可用之主从架构 Redis系列4：高可用之Sentinel(哨兵模式） Redis系列5：深入分析Cluster 集群模式追求性能极致：Redis6…

服务器托管，北京服务器托管，服务器租用，机房机柜带宽租用