我們首先來看下實例代碼:

from time import sleep
import faker
import requests
from lxml import etree
fake = faker.Faker()
base_url = "http://angelimg.spbeen.com"
def get_next_link(url):
content = downloadHtml(url)
html = etree.HTML(content)
next_url = html.xpath("http://a[@class='ch next']/@href")
if next_url:
return base_url + next_url[0]
else:
return False
def downloadHtml(ur):
user_agent = fake.user_agent()
headers = {'User-Agent': user_agent,"Referer":"http://angelimg.spbeen.com/"}
response = requests.get(url, headers=headers)
return response.text
def getImgUrl(content):
html = etree.HTML(content)
img_url = html.xpath('//*[@id="content"]/a/img/@src')
title = html.xpath(".//div['@class=article']/h3/text()")
return img_url[0],title[0]
def saveImg(title,img_url):
if img_url is not None and title is not None:
with open("txt/"+str(title)+".jpg",'wb') as f:
user_agent = fake.user_agent()
headers = {'User-Agent': user_agent,"Referer":"http://angelimg.spbeen.com/"}
content = requests.get(img_url, headers=headers)
#request_view(content)
f.write(content.content)
f.close()
def request_view(response):
import webbrowser
request_url = response.url
base_url = '<head><base href="%s" rel="external nofollow" >' %(request_url)
base_url = base_url.encode()
content = response.content.replace(b"<head>",base_url)
tem_html = open('tmp.html','wb')
tem_html.write(content)
tem_html.close()
webbrowser.open_new_tab('tmp.html')
def crawl_img(url):
content = downloadHtml(url)
res = getImgUrl(content)
title = res[1]
img_url = res[0]
saveImg(title,img_url)
if __name__ == "__main__":
url = "http://angelimg.spbeen.com/ang/4968/1"
while url:
print(url)
crawl_img(url)
url = get_next_link(url)
網(wǎng)站題目:python爬蟲實現(xiàn)獲取下一頁代碼-創(chuàng)新互聯(lián)
鏈接分享:http://chinadenli.net/article34/shdse.html
成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián),為您提供網(wǎng)站收錄、微信公眾號、網(wǎng)站導航、定制網(wǎng)站、品牌網(wǎng)站設(shè)計、網(wǎng)站排名
聲明:本網(wǎng)站發(fā)布的內(nèi)容(圖片、視頻和文字)以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主,如果涉及侵權(quán)請盡快告知,我們將會在第一時間刪除。文章觀點不代表本網(wǎng)站立場,如需處理請聯(lián)系客服。電話:028-86922220;郵箱:631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載,或轉(zhuǎn)載時需注明來源: 創(chuàng)新互聯(lián)
猜你還喜歡下面的內(nèi)容