租用服务器做视频网站,模仿wordpress,有趣的网站源码,移动端适配 wordpress爬虫的过程中#xff0c;当对方服务器发现你屡次爬取它#xff0c;可能会遇到被封IP的苦痛#xff0c;这时IP就应该换啦#xff0c;打造IP池的意义十分重要#xff0c;提供免费IP网站有很多#xff0c;本次用的是西刺代理IP
# -*- coding: utf-8 -*-
…
爬虫的过程中当对方服务器发现你屡次爬取它可能会遇到被封IP的苦痛这时IP就应该换啦打造IP池的意义十分重要提供免费IP网站有很多本次用的是西刺代理IP
# -*- coding: utf-8 -*-Created on Fri May 11 09:02:12 2018author: JJimport urllib.request
import re
def get_proxy(n):url http://www.xicidaili.com/nn/{}.format(n)headers (User-Agent,Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36)opener urllib.request.build_opener()opener.addheaders [headers]urllib.request.install_opener(opener)html opener.open(url).read().decode(utf8)ip_port_list re.findall(rtr class(.*?)/tr,html,re.S)proxy_list []for i in ip_port_list:ip re.findall(r\d\.\d\.\d\.\d,i)[0]port re.findall(rtd(\d)/td,i)[0]proxy {}:{}.format(ip,port)proxy_list.append(proxy)print(proxy_list)if __name____main__:get_proxy(1)
这段代码是爬取第一页仅仅做个示范大家可以在下面写个for循环多爬几页也是可以的来看看输出结果
接下来来看看刚才做的IP池能不能用或者说效率怎么样。测试网站http://httpbin.org/ip话不多说上代码
# -*- coding: utf-8 -*-Created on Fri May 11 09:02:12 2018author: JJimport urllib.request
import re
import time
import random
def get_proxy(n):url http://www.xicidaili.com/nn/{}.format(n)headers (User-Agent,Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36)opener urllib.request.build_opener()opener.addheaders [headers]urllib.request.install_opener(opener)html opener.open(url).read().decode(utf8)ip_port_list re.findall(rtr class(.*?)/tr,html,re.S)proxy_list []for i in ip_port_list:ip re.findall(r\d\.\d\.\d\.\d,i)[0]port re.findall(rtd(\d)/td,i)[0]proxy {}:{}.format(ip,port)proxy_list.append(proxy)return proxy_listdef proxy_read(proxy_list,i):proxy proxy_list[i]print(当前IP为:{}.format(proxy))sleep_time random.randint(1,3)print(等待{}秒.format(sleep_time))time.sleep(sleep_time)print(开始测试)proxy_jj urllib.request.ProxyHandler({http:proxy})opener urllib.reequest.build_opener(proxy_jj,urllib.request.HTTPHandler)urllib.request.install_opener(opener)try:html urllib.request.urlopen(http://httpbin.org/ip)rhtml html.read()print(rhtml)except Exception as e:print(e)print(-------IP不能用------)if __name____main__:proxy_list get_proxy(1)print(开始测试)for i in range(100):proxy_read(proxy_list,i) 结果如上图总得来说代码很简单好啦end啦下一篇文章‘用xpath打造免费IP池‘。敬请期待