当前位置：首页 > news >正文

手机怎么建立网站河南省建协网官方网站

news 2026/5/2 17:08:13

手机怎么建立网站,河南省建协网官方网站,网站设计分析报告,网页模板网站生成Python – 网络爬虫流程#xff1a; 1. 连接链接获取页面内容#xff08;html文件#xff09;#xff1b; 2. 过滤获取需要信息#xff08;正则#xff09; [可能重复步骤1#xff0c;2] #xff1b; 3. 存储文件到本地。一#xff09;网络连接获取页面内容 # 网络…Python – 网络爬虫流程 1. 连接链接获取页面内容html文件 2. 过滤获取需要信息正则 [可能重复步骤12] 3. 存储文件到本地。一网络连接获取页面内容 # 网络连接获取页面内容es import urllib.request as request # 使用网络请求类库 import urllib.error as error # 连接 import requests # 另一种网络连接方式headers {Connection:keep-alive,Accept-Language:zh-CN,zh;q0.9,Accept:text/html,application/xhtmlxml,application/xml;q0.9,image/webp,image/apng,*/*;q0.8,User-Agent:Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36, }# 简单直接访问网页 (某些网页可能被拒绝访问 def getHtml(url):try:req request.Request(url) # 获取请求webpage request.urlopen(req) # 打开页面方法1# webpage request.urlopen(url) # 打开页面方法2html webpage.read() # 读取页面内容return htmlexcept error.URLError as e:print(str(e.code) \t e.reason)return Nonedef getXMLText(url):try:response requests.get(url) # headers headersresponse.raise_for_status()response.encoding utf-8return response.textexcept:return None# 配置访问请求 def getHtmlWithHead(url):req request.Request(url, headers) # 发送请求同时传data表单webpage request.urlopen(req) html webpage.read() # 读取页面内容return html# def main():url input(输入网址: )print(getHtml(url))print(getXMLText(url))#---------------------------------------------------------------- if __name__ __main__:main()python用于爬虫的库: urllib, requestsurllib.request 用于打开和读取URL, (request.urlopen) urllib.error 用于处理前面request引起的异常, (:403 Forbidden) urllib.parse 用于解析URL,urlopen(url, dataNone, timeoutobject object at 0x000001D4652FE140, *, cafileNone, capathNone, cadefaultFalse, contextNone)。二过滤、筛选、替换 1. from bs4 import BeautifulSoup as bs: # 使用文档解析类库, 整理HTML文件方便处理 soup bs(html, html.parser) # lxml# 返回为数组 info soup.find_all(div, attrs{class : add})# 获取所有标签为div, 属性为class属性值为add的数据 div classadd当前位置xxxx/div info soup.select(p) # 获取所有标签为a(链接)的数据a hrefhttps://www.xxx.com/xxx/a 2. import re # 正则# 返回为数组 title re.compile(rh2(.*?)/h2).search(str(info))# 在info字符串内获取所有被h2和/h2包围的字段3. str 字符操作 author str(info).replace(p,).replace(/p,).rstrip() # lstrip()三本地存储 import os # 含文件读写 import time # time.sleep(0.1)dir D:\\Python\\Data\\ path D:\\Python\\Data\\text.txt 1. create dir isExists os.path.exists(dir)if not isExists:os.mkdir(path)2. write: w,wb file open(path,w,encodingutf-8) # 以utf-8编码方式向path路径指向的文件内写入不存在会自动创建 file.write(content) file.close() # 写完后记得关闭3. read: r,rb file open(path, rb)

查看全文

http://www.hkea.cn/news/14504080/