当前位置：首页 > news >正文

泰州网站排名seo建筑网站招聘

news 2026/4/22 1:49:58

泰州网站排名seo,建筑网站招聘,app推广引流方法,搭建WordPress教程Python 第二阶段 - 爬虫入门 #x1f3af; 今日目标掌握 XPath 的基本语法使用 lxml.etree 解析 HTML#xff0c;提取数据与 BeautifulSoup 比较#xff1a;谁更强#xff1f; #x1f4d8; 学习内容详解 ✅ 安装依赖 pip install lxml#x1f9e9; XPath 简介 XPa…Python 第二阶段 - 爬虫入门今日目标掌握 XPath 的基本语法使用 lxml.etree 解析 HTML提取数据与 BeautifulSoup 比较谁更强学习内容详解 ✅ 安装依赖 pip install lxmlXPath 简介 XPath 是一种用于在 XML/HTML 中查找信息的语言功能强大支持复杂结构提取。常见语法 XPath 表达式含义//tag所有指定标签//div[classquote]class 为 quote 的所有 div 标签.//span[classtext]/text()当前元素内的 span.text 的内容//a/href提取 a 标签的 href 属性值示例代码 from lxml import etree import requestsurl https://quotes.toscrape.com/ res requests.get(url) tree etree.HTML(res.text)quotes tree.xpath(//div[classquote])for q in quotes:text q.xpath(.//span[classtext]/text())[0]author q.xpath(.//small[classauthor]/text())[0]tags q.xpath(.//div[classtags]/a[classtag]/text())print(f{text} —— {author} [Tags: {, .join(tags)}])XPath vs BeautifulSoup 对比项BeautifulSoupXPath (lxml)学习曲线简单稍复杂功能强度中强性能一般较快选择方式标签/类名/选择器路径表达式适合人群初学者熟悉 HTML 的开发者今日练习任务使用 XPath 提取名言、作者、标签获取所有页数据分页跳转统计作者数量不重复的标签数保存数据为 JSON 文件示例代码 import requests from lxml import etree import json import timeBASE_URL https://quotes.toscrape.com HEADERS {User-Agent: Mozilla/5.0 }def fetch_html(url):response requests.get(url, headersHEADERS)return response.text if response.status_code 200 else Nonedef parse_quotes(html):tree etree.HTML(html)quotes tree.xpath(//div[classquote])data []for q in quotes:text q.xpath(.//span[classtext]/text())[0]author q.xpath(.//small[classauthor]/text())[0]tags q.xpath(.//div[classtags]/a[classtag]/text())data.append({text: text,author: author,tags: tags})return datadef get_next_page(html):tree etree.HTML(html)next_page tree.xpath(//li[classnext]/a/href)return BASE_URL next_page[0] if next_page else Nonedef main():all_quotes []url BASE_URLwhile url:print(f正在抓取{url})html fetch_html(url)if not html:print(页面加载失败)breakquotes parse_quotes(html)all_quotes.extend(quotes)url get_next_page(html)time.sleep(0.5) # 模拟人类行为防止被封# 输出抓取结果print(f\n共抓取名言{len(all_quotes)} 条)# 保存为 JSONwith open(quotes_xpath.json, w, encodingutf-8) as f:json.dump(all_quotes, f, ensure_asciiFalse, indent2)print(已保存为 quotes_xpath.json)if __name__ __main__:main()✍️ 今日总结学会使用 XPath 精确定位 HTML 元素掌握了 lxml.etree.HTML 的解析方法对比了两种主流网页解析方式为后续复杂数据提取打好基础

查看全文

http://www.hkea.cn/news/14362018/