当前位置：首页 > news >正文

手机在线做网站wordpress数据库4.1.14

news 2026/4/20 5:18:40

手机在线做网站,wordpress数据库4.1.14,制作网站app,西青做网站的公司Python 第二阶段 - 爬虫入门 #x1f3af; 今日目标学会使用 BeautifulSoup 解析 HTML 网页内容掌握常用的标签选择器与属性提取方法为爬虫后续提取结构化数据打好基础 #x1f4d8; 学习内容详解 #x1f4e6; 安装 BeautifulSoup 相关库 pip install beautifulsoup4 p…Python 第二阶段 - 爬虫入门今日目标学会使用 BeautifulSoup 解析 HTML 网页内容掌握常用的标签选择器与属性提取方法为爬虫后续提取结构化数据打好基础学习内容详解安装 BeautifulSoup 相关库 pip install beautifulsoup4 pip install lxmllxml 是解析器速度快也可使用 html.parserPython 自带。基本用法 from bs4 import BeautifulSoup import requestsurl https://example.com html requests.get(url).textsoup BeautifulSoup(html, lxml) # 也可换成 html.parser常用标签查找方法操作示例说明获取标题内容soup.title.text获取标签文本获取第一个标签soup.p第一个 p 标签获取所有某类标签soup.find_all(“p”)返回列表查找指定 class 的标签soup.find_all(class_“info”)注意 class 是关键字根据 id 查找标签soup.find(id“main”)选择 CSS 选择器soup.select(“div p.info”)返回列表获取属性和文本 tag soup.find(a) print(tag[href]) # 获取属性 print(tag.get(href)) # 同上 print(tag.text) # 获取文本内容今日练习任务 1.用 requests 抓取网页 https://quotes.toscrape.com/ 2. 用 BeautifulSoup 完成以下任务提取所有名人名言内容位于提取作者名字提取每条名言的标签class“tag” 把每条名言信息输出为字典 {quote: ...,author: ...,tags: [tag1, tag2] }练习脚本 # quotes_scraper.pyimport requests from bs4 import BeautifulSoupdef fetch_quotes():url https://quotes.toscrape.com/headers {User-Agent: Mozilla/5.0}response requests.get(url, headersheaders)if response.status_code ! 200:print(f请求失败状态码{response.status_code})returnsoup BeautifulSoup(response.text, lxml)# 查找所有名言块quote_blocks soup.find_all(div, class_quote)quotes_data []for block in quote_blocks:quote_text block.find(span, class_text).text.strip()author block.find(small, class_author).text.strip()tags [tag.text.strip() for tag in block.find_all(a, class_tag)]quotes_data.append({quote: quote_text,author: author,tags: tags})return quotes_dataif __name__ __main__:data fetch_quotes()for i, quote in enumerate(data, start1):print(f\n第 {i} 条名言)print(f内容{quote[quote]})print(f作者{quote[author]})print(f标签{, .join(quote[tags])})输出结果为第 1 条名言内容“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.” 作者Albert Einstein 标签change, deep-thoughts, thinking, world第 2 条名言内容“It is our choices, Harry, that show what we truly are, far more than our abilities.” 作者J.K. Rowling 标签abilities, choices第 3 条名言内容“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.” 作者Albert Einstein 标签inspirational, life, live, miracle, miracles第 4 条名言内容“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.” 作者Jane Austen 标签aliteracy, books, classic, humor第 5 条名言内容“Imperfection is beauty, madness is genius and its better to be absolutely ridiculous than absolutely boring.” 作者Marilyn Monroe 标签be-yourself, inspirational第 6 条名言内容“Try not to become a man of success. Rather become a man of value.” 作者Albert Einstein 标签adulthood, success, value第 7 条名言内容“It is better to be hated for what you are than to be loved for what you are not.” 作者André Gide 标签life, love第 8 条名言内容“I have not failed. Ive just found 10,000 ways that wont work.” 作者Thomas A. Edison 标签edison, failure, inspirational, paraphrased第 9 条名言内容“A woman is like a tea bag; you never know how strong it is until its in hot water.” 作者Eleanor Roosevelt 标签misattributed-eleanor-roosevelt第 10 条名言内容“A day without sunshine is like, you know, night.” 作者Steve Martin 标签humor, obvious, simile小贴士遇到复杂页面建议先用浏览器 F12 检查结构。尝试结合 select() 使用 CSS 选择器soup.select(‘.quote span.text’) 今日总结初步掌握了 BeautifulSoup 的解析方式和用法能提取文本、属性、class、id 等元素内容可初步实现“结构化数据提取”为爬虫数据存储铺路

查看全文

http://www.hkea.cn/news/14337137/