当前位置：首页 > news >正文

做网站顺序自助建站系统源码

news 2026/5/6 15:53:53

做网站顺序,自助建站系统源码,网站建设百度,万网cname解析文章目录专栏导读背景结果预览1、爬取页面分析2、通过返回数据发现适合利用lxmlxpath3、继续分析【小说榜、电影榜、电视剧榜、汽车榜、游戏榜】4、完整代码总结专栏导读 #x1f525;#x1f525;本文已收录于《Python基础篇爬虫》 #x1f251;#x1f251;本专栏专门… 文章目录专栏导读背景结果预览1、爬取页面分析2、通过返回数据发现适合利用lxmlxpath3、继续分析【小说榜、电影榜、电视剧榜、汽车榜、游戏榜】4、完整代码总结专栏导读本文已收录于《Python基础篇爬虫》本专栏专门针对于有爬虫基础准备的一套基础教学轻松掌握Python爬虫欢迎各位同学订阅专栏订阅地址点我直达此外如果您已工作如需利用Python解决办公中常见的问题欢迎订阅《Python办公自动化》专栏订阅地址点我直达的此外《Python30天从入门到熟练》专栏已上线欢迎大家订阅订阅地址点我直达背景我想利用爬虫获取【百度热搜页面】的全部热搜、包括 1、热搜榜 2、小说榜 3、电影榜 4、电视剧榜 5、汽车榜 6、游戏榜结果预览 1、爬取页面分析爬取URLhttps://top.baidu.com/board? 爬取方法GET 返回数据整个页面(TXT) 代码 # -*- coding: UTF-8 -*-Project 项目名称 File 程序.py IDE PyCharm Author 一晌小贪欢 Date 2024/05/27 11:27 import json import openpyxl import requests from lxml import etreeurl https://top.baidu.com/board? cookies {Cookie: 填入自己的Cookie }headers {User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36,}params {platform: pc,tab: homepage,sa: pc_index_homepage_all, }res_data requests.get(urlurl, paramsparams, headersheaders, cookiescookies) print(res_data.text) 请求结果 2、通过返回数据发现适合利用lxmlxpath 我们发现返回的数据是整个网页其中每一种热搜均在其页面中热搜榜、小说榜、电影榜、电视剧榜、汽车榜、游戏榜、存在如下div中 ↓ 获取该【div】(利用lxmlxpath) 通过分析得 //div[idsanRoot]//div[classlist_1EDla]//a//div[classc-single-text-ellipsis] 通过分析发现xpath没问题但是获的值重复了所以利用 range(0,len(hot_search),2)只要获取一个就行了 3、继续分析【小说榜、电影榜、电视剧榜、汽车榜、游戏榜】我们发现这几个排行榜居然使用一个xpath就可以通过分析得 //div[idsanRoot]//div[classlist_1s-Px]//a[classtitle_ZsyAw] 【热搜指数】通过分析得 //div[idsanRoot]//div[classlist_1s-Px]//div[classexponent_QjyjZ]//span 【热搜分类】通过分析得 //div[idsanRoot]//div[classlist_1s-Px]//div[classdesc_2YkQx] 这三个长度都是【50】所以写进列表进行以10个元素拆分然后分别写进Excel 4、完整代码 # -*- coding: UTF-8 -*-Project 百度热搜爬虫 File 程序.py IDE PyCharm Author 一晌小贪欢 Date 2024/05/27 11:27 import json import openpyxl import requests from lxml import etreewb openpyxl.Workbook() ws wb.active # 修改sheet名 ws.title 热搜榜 ws.append([热搜榜]) ws2 wb.create_sheet(小说榜) ws2.append([小说榜]) ws3 wb.create_sheet(电影榜) ws3.append([电影榜]) ws4 wb.create_sheet(电视剧榜) ws4.append([电视剧榜]) ws5 wb.create_sheet(汽车榜) ws5.append([汽车榜]) ws6 wb.create_sheet(游戏榜) ws6.append([游戏榜])url https://top.baidu.com/board? cookies {填入自己的Cookie }headers {User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36,}params {platform: pc,tab: homepage,sa: pc_index_homepage_all, }res_data requests.get(urlurl, paramsparams, headersheaders, cookiescookies) tree etree.HTML(res_data.text) 热搜榜 hot_search tree.xpath(//div[idsanRoot]//div[classlist_1EDla]//a//div[classc-single-text-ellipsis]) print(len(hot_search))for i in range(0,len(hot_search),2):print(hot_search[i].text)ws.append([hot_search[i].text])小说榜、电影榜、电视剧榜、汽车榜、游戏榜hot_search2 tree.xpath(//div[idsanRoot]//div[classlist_1s-Px]//a[classtitle_ZsyAw]) # print(len(hot_search)) # 热搜指数 hot_search3 tree.xpath(//div[idsanRoot]//div[classlist_1s-Px]//div[classexponent_QjyjZ]//span) # 分类 type_ tree.xpath(//div[idsanRoot]//div[classlist_1s-Px]//div[classdesc_2YkQx]) count 0a_list []for i in range(len(hot_search2)):# print(hot_search2[i].text hot_search3[i].text type_[i].text)a_list.append(hot_search2[i].text hot_search3[i].text type_[i].text)# 将a_list 以10个元素拆分成小列表 a_list [a_list[i:i10] for i in range(0, len(a_list), 10)] count 0 for i in a_list:count1if count 1:for j in i:ws2.append([j])elif count 2:for j in i:ws3.append([j])elif count 3:for j in i:ws4.append([j])elif count 4:for j in i:ws5.append([j])elif count 5:for j in i:ws6.append([j])wb.save(./整体热搜榜.xlsx) 总结希望对初学者有帮助致力于办公自动化的小小程序员一枚希望能得到大家的【一个免费关注】感谢求个关注此外还有办公自动化专栏欢迎大家订阅Python办公自动化专栏求个 ❤️ 喜欢 ❤️ 此外还有爬虫专栏欢迎大家订阅Python爬虫基础专栏求个收藏此外还有Python基础专栏欢迎大家订阅Python基础学习专栏

查看全文

http://www.hkea.cn/news/14556756/