南昌市,做网站的公司,中国设计联盟网创始人,简单网页制作成品下载,哪种网站语言最好作用
在Scrapy框架中#xff0c;settings.py 文件起着非常重要的作用#xff0c;它用于配置和控制整个Scrapy爬虫项目的行为、性能和功能。
setting.py文件的介绍
# Scrapy settings for haodaifu project
#
# For simplicity, this file contains only settings consider…作用
在Scrapy框架中settings.py 文件起着非常重要的作用它用于配置和控制整个Scrapy爬虫项目的行为、性能和功能。
setting.py文件的介绍
# Scrapy settings for haodaifu project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
# https://docs.scrapy.org/en/latest/topics/settings.html
# https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
# https://docs.scrapy.org/en/latest/topics/spider-middleware.html# 设置爬虫的名称
BOT_NAME spidername# 指定包含的爬虫代码的模块
SPIDER_MODULES [spidername.spiders]
NEWSPIDER_MODULE spidername.spiders# 设置用户代理用于模拟浏览器或特定的爬虫身份
# Crawl responsibly by identifying yourself (and your website) on the user-agent
USER_AGENT Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36# 配置爬虫是否遵循robots.txt柜子
# Obey robots.txt rules
ROBOTSTXT_OBEY False# 控制并发请求的数量默认为16
# Configure maximum concurrent requests performed by Scrapy (default: 16)
CONCURRENT_REQUESTS 64# 设置下载延迟控制请求之间的时间间隔以避免对目标服务器造成过大负载
# Configure a delay for requests for the same website (default: 0)
# See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay
# See also autothrottle settings and docs
DOWNLOAD_DELAY 3# 配置每个域名的最大并发请求数
# The download delay setting will honor only one of:
CONCURRENT_REQUESTS_PER_DOMAIN 16# 设置每个IP地址的最大并发请求数
CONCURRENT_REQUESTS_PER_IP 16# 启用或禁用cookies
# Disable cookies (enabled by default)
COOKIES_ENABLED False# 启用或禁用Telnet控制台
# Disable Telnet Console (enabled by default)
TELNETCONSOLE_ENABLED False# 默认的请求头
# Override the default request headers:
DEFAULT_REQUEST_HEADERS {Referer: https://www.xxx.com/,User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36,
}# 启用或禁用爬虫中间件
# Enable or disable spider middlewares
# See https://docs.scrapy.org/en/latest/topics/spider-middleware.html
# 爬虫中间件及其顺序
SPIDER_MIDDLEWARES {spidername.middlewares.SpidernameSpiderMiddleware: 543,
}# 启用或禁用下载中间件
# Enable or disable downloader middlewares
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
# 下载中间件及其顺序
DOWNLOADER_MIDDLEWARES {spidername.middlewares.SpidernameDownloaderMiddleware: 543,
}# 启用和配置scrapy扩展
# Enable or disable extensions
# See https://docs.scrapy.org/en/latest/topics/extensions.html
EXTENSIONS {scrapy.extensions.telnet.TelnetConsole: None,
}# 启用或禁用项目管道
# Configure item pipelines
# See https://docs.scrapy.org/en/latest/topics/item-pipeline.html
# 项目管道
ITEM_PIPELINES {spidername.pipelines.spidernamePipeline: 300,
}# Enable and configure the AutoThrottle extension (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/autothrottle.html
# 是否启用自动限速
AUTOTHROTTLE_ENABLED True# 初始化下载延迟秒
# The initial download delay
AUTOTHROTTLE_START_DELAY 5# 最大下载延迟秒
# The maximum download delay to be set in case of high latencies
AUTOTHROTTLE_MAX_DELAY 60# The average number of requests Scrapy should be sending in parallel to
# each remote server
# scrapy 的目标并发请求数
AUTOTHROTTLE_TARGET_CONCURRENCY 1.0# Enable showing throttling stats for every response received:
# 是否启用自动限速调试模式
AUTOTHROTTLE_DEBUG False# 启用和配置http缓存
# Enable and configure HTTP caching (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
# 是否启用HTTP缓存
HTTPCACHE_ENABLED True# 缓存超时时间秒
HTTPCACHE_EXPIRATION_SECS 0# 缓存目录
HTTPCACHE_DIR httpcache# 忽略缓存的HTTP状态码列表
HTTPCACHE_IGNORE_HTTP_CODES []HTTPCACHE_STORAGE scrapy.extensions.httpcache.FilesystemCacheStorage# Set settings whose default value is deprecated to a future-proof value
REQUEST_FINGERPRINTER_IMPLEMENTATION 2.7
TWISTED_REACTOR twisted.internet.asyncioreactor.AsyncioSelectorReactor# 输出feed的编码
FEED_EXPORT_ENCODING utf-8