Scrapy框架采集微信公众号数据,机智操作绕过反爬验证码( 二 )
get_cookies.py
# coding=utf-8from IP.free_ip import get_random_proxyfrom fake_useragent import UserAgentimport requestsua = UserAgent().randomdef get_new_headers():headers = {"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9","Accept-Encoding": "gzip, deflate, br","Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8","User-Agent": ua}return headersdef get_new_cookies():url = ';query= --tt-darkmode-color: #999999;">free_ip.py
# coding=utf-8import requestsproxypool_url = ''def get_random_proxy():response = requests.get(proxypool_url)try:if response.status_code == 200:return response.text.strip()except ConnectionError:return Noneif __name__ == '__main__':print(get_random_proxy())
setting.py
BOT_NAME = 'sougou_search_spider'SPIDER_MODULES = ['sougou_search_spider.spiders']NEWSPIDER_MODULE = 'sougou_search_spider.spiders'REDIRECT_ENABLED = FalseHTTPERROR_ALLOWED_CODES = [302]ROBOTSTXT_OBEY = FalseCOOKIES_ENABLED = TrueITEM_PIPELINES = {'sougou_search_spider.pipelines.CsvSougouSearchSpiderPipeline': 300,}
- 框架|三种数据分析思维框架的构建方法
- Python爬虫采集网易云音乐热评实战
- 整理:常见的Java开发框架有哪些,看过,就赶紧收藏吧
- Martian框架发布 3.0.3 版本,Redis分布式锁
- 在Linux系统中安装深度学习框架Pytorch
- 案例:如何使用接口测试框架Karate创建一个API测试?
- 常用的NIO框架-Netty
- SpreadJS 表格控件应用:MHT-CP数据填报采集平台
- Python|TensorFlow 、Caffe等9大主流人工智能框架优劣势分析
- 商园|如何批量采集网商园多个商品主图素材