Python抓包并解析json爬虫!这个是爬虫必会内容

在使用Python爬虫的时候 , 通过抓包url , 打开url可能会遇见以下类似网址 , 打开后会出现类似这样的界面 , 无法继续进行爬虫:
Python抓包并解析json爬虫!这个是爬虫必会内容文章插图
例如:需要爬取网页中第二页的数据时 , 点击F12?网络(Network)?XHR , 最好点击清除键 , 如下图:
Python抓包并解析json爬虫!这个是爬虫必会内容文章插图
通过点击“第二页” , 会出现一个POST请求(有时会是GET请求) , 点击POST请求的url , (这里网址以POST请求为例) , 如图:
Python抓包并解析json爬虫!这个是爬虫必会内容文章插图
然后复制参数代码
代码展示:
import requestsimport jsonurl = ''header={'authority': 'm.ctrip.com','method': 'POST','path': '/restapi/soa2/13444/json/getCommentCollapseList?_fxpcqlniredt=09031130211378497389','scheme': 'https','accept': '*/*','accept-encoding': 'gzip, deflate, br','accept-language': 'zh-CN,zh;q=0.9','cache-control': 'no-cache','content-length': '278','content-type': 'application/json','cookie': '__utma=1.1986366783.1601607319.1601607319.1601607319.1; __utmz=1.1601607319.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); _RSG=blqD1d4mGX0BA_amPD3t29; _RDG=286710759c35f221c000cbec6169743cac; _RGUID=0850c049-c137-4be5-90b7-0cd67093f28b; MKT_CKID=1601607321903.rzptk.lbzh; _ga=GA1.2.1986366783.1601607319; nfes_isSupportWebP=1; appFloatCnt=8; _gcl_dc=GCL.1601638857.CKzg58XqlewCFQITvAodioIJWw; Session=SmartLinkCode=U155952 Union=OUID=index MKT_OrderClick=ASID=4897155952 MKT_Pagesource=PC; _RF1=218.58.59.72; _bfa=1.1601607319353.3cid9z.1.1602506738089.1602680023977.4.25; _bfi=p1%3D290510%26p2%3D290510%26v1%3D25%26v2%3D24; MKT_CKID_LMT=1602680029515; __zpspc=9.5.1602680029.1602680029.1%232%7Cwww.baidu.com%7C%7C%7C%25E6%2590%25BA%25E7%25A8%258B%7C%23; _gid=GA1.2.1363667416.1602680030; _jzqco=%7C%7C%7C%7C1602680029668%7C1.672451398.1601607321899.1602506755440.1602680029526.1602506755440.1602680029526.undefined.0.0.16.16','cookieorigin': '','origin': '','pragma': 'no-cache','referer': '/','sec-fetch-dest': 'empty','sec-fetch-mode': 'cors','sec-fetch-site': 'same-site','user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36'}dat = {"arg": {'channelType': 2,'collapseType': 0,'commentTagId': 0,'pageIndex': 1,'pageSize': 10,'poiId': 75648,'sortType': 3,'sourceType': 1,'starType': 0},"head": {'auth': "",'cid': "09031117213661657011",'ctok': "",'cver': "1.0",'extension': [],'lang': "01",'sid': "8888",'syscode': "09",'xsid': ""}}r = requests.post(url, data=http://kandian.youth.cn/index/json.dumps(dat), headers=header)s = r.json()print(s)运行结果:
Python抓包并解析json爬虫!这个是爬虫必会内容文章插图
然后右击结果 , 再点击Show as JSON:
Python抓包并解析json爬虫!这个是爬虫必会内容文章插图
【Python抓包并解析json爬虫!这个是爬虫必会内容】最后就会出现目标url的响应信息 , 就可以进行爬取了!!!
你学会了吗?
PS:如有需要Python学习资料的小伙伴可以加点击下方链接自行获取
python免费学习资料以及群交流解答点击后台私信小编01即可