Python爬虫训练:爬取酷燃网视频数据

前言本文的文字及图片来源于网络,仅供学习、交流使用,不具有任何商业用途,版权归原作者所有,如有问题请及时联系我们以作处理
项目目标爬取酷燃网视频数据

Python爬虫训练:爬取酷燃网视频数据文章插图
环境Python3.6
pycharm
Python爬虫训练:爬取酷燃网视频数据文章插图
爬虫代码【Python爬虫训练:爬取酷燃网视频数据】import pprintimport requestsimport redef download_video(title, url):filename_video = 'C:\\Users\\Administrator\\Desktop\\酷燃网\\' + title + '.mp4'response_video = requests.get(url=url)with open(filename_video, mode='wb') as f:f.write(response_video.content)def download_mp3(title, url):filename_mp3 = 'C:\\Users\\Administrator\\Desktop\\酷燃网\\' + title + '.mp3'response_mp3 = requests.get(url=url)with open(filename_mp3, mode='wb') as f:f.write(response_mp3.content)for page in range(0, 101, 20):url = 'aj/hot/loadingmore?ajwvr=62020102014 WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'}response = requests.get(url=url, headers=headers)html_data = http://kandian.youth.cn/index/response.text.encode('utf-8').decode('unicode_escape')urls = re.findall('vid=(.*?)\"', html_data, re.S)titles = re.findall('(.*?)<', html_data, re.S)data = http://kandian.youth.cn/index/zip(urls, titles)for i in data:vid = i[0]title = i[1]page_url ='aj/dash/media?media_ids={}&protocols=dash&watermarks=krcom'.format(vid)response_2 = requests.get(url=page_url, headers=headers)html_json = response_2.json()video_url = html_json['data']['list'][0]['details'][1]['play_info']['url']mp3_url = html_json['data']['list'][0]['details'][-1]['play_info']['url']download_video(title, video_url)download_mp3(title, mp3_url)print(title)