由于近期Youtube做了一次网站结构改版,导致了一系列解析Youtube的开源软件功能异常,我最常用的Pytube也是其中一员,目前能看到的不兼容有:
* Channel中的videos和video_urls都返回空列表,具体调用情况:
from pytube import Channel
channel = Channel("https://www.youtube.com/channel/频道ID")
print(channel.video_urls)
print(channel.videos)
打印结果:
[]
[]
* Channel中的部分参数调用报错,具体调用情况:
#频道主的用户名
owner_name = channel.owner
#频道内的视频总数
video_total = channel.length
#播放总数
views_total = channel.views
打印结果:
Traceback (most recent call last):
File "~/channel_scan.py", line 5, in <module>
print(channel.views)
File "/Users/yeyu/opt/anaconda3/lib/python3.9/site-packages/pytube/contrib/playlist.py", line 379, in views
views_text = self.sidebar_info[0]['playlistSidebarPrimaryInfoRenderer'][
File "/Users/yeyu/opt/anaconda3/lib/python3.9/site-packages/pytube/contrib/playlist.py", line 93, in sidebar_info
self._sidebar_info = self.initial_data['sidebar'][
KeyError: 'sidebar'
* 不支持新的@格式的频道URL,具体调用情况:
channel = Channel("https://www.youtube.com/@NBAHighlightsYT")
打印结果:
Traceback (most recent call last):
File "~/channel_scan.py", line 4, in <module>
channel = Channel("https://www.youtube.com/@NBAHighlightsYT")
File "/Users/yeyu/opt/anaconda3/lib/python3.9/site-packages/pytube/contrib/channel.py", line 24, in __init__
self.channel_uri = extract.channel_name(url)
File "/Users/yeyu/opt/anaconda3/lib/python3.9/site-packages/pytube/extract.py", line 185, in channel_name
raise RegexMatchError(
pytube.exceptions.RegexMatchError: channel_name: could not find match for patterns
目前Pytube主干上还未fix这些问题,由于我的项目需要用到Channel中的video列表,所以急需解决第一个问题。靠人不如靠己,通过对pytube转换http后的字典进行分析,发现自己动手解决起来其实也很容易。以下是解决方案:
1) 找到channel.py文件
Channel.py文件保存在pytube源码目录里,pytube是一个python包,由pip管理,因此我们可以直接调用pip命令得到pip包的路径,在命令行中输入:
$ pip -V
在我的MAC里就会打印出pip的路径:
pip 21.2.4 from /Users/yeyu/opt/anaconda3/lib/python3.9/site-packages/pip (python 3.9)
去掉路径中最后的“pip”,改成“pytube/contrib/channel.py”,“/Users/yeyu/opt/anaconda3/lib/python3.9/site-packages/pytube/contrib/channel.py”就是channel.py的位置。
2) 修改channel.py文件
第一个修改点:
154行-156行
第二个修改点:
194行
改完保存channel.py,接下来尝试调用:
from pytube import Channel
channel = Channel("https://www.youtube.com/channel/频道ID")
print(channel.video_urls)
print(channel.videos)
就没有问题了。