pytube项目
最近我家姑娘的幼儿园外教需要一整套YouTube的教学儿歌《Singing Walrus Music》,在家长群里发出求助后,作为程序员的老爸必须把这个事情安排的明明白白的。
Github地址
https://github.com/nficano/pytube
文档地址
https://python-pytube.readthedocs.io
安装方式
快速上手
1 2
| from pytube import YouTube YouTube('http://youtube.com/watch?v=9bZkp7q19f0').streams.first().download()
|
- pytube的
first()
方法,按照作者的解释,会选取最高分辨率的视频进行下载,但亲测后发现效果并不理想。 - YouTube的是采用DASH Streams的技术架构,其中的DASH技术会将视频、音频进行独立拆分,比如视频有480p video,720p video,音频有44100采样 audio,22050采样audio。通过以下代码即可输出DASH的Representation描述信息:
1 2
| yt = YouTube('http://youtube.com/watch?v=9bZkp7q19f0') yt.streams.all()
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
| [<Stream: itag="22" mime_type="video/mp4" res="720p" fps="30fps" vcodec="avc1.64001F" acodec="mp4a.40.2">, <Stream: itag="43" mime_type="video/webm" res="360p" fps="30fps" vcodec="vp8.0" acodec="vorbis">, <Stream: itag="18" mime_type="video/mp4" res="360p" fps="30fps" vcodec="avc1.42001E" acodec="mp4a.40.2">, <Stream: itag="36" mime_type="video/3gpp" res="240p" fps="30fps" vcodec="mp4v.20.3" acodec="mp4a.40.2">, <Stream: itag="17" mime_type="video/3gpp" res="144p" fps="30fps" vcodec="mp4v.20.3" acodec="mp4a.40.2">, <Stream: itag="137" mime_type="video/mp4" res="1080p" fps="30fps" vcodec="avc1.640028">, <Stream: itag="248" mime_type="video/webm" res="1080p" fps="30fps" vcodec="vp9">, <Stream: itag="136" mime_type="video/mp4" res="720p" fps="30fps" vcodec="avc1.4d401f">, <Stream: itag="247" mime_type="video/webm" res="720p" fps="30fps" vcodec="vp9">, <Stream: itag="135" mime_type="video/mp4" res="480p" fps="30fps" vcodec="avc1.4d401e">, <Stream: itag="244" mime_type="video/webm" res="480p" fps="30fps" vcodec="vp9">, <Stream: itag="134" mime_type="video/mp4" res="360p" fps="30fps" vcodec="avc1.4d401e">, <Stream: itag="243" mime_type="video/webm" res="360p" fps="30fps" vcodec="vp9">, <Stream: itag="133" mime_type="video/mp4" res="240p" fps="30fps" vcodec="avc1.4d4015">, <Stream: itag="242" mime_type="video/webm" res="240p" fps="30fps" vcodec="vp9">, <Stream: itag="160" mime_type="video/mp4" res="144p" fps="30fps" vcodec="avc1.4d400c">, <Stream: itag="278" mime_type="video/webm" res="144p" fps="30fps" vcodec="vp9">, <Stream: itag="140" mime_type="audio/mp4" abr="128kbps" acodec="mp4a.40.2">, <Stream: itag="171" mime_type="audio/webm" abr="128kbps" acodec="vorbis">, <Stream: itag="249" mime_type="audio/webm" abr="50kbps" acodec="opus">, <Stream: itag="250" mime_type="audio/webm" abr="70kbps" acodec="opus">, <Stream: itag="251" mime_type="audio/webm" abr="160kbps" acodec="opus">]
|
- 其中
itag="22"
的视频为720p并带有音频( acodec="mp4a.40.2"
)的视频文件;而 itag="136"
同样的720p的,却是无声版视频文件。 - 回到之前的pytube的
first()
方法,该方法会优先混合音频的视频源,再选择无声版视频源。这就导致一种极端情况发生, first()
会简单粗暴的选择了低分辨率的混合版视频源,忽略了高清版视频源。 - 我自己对视频筛选逻辑进行重新改写,后面会说明。
视频筛选
1、传统混合音频的视频源
1
| yt.streams.filter(progressive=True).all()
|
1 2 3 4 5
| [<Stream: itag="22" mime_type="video/mp4" res="720p" fps="30fps" vcodec="avc1.64001F" acodec="mp4a.40.2">, <Stream: itag="43" mime_type="video/webm" res="360p" fps="30fps" vcodec="vp8.0" acodec="vorbis">, <Stream: itag="18" mime_type="video/mp4" res="360p" fps="30fps" vcodec="avc1.42001E" acodec="mp4a.40.2">, <Stream: itag="36" mime_type="video/3gpp" res="240p" fps="30fps" vcodec="mp4v.20.3" acodec="mp4a.40.2">, <Stream: itag="17" mime_type="video/3gpp" res="144p" fps="30fps" vcodec="mp4v.20.3" acodec="mp4a.40.2">]
|
2、DASH流的视频源
1
| yt.streams.filter(adaptive=True).all()
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
| [<Stream: itag="137" mime_type="video/mp4" res="1080p" fps="30fps" vcodec="avc1.640028">, <Stream: itag="248" mime_type="video/webm" res="1080p" fps="30fps" vcodec="vp9">, <Stream: itag="136" mime_type="video/mp4" res="720p" fps="30fps" vcodec="avc1.4d401f">, <Stream: itag="247" mime_type="video/webm" res="720p" fps="30fps" vcodec="vp9">, <Stream: itag="135" mime_type="video/mp4" res="480p" fps="30fps" vcodec="avc1.4d401e">, <Stream: itag="244" mime_type="video/webm" res="480p" fps="30fps" vcodec="vp9">, <Stream: itag="134" mime_type="video/mp4" res="360p" fps="30fps" vcodec="avc1.4d401e">, <Stream: itag="243" mime_type="video/webm" res="360p" fps="30fps" vcodec="vp9">, <Stream: itag="133" mime_type="video/mp4" res="240p" fps="30fps" vcodec="avc1.4d4015">, <Stream: itag="242" mime_type="video/webm" res="240p" fps="30fps" vcodec="vp9">, <Stream: itag="160" mime_type="video/mp4" res="144p" fps="30fps" vcodec="avc1.4d400c">, <Stream: itag="278" mime_type="video/webm" res="144p" fps="30fps" vcodec="vp9">, <Stream: itag="140" mime_type="audio/mp4" abr="128kbps" acodec="mp4a.40.2">, <Stream: itag="171" mime_type="audio/webm" abr="128kbps" acodec="vorbis">, <Stream: itag="249" mime_type="audio/webm" abr="50kbps" acodec="opus">, <Stream: itag="250" mime_type="audio/webm" abr="70kbps" acodec="opus">, <Stream: itag="251" mime_type="audio/webm" abr="160kbps" acodec="opus">]
|
3、其它过滤条件
only_audio=True
:只下载音频only_video
:只下载视频subtype='mp4'
:下载扩展名为“mp4”的文件,包括音频和视频res="720p"
:下载清晰度为720p的视频abr="64kbps"
:下载码率为64kbps的视频video_codec="vp9"
:下载压缩格式为vp9的视频audio_codec="vorbis"
:下载压缩格式为vorbis的音频
通过itag下载视频
- YouTube对每个DASH流的视频源的类型给了一个独立的id,称为itag
- 可通过
get_by_itag
方法下载对应视频
1
| yt.streams.get_by_itag(22)
|
itag Code | Container | Content | Resolution | Bitrate | Range | VR / 3D |
---|
5 | flv | audio/video | 240p | - | - | - |
6 | flv | audio/video | 270p | - | - | - |
17 | 3gp | audio/video | 144p | - | - | - |
18 | mp4 | audio/video | 360p | - | - | - |
22 | mp4 | audio/video | 720p | - | - | - |
34 | flv | audio/video | 360p | - | - | - |
35 | flv | audio/video | 480p | - | - | - |
36 | 3gp | audio/video | 180p | - | - | - |
37 | mp4 | audio/video | 1080p | - | - | - |
38 | mp4 | audio/video | 3072p | - | - | - |
43 | webm | audio/video | 360p | - | - | - |
44 | webm | audio/video | 480p | - | - | - |
45 | webm | audio/video | 720p | - | - | - |
46 | webm | audio/video | 1080p | - | - | - |
82 | mp4 | audio/video | 360p | - | - | 3D |
83 | mp4 | audio/video | 480p | - | - | 3D |
84 | mp4 | audio/video | 720p | - | - | 3D |
85 | mp4 | audio/video | 1080p | - | - | 3D |
92 | hls | audio/video | 240p | - | - | 3D |
93 | hls | audio/video | 360p | - | - | 3D |
94 | hls | audio/video | 480p | - | - | 3D |
95 | hls | audio/video | 720p | - | - | 3D |
96 | hls | audio/video | 1080p | - | - | - |
100 | webm | audio/video | 360p | - | - | 3D |
101 | webm | audio/video | 480p | - | - | 3D |
102 | webm | audio/video | 720p | - | - | 3D |
132 | hls | audio/video | 240p | - | - | |
133 | mp4 | video | 240p | - | - | |
134 | mp4 | video | 360p | - | - | |
135 | mp4 | video | 480p | - | - | |
136 | mp4 | video | 720p | - | - | |
137 | mp4 | video | 1080p | - | - | |
138 | mp4 | video | 2160p60 | - | - | |
139 | m4a | audio | - | 48k | - | |
140 | m4a | audio | - | 128k | - | |
141 | m4a | audio | - | 256k | - | |
151 | hls | audio/video | 72p | - | - | |
160 | mp4 | video | 144p | - | - | |
167 | webm | video | 360p | - | - | |
168 | webm | video | 480p | - | - | |
169 | webm | video | 1080p | - | - | |
171 | webm | audio | - | 128k | - | |
218 | webm | video | 480p | - | - | |
219 | webm | video | 144p | - | - | |
242 | webm | video | 240p | - | - | |
243 | webm | video | 360p | - | - | |
244 | webm | video | 480p | - | - | |
245 | webm | video | 480p | - | - | |
246 | webm | video | 480p | - | - | |
247 | webm | video | 720p | - | - | |
248 | webm | video | 1080p | - | - | |
249 | webm | audio | - | 50k | - | |
250 | webm | audio | - | 70k | - | |
251 | webm | audio | - | 160k | - | |
264 | mp4 | video | 1440p | - | - | |
266 | mp4 | video | 2160p60 | - | - | |
271 | webm | video | 1440p | - | - | |
272 | webm | video | 4320p | - | - | |
278 | webm | video | 144p | - | - | |
298 | mp4 | video | 720p60 | - | - | |
299 | mp4 | video | 1080p60 | - | - | |
302 | webm | video | 720p60 | - | - | |
303 | webm | video | 1080p60 | - | - | |
308 | webm | video | 1440p60 | - | - | |
313 | webm | video | 2160p | - | - | |
315 | webm | video | 2160p60 | - | - | |
330 | webm | video | 144p60 | - | hdr | |
331 | webm | video | 240p60 | - | hdr | |
332 | webm | video | 360p60 | - | hdr | |
333 | webm | video | 480p60 | - | hdr | |
334 | webm | video | 720p60 | - | hdr | |
335 | webm | video | 1080p60 | - | hdr | |
336 | webm | video | 1440p60 | - | hdr | |
337 | webm | video | 2160p60 | - | hdr | |
394 | mp4 | video | 144p | - | - | |
395 | mp4 | video | 240p | - | - | |
396 | mp4 | video | 360p | - | - | |
397 | mp4 | video | 480p | - | - | |
398 | mp4 | video | 720p | - | - | |
399 | mp4 | video | 1080p | - | - | |
400 | mp4 | video | 1440p | - | - | |
401 | mp4 | video | 2160p | - | - | |
402 | mp4 | video | 2880p | - | - | |
关于网络
- 因为需要避免西方资本主义思想毒害,网络经常请求不稳定
- 常见的错误会有以下两种:
HTTPError
URLError
- 使用Pycharm的同学还会遇到
ConnectionResetError
- 前两种错误需要引入
from urllib.error import HTTPError, URLError
- 然后通过where循环,try…except… 来重复请求
1 2 3 4 5 6 7 8 9 10 11 12
| yt = None while True: try: yt = YouTube(url) break except HTTPError: self.logger.error("请求出错一次:HTTPError") continue except URLError: self.logger.error("请求出错一次:URLError") continue streams = yt.streams.filter(subtype='mp4').all()
|
下载视频
- 当确认了符合条件的视频后,可通过
download
的方式直接下载
1 2 3 4
| from pytube import YouTube yt=YouTube('http://youtube.com/watch?v=9bZkp7q19f0') mp4=yt.streams.first() mp4.download(output_path, filename, filename_prefix)
|
- 其中
download
会接受3个参数:output_path
:视频输出路径;filename
:视频输出名称,默认为视频的标题,该名称不需要扩展名;filename_prefix
:视频名称前缀,这里主要是区分音频和视频,因为音频和视频下载后名称相同,格式相同,前者会被后者覆盖掉。可以增加前缀来进行区分,比如音频为“audio_FilmTitle.mp4”、视频为“video_FilmTitle.mp4”
音视频合并