pytube项目
最近我家姑娘的幼儿园外教需要一整套YouTube的教学儿歌《Singing Walrus Music》,在家长群里发出求助后,作为程序员的老爸必须把这个事情安排的明明白白的。
 Github地址
https://github.com/nficano/pytube
 文档地址
https://python-pytube.readthedocs.io
 安装方式
 快速上手
| 12
 
 | from pytube import YouTubeYouTube('http://youtube.com/watch?v=9bZkp7q19f0').streams.first().download()
 
 | 
- pytube的 first()方法,按照作者的解释,会选取最高分辨率的视频进行下载,但亲测后发现效果并不理想。
- YouTube的是采用DASH Streams的技术架构,其中的DASH技术会将视频、音频进行独立拆分,比如视频有480p video,720p video,音频有44100采样 audio,22050采样audio。通过以下代码即可输出DASH的Representation描述信息:
| 12
 
 | yt = YouTube('http://youtube.com/watch?v=9bZkp7q19f0')yt.streams.all()
 
 | 
| 12
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 
 | [<Stream: itag="22" mime_type="video/mp4" res="720p" fps="30fps" vcodec="avc1.64001F" acodec="mp4a.40.2">,<Stream: itag="43" mime_type="video/webm" res="360p" fps="30fps" vcodec="vp8.0" acodec="vorbis">,
 <Stream: itag="18" mime_type="video/mp4" res="360p" fps="30fps" vcodec="avc1.42001E" acodec="mp4a.40.2">,
 <Stream: itag="36" mime_type="video/3gpp" res="240p" fps="30fps" vcodec="mp4v.20.3" acodec="mp4a.40.2">,
 <Stream: itag="17" mime_type="video/3gpp" res="144p" fps="30fps" vcodec="mp4v.20.3" acodec="mp4a.40.2">,
 <Stream: itag="137" mime_type="video/mp4" res="1080p" fps="30fps" vcodec="avc1.640028">,
 <Stream: itag="248" mime_type="video/webm" res="1080p" fps="30fps" vcodec="vp9">,
 <Stream: itag="136" mime_type="video/mp4" res="720p" fps="30fps" vcodec="avc1.4d401f">,
 <Stream: itag="247" mime_type="video/webm" res="720p" fps="30fps" vcodec="vp9">,
 <Stream: itag="135" mime_type="video/mp4" res="480p" fps="30fps" vcodec="avc1.4d401e">,
 <Stream: itag="244" mime_type="video/webm" res="480p" fps="30fps" vcodec="vp9">,
 <Stream: itag="134" mime_type="video/mp4" res="360p" fps="30fps" vcodec="avc1.4d401e">,
 <Stream: itag="243" mime_type="video/webm" res="360p" fps="30fps" vcodec="vp9">,
 <Stream: itag="133" mime_type="video/mp4" res="240p" fps="30fps" vcodec="avc1.4d4015">,
 <Stream: itag="242" mime_type="video/webm" res="240p" fps="30fps" vcodec="vp9">,
 <Stream: itag="160" mime_type="video/mp4" res="144p" fps="30fps" vcodec="avc1.4d400c">,
 <Stream: itag="278" mime_type="video/webm" res="144p" fps="30fps" vcodec="vp9">,
 <Stream: itag="140" mime_type="audio/mp4" abr="128kbps" acodec="mp4a.40.2">,
 <Stream: itag="171" mime_type="audio/webm" abr="128kbps" acodec="vorbis">,
 <Stream: itag="249" mime_type="audio/webm" abr="50kbps" acodec="opus">,
 <Stream: itag="250" mime_type="audio/webm" abr="70kbps" acodec="opus">,
 <Stream: itag="251" mime_type="audio/webm" abr="160kbps" acodec="opus">]
 
 | 
- 其中 itag="22"的视频为720p并带有音频(acodec="mp4a.40.2")的视频文件;而itag="136"同样的720p的,却是无声版视频文件。
- 回到之前的pytube的 first()方法,该方法会优先混合音频的视频源,再选择无声版视频源。这就导致一种极端情况发生,first()会简单粗暴的选择了低分辨率的混合版视频源,忽略了高清版视频源。
- 我自己对视频筛选逻辑进行重新改写,后面会说明。
 视频筛选
 1、传统混合音频的视频源
| 1
 | yt.streams.filter(progressive=True).all()
 | 
| 12
 3
 4
 5
 
 | [<Stream: itag="22" mime_type="video/mp4" res="720p" fps="30fps" vcodec="avc1.64001F" acodec="mp4a.40.2">,<Stream: itag="43" mime_type="video/webm" res="360p" fps="30fps" vcodec="vp8.0" acodec="vorbis">,
 <Stream: itag="18" mime_type="video/mp4" res="360p" fps="30fps" vcodec="avc1.42001E" acodec="mp4a.40.2">,
 <Stream: itag="36" mime_type="video/3gpp" res="240p" fps="30fps" vcodec="mp4v.20.3" acodec="mp4a.40.2">,
 <Stream: itag="17" mime_type="video/3gpp" res="144p" fps="30fps" vcodec="mp4v.20.3" acodec="mp4a.40.2">]
 
 | 
 2、DASH流的视频源
| 1
 | yt.streams.filter(adaptive=True).all()
 | 
| 12
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 
 | [<Stream: itag="137" mime_type="video/mp4" res="1080p" fps="30fps" vcodec="avc1.640028">,<Stream: itag="248" mime_type="video/webm" res="1080p" fps="30fps" vcodec="vp9">,
 <Stream: itag="136" mime_type="video/mp4" res="720p" fps="30fps" vcodec="avc1.4d401f">,
 <Stream: itag="247" mime_type="video/webm" res="720p" fps="30fps" vcodec="vp9">,
 <Stream: itag="135" mime_type="video/mp4" res="480p" fps="30fps" vcodec="avc1.4d401e">,
 <Stream: itag="244" mime_type="video/webm" res="480p" fps="30fps" vcodec="vp9">,
 <Stream: itag="134" mime_type="video/mp4" res="360p" fps="30fps" vcodec="avc1.4d401e">,
 <Stream: itag="243" mime_type="video/webm" res="360p" fps="30fps" vcodec="vp9">,
 <Stream: itag="133" mime_type="video/mp4" res="240p" fps="30fps" vcodec="avc1.4d4015">,
 <Stream: itag="242" mime_type="video/webm" res="240p" fps="30fps" vcodec="vp9">,
 <Stream: itag="160" mime_type="video/mp4" res="144p" fps="30fps" vcodec="avc1.4d400c">,
 <Stream: itag="278" mime_type="video/webm" res="144p" fps="30fps" vcodec="vp9">,
 <Stream: itag="140" mime_type="audio/mp4" abr="128kbps" acodec="mp4a.40.2">,
 <Stream: itag="171" mime_type="audio/webm" abr="128kbps" acodec="vorbis">,
 <Stream: itag="249" mime_type="audio/webm" abr="50kbps" acodec="opus">,
 <Stream: itag="250" mime_type="audio/webm" abr="70kbps" acodec="opus">,
 <Stream: itag="251" mime_type="audio/webm" abr="160kbps" acodec="opus">]
 
 | 
 3、其它过滤条件
- only_audio=True:只下载音频
- only_video:只下载视频
- subtype='mp4':下载扩展名为“mp4”的文件,包括音频和视频
- res="720p":下载清晰度为720p的视频
- abr="64kbps":下载码率为64kbps的视频
- video_codec="vp9":下载压缩格式为vp9的视频
- audio_codec="vorbis":下载压缩格式为vorbis的音频
 通过itag下载视频
- YouTube对每个DASH流的视频源的类型给了一个独立的id,称为itag
- 可通过 get_by_itag方法下载对应视频
| 1
 | yt.streams.get_by_itag(22)
 | 
| itag Code | Container | Content | Resolution | Bitrate | Range | VR / 3D | 
|---|
| 5 | flv | audio/video | 240p | - | - | - | 
| 6 | flv | audio/video | 270p | - | - | - | 
| 17 | 3gp | audio/video | 144p | - | - | - | 
| 18 | mp4 | audio/video | 360p | - | - | - | 
| 22 | mp4 | audio/video | 720p | - | - | - | 
| 34 | flv | audio/video | 360p | - | - | - | 
| 35 | flv | audio/video | 480p | - | - | - | 
| 36 | 3gp | audio/video | 180p | - | - | - | 
| 37 | mp4 | audio/video | 1080p | - | - | - | 
| 38 | mp4 | audio/video | 3072p | - | - | - | 
| 43 | webm | audio/video | 360p | - | - | - | 
| 44 | webm | audio/video | 480p | - | - | - | 
| 45 | webm | audio/video | 720p | - | - | - | 
| 46 | webm | audio/video | 1080p | - | - | - | 
| 82 | mp4 | audio/video | 360p | - | - | 3D | 
| 83 | mp4 | audio/video | 480p | - | - | 3D | 
| 84 | mp4 | audio/video | 720p | - | - | 3D | 
| 85 | mp4 | audio/video | 1080p | - | - | 3D | 
| 92 | hls | audio/video | 240p | - | - | 3D | 
| 93 | hls | audio/video | 360p | - | - | 3D | 
| 94 | hls | audio/video | 480p | - | - | 3D | 
| 95 | hls | audio/video | 720p | - | - | 3D | 
| 96 | hls | audio/video | 1080p | - | - | - | 
| 100 | webm | audio/video | 360p | - | - | 3D | 
| 101 | webm | audio/video | 480p | - | - | 3D | 
| 102 | webm | audio/video | 720p | - | - | 3D | 
| 132 | hls | audio/video | 240p | - | - |  | 
| 133 | mp4 | video | 240p | - | - |  | 
| 134 | mp4 | video | 360p | - | - |  | 
| 135 | mp4 | video | 480p | - | - |  | 
| 136 | mp4 | video | 720p | - | - |  | 
| 137 | mp4 | video | 1080p | - | - |  | 
| 138 | mp4 | video | 2160p60 | - | - |  | 
| 139 | m4a | audio | - | 48k | - |  | 
| 140 | m4a | audio | - | 128k | - |  | 
| 141 | m4a | audio | - | 256k | - |  | 
| 151 | hls | audio/video | 72p | - | - |  | 
| 160 | mp4 | video | 144p | - | - |  | 
| 167 | webm | video | 360p | - | - |  | 
| 168 | webm | video | 480p | - | - |  | 
| 169 | webm | video | 1080p | - | - |  | 
| 171 | webm | audio | - | 128k | - |  | 
| 218 | webm | video | 480p | - | - |  | 
| 219 | webm | video | 144p | - | - |  | 
| 242 | webm | video | 240p | - | - |  | 
| 243 | webm | video | 360p | - | - |  | 
| 244 | webm | video | 480p | - | - |  | 
| 245 | webm | video | 480p | - | - |  | 
| 246 | webm | video | 480p | - | - |  | 
| 247 | webm | video | 720p | - | - |  | 
| 248 | webm | video | 1080p | - | - |  | 
| 249 | webm | audio | - | 50k | - |  | 
| 250 | webm | audio | - | 70k | - |  | 
| 251 | webm | audio | - | 160k | - |  | 
| 264 | mp4 | video | 1440p | - | - |  | 
| 266 | mp4 | video | 2160p60 | - | - |  | 
| 271 | webm | video | 1440p | - | - |  | 
| 272 | webm | video | 4320p | - | - |  | 
| 278 | webm | video | 144p | - | - |  | 
| 298 | mp4 | video | 720p60 | - | - |  | 
| 299 | mp4 | video | 1080p60 | - | - |  | 
| 302 | webm | video | 720p60 | - | - |  | 
| 303 | webm | video | 1080p60 | - | - |  | 
| 308 | webm | video | 1440p60 | - | - |  | 
| 313 | webm | video | 2160p | - | - |  | 
| 315 | webm | video | 2160p60 | - | - |  | 
| 330 | webm | video | 144p60 | - | hdr |  | 
| 331 | webm | video | 240p60 | - | hdr |  | 
| 332 | webm | video | 360p60 | - | hdr |  | 
| 333 | webm | video | 480p60 | - | hdr |  | 
| 334 | webm | video | 720p60 | - | hdr |  | 
| 335 | webm | video | 1080p60 | - | hdr |  | 
| 336 | webm | video | 1440p60 | - | hdr |  | 
| 337 | webm | video | 2160p60 | - | hdr |  | 
| 394 | mp4 | video | 144p | - | - |  | 
| 395 | mp4 | video | 240p | - | - |  | 
| 396 | mp4 | video | 360p | - | - |  | 
| 397 | mp4 | video | 480p | - | - |  | 
| 398 | mp4 | video | 720p | - | - |  | 
| 399 | mp4 | video | 1080p | - | - |  | 
| 400 | mp4 | video | 1440p | - | - |  | 
| 401 | mp4 | video | 2160p | - | - |  | 
| 402 | mp4 | video | 2880p | - | - |  | 
 关于网络
- 因为需要避免西方资本主义思想毒害,网络经常请求不稳定
- 常见的错误会有以下两种:- HTTPError
- URLError
- 使用Pycharm的同学还会遇到 ConnectionResetError
 
- 前两种错误需要引入 from urllib.error import HTTPError, URLError
- 然后通过where循环,try…except… 来重复请求
| 12
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 
 | yt = Nonewhile True:
 try:
 yt = YouTube(url)
 break
 except HTTPError:
 self.logger.error("请求出错一次:HTTPError")
 continue
 except URLError:
 self.logger.error("请求出错一次:URLError")
 continue
 streams = yt.streams.filter(subtype='mp4').all()
 
 | 
 下载视频
- 当确认了符合条件的视频后,可通过 download的方式直接下载
| 12
 3
 4
 
 | from pytube import YouTubeyt=YouTube('http://youtube.com/watch?v=9bZkp7q19f0')
 mp4=yt.streams.first()
 mp4.download(output_path, filename, filename_prefix)
 
 | 
- 其中 download会接受3个参数:- output_path:视频输出路径;
- filename:视频输出名称,默认为视频的标题,该名称不需要扩展名;
- filename_prefix:视频名称前缀,这里主要是区分音频和视频,因为音频和视频下载后名称相同,格式相同,前者会被后者覆盖掉。可以增加前缀来进行区分,比如音频为“audio_FilmTitle.mp4”、视频为“video_FilmTitle.mp4”
 
 音视频合并