短视频批量下载工具源码和思路
一:开发环境
采用C#源码开发,使用到 .NET4.5框架
代码核心思想是:用主要的核心代码可以衍生出不通的爬虫提取软件。
目前是通过关键词搜索的形式在抖音短视频平台进行视频搜索。搜索后进行下载。
二:概述
此文章讲解和记录,所用到的代码的方法和函数。讲解对应流程和提供部分核心代码
三:所用到的函数和方法
3.1:字符串函数
用来判断获取到的标签进行判断和提取
3.2:正则表达式
通过正则表达式来判断是否有对应标签,并且通过正则表达式来获取相应的标签和标签之间的所要提取字符和对应多个的循环数据
3.3:UI自动操作提取页面
通过UI自动操作方法来实现来模拟人工进行加载短视频平台的JS。目的是为了让短视频平台加载完所有需要提取的数据。
3.4:获取动态页面中的对应标签代码
通过获取到完整的前端代码后来进行数据的筛选。
3.5:url的重定向真实下载地址
四:关键词搜索批量下载的逻辑思路
4.1:获取用户输入的关键词。然后通过关键词拼接成有效的短视频平台的搜索url地址
拼接地址:
string url = "https://www.douyin.com/root/search/" + t_mess.Text + "?type=video";
4.2:拼接完成后加载此地址,加载地址的目的是为了获取要提取的视频数据。如 视频名称、作者、视频发布日期等。
if (jiazai == 0){chromeBrowser = new ChromiumWebBrowser(url.Trim());chromeBrowser.Dock = DockStyle.Fill;panel2.Invoke(new MethodInvoker(() =>{panel2.Controls.Add(chromeBrowser);}));t_rizhi.Text += "视频搜索加载中......." + "\r\n";//panel2.Controls.Add(chromeBrowser);}else{if (jiazai1 == 0){t_rizhi.Text += "视频搜索加载中......." + "\r\n";chromeBrowser.Load("https://www.douyin.com/root/search/" + t_mess.Text + "?type=video");}}if (!chromeBrowser.IsBrowserInitialized || chromeBrowser.IsLoading){t_rizhi.Text += "视频搜索加载中......."+"\r\n";jiazai = 1;jiazai1 = 1;// 控件未加载完毕}
4.3:视频加载完毕后开始获取进行模拟UI自动化操作。例如自动下拉执行JS等
下面代码是执行悬停条件筛选JS 让JS执行 然后通过 JS执行后获取到需要的条件数据。
```clike
t_rizhi.Text = "自动悬停条件中.....";if (htmlContent.Contains("jjU9T0dQ"))//悬停筛选条件{t_rizhi.Text += "找到悬停条件,开始操作悬停"+"\r\n";//可筛选//鼠标悬停进行筛选shaixuan_xuanting = 1;string script = "var element = document.querySelector('.jjU9T0dQ');" +"var event = new MouseEvent('mouseover', { bubbles: true });" +"element.dispatchEvent(event);";chromeBrowser.ExecuteScriptAsync(script);Thread.Sleep(3000);t_rizhi.Text += "悬停成功开始获取 开始点击条件" + "\r\n";if (t_shaixuan.Text.Trim() == "一天内"){shaixuan_dianji_yitiannei();Thread.Sleep(3000);}if (t_shaixuan.Text.Trim() == "一周内"){shaixuan_dianji_yizhou();Thread.Sleep(3000);}if (t_shaixuan.Text.Trim() == "半年内"){shaixuan_dianji_bannian();Thread.Sleep(3000);}}
4.4:加载页面后进入下拉执行JS
执行JS直到执行完毕没有新的视频即可开始获取 视频列表
4.5:获取视频列表
其实对于短视频平台 通过关键词搜索出来的视频展示 我们把他看做成一个表格循环。这样就能方便我们实现代码
实现获取视频列表的逻辑:先找到DY搜索视频页面中的 所谓的表格大标签,然后在找到每个视频的小标签
先找到大标签 有多少个 就代表有多少个视频
代码如下 ```csharp```csharp
int video_id_data_count = 0;string htmlContent = chromeBrowser.GetSourceAsync().Result;jieshu = htmlContent;string input = htmlContent;string input1 = input;// < li class="MgWTwktU B9KMVC9A">string pattern = "<li class=\"SwZLHMKk SEbmeLLH\">(.*?)</li>";// string pattern = "<li class=\"HN50D2ec Z3LKqldT\">(.*?)</li>";// string pattern = "<li class=\"MgWTwktU B9KMVC9A\">(.*?)</li>";// string pattern = "<li class=\"MgWTwktU search-result-card B9KMVC9A\">(.*?)</li>";MatchCollection matches = Regex.Matches(input1, pattern);foreach (Match match in matches){t_rizhi.Text += "开始循环读取视频列表内容"+"\r\n";string aaaaa = match.Groups[1].Value;string url = aaaaa;// string biaoti=string pattern1 = @"\/video\/(\d+)";Match match1 = Regex.Match(url, pattern1);//视频IDstring biaoti = "";string pattern_biaoti = @"<div class=""VDYK8Xd7"">(.*?)</div>";Match match_biaoti = Regex.Match(url, pattern_biaoti);string zuozhe = "";string pattern_zuozhe = @"<span class=""MZNczJmS"">(.*?)</span>";Match match_zuozhe = Regex.Match(url, pattern_zuozhe);string shipin_dates = "";string shipin_dates_add = "";string day = "";string pattern_shipin_dates = @"<span class=""faDtinfi"">(.*?)</span>";Match match_shipin_dates = Regex.Match(url, pattern_shipin_dates);if (match_biaoti.Success)//标题{t_rizhi.Text += "获取到视频标题"+"\r\n";biaoti = match_biaoti.Groups[1].Value;}if (match_zuozhe.Success)//作者{t_rizhi.Text += "获取到视频作者"+"\r\n";zuozhe = match_zuozhe.Groups[1].Value;}if (match_shipin_dates.Success)//作者{t_rizhi.Text += "获取到视频日期"+"\r\n";shipin_dates = match_shipin_dates.Groups[1].Value;if (shipin_dates.Contains("刚刚")){// pinglun_riqi_yuanshi = extraInfo;// int index = shipin_dates.IndexOf("天");//day = shipin_dates.Substring(0, index);DateTime dt = DateTime.Now.Date;shipin_dates_add = dt.ToShortDateString();t_rizhi.Text += "成功解析视频日期"+"\r\n";}if (shipin_dates.Contains("天")){int index = shipin_dates.IndexOf("天");day = shipin_dates.Substring(0, index);DateTime dt = DateTime.Now.Date.AddDays(-Convert.ToInt32(Convert.ToInt32(day)));shipin_dates_add = dt.ToShortDateString();t_rizhi.Text += "成功解析视频日期"+"\r\n";}if (shipin_dates.Contains("月")){//pinglun_riqi_yuanshi = extraInfo;int index = shipin_dates.IndexOf("月");day = shipin_dates.Substring(0, index);DateTime dt = DateTime.Now.Date.AddMonths(-Convert.ToInt32(Convert.ToInt32(day)));shipin_dates_add = dt.ToShortDateString();t_rizhi.Text += "成功解析视频日期"+"\r\n";}if (shipin_dates.Contains("小时")){// pinglun_riqi_yuanshi = extraInfo;int index = shipin_dates.IndexOf("小时");day = shipin_dates.Substring(0, index);DateTime dt = DateTime.Now.Date.AddHours(-Convert.ToInt32(Convert.ToInt32(day)));shipin_dates_add = dt.ToString();t_rizhi.Text += "成功解析视频日期"+"\r\n";}if (shipin_dates.Contains("分钟")){//pinglun_riqi_yuanshi = extraInfo;int index = shipin_dates.IndexOf("分钟");day = shipin_dates.Substring(0, index);DateTime dt = DateTime.Now.Date.AddMinutes(-Convert.ToInt32(Convert.ToInt32(day)));shipin_dates_add = dt.ToString();t_rizhi.Text += "成功解析视频日期"+"\r\n";}if (shipin_dates.Contains("周")){// pinglun_riqi_yuanshi = extraInfo;int index = shipin_dates.IndexOf("周");day = shipin_dates.Substring(0, index);int week = (Convert.ToInt32(day) * 7);DateTime dt = DateTime.Now.Date.AddDays(-Convert.ToInt32(week));shipin_dates_add = dt.ToShortDateString();t_rizhi.Text += "成功解析视频日期"+"\r\n";}if (shipin_dates.Contains("年")){// pinglun_riqi_yuanshi = extraInfo;int index = shipin_dates.IndexOf("年");day = shipin_dates.Substring(0, index);DateTime dt = DateTime.Now.AddYears(-Convert.ToInt32(Convert.ToInt32(day)));shipin_dates_add = dt.ToShortDateString();t_rizhi.Text += "成功解析视频日期"+"\r\n";}}if (match1.Success){string id = match1.Groups[1].Value;//视频IDt_rizhi.Text += "解析视频ID"+"\r\n";string li_id = "";int li_count = 0;int i = 0;string shipin_id = "";//用于获取列表中已获取到的视频IDif (c_quchong.Text == "是"){while (i < list_view_shipin.Items.Count){t_rizhi.Text += "判断视频ID是否已获取" + "\r\n";shipin_id = list_view_shipin.Items[i].SubItems[3].Text;if (id.Trim() == shipin_id.ToString().Trim()){li_count = 1;break;//如果存在跳出循环}i = i + 1;}}if (li_count == 0){list_view_shipin.Invoke(new MethodInvoker(() =>{System.Windows.Forms.ListViewItem item1 = new System.Windows.Forms.ListViewItem("item1", 0);item1.Text = biaoti.Trim();//视频标题item1.SubItems.Add(zuozhe.Trim());//视频作者item1.SubItems.Add(shipin_dates_add);//视频日期item1.SubItems.Add(id.Trim());//视频IDstring video_url = "https://douyin.com/video/" + id + "";item1.SubItems.Add(video_url.Trim());//视频地址list_view_shipin.Items.Add(item1);t_rizhi.Text += "成功储存视频数据"+"\r\n";list_video_count.Text = list_view_shipin.Items.Count.ToString();}));}
```csharp
在这里插入代码片
// listBox1.Items.Add(id);//添加listview// add_video_sousuo(id);}}