从.net中的youtube url中提取视频ID
本文关键字:提取 视频 ID url net 中的 youtube | 更新日期: 2023-09-27 18:09:45
我很难用正则表达式从youtube url中提取视频ID。
"(?:.+?)?(?:''/v''/|watch''/|''?v=|''&v=|youtu''.be''/|''/v=|^youtu''.be''/)([a-zA-Z0-9_-]{11})+";
它可以工作,因为它与视频ID匹配,但我想在youtube域中限制它,如果域与youtube.com或youtu.be不同,我不希望它与ID匹配。不幸的是,我无法理解这个正则表达式来应用限制。
我只想在域为时匹配id
- www.youtube.com
- youtube.com
- youtu.be
- www.youtu.be
前端有http或https(或没有(
上面提到的regex成功地匹配了以下示例的youtube id:
"http://youtu.be/AAAAAAAAA01"
"http://www.youtube.com/embed/watch?feature=player_embedded&v=AAAAAAAAA02"
"http://www.youtube.com/embed/watch?v=AAAAAAAAA03"
"http://www.youtube.com/embed/v=AAAAAAAAA04"
"http://www.youtube.com/watch?feature=player_embedded&v=AAAAAAAAA05"
"http://www.youtube.com/watch?v=AAAAAAAAA06"
"http://www.youtube.com/v/AAAAAAAAA07"
"www.youtu.be/AAAAAAAAA08"
"youtu.be/AAAAAAAAA09"
"http://www.youtube.com/watch?v=i-AAAAAAA14&feature=related"
"http://www.youtube.com/attribution_link?u=/watch?v=AAAAAAAAA15&feature=share&a=9QlmP1yvjcllp0h3l0NwuA"
"http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&u=/watch?v=AAAAAAAAA16&feature=em-uploademail"
"http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&feature=em-uploademail&u=/watch?v=AAAAAAAAA17"
"http://www.youtube.com/v/A-AAAAAAA18?fs=1&rel=0"
"http://www.youtube.com/watch/AAAAAAAAA11"
当前检查url的代码是:
private const string YoutubeLinkRegex = "(?:.+?)?(?:''/v''/|watch''/|''?v=|''&v=|youtu''.be''/|''/v=|^youtu''.be''/)([a-zA-Z0-9_-]{11})+";
private static Regex regexExtractId = new Regex(YoutubeLinkRegex, RegexOptions.Compiled);
public string ExtractVideoIdFromUrl(string url)
{
//extract the id
var regRes = regexExtractId.Match(url);
if (regRes.Success)
{
return regRes.Groups[1].Value;
}
return null;
}
这里不需要使用正则表达式
var url = @"https://www.youtube.com/watch?v=6QlW4m9xVZY";
var uri = new Uri(url);
// you can check host here => uri.Host <= "www.youtube.com"
var query = HttpUtility.ParseQueryString(uri.Query);
var videoId = query["v"];
// videoId = 6QlW4m9xVZY
好的,上面的例子是有效的,当你有v=videoId作为参数。如果你有视频ID作为片段,你可以使用这个:
var url = "http://youtu.be/AAAAAAAAA09";
var uri = new Uri(url);
var videoid = uri.Segments.Last(); // AAAAAAAAA09
综合起来,我们可以得到
var url = @"https://www.youtube.com/watch?v=Lvcyj1GfpGY&list=PLolZLFndMkSIYef2O64OLgT-njaPYDXqy";
var uri = new Uri(url);
// you can check host here => uri.Host <= "www.youtube.com"
var query = HttpUtility.ParseQueryString(uri.Query);
var videoId = string.Empty;
if (query.AllKeys.Contains("v"))
{
videoId = query["v"];
}
else
{
videoId = uri.Segments.Last();
}
当然,我对你的要求一无所知,但是,我希望它能有所帮助。
问题是正则表达式无法在挖掘操作之前检查所需的字符串,同时将此字符串用作挖掘操作本身。
例如,让我们检查"http://www.youtu.be/v/AAAAAAAAA07"
YouTu.be在URL的开头是强制性的,但挖掘操作是"/v/(11 chars)"
在"http://www.youtu.be/AAAAAAAAA07"
,挖掘操作为"youtu.be/(11 chars)"
这不可能在同一个正则表达式中,这就是为什么我们不能在同一正则表达式中检查域和提取id。
我决定从有效域列表中检查域授权,然后从URL中提取id。
private const string YoutubeLinkRegex = "(?:.+?)?(?:''/v''/|watch''/|''?v=|''&v=|youtu''.be''/|''/v=|^youtu''.be''/)([a-zA-Z0-9_-]{11})+";
private static Regex regexExtractId = new Regex(YoutubeLinkRegex, RegexOptions.Compiled);
private static string[] validAuthorities = { "youtube.com", "www.youtube.com", "youtu.be", "www.youtu.be" };
public string ExtractVideoIdFromUri(Uri uri)
{
try
{
string authority = new UriBuilder(uri).Uri.Authority.ToLower();
//check if the url is a youtube url
if (validAuthorities.Contains(authority))
{
//and extract the id
var regRes = regexExtractId.Match(uri.ToString());
if (regRes.Success)
{
return regRes.Groups[1].Value;
}
}
}catch{}
return null;
}
UriBuilder
是优选的,因为它可以理解比Uri
类更广泛的URL。它可以从不包含方案(如"youtube.com"
(的URL创建Uri
。
该函数使用以下测试URL返回null(正确(:
"ww.youtube.com/v/AAAAAAAAA13"
"http:/www.youtube.com/v/AAAAAAAAA13"
"http://www.youtub1e.com/v/AAAAAAAAA13"
"http://www.vimeo.com/v/AAAAAAAAA13"
"www.youtube.com/b/AAAAAAAAA13"
"www.youtube.com/v/AAAAAAAAA1"
"www.youtube.com/v/AAAAAAAAA1&"
"www.youtube.com/v/AAAAAAAAA1/"
".youtube.com/v/AAAAAAAAA13"
tym32167的答案在url
没有方案时在var uri = new Uri(url);
抛出异常,如"www.youtu.be/AAAAAAAAAAA 08"。
此外,某些URL返回了错误的videoId
。
- "http://www.youtube.com/embed/v=AAAAAAAAA04"->"v=AAAAAAAAA 04">
- "http://www.youtube.com/attribution_link?u=/watch?v=AAAAAAAAA15&feature=share&a=9QlmP1yvjcllp0h3l0NwuA"->"归因链接">
- "http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&u=/手表?v=AAAAAAAAA16&feature=em上传电子邮件"->"归因链接">
- "http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&feature=em上传电子邮件&u=/手表?v=AAAAAAAAA 17"->"归因链接">
这是我基于tym3167的代码。
static private string GetYouTubeVideoIdFromUrl(string url)
{
Uri uri = null;
if (!Uri.TryCreate(url, UriKind.Absolute, out uri))
{
try
{
uri = new UriBuilder("http", url).Uri;
}
catch
{
// invalid url
return "";
}
}
string host = uri.Host;
string[] youTubeHosts = { "www.youtube.com", "youtube.com", "youtu.be", "www.youtu.be" };
if (!youTubeHosts.Contains(host))
return "";
var query = HttpUtility.ParseQueryString(uri.Query);
if (query.AllKeys.Contains("v"))
{
return Regex.Match(query["v"], @"^[a-zA-Z0-9_-]{11}$").Value;
}
else if (query.AllKeys.Contains("u"))
{
// some urls have something like "u=/watch?v=AAAAAAAAA16"
return Regex.Match(query["u"], @"/watch'?v=([a-zA-Z0-9_-]{11})").Groups[1].Value;
}
else
{
// remove a trailing forward space
var last = uri.Segments.Last().Replace("/", "");
if (Regex.IsMatch(last, @"^v=[a-zA-Z0-9_-]{11}$"))
return last.Replace("v=", "");
string[] segments = uri.Segments;
if (segments.Length > 2 && segments[segments.Length - 2] != "v/" && segments[segments.Length - 2] != "watch/")
return "";
return Regex.Match(last, @"^[a-zA-Z0-9_-]{11}$").Value;
}
}
让我们测试一下。
string[] urls = {"http://youtu.be/AAAAAAAAA01",
"http://www.youtube.com/embed/watch?feature=player_embedded&v=AAAAAAAAA02",
"http://www.youtube.com/embed/watch?v=AAAAAAAAA03",
"http://www.youtube.com/embed/v=AAAAAAAAA04",
"http://www.youtube.com/watch?feature=player_embedded&v=AAAAAAAAA05",
"http://www.youtube.com/watch?v=AAAAAAAAA06",
"http://www.youtube.com/v/AAAAAAAAA07",
"www.youtu.be/AAAAAAAAA08",
"youtu.be/AAAAAAAAA09",
"http://www.youtube.com/watch?v=i-AAAAAAA14&feature=related",
"http://www.youtube.com/attribution_link?u=/watch?v=AAAAAAAAA15&feature=share&a=9QlmP1yvjcllp0h3l0NwuA",
"http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&u=/watch?v=AAAAAAAAA16&feature=em-uploademail",
"http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&feature=em-uploademail&u=/watch?v=AAAAAAAAA17",
"http://www.youtube.com/v/A-AAAAAAA18?fs=1&rel=0",
"http://www.youtube.com/watch/AAAAAAAAA11",};
Console.WriteLine("***Youtube urls***");
foreach (string url in urls)
{
Console.WriteLine("{0}'n-> {1}", url, GetYouTubeVideoIdFromUrl(url));
}
string[] invalidUrls = {
"ww.youtube.com/v/AAAAAAAAA13",
"http:/www.youtube.com/v/AAAAAAAAA13",
"http://www.youtub1e.com/v/AAAAAAAAA13",
"http://www.vimeo.com/v/AAAAAAAAA13",
"www.youtube.com/b/AAAAAAAAA13",
"www.youtube.com/v/AAAAAAAAA1",
"www.youtube.com/v/AAAAAAAAA1&",
"www.youtube.com/v/AAAAAAAAA1/",
".youtube.com/v/AAAAAAAAA13"};
Console.WriteLine("***Invalid youtube urls***");
foreach (string url in invalidUrls)
{
Console.WriteLine("{0}'n-> {1}", url, GetYouTubeVideoIdFromUrl(url));
}
结果(一切正常(
***Youtube urls***
http://youtu.be/AAAAAAAAA01
-> AAAAAAAAA01
http://www.youtube.com/embed/watch?feature=player_embedded&v=AAAAAAAAA02
-> AAAAAAAAA02
http://www.youtube.com/embed/watch?v=AAAAAAAAA03
-> AAAAAAAAA03
http://www.youtube.com/embed/v=AAAAAAAAA04
-> AAAAAAAAA04
http://www.youtube.com/watch?feature=player_embedded&v=AAAAAAAAA05
-> AAAAAAAAA05
http://www.youtube.com/watch?v=AAAAAAAAA06
-> AAAAAAAAA06
http://www.youtube.com/v/AAAAAAAAA07
-> AAAAAAAAA07
www.youtu.be/AAAAAAAAA08
-> AAAAAAAAA08
youtu.be/AAAAAAAAA09
-> AAAAAAAAA09
http://www.youtube.com/watch?v=i-AAAAAAA14&feature=related
-> i-AAAAAAA14
http://www.youtube.com/attribution_link?u=/watch?v=AAAAAAAAA15&feature=share&a=9QlmP1yvjcllp0h3l0NwuA
-> AAAAAAAAA15
http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&u=/watch?v=AAAAAAAAA16&feature=em-uploademail
-> AAAAAAAAA16
http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&feature=em-uploademail&u=/watch?v=AAAAAAAAA17
-> AAAAAAAAA17
http://www.youtube.com/v/A-AAAAAAA18?fs=1&rel=0
-> A-AAAAAAA18
http://www.youtube.com/watch/AAAAAAAAA11
-> AAAAAAAAA11
***Invalid youtube urls***
ww.youtube.com/v/AAAAAAAAA13
->
http:/www.youtube.com/v/AAAAAAAAA13
->
http://www.youtub1e.com/v/AAAAAAAAA13
->
http://www.vimeo.com/v/AAAAAAAAA13
->
www.youtube.com/b/AAAAAAAAA13
->
www.youtube.com/v/AAAAAAAAA1
->
www.youtube.com/v/AAAAAAAAA1&
->
www.youtube.com/v/AAAAAAAAA1/
->
.youtube.com/v/AAAAAAAAA13
->
正如septih在这里所说
我拿着这些例子玩了一玩,想出了以下几个例子:.
Youtube:
youtu(?:'.be|be'.com)/(?:.*v(?:/|=)|(?:.*/)?)([a-zA-Z0-9-_]+)
他们应该匹配所有给定的。(?:…(表示括号内的所有内容都不会被捕获。因此,应该只获取id。
这应该做到:
public static string GetYouTubeId(string url) {
var regex = @"(?:youtube'.com'/(?:[^'/]+'/.+'/|(?:v|e(?:mbed)?|watch)'/|.*[?&]v=)|youtu'.be'/)([^""&?'/ ]{11})";
var match = Regex.Match(url, regex);
if (match.Success)
{
return match.Groups[1].Value;
}
return url;
}
这是我对前面的答案的$0.2,添加了安全检查,确保您不会遇到带有一些边缘情况输入的Length cannot be less than zero
错误。
public static string LinkifyYoutube(this string url)
{
if (!url.Contains("data-linkified"))
{
return "";
}
int pos1 = url.IndexOf("<a target=''_blank'' data-linkified href=''", StringComparison.Ordinal);
int pos2 = url.IndexOf("</a>", StringComparison.Ordinal);
if (pos1 <= -1 || pos2 - pos1 <= 0)
{
return "";
}
url = url.Substring(pos1, pos2 - pos1);
url = url.Replace("<a target=''_blank'' data-linkified href=''", "");
url = url.Replace("''>", "");
url = url.Replace("</a>", "");
var zh = url.LastIndexOf("https", StringComparison.Ordinal);
if (zh <= 0)
{
return "";
}
url = url.Substring(0, zh);
Uri uri = null;
if (!Uri.TryCreate(url, UriKind.Absolute, out uri))
{
try
{
uri = new UriBuilder("http", url).Uri;
}
catch
{
return "";
}
}
string host = uri.Host;
string[] youTubeHosts = { "www.youtube.com", "youtube.com", "youtu.be", "www.youtu.be" };
if (!youTubeHosts.Contains(host))
{
return "";
}
var query = HttpUtility.ParseQueryString(uri.Query);
if (query.AllKeys.Contains("v"))
{
return Regex.Match(query["v"], @"^[a-zA-Z0-9_-]{11}$").Value;
}
else if (query.AllKeys.Contains("u"))
{
return Regex.Match(query["u"], @"/watch'?v=([a-zA-Z0-9_-]{11})").Groups[1].Value;
}
else
{
var last = uri.Segments.Last().Replace("/", "");
if (Regex.IsMatch(last, @"^v=[a-zA-Z0-9_-]{11}$"))
{
return last.Replace("v=", "");
}
string[] segments = uri.Segments;
if (segments.Length > 2 && segments[segments.Length - 2] != "v/" && segments[segments.Length - 2] != "watch/")
{
return "";
}
return Regex.Match(last, @"^[a-zA-Z0-9_-]{11}$").Value;
}
}