从.net中的youtube url中提取视频ID

本文关键字:提取 视频 ID url net 中的 youtube | 更新日期: 2023-09-27 18:09:45

我很难用正则表达式从youtube url中提取视频ID。

"(?:.+?)?(?:''/v''/|watch''/|''?v=|''&v=|youtu''.be''/|''/v=|^youtu''.be''/)([a-zA-Z0-9_-]{11})+";

它可以工作,因为它与视频ID匹配,但我想在youtube域中限制它,如果域与youtube.com或youtu.be不同,我不希望它与ID匹配。不幸的是,我无法理解这个正则表达式来应用限制。

我只想在域为时匹配id

  • www.youtube.com
  • youtube.com
  • youtu.be
  • www.youtu.be

前端有http或https(或没有(

上面提到的regex成功地匹配了以下示例的youtube id:

"http://youtu.be/AAAAAAAAA01"
"http://www.youtube.com/embed/watch?feature=player_embedded&v=AAAAAAAAA02"
"http://www.youtube.com/embed/watch?v=AAAAAAAAA03"
"http://www.youtube.com/embed/v=AAAAAAAAA04"
"http://www.youtube.com/watch?feature=player_embedded&v=AAAAAAAAA05"
"http://www.youtube.com/watch?v=AAAAAAAAA06"
"http://www.youtube.com/v/AAAAAAAAA07"
"www.youtu.be/AAAAAAAAA08"
"youtu.be/AAAAAAAAA09"
"http://www.youtube.com/watch?v=i-AAAAAAA14&feature=related"
"http://www.youtube.com/attribution_link?u=/watch?v=AAAAAAAAA15&feature=share&a=9QlmP1yvjcllp0h3l0NwuA"
"http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&u=/watch?v=AAAAAAAAA16&feature=em-uploademail"
"http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&feature=em-uploademail&u=/watch?v=AAAAAAAAA17"
"http://www.youtube.com/v/A-AAAAAAA18?fs=1&rel=0"
"http://www.youtube.com/watch/AAAAAAAAA11"

当前检查url的代码是:

private const string YoutubeLinkRegex = "(?:.+?)?(?:''/v''/|watch''/|''?v=|''&v=|youtu''.be''/|''/v=|^youtu''.be''/)([a-zA-Z0-9_-]{11})+";
    private static Regex regexExtractId = new Regex(YoutubeLinkRegex, RegexOptions.Compiled);

    public string ExtractVideoIdFromUrl(string url)
    {
        //extract the id
        var regRes = regexExtractId.Match(url);
        if (regRes.Success)
        {
            return regRes.Groups[1].Value;
        }
        return null;
    }

从.net中的youtube url中提取视频ID

这里不需要使用正则表达式

var url = @"https://www.youtube.com/watch?v=6QlW4m9xVZY";
var uri = new Uri(url);
// you can check host here => uri.Host <= "www.youtube.com"
var query = HttpUtility.ParseQueryString(uri.Query);
var videoId = query["v"];
// videoId = 6QlW4m9xVZY

好的,上面的例子是有效的,当你有v=videoId作为参数。如果你有视频ID作为片段,你可以使用这个:

var url = "http://youtu.be/AAAAAAAAA09";
var uri = new Uri(url);
var videoid = uri.Segments.Last(); // AAAAAAAAA09

综合起来,我们可以得到

var url = @"https://www.youtube.com/watch?v=Lvcyj1GfpGY&list=PLolZLFndMkSIYef2O64OLgT-njaPYDXqy";
var uri = new Uri(url);
// you can check host here => uri.Host <= "www.youtube.com"
var query = HttpUtility.ParseQueryString(uri.Query);
var videoId = string.Empty;
if (query.AllKeys.Contains("v"))
{
    videoId = query["v"];
}
else
{
    videoId = uri.Segments.Last();
}

当然,我对你的要求一无所知,但是,我希望它能有所帮助。

问题是正则表达式无法在挖掘操作之前检查所需的字符串,同时将此字符串用作挖掘操作本身。

例如,让我们检查"http://www.youtu.be/v/AAAAAAAAA07"YouTu.be在URL的开头是强制性的,但挖掘操作是"/v/(11 chars)"

"http://www.youtu.be/AAAAAAAAA07",挖掘操作为"youtu.be/(11 chars)"

这不可能在同一个正则表达式中,这就是为什么我们不能在同一正则表达式中检查域提取id。

我决定从有效域列表中检查域授权,然后从URL中提取id。

 private const string YoutubeLinkRegex = "(?:.+?)?(?:''/v''/|watch''/|''?v=|''&v=|youtu''.be''/|''/v=|^youtu''.be''/)([a-zA-Z0-9_-]{11})+";
 private static Regex regexExtractId = new Regex(YoutubeLinkRegex, RegexOptions.Compiled);
 private static string[] validAuthorities = { "youtube.com", "www.youtube.com", "youtu.be", "www.youtu.be" };
 public string ExtractVideoIdFromUri(Uri uri)
 {
     try
     {
        string authority = new UriBuilder(uri).Uri.Authority.ToLower();
        //check if the url is a youtube url
        if (validAuthorities.Contains(authority))
        {
            //and extract the id
            var regRes = regexExtractId.Match(uri.ToString());
            if (regRes.Success)
            {
                return regRes.Groups[1].Value;
            }
        }
     }catch{}

     return null;
 }

UriBuilder是优选的,因为它可以理解比Uri类更广泛的URL。它可以从不包含方案(如"youtube.com"(的URL创建Uri

该函数使用以下测试URL返回null(正确(:

"ww.youtube.com/v/AAAAAAAAA13"
"http:/www.youtube.com/v/AAAAAAAAA13"
"http://www.youtub1e.com/v/AAAAAAAAA13"
"http://www.vimeo.com/v/AAAAAAAAA13"
"www.youtube.com/b/AAAAAAAAA13"
"www.youtube.com/v/AAAAAAAAA1"
"www.youtube.com/v/AAAAAAAAA1&"
"www.youtube.com/v/AAAAAAAAA1/"
".youtube.com/v/AAAAAAAAA13"

tym32167的答案在url没有方案时在var uri = new Uri(url);抛出异常,如"www.youtu.be/AAAAAAAAAAA 08"。

此外,某些URL返回了错误的videoId

  • "http://www.youtube.com/embed/v=AAAAAAAAA04"->"v=AAAAAAAAA 04">
  • "http://www.youtube.com/attribution_link?u=/watch?v=AAAAAAAAA15&feature=share&a=9QlmP1yvjcllp0h3l0NwuA"->"归因链接">
  • "http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&u=/手表?v=AAAAAAAAA16&feature=em上传电子邮件"->"归因链接">
  • "http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&feature=em上传电子邮件&u=/手表?v=AAAAAAAAA 17"->"归因链接">

这是我基于tym3167的代码。

    static private string GetYouTubeVideoIdFromUrl(string url)
    {
        Uri uri = null;
        if (!Uri.TryCreate(url, UriKind.Absolute, out uri))
        {
            try
            {
                uri = new UriBuilder("http", url).Uri;
            }
            catch
            {
                // invalid url
                return "";
            }
        }
        string host = uri.Host;
        string[] youTubeHosts = { "www.youtube.com", "youtube.com", "youtu.be", "www.youtu.be" };
        if (!youTubeHosts.Contains(host))
            return "";
        var query = HttpUtility.ParseQueryString(uri.Query);
        if (query.AllKeys.Contains("v"))
        {
            return Regex.Match(query["v"], @"^[a-zA-Z0-9_-]{11}$").Value;
        }
        else if (query.AllKeys.Contains("u"))
        {
            // some urls have something like "u=/watch?v=AAAAAAAAA16"
            return Regex.Match(query["u"], @"/watch'?v=([a-zA-Z0-9_-]{11})").Groups[1].Value;
        }
        else
        {
            // remove a trailing forward space
            var last = uri.Segments.Last().Replace("/", "");
            if (Regex.IsMatch(last, @"^v=[a-zA-Z0-9_-]{11}$"))
                return last.Replace("v=", "");
            string[] segments = uri.Segments;
            if (segments.Length > 2 && segments[segments.Length - 2] != "v/" && segments[segments.Length - 2] != "watch/")
                return "";
            return Regex.Match(last, @"^[a-zA-Z0-9_-]{11}$").Value;
        }
    }

让我们测试一下。

        string[] urls = {"http://youtu.be/AAAAAAAAA01",
            "http://www.youtube.com/embed/watch?feature=player_embedded&v=AAAAAAAAA02",
            "http://www.youtube.com/embed/watch?v=AAAAAAAAA03",
            "http://www.youtube.com/embed/v=AAAAAAAAA04",
            "http://www.youtube.com/watch?feature=player_embedded&v=AAAAAAAAA05",
            "http://www.youtube.com/watch?v=AAAAAAAAA06",
            "http://www.youtube.com/v/AAAAAAAAA07",
            "www.youtu.be/AAAAAAAAA08",
            "youtu.be/AAAAAAAAA09",
            "http://www.youtube.com/watch?v=i-AAAAAAA14&feature=related",
            "http://www.youtube.com/attribution_link?u=/watch?v=AAAAAAAAA15&feature=share&a=9QlmP1yvjcllp0h3l0NwuA",
            "http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&u=/watch?v=AAAAAAAAA16&feature=em-uploademail",
            "http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&feature=em-uploademail&u=/watch?v=AAAAAAAAA17",
            "http://www.youtube.com/v/A-AAAAAAA18?fs=1&rel=0",
            "http://www.youtube.com/watch/AAAAAAAAA11",};
        Console.WriteLine("***Youtube urls***");
        foreach (string url in urls)
        {
            Console.WriteLine("{0}'n-> {1}", url, GetYouTubeVideoIdFromUrl(url));
        }
        string[] invalidUrls = {
            "ww.youtube.com/v/AAAAAAAAA13",
            "http:/www.youtube.com/v/AAAAAAAAA13",
            "http://www.youtub1e.com/v/AAAAAAAAA13",
            "http://www.vimeo.com/v/AAAAAAAAA13",
            "www.youtube.com/b/AAAAAAAAA13",
            "www.youtube.com/v/AAAAAAAAA1",
            "www.youtube.com/v/AAAAAAAAA1&",
            "www.youtube.com/v/AAAAAAAAA1/",
            ".youtube.com/v/AAAAAAAAA13"};
        Console.WriteLine("***Invalid youtube urls***");
        foreach (string url in invalidUrls)
        {
            Console.WriteLine("{0}'n-> {1}", url, GetYouTubeVideoIdFromUrl(url));
        }

结果(一切正常(

***Youtube urls***
http://youtu.be/AAAAAAAAA01
-> AAAAAAAAA01
http://www.youtube.com/embed/watch?feature=player_embedded&v=AAAAAAAAA02
-> AAAAAAAAA02
http://www.youtube.com/embed/watch?v=AAAAAAAAA03
-> AAAAAAAAA03
http://www.youtube.com/embed/v=AAAAAAAAA04
-> AAAAAAAAA04
http://www.youtube.com/watch?feature=player_embedded&v=AAAAAAAAA05
-> AAAAAAAAA05
http://www.youtube.com/watch?v=AAAAAAAAA06
-> AAAAAAAAA06
http://www.youtube.com/v/AAAAAAAAA07
-> AAAAAAAAA07
www.youtu.be/AAAAAAAAA08
-> AAAAAAAAA08
youtu.be/AAAAAAAAA09
-> AAAAAAAAA09
http://www.youtube.com/watch?v=i-AAAAAAA14&feature=related
-> i-AAAAAAA14
http://www.youtube.com/attribution_link?u=/watch?v=AAAAAAAAA15&feature=share&a=9QlmP1yvjcllp0h3l0NwuA
-> AAAAAAAAA15
http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&u=/watch?v=AAAAAAAAA16&feature=em-uploademail
-> AAAAAAAAA16
http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&feature=em-uploademail&u=/watch?v=AAAAAAAAA17
-> AAAAAAAAA17
http://www.youtube.com/v/A-AAAAAAA18?fs=1&rel=0
-> A-AAAAAAA18
http://www.youtube.com/watch/AAAAAAAAA11
-> AAAAAAAAA11

***Invalid youtube urls***
ww.youtube.com/v/AAAAAAAAA13
-> 
http:/www.youtube.com/v/AAAAAAAAA13
-> 
http://www.youtub1e.com/v/AAAAAAAAA13
-> 
http://www.vimeo.com/v/AAAAAAAAA13
-> 
www.youtube.com/b/AAAAAAAAA13
-> 
www.youtube.com/v/AAAAAAAAA1
-> 
www.youtube.com/v/AAAAAAAAA1&
-> 
www.youtube.com/v/AAAAAAAAA1/
-> 
.youtube.com/v/AAAAAAAAA13
-> 

正如septih在这里所说

我拿着这些例子玩了一玩,想出了以下几个例子:.

Youtube:youtu(?:'.be|be'.com)/(?:.*v(?:/|=)|(?:.*/)?)([a-zA-Z0-9-_]+)他们应该匹配所有给定的。(?:…(表示括号内的所有内容都不会被捕获。因此,应该只获取id。

这应该做到:

public static string GetYouTubeId(string url) {
    var regex = @"(?:youtube'.com'/(?:[^'/]+'/.+'/|(?:v|e(?:mbed)?|watch)'/|.*[?&amp;]v=)|youtu'.be'/)([^""&amp;?'/ ]{11})";
    var match = Regex.Match(url, regex);
    if (match.Success)
    {
        return match.Groups[1].Value;
    }
    return url;
  }

这是我对前面的答案的$0.2,添加了安全检查,确保您不会遇到带有一些边缘情况输入的Length cannot be less than zero错误。

    public static string LinkifyYoutube(this string url)
    {
        if (!url.Contains("data-linkified"))
        {
            return "";
        }
        int pos1 = url.IndexOf("<a target=''_blank'' data-linkified href=''", StringComparison.Ordinal);
        int pos2 = url.IndexOf("</a>", StringComparison.Ordinal);
        if (pos1 <= -1 || pos2 - pos1 <= 0)
        {
            return "";
        }
        url = url.Substring(pos1, pos2 - pos1);
        url = url.Replace("<a target=''_blank'' data-linkified href=''", "");
        url = url.Replace("''>", "");
        url = url.Replace("</a>", "");
        var zh = url.LastIndexOf("https", StringComparison.Ordinal);
        if (zh <= 0)
        {
            return "";
        }
        url = url.Substring(0, zh);
        Uri uri = null;
        if (!Uri.TryCreate(url, UriKind.Absolute, out uri))
        {
            try
            {
                uri = new UriBuilder("http", url).Uri;
            }
            catch
            {
                return "";
            }
        }
        string host = uri.Host;
        string[] youTubeHosts = { "www.youtube.com", "youtube.com", "youtu.be", "www.youtu.be" };
        if (!youTubeHosts.Contains(host))
        {
            return "";
        }
        var query = HttpUtility.ParseQueryString(uri.Query);
        if (query.AllKeys.Contains("v"))
        {
            return Regex.Match(query["v"], @"^[a-zA-Z0-9_-]{11}$").Value;
        }
        else if (query.AllKeys.Contains("u"))
        {
            return Regex.Match(query["u"], @"/watch'?v=([a-zA-Z0-9_-]{11})").Groups[1].Value;
        }
        else
        {
            var last = uri.Segments.Last().Replace("/", "");
            if (Regex.IsMatch(last, @"^v=[a-zA-Z0-9_-]{11}$"))
            {
                return last.Replace("v=", "");
            }
            string[] segments = uri.Segments;
            if (segments.Length > 2 && segments[segments.Length - 2] != "v/" && segments[segments.Length - 2] != "watch/")
            {
                return "";
            }
            return Regex.Match(last, @"^[a-zA-Z0-9_-]{11}$").Value;
        }
    }