ASP.net解析html以确保其安全.这样可以吗

本文关键字：安全确保 net 解析 html ASP | 更新日期: 2023-09-27 18:29:54

我确信这个问题已经被问过很多次了，但我很难找到符合我想要的东西。我希望能够安全地在我的网页中呈现html，但只允许链接，
和

标签

我已经想出了以下办法，但我想确保我没有做出任何猜测，或者如果有更好的方法，请告诉我。

代码：

    private string RemoveEvilTags(string value)
    {
        string[] allowed = { "<br/>", "<p>", "</p>", "</a>", "<a href" };
        string anchorPattern = @"<a['s]+[^>]*?href['s]?=['s'""'']+(?<href>.*?)['""'']+.*?>(?<fileName>[^<]+|.*?‌)?<'/a>";            
        string safeText = value;
        System.Text.RegularExpressions.MatchCollection matches = Regex.Matches(value, anchorPattern, RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.Compiled);
        if (matches.Count > 0)
        {
            foreach (Match m in matches)
            {
                string url = m.Groups["href"].Value;
                string linkText = m.Groups["fileName"].Value;                    
                Uri testUri = null;
                if (Uri.TryCreate(url, UriKind.Absolute, out testUri) && testUri.AbsoluteUri.StartsWith("http"))
                {
                    safeText = safeText.Replace(m.Groups[0].Value, string.Format("<a href='"{0}'" >{1}</a>", testUri.AbsoluteUri, linkText));
                }
                else
                {
                    safeText = safeText.Replace(m.Groups[0].Value, linkText);
                }
            }
        }
        //Remove everything.
        safeText = System.Text.RegularExpressions.Regex.Replace(safeText, @"<[a-zA-Z'/][^>]*>", m => m != null && allowed.Contains(m.Value) || m.Value.StartsWith("<a href") ? m.Value : String.Empty);
        //Now add them back in.
        return safeText;
    }

测试：

    [Test]
    public void EvilTagTest()
    {
        var safeText = RemoveEvilTags("this is a test <p>ok</p>");
        Assert.AreEqual("this is a test <p>ok</p>", safeText);
        safeText = RemoveEvilTags("this is a test <script>ok</script>");
        Assert.AreEqual("this is a test ok", safeText);
        safeText = RemoveEvilTags("this is a test <script><script>ok</script></script>");
        Assert.AreEqual("this is a test ok", safeText);
        //Check relitive link
        safeText = RemoveEvilTags("this is a test <a href='"bob'" >click here</a>");
        Assert.AreEqual("this is a test click here", safeText);
        //Check full link
        safeText = RemoveEvilTags("this is a test <a href='"http://test.com/'" >click here</a>");
        Assert.AreEqual("this is a test <a href='"http://test.com/'" >click here</a>", safeText);
        //Check full link
        safeText = RemoveEvilTags("this is a test <a href='"https://test.com/'" >click here</a>");
        Assert.AreEqual("this is a test <a href='"https://test.com/'" >click here</a>", safeText);
        //javascript link
        safeText = RemoveEvilTags("this is a test <a href='"javascript:evil()'" >click here</a>");
        Assert.AreEqual("this is a test click here", safeText);
        safeText = RemoveEvilTags("this is a test <a href='"https://test.com/'" ><script>evil();</script>click here</a>");
        Assert.AreEqual("this is a test <a href='"https://test.com/'" >click here</a>", safeText);
    }

所有的测试都通过了，但我错过了什么？

谢谢。

ASP.net解析html以确保其安全.这样可以吗

对于最佳实践，您不应该将自己的库制作为"RemoveEvilTags"。恶意用户可以使用许多方法来执行XSS攻击。ASP.NET已经提供了一个反XSS库：

http://msdn.microsoft.com/en-us/library/aa973813.aspx

由于您使用的是ASP.NET，PluralSight在XSS上有一个很好的视频。更侧重于MVC，但它在这种情况下仍然有效。

http://www.pluralsight-training.net/microsoft/players/PSODPlayer?author=scott-allen&name＝mvc3建筑安全&mode=live&clip＝0&课程=aspdotnet-mvc3-intro

与其编写这样的代码，我建议您使用一些html解析器，如html敏捷包。

您的代码解析代码可能会遇到很多未处理的角落案例——希望解析器能处理大部分这些案例。解析后，您可以拒绝无效输入或只允许有效标记（根据需要）。