ASP.net解析html以确保其安全.这样可以吗
本文关键字:安全 确保 net 解析 html ASP | 更新日期: 2023-09-27 18:29:54
我确信这个问题已经被问过很多次了,但我很难找到符合我想要的东西。我希望能够安全地在我的网页中呈现html,但只允许链接,
和
标签
我已经想出了以下办法,但我想确保我没有做出任何猜测,或者如果有更好的方法,请告诉我。
代码:
private string RemoveEvilTags(string value)
{
string[] allowed = { "<br/>", "<p>", "</p>", "</a>", "<a href" };
string anchorPattern = @"<a['s]+[^>]*?href['s]?=['s'""'']+(?<href>.*?)['""'']+.*?>(?<fileName>[^<]+|.*?)?<'/a>";
string safeText = value;
System.Text.RegularExpressions.MatchCollection matches = Regex.Matches(value, anchorPattern, RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.Compiled);
if (matches.Count > 0)
{
foreach (Match m in matches)
{
string url = m.Groups["href"].Value;
string linkText = m.Groups["fileName"].Value;
Uri testUri = null;
if (Uri.TryCreate(url, UriKind.Absolute, out testUri) && testUri.AbsoluteUri.StartsWith("http"))
{
safeText = safeText.Replace(m.Groups[0].Value, string.Format("<a href='"{0}'" >{1}</a>", testUri.AbsoluteUri, linkText));
}
else
{
safeText = safeText.Replace(m.Groups[0].Value, linkText);
}
}
}
//Remove everything.
safeText = System.Text.RegularExpressions.Regex.Replace(safeText, @"<[a-zA-Z'/][^>]*>", m => m != null && allowed.Contains(m.Value) || m.Value.StartsWith("<a href") ? m.Value : String.Empty);
//Now add them back in.
return safeText;
}
测试:
[Test]
public void EvilTagTest()
{
var safeText = RemoveEvilTags("this is a test <p>ok</p>");
Assert.AreEqual("this is a test <p>ok</p>", safeText);
safeText = RemoveEvilTags("this is a test <script>ok</script>");
Assert.AreEqual("this is a test ok", safeText);
safeText = RemoveEvilTags("this is a test <script><script>ok</script></script>");
Assert.AreEqual("this is a test ok", safeText);
//Check relitive link
safeText = RemoveEvilTags("this is a test <a href='"bob'" >click here</a>");
Assert.AreEqual("this is a test click here", safeText);
//Check full link
safeText = RemoveEvilTags("this is a test <a href='"http://test.com/'" >click here</a>");
Assert.AreEqual("this is a test <a href='"http://test.com/'" >click here</a>", safeText);
//Check full link
safeText = RemoveEvilTags("this is a test <a href='"https://test.com/'" >click here</a>");
Assert.AreEqual("this is a test <a href='"https://test.com/'" >click here</a>", safeText);
//javascript link
safeText = RemoveEvilTags("this is a test <a href='"javascript:evil()'" >click here</a>");
Assert.AreEqual("this is a test click here", safeText);
safeText = RemoveEvilTags("this is a test <a href='"https://test.com/'" ><script>evil();</script>click here</a>");
Assert.AreEqual("this is a test <a href='"https://test.com/'" >click here</a>", safeText);
}
所有的测试都通过了,但我错过了什么?
谢谢。
对于最佳实践,您不应该将自己的库制作为"RemoveEvilTags"。恶意用户可以使用许多方法来执行XSS攻击。ASP.NET已经提供了一个反XSS库:
http://msdn.microsoft.com/en-us/library/aa973813.aspx
由于您使用的是ASP.NET,PluralSight在XSS上有一个很好的视频。更侧重于MVC,但它在这种情况下仍然有效。
http://www.pluralsight-training.net/microsoft/players/PSODPlayer?author=scott-allen&name=mvc3建筑安全&mode=live&clip=0&课程=aspdotnet-mvc3-intro
与其编写这样的代码,我建议您使用一些html解析器,如html敏捷包。
您的代码解析代码可能会遇到很多未处理的角落案例——希望解析器能处理大部分这些案例。解析后,您可以拒绝无效输入或只允许有效标记(根据需要)。