如何从字符串中剥离任何和所有HTML标记?
本文关键字:HTML 标记 任何 字符串 剥离 | 更新日期: 2023-09-27 18:03:14
我有一个这样定义的字符串:
private const String REFER_TO_BUSINESS = "<pre> (Refer to business office for guidance and explain below the circumstances for exception to policy or attach a copy of request)</pre>";
…正如您所看到的,它有"pre"标记来保留附加在冗余词前的空间。但是,我想在没有"pre"标记的情况下引用这个字符串。搜索"
"answers""并删除它们是很容易的,但是对每一种HTML标记类型都这样做会很快变得乏味。
我如何在c#中,从字符串中剥离所有标签,不管它们是"
","","","
尝试替换正则表达式。此模式匹配字符串中的html标记。从这里
var pattern = @"</?'w+(('s+'w+('s*='s*(?:"".*?""|'.*?'|[^'"">'s]+))?)+'s*|'s*)/?>";
var source = "<pre> (Refer to business office for guidance and explain below the circumstances for exception to policy or attach a copy of request)</pre>";
Regex.Replace(source, pattern, string.Empty);
这应该做你需要它做的事情:
string stripMeOfHTML = Regex.Replace(stripMeOfHTML, @"<[^>]+>", "").Trim();
这行得通:
// For strings that have embedded HTML tags for presentation on the form (such as "<pre>" and such), but need to be rendered free of these (such as on the PDF)
private String RemoveHTMLTags(String stringContainingHTMLTags)
{
String regexified = Regex.Replace(stringContainingHTMLTags, "<.*?>", string.Empty);
return regexified;
}