如何从字符串中剥离任何和所有HTML标记?

本文关键字：HTML 标记任何字符串剥离 | 更新日期: 2023-09-27 18:03:14

我有一个这样定义的字符串:

private const String REFER_TO_BUSINESS = "<pre> (Refer to business office for guidance and explain below the circumstances for exception to policy or attach a copy of request)</pre>";

…正如您所看到的，它有"pre"标记来保留附加在冗余词前的空间。但是，我想在没有"pre"标记的情况下引用这个字符串。搜索"

"answers"

"并删除它们是很容易的，但是对每一种HTML标记类型都这样做会很快变得乏味。

我如何在c#中，从字符串中剥离所有标签，不管它们是"

"，""，""，""还是其他任何东西?

如何从字符串中剥离任何和所有HTML标记?

尝试替换正则表达式。此模式匹配字符串中的html标记。从这里

        var pattern = @"</?'w+(('s+'w+('s*='s*(?:"".*?""|'.*?'|[^'"">'s]+))?)+'s*|'s*)/?>";
        var source = "<pre> (Refer to business office for guidance and explain below the circumstances for exception to policy or attach a copy of request)</pre>";
        Regex.Replace(source, pattern, string.Empty);

这应该做你需要它做的事情:

 string stripMeOfHTML = Regex.Replace(stripMeOfHTML, @"<[^>]+>", "").Trim();

这行得通:

// For strings that have embedded HTML tags for presentation on the form (such as "<pre>" and such), but need to be rendered free of these (such as on the PDF)
private String RemoveHTMLTags(String stringContainingHTMLTags)
{
    String regexified = Regex.Replace(stringContainingHTMLTags, "<.*?>", string.Empty);
    return regexified;
}