在 C# 中解析字符串的多个部分的干净方法是什么

本文关键字：方法是什么个部字符串 | 更新日期: 2023-09-27 18:33:52

我有一个可能包含XML和普通字符串的字符串。我需要解析字符串中所有实例的<math....</math>。如何从此字符串中解析出此字符串的多个部分（从<math>到</math>）？

Here is some content <math
xmlns="http://www.w3.org/1998/Math/MathML">  
<mi>a</mi><mo>&#x2260;</mo><mn>0</mn> </math>, that is mixed in with
this other content <math xmlns="http://www.w3.org/1998/Math/MathML">  
<mi>a</mi><msup><mi>x</mi><mn>2</mn></msup>   <mo>+</mo>
<mi>b</mi><mi>x</mi>   <mo>+</mo> <mi>c</mi> <mo>=</mo> <mn>0</mn>
</math> we want to be able to seperate this string

背景：我试图使这个问题变得通用。我正在尝试做的事情的细节是针对 MVC3 编码与 Raw。默认情况下，它将对所有内容进行编码。我不希望它对MathML进行编码，但确实希望它对其他所有内容进行编码。因此，我想将其的一部分渲染为Html.Raw（MathML部分），其余部分我想渲染为正常编码的字符串。

如果您通常可以期望 XML 格式正确，或者至少格式一致，那么您应该能够使用正则表达式来去除 XML。

您可以尝试使用 Expresso 来制作您的表达式。

如果要解析剥离的XML，这是.NET XMLParser的工作。

我不是正则表达式 boffin，但这是我尝试过的，我得到了正确的结果。请将其用作基础，并在必要时对其进行修改。

我从Stackoverflow上的这篇文章中得到了它。

string yourstring = "<math xmlns='"http://www.w3.org/1998/Math/MathML'">   <mi>a</mi><mo>&#x2260;</mo><mn>0</mn> </math>, that is mixed in with this other content <math xmlns='"http://www.w3.org/1998/Math/MathML'">   <mi>a</mi><msup><mi>x</mi><mn>2</mn></msup>   <mo>+</mo> <mi>b</mi><mi>x</mi>   <mo>+</mo> <mi>c</mi> <mo>=</mo> <mn>0</mn> </math>";
try
{
     yourstring = Regex.Replace(yourstring, "(<math[^>]+>.+?</math>)", "");
}
catch (ArgumentException ex)
{
     // Syntax error in the regular expression
}

生成的字符串为：

, that is mixed in with this other content