获取<>;里面有动态数字

本文关键字:动态 数字 lt gt 获取 | 更新日期: 2023-09-27 18:28:00

我正在研究一种文本摘要方法,为了测试我的方法,我有一个名为doc 2007的基准,在这个基准中我有很多xml文件,我应该清除那个文件。

例如,我有一个xml文件,如下所示:

<sentence id='s0'>
 The nature of the proceeding 
1 The principal issue in this proceeding is whether the Victorian Arts Centre falls within the category of 'premises of State Government Departments and Instrumentalities', for the purposes of provisions in industrial awards relating to rates of payment for persons employed in cleaning those premises.</sentence>
<sentence id='s1'>In turn, this depends upon whether the Victorian Arts Centre Trust, a statutory corporation established by the Victorian Arts Centre Act 1979 (Vic) ('the VAC Act'), is properly described as a State Government department or instrumentality, for the purposes of the award provisions.</sentence>
;

我应该提取<sentence id='s0'></sentence><sentence id='s1'></sentence>之间的字符串。我的意思是结果应该是这样的:

The nature of the proceeding 
     1 The principal issue in this proceeding is whether the Victorian Arts Centre falls within the category of 'premises of State Government Departments and Instrumentalities', for the purposes of provisions in industrial awards relating to rates of payment for persons employed in cleaning those premises.
In turn, this depends upon whether the Victorian Arts Centre Trust, a statutory corporation established by the Victorian Arts Centre Act 1979 (Vic) ('the VAC Act'), is properly described as a State Government department or instrumentality, for the purposes of the award provisions.

我发现了这样的东西:

Regex.Match("User name (sales)", @"'(([^)]*)')").Groups[1].Value

使用Regex,但它不起作用。你能给我一个快速的解决方案吗?

获取<>;里面有动态数字

使用LINQ到XML应该更容易:

var res = XElement.Parse(xml)
                  .Descendants("sentence").Where(e => e.Attribute("id").Value == "s0")
                  .FirstOrDefault().Value;

或者,正如Yeldar所建议的,更清洁的方式是:

var s0 = XElement.Parse(xml)
                 .Descendants("sentence").FirstOrDefault(e => e.Attribute("id").Value == "s0")
                 .Value;

XEment.Parse仅在具有单个根节点的String中使用。您编写的实例有两个节点",但没有一个根节点。您可以添加如下根节点:

xml = "<root>" + xml + "</root>";