用Regex和C#解析HTML
本文关键字:解析 HTML Regex | 更新日期: 2023-09-27 18:00:29
我有这样的HTML代码:
<tr class="discussion r0"><td class="topic starter"><a href="SITE?d=6638">Test di matematica</a></td>
我只需要选择"Test di matematica",我想用正则表达式来做这件事。我尝试过:
string pattern= "<tr class='"discussion r0'"><td class='"topic starter'"><a href='"" + site + "=d{1,4}'"" + ">''s*(.+?)''s*</a></td>";
但它不起作用。。在表达式之后和其他表达式之前选择单词时,我能做些什么?
编辑:你能告诉我如何使用HTMLAgility来解析这个字符串吗?谢谢
此正则表达式确保我们捕获的文本位于<a
标记内,该标记位于<td
标记内,<tr
标记内。
using System;
using System.Text.RegularExpressions;
class Program {
static void Main() {
string s1 = "<tr class='"discussion r0'"><td class='"topic starter'"><a href='"SITE?d=6638'">Test di matematica</a></td>";
var r = new Regex(@"(?i)<tr[^>]*?>'s*<td[^>]*?>'s*<a[^>]*?>([^<]*)<", RegexOptions.IgnoreCase);
string capture = r.Match(s1).Groups[1].Value;
Console.WriteLine(capture);
Console.WriteLine("'nPress Any Key to Exit.");
Console.ReadKey();
} // END Main
} // END Program
输出:Test di matematica
试试这个:
string myString = "<tr class='"discussion r0'"><td class='"topic starter'"><a href='"SITE?d=6638'">Test di matematica</a></td>";
Regex rx = new Regex(@"<a.*?>(.*?)</a>");
MatchCollection matches = rx.Matches(myString);
if (matches.Count > 0)
{
Match match = matches[0]; // only one match in this case
GroupCollection groupCollection = match.Groups;
Console.WriteLine( groupCollection[1].ToString());
}
DEMO
http://ideone.com/nFY6aw