用于提取此 HTML 文件的一部分的正则表达式
本文关键字:一部分 正则表达式 文件 HTML 提取 用于 | 更新日期: 2023-09-27 18:36:42
我有一个带有此脚本的 html 页面:
flashembed("header_container",
{"src": "http://****.swf?__cv=3cfd4cc0ac1fad53803ff73629e93d00",
"version": [8,0],
"expressInstall": "http://****.swf?__cv=87411ea96ce42429f52b28683e7af400",
"width": 860,"height": 229,"wmode": "opaque","id": "flashHeader",
"onFail":
function(){onFailFlashembed();}},
{"cdn": "http://*****/","nosid": "1","lol": "89,04","isGuestUser": "",
"navPoint": "1","eventItemEnabled": "",
"supporturl":"indexInternal.es%3Faction%3Dsupport%26back%3DinternalStart",
"***ouser***": "817",
"serverdesc": "Italia 3","server_code": "1","lang": "it","coBrandImgUrl": "",
"coBrandHref": "","customSkinURL": "","messaging": "1"});
hackEmailInviteDialog();
jQuery('#emailInviteCloseButton').click(function()
{
.....
}
我需要从此页面中提取字段"ouser"。我尝试过:
string pattern= @"""ouser"": "".*?,""serverdesc""";
string output = Regex.Replace(ConnectionAPI.responseFromServer, pattern, "");
但是在输出中有整个页面...
更新了正则表达式以匹配第二对引号之间的任何内容,以防它们并不总是数字。
Match match =
Regex.Match(
ConnectionAPI.responseFromServer,
"'"''**?ouser''**?":''s*'"([^'"]*)'",",
RegexOptions.IgnoreCase);
String output = String.Empty;
// Here we check the Match instance.
if (match.Success)
{
// Finally, we get the Group value and display it.
output = match.Groups[1].Value;
Console.WriteLine(output);
}
"'**?ouser'**?":'s*"('d'w+)
组 1 与该文档中的817
匹配。在这里玩正则表达式。
不过,如果你在任意标签上做大量的HTML解析,你会更好地使用SAX或DOM解析器。Andrew Finnell也提到了使用JSON或WebKit。
正如merlin2011所提到的,Regex.Replace
会取代你想要拉出来的东西,而不是为你抓住它。