Regex删除超级链接中的一些文本
本文关键字:文本 删除 超级链接 Regex | 更新日期: 2023-09-27 18:08:47
click <a href="javascript:validate('http://www.google.com');">here</a> to open google.com
我需要将上面的句子替换为以下内容:
click <a href="http://www.google.com">here</a> to open google.com
请帮我用正则表达式在c#中做到这一点
Regex regex = new Regex ("href'='".+?'(.+)'",
RegexOptions.IgnoreCase);
MatchCollection matches = regex.Matches(text);
则需要提取组#1:
matches .Groups[1]
,这是你要分配的新值
给你:
正则表达式:
(?<=href'=")(javascript:validate'('(?<URL>[^"']*)'');)
代码:
string url = "click <a href='"javascript:validate('http://www.google.com');'">here</a> to open google.com";
Regex regex = new Regex("(?<=href''='")javascript:validate''('(?<URL>[^'"']*)''');");
string output = regex.Replace(url, "${URL}");
输出:click <a href="http://www.google.com">here</a> to open google.com
不需要正则表达式:
var s =
inputString.Replace(
"javascript:validate('http://www.google.com');",
"http://www.google.com" );
htmllagilitypack: http://htmlagilitypack.codeplex.com
这是解析HTML的首选方法。
像Austin建议的那样解析HTML是一种更有效的方法,但是如果您绝对必须使用REGEX,请尝试这样做(来自MSDN system . text . regulareexpression命名空间):
using System;
using System.Text.RegularExpressions;
class MyClass
{
static void Main(string[] args)
{
string pattern = @"<a href='"[^'(]*'('([^']+)'');'">";
Regex r = new Regex(pattern, RegexOptions.IgnoreCase);
string sInput = "click <a href='"javascript:validate('http://www.google.com');'">here</a> to open google.com";
MyClass c = new MyClass();
// Assign the replace method to the MatchEvaluator delegate.
MatchEvaluator myEvaluator = new MatchEvaluator(c.ReplaceCC);
// Write out the original string.
Console.WriteLine(sInput);
// Replace matched characters using the delegate method.
sInput = r.Replace(sInput, myEvaluator);
// Write out the modified string.
Console.WriteLine(sInput);
}
// Replace each Regex cc match
public string ReplaceCC(Match m)
{
return "click <a href='"" + m.Group[0] + "'">";
}
}