使用正则表达式从SQL查询中提取表和列
本文关键字:提取 查询 正则表达式 SQL | 更新日期: 2023-09-27 18:15:52
我试图为这个任务创建一个正则表达式,但除了非常简单的情况外,我真的无法掌握正则表达式的理解:-(:
问题:我有这个("SQL like")查询:SELECT tcmcs003.*, tccom130.nama, tccom705.dsca, tcmcs052.dsca, tccom100.nama
FROM tcmcs003, tccom130,tccom705,tcmcs052,tccom100
WHERE tcmcs003.cadr REFERS TO tccom130
AND tcmcs003.casi REFERS TO tccom705
AND tcmcs003.cprj REFERS TO tcmcs052
AND tcmcs003.bpid REFERS TO tccom100
ORDER BY tcmcs003._index1
我想"提取"所有的表名和列名,然后我想简单地添加我的字符给他们…例如:
SELECT tcmcs003.*, tccom130.nama
:
SELECT tcmcs003XXX.*, tccom130XXX.namaYYY
到目前为止,我拥有的"最好的"正则表达式是:
(?<gselect>SELECT's+)*(?<tname>'w{5}'d{3})*(?<spaces>['.','s])+(?<colname>'w{4})*
和替换模式:
${gselect}${tname}XXX${spaces}${colname}YYY
输出真的很糟糕:-(
)SELECT tcmcs003.
m130
.nama
m705
.dsca
s052
.dsca
m100
.nama
FROM
s003
m130
,m705
,s052
,m100
WHER
s003
.cadr
REFE
m130
s003
如何写正则表达式?
我想重复捕捉一些东西,比如
[(any string)(table name)('.a dot or not)(column name)(any string) ] (repeat N times)
编辑
我是用c#写的
模式应该更通用一点:' b (tc (?: mcs | com) ' d {3} XXX。' w +) ' b
表示表名是5个字符(第一个总是t
,后面跟着4个随机字符),后面跟着3个随机数字
我不尝试匹配整个命令,而是简单地独立匹配每个表或列。由于表的名称中有数字,因此它几乎不可能匹配其他内容。
-
用:
匹配列名'b(t'w{4}'d{3}'.'w{4})'b
-
用:
匹配表名'b(t'w{4}'d{3})'b
然后,我们可以将它们分别替换为所需的值:"$1YYY"
和"$1XXX"
。模式使用这些结构:
-
'b
匹配字边界(一边是字字符,另一边不是字字符)。 -
'w{4}
匹配4个字符([A-Za-z0-9_]
)。 -
'd{3}
匹配3位数字([0-9]
)。
代码:
string input = @"SELECT tcmcs003.*, tccom130.nama, tccom705.dsca, tcmcs052.dsca, tccom100.nama
FROM tcmcs003, tccom130,tccom705,tcmcs052,tccom100
WHERE tcmcs003.cadr REFERS TO tccom130
AND tcmcs003.casi REFERS TO tccom705
AND tcmcs003.cprj REFERS TO tcmcs052
AND tcmcs003.bpid REFERS TO tccom100
ORDER BY tcmcs003._index1";
string Pattern1 = @"'b(t'w{4}'d{3}'.'w{4})'b";
string Pattern2 = @"'b(t'w{4}'d{3})'b";
Regex r1 = new Regex(Pattern1);
Regex r2 = new Regex(Pattern2);
string replacement1 = "YYY";
string replacement2 = "XXX";
string result = "";
result = r1.Replace(input, "$1" + replacement1);
result = r2.Replace(result, "$1" + replacement2);
Console.WriteLine(result);
<<p> ideone演示/kbd>