如何将Javascript Unicode解码为C#字符串

本文关键字:字符串 解码 Unicode Javascript | 更新日期: 2023-09-27 18:16:03

例如,我们在谷歌自动搜索中得到的JSON回调:

window.google.td && window.google.td('tljp1322487273527014', 4,{e:"HY7TTtmRFZPe8QPCvf30Dw",c:1,u:"http://www.google.co.uk/s?hl'x3den'x26cp'x3d5'x26gs_id'x3d17'x26xhr'x3dt'x26q'x3dowasp'x26pf'x3dp'x26sclient'x3dpsy-ab'x26source'x3dhp'x26pbx'x3d1'x26oq'x3d'x26aq'x3d'x26aqi'x3d'x26aql'x3d'x26gs_sm'x3d'x26gs_upl'x3d'x26bav'x3don.2,or.r_gc.r_pw.,cf.osb'x26fp'x3dbd20912ccdf288ab'x26biw'x3d387'x26bih'x3d362'x26tch'x3d4'x26ech'x3d15'x26psi'x3d5o3TTqCqCsnD0QXA7sUI.1322487273527.1'x26wrapid'x3dtljp1322487273527014",d:"['x22owasp'x22,[['x22owasp'x22,0,'x220'x22],['x22owasp''u003Cb''u003E top 10''u003C''/b''u003E'x22,0,'x221'x22],['x22owasp''u003Cb''u003E top 10 2011''u003C''/b''u003E'x22,0,'x222'x22],['x22owasp''u003Cb''u003E zap''u003C''/b''u003E'x22,0,'x223'x22]],{'x22j'x22:'x2217'x22}]"});window.google.td && window.google.td('tljp1322487273527014', 4,{e:"HY7TTtmRFZPe8QPCvf30Dw",c:0,u:"http://www.google.co.uk/s?hl'x3den'x26cp'x3d5'x26gs_id'x3d17'x26xhr'x3dt'x26q'x3dowasp'x26pf'x3dp'x26sclient'x3dpsy-ab'x26source'x3dhp'x26pbx'x3d1'x26oq'x3d'x26aq'x3d'x26aqi'x3d'x26aql'x3d'x26gs_sm'x3d'x26gs_upl'x3d'x26bav'x3don.2,or.r_gc.r_pw.,cf.osb'x26fp'x3dbd20912ccdf288ab'x26biw'x3d387'x26bih'x3d362'x26tch'x3d4'x26ech'x3d15'x26psi'x3d5o3TTqCqCsnD0QXA7sUI.1322487273527.1'x26wrapid'x3dtljp1322487273527014",d:""});

更具体地说,如何从:

"'x22te''u003Cb''u003Esco''u003C''/b''u003E'x22,0,'x220'x22"

"te'u003Cb'u003Esco'u003C'/b'u003E",0,"0"

"te<b>sco</b>"

请注意,System.Web UrlDecode和HtmlDecode无法处理此问题。

有趣的是,AntiX几乎反其道而行之,因为它可以从:

"te<b>sco</b>"

te'00003Cb'00003Esco'00003C'00002Fb'00003E

安全角度

这些解码具有许多安全隐患,因为它们将由浏览器呈现。例如,如果在Javascript/jQuery中,我们有一个带有有效载荷的变量

var xss = "te'u003Cscript'u003Ealert'u002812'u0029'u003C'u002Fscript'u003E"

如果分配给div的html ,将被触发

$("#header").html(xss)

如何将Javascript Unicode解码为C#字符串

'x....

WTF?''这没关系。根据之前的回答:

string str = @"P'u003e'u003cp'u003e Notes 'u003cstrong'u003e Разработчик: 'u003c/STRONG'u003e 'u003cbr /'u003eЕсли игра Безразлично";
Regex regex = new Regex(@"''u([0-9a-z]{4})",RegexOptions.IgnoreCase);
str = regex.Replace(str, match => char.ConvertFromUtf32(Int32.Parse(match.Groups[1].Value , System.Globalization.NumberStyles.HexNumber)));

"'x22te''u003Cb''u003Esco''u003C''/b''u003E'x22,0,'x220'x22"似乎是十六进制编码的,没有任何可用的东西可以开箱即用地解码此字符串,但以下内容应该有效:

var regex = new Regex(@"''x([a-fA-F0-9]{2})");
var replaced = regex.Replace(input, match => char.ConvertFromUtf32(Int32.Parse(match.Groups[1].Value, System.Globalization.NumberStyles.HexNumber)));