获取.net 1.1中的Encoding列表

本文关键字:列表 Encoding 中的 获取 net | 更新日期: 2023-09-27 18:07:36

我需要检索支持的编码列表,但我使用。. NET 1.1,因此以下调用不可用:

using System;
using System.Text;
public class SamplesEncoding
{
   public static void Main()
   {
      // For every encoding, get the property values.
      foreach( EncodingInfo ei in Encoding.GetEncodings() )
      {
         Encoding e = ei.GetEncoding();
         Console.Write("{0,-6} {1,-25} ", ei.CodePage, ei.Name);
         Console.Write("{0,-8} {1,-8} ", e.IsBrowserDisplay, e.IsBrowserSave);
         Console.Write("{0,-8} {1,-8} ", e.IsMailNewsDisplay, e.IsMailNewsSave);
         Console.WriteLine("{0,-8} {1,-8} ", e.IsSingleByte, e.IsReadOnly);
      }
   }
}

调用Encoding.GetEncodings()在。net 1.1中不可用。你知道有什么别的方法可以得到那张清单吗?

获取.net 1.1中的Encoding列表

很简单:. net 1.1是"fixed":它不会改变。您使用2.0的编码并测试它们是否已经出现在1.1中。例如:

string[] encs = new string[] {
    "IBM037", "IBM437", "IBM500", "ASMO-708", "DOS-720", "ibm737",
    "ibm775", "ibm850", "ibm852", "IBM855", "ibm857", "IBM00858",
    "IBM860", "ibm861", "DOS-862", "IBM863", "IBM864", "IBM865",
    "cp866", "ibm869", "IBM870", "windows-874", "cp875",
    "shift_jis", "gb2312", "ks_c_5601-1987", "big5", "IBM1026",
    "IBM01047", "IBM01140", "IBM01141", "IBM01142", "IBM01143",
    "IBM01144", "IBM01145", "IBM01146", "IBM01147", "IBM01148",
    "IBM01149", "utf-16", "utf-16BE", "windows-1250",
    "windows-1251", "Windows-1252", "windows-1253", "windows-1254",
    "windows-1255", "windows-1256", "windows-1257", "windows-1258",
    "Johab", "macintosh", "x-mac-japanese", "x-mac-chinesetrad",
    "x-mac-korean", "x-mac-arabic", "x-mac-hebrew", "x-mac-greek",
    "x-mac-cyrillic", "x-mac-chinesesimp", "x-mac-romanian",
    "x-mac-ukrainian", "x-mac-thai", "x-mac-ce", "x-mac-icelandic",
    "x-mac-turkish", "x-mac-croatian", "utf-32", "utf-32BE",
    "x-Chinese-CNS", "x-cp20001", "x-Chinese-Eten", "x-cp20003",
    "x-cp20004", "x-cp20005", "x-IA5", "x-IA5-German",
    "x-IA5-Swedish", "x-IA5-Norwegian", "us-ascii", "x-cp20261",
    "x-cp20269", "IBM273", "IBM277", "IBM278", "IBM280", "IBM284",
    "IBM285", "IBM290", "IBM297", "IBM420", "IBM423", "IBM424",
    "x-EBCDIC-KoreanExtended", "IBM-Thai", "koi8-r", "IBM871",
    "IBM880", "IBM905", "IBM00924", "EUC-JP", "x-cp20936",
    "x-cp20949", "cp1025", "koi8-u", "iso-8859-1", "iso-8859-2",
    "iso-8859-3", "iso-8859-4", "iso-8859-5", "iso-8859-6",
    "iso-8859-7", "iso-8859-8", "iso-8859-9", "iso-8859-13",
    "iso-8859-15", "x-Europa", "iso-8859-8-i", "iso-2022-jp",
    "csISO2022JP", "iso-2022-jp", "iso-2022-kr", "x-cp50227",
    "euc-jp", "EUC-CN", "euc-kr", "hz-gb-2312", "GB18030",
    "x-iscii-de", "x-iscii-be", "x-iscii-ta", "x-iscii-te",
    "x-iscii-as", "x-iscii-or", "x-iscii-ka", "x-iscii-ma",
    "x-iscii-gu", "x-iscii-pa", "utf-7", "utf-8"
};

foreach (string enc in encs)
{
    try {
        Encoding.GetEncoding(enc);
    } catch {
        Console.WriteLine("Missing {0}", enc);
    }
}

(是的,这是。net 4.0中存在的编码的完整列表…如果您需要它们的"数值"值,这将非常容易做到)。然后你把那些不起作用的从列表中删除。

生成它们(最少。net 2.0和c# 3.0):

var encs = Encoding.GetEncodings().Select(p => p.Name);
//var encs = Encoding.GetEncodings().Select(p => p.CodePage);
var sb = new StringBuilder("var encs = new[] {");
foreach (var enc in encs) {
    sb.Append(" '"" + enc + "'",");
    //sb.Append(" " + enc + ",");
}
sb.Length--;
sb.Append(" };");
var str = sb.ToString();
Console.WriteLine(str);

假设我没有完全误解这种情况(很有可能),如果您有一两分钟的等待时间,下面的操作应该可以工作。它真的很慢,更不用说有点丑了。

public static Encoding[] GetList()
{
    ArrayList arrayList;
    arrayList = new ArrayList();
    for ( int i = 0; i < 65535; i++ )
    {
        try
        {
            arrayList.Add( Encoding.GetEncoding( i ) );
        }
        catch(Exception ex)
        {
        }
    }
    return (Encoding[])arrayList.ToArray( typeof( Encoding ) );
}

文档说您应该使用Encoding.GetEnconding(int CodePage),其中CodePage应该是:

存在以下Windows代码页:

874 — Thai
932 — Japanese
936 — Chinese (simplified) (PRC, Singapore)
949 — Korean
950 — Chinese (traditional) (Taiwan, Hong Kong)
1200 — Unicode (BMP of ISO 10646, UTF-16LE)
1201 — Unicode (BMP of ISO 10646, UTF-16BE)
1250 — Latin (Central European languages)
1251 — Cyrillic
1252 — Latin (Western European languages)
1253 — Greek
1254 — Turkish
1255 — Hebrew
1256 — Arabic
1257 — Latin (Baltic languages)
1258 — Vietnamese
65000 — Unicode (BMP of ISO 10646, UTF-7)
65001 — Unicode (BMP of ISO 10646, UTF-8)

摘自维基百科