读取小端UTF-16编码的字符串;与主题

本文关键字：字符串 UTF-16 编码读取 | 更新日期: 2023-09-27 18:16:03

我遵循这个文件格式的规范:https://github.com/rouault/dump_gdbtable/wiki/FGDB-Spec

utf16: string in little-endian UTF-16 encoding

我怎么读这个?我尝试了BinaryReader.ReadString()，但是它返回的内容如下:

"'0e'0y'0w'0o'0r'0d'0'0 '0'0'0'0'rP'0a'0r'0a'0m'0e'0t'0e'0r'0N'0a'0m'0e'0'0 '0'0'0'0'fC'0o'0n'0f'0i'0g'0S'0t'0r'0"

那肯定是不对的。

从规格:

ubyte: number of UTF-16 characters (not bytes) of the name of the field
utf16: name of the field
ubyte: number of UTF-16 characters (not bytes) of the alias of the field. Might be 0
utf16: alias of the field (ommitted if previous field is 0)
ubyte: field type ( 0 = int16, 1 = int32, 2 = float32, 3 = float64, 4 = string, 5 = datetime, 6 = objectid, 7 = geometry, 8 = binary, 9=raster, 10/11 = UUID, 12 = XML )

我能以某种方式使用UTF-16字符的数量来读取字段的名称吗?

读取小端UTF-16编码的字符串;与主题

BinaryReader的ReadString()方法不提供重载，您可以指定字符串长度(相反，它假设一个编码的前缀长度，这与您链接的规范的格式不匹配)。

因此，不能直接使用ReadString()，但可以

使用ReadByte()获取字符串(字符)长度，
乘以2，
使用ReadBytes(count)，
使用Encoding.Unicode.GetString(bytes) .

应该是:

BinaryReader br = new BinaryReader(File.Open("C:''florida.gdb''a00000002.gdbtable",
                                   FileMode.Open,
                                   FileAccess.Read,
                                   FileShare.Read | FileShare.Delete),
                      Encoding.Unicode);

式中Encoding为System.Text.Encoding。

由于各种历史原因，Microsoft/Windows将UTF-16(特别是小端变体)称为"Unicode"，而不是UTF-16。