如何读取字节和字符串的混合文件
本文关键字:字符串 混合 文件 字节 何读取 读取 | 更新日期: 2023-09-27 18:05:57
我有一个混合文件,有很多字符串行和部分字节编码数据。例子:
--Begin Attach
Content-Info: /Format=TIF
Content-Description: 30085949.tif (TIF File)
Content-Transfer-Encoding: binary; Length=220096
II*II* Îh ÿÿÿÿÿÿü³küìpsMg›Êq™Æ™Ôd™‡–h7ÃAøAú áùõ=6?Eã½/ô|û ƒú7z:>„Çÿý<þ¯úýúßj?å¿þÇéöûþ“«ÿ¾ÁøKøÈ%ŠdOÿÞÈ<,Wþ‡ÿ·ƒïüúCÿß%Ï$sŸÿÃÿ÷‡þåiò>GÈù#ä|‘ò:#ä|Š":#¢:;ˆèŽˆèʤV‘ÑÑÑÑÑÑÑÑÑçIþ×o(¿zHDDDDDFp'.Ñ:ˆR:aAràÁ¬LˆÈù!ÿÿï[ÿ¯Äàiƒ"VƒDÇ)Ê6PáÈê$9C”9C†‡CD¡pE@¦œÖ{i~Úý¯kköDœ4ÉU”8`ƒt!l2G
--End Attach--
我尝试用streamreader读取文件:
string[] lines = System.IO.File.ReadAllLines(@"C:'Users'Davide'Desktop'20041230000D.xmm")
我逐行读取文件,当line等于"Content-Transfer-Encoding: binary;Length=220096",我读取所有下面的行,并写入一个"filename"(在本例中为30085949.tif)文件。但我正在阅读字符串,而不是字节数据和结果文件损坏(现在我尝试tiff文件)。有什么建议吗?
<解决方案/strong>谢谢你的回复。我采用了这个解决方案:我建立了一个LineReader扩展BinaryReader:
public class LineReader : BinaryReader
{
public LineReader(Stream stream, Encoding encoding)
: base(stream, encoding)
{
}
public int currentPos;
private StringBuilder stringBuffer;
public string ReadLine()
{
currentPos = 0;
char[] buf = new char[1];
stringBuffer = new StringBuilder();
bool lineEndFound = false;
while (base.Read(buf, 0, 1) > 0)
{
currentPos++;
if (buf[0] == Microsoft.VisualBasic.Strings.ChrW(10))
{
lineEndFound = true;
}
else
{
stringBuffer.Append(buf[0]);
}
if (lineEndFound)
{
return stringBuffer.ToString();
}
}
return stringBuffer.ToString();
}
}
其中Microsoft.VisualBasic.Strings.ChrW(10)是换行符。当我解析我的文件时:
using (LineReader b = new LineReader(File.OpenRead(path), Encoding.Default))
{
int pos = 0;
int length = (int)b.BaseStream.Length;
while (pos < length)
{
string line = b.ReadLine();
pos += (b.currentPos);
if (!beginNextPart)
{
if (line.StartsWith(BEGINATTACH))
{
beginNextPart = true;
}
}
else
{
if (line.StartsWith(ENDATTACH))
{
beginNextPart = false;
}
else
{
if (line.StartsWith("Content-Transfer-Encoding: binary; Length="))
{
attachLength = Convert.ToInt32(line.Replace("Content-Transfer-Encoding: binary; Length=", ""));
byte[] attachData = b.ReadBytes(attachLength);
pos += (attachLength);
ByteArrayToFile(@"C:'users'davide'desktop'files.tif", attachData);
}
}
}
}
}
我从文件中读取一个字节长度,并读取以下n个字节
这里的问题是StreamReader假设它是唯一读取文件的东西,因此它提前读取。最好的方法是将文件读取为二进制文件,并使用适当的文本编码从您自己的缓冲区中检索字符串数据。
显然你不介意把整个文件读入内存,你可以从:
开始byte[] buf = System.IO.File.ReadAllBytes(@"C:'Users'Davide'Desktop'20041230000D.xmm");
然后假设您的文本数据使用UTF-8:
int offset = 0;
int binaryLength = 0;
while (binaryLength == 0 && offset < buf.Length) {
var eolIdx = Array.IndexOf(offset, 13); // In a UTF-8 stream, byte 13 always represents newline
string line = System.Text.Encoding.UTF8.GetString(buf, offset, eolIdx - offset - 1);
// Process your line appropriately here, and set binaryLength if you expect binary data to follow
offset = eolIdx + 1;
}
// You don't necessarily need to copy binary data out, but just to show where it is:
var binary = new byte[binaryLength];
Buffer.BlockCopy(buf, offset, binary, 0, binaryLength);
如果您希望使用windows样式的行尾,您可能还需要使用line.TrimEnd(''r')
。