从文件中读取部分内容的正确方法是什么?
本文关键字:方法 是什么 文件 读取部 | 更新日期: 2023-09-27 18:18:42
我想从一个文本文件中读取部分内容,该文件有以下内容和换行符:
Content-Type: multipart/signed;
boundary="boundarytext"
--<boundarytext>
Content-Type: text/plain
{1:A}{2:B N}{4:
:10:C123
:11:124
:43E:
test.txt
2010-03-20/09:37:45
Accepted
-}
--<boundarytext>
我期待以下内容:
{1:A}{2:B N}{4:
:10:C123
:11:124
:43E:
test.txt
2010-03-20/09:37:45
Accepted
-}
当前正在逐行读取文件。
是否有最好的方法来达到我的预期结果?
这是我使用的代码,
using (var fileRead = new StreamReader(@"c:'temp'testfile.txt"))
{
var blockIdentifier = "{1:";
var textBlockIdentifier = "-}";
var fileContent = fileRead.ReadToEnd();
var startPos = fileContent.LastIndexOf(blockIdentifier);
var length = (fileContent.IndexOf(textBlockIdentifier) + 2) - startPos;
var newContent = fileContent.Substring(startPos, length);
}
谢谢
你可以使用LINQ:
string[] relevantLines = File.ReadLines(path)
.SkipWhile(l => !l.StartsWith("--<boundarytext>"))
.Skip(3)
.TakeWhile(l => !l.StartsWith("--<boundarytext>"))
.ToArray();
似乎boundarytext
总是在变化,所以你需要先检测它:
string boundaryTextLine = File.ReadLines(path)
.FirstOrDefault(l => l.IndexOf("boundary=", StringComparison.InvariantCultureIgnoreCase) >= 0);
if(boundaryTextLine != null)
{
string boundaryText = boundaryTextLine
.Substring(boundaryTextLine.IndexOf("boundary=", StringComparison.InvariantCultureIgnoreCase) + "boundary=".Length)
.Trim(' ', '"');
}
我个人会使用正则表达式
/(?<='-'-<boundarytext>)'n(.+)'n(?='-'-<boundarytext>)/gsU
见:https://regex101.com/r/rB1tC8/1
我甚至会解析消息中的content_type。为方便起见,这里有一个c#
的工作测试用例 [TestMethod]
public void TestMethod1()
{
var input = @"Content-Type: multipart/signed;
boundary='boundarytext1'
--<boundarytext1>
Content-Type: text/plain
{1:A}{2:B N}{4:
:10:C123
:11:124
:43E:
test.txt
2010-03-20/09:37:45
Accepted
-}
--<boundarytext1>
Content-Type: multipart/signed;
boundary='boundarytext2'
--<boundarytext2>
Content-Type: text/plain
{1:A}{2:B N}{4:
:10:C123
:11:124
:43E:
test.txt
2010-03-20/09:37:45
Accepted
-}
--<boundarytext2>".Replace("'", "'"");
var pattern = @"
boundary='(.+)' # capture the boundary delimiter in '1
.+
(--<'1>) # every message starts with --<boundary>
.+
Content-Type:'s
(?<content_type>['w/]+) # capture the content_type
('r?'n)+
(?<content>.+?) # capture the content
('r?'n)+
(--<'1>) # every message ends with --<boundary>
".Replace("'", "'"");
var regex = new Regex(pattern,
RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace);
var matches = regex.Matches(input);
Assert.AreEqual(2, matches.Count);
foreach (Match match in matches)
{
var content_type = match.Groups["content_type"].Value;
var content = match.Groups["content"].Value;
Assert.AreEqual("text/plain", content_type);
Assert.IsTrue(content.StartsWith("{1") && content.EndsWith("-}"));
}
}
}