解析文本文件并删除双引号中的逗号
本文关键字:删除 文本 文件 | 更新日期: 2023-09-27 18:29:19
我有一个文本文件需要转换为csv文件。我的计划是:
- 逐行分析文件
- 搜索并用空格替换双引号内的逗号
- 然后删除所有双引号
- 将该行附加到新的csv文件
问题:我需要一个函数来识别双引号内的逗号并替换它。
这是一条样品线:
"MRS Brown","4611 BEAUMONT ST",","WARRIOR RUN,PA"
您的文件似乎已经是CSV投诉格式。任何优秀的CSV阅读器都能正确阅读。
如果您的问题只是正确地读取字段值,那么您需要以正确的方式读取它。
这里有一种方法:
using Microsoft.VisualBasic.FileIO;
private void button1_Click(object sender, EventArgs e)
{
TextFieldParser tfp = new TextFieldParser("C:''Temp''Test.csv");
tfp.Delimiters = new string[] { "," };
tfp.HasFieldsEnclosedInQuotes = true;
while (!tfp.EndOfData)
{
string[] fields = tfp.ReadFields();
// do whatever you want to do with the fields now...
// e.g. remove the commas and double-quotes from the fields.
for (int i = 0; i < fields.Length;i++ )
{
fields[i] = fields[i].Replace(","," ").Replace("'"","");
}
// this is to show what we got as the output
textBox1.AppendText(String.Join("'t", fields) + "'n");
}
tfp.Close();
}
编辑:
我刚刚注意到这个问题已经在C#、VB.NET-2010下提交了。这是VB.NET版本,以防您使用VB.进行编码
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
Dim tfp As New FileIO.TextFieldParser("C:'Temp'Test.csv")
tfp.Delimiters = New String() {","}
tfp.HasFieldsEnclosedInQuotes = True
While Not tfp.EndOfData
Dim fields() As String = tfp.ReadFields
'' do whatever you want to do with the fields now...
'' e.g. remove the commas and double-quotes from the fields.
For i As Integer = 0 To fields.Length - 1
fields(i) = fields(i).Replace(",", " ").Replace("""", "")
Next
'' this is to show what we got as the output
TextBox1.AppendText(Join(fields, vbTab) & vbCrLf)
End While
tfp.Close()
End Sub
感谢VB中的Baz,The Glockster Answer,我刚刚在C#中转换了它,它运行得很好。有了这些代码,您就不需要任何第三方解析器了。
string line = reader.ReadLine();
line = ParseCommasInQuotes(line);
private string ParseCommasInQuotes(string arg)
{
bool foundEndQuote = false;
bool foundStartQuote = false;
StringBuilder output = new StringBuilder();
//44 = comma
//34 = double quote
foreach (char element in arg)
{
if (foundEndQuote)
{
foundStartQuote = false;
foundEndQuote = false;
}
if (element.Equals((Char)34) & (!foundEndQuote) & foundStartQuote)
{
foundEndQuote = true;
continue;
}
if (element.Equals((Char)34) & !foundStartQuote)
{
foundStartQuote = true;
continue;
}
if ((element.Equals((Char)44) & foundStartQuote))
{
//skip the comma...its between double quotes
}
else
{
output.Append(element);
}
}
return output.ToString();
}
这里有一个简单的函数,它将删除嵌入字符串中两个双引号之间的逗号。您可以传入一个长字符串,该字符串多次出现"abc,123"、10/13/12、"some description"。。。等等。它还将删除双引号。
Private Function ParseCommasInQuotes(ByVal arg As String) As String
Dim foundEndQuote As Boolean = False
Dim foundStartQuote As Boolean = False
Dim output As New StringBuilder()
'44 = comma
'34 = double quote
For Each element As Char In arg
If foundEndQuote Then
foundStartQuote = False
foundEndQuote = False
End If
If element.Equals(Chr(34)) And (Not foundEndQuote) And foundStartQuote Then
foundEndQuote = True
Continue For
End If
If element.Equals(Chr(34)) And Not foundStartQuote Then
foundStartQuote = True
Continue For
End If
If (element.Equals(Chr(44)) And foundStartQuote) Then
'skip the comma...its between double quotes
Else
output.Append(element)
End If
Next
Return output.ToString()
End Function
我以前不理解你的问题。现在我确信我做对了:
TextFieldParser parser = new TextFieldParser(@"c:'file.csv");
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
while (!parser.EndOfData)
{
//Processing row
string[] fields = parser.ReadFields();
foreach (string field in fields)
{
//TODO: Do whatever you need
}
}
parser.Close();
var result = Regex.Replace(input,
@"[^'""]([^'""])*[^'""]",
m => m.Value.Replace(",", " ") );
听起来你所描述的内容最终不会变成csv文件,但为了回答你的问题,我会这么做。
首先,您需要将文本文件转换为一些可用的代码,这些代码可以像这样循环:
public static List<String> GetTextListFromDiskFile(String fileName)
{
List<String> list = new List<String>();
try
{
//load the file into the streamreader
System.IO.StreamReader sr = new System.IO.StreamReader(fileName);
//loop through each line of the file
while (sr.Peek() >= 0)
{
list.Add(sr.ReadLine());
}
sr.Close();
}
catch (Exception ex)
{
list.Add("Error: Could not read file from disk. Original error: " + ex.Message);
}
return list;
}
然后循环浏览列表,并使用一个简单的foreach循环,在列表上运行replace,如下所示:
foreach (String item in list)
{
String x = item.Replace("'",'"", "'" '"");
x = x.Replace("'"", "");
}
完成此操作后,您需要逐行创建csv文件。我会再次使用StringBuilder,然后只需执行sb.AppendLine(x)来创建将成为文本文件的字符串,然后使用类似的方法将其写入磁盘。
public static void SaveFileToDisk(String filePathName, String fileText)
{
using (StreamWriter outfile = new StreamWriter(filePathName))
{
outfile.Write(fileText);
}
}
这对我很有效。希望它能帮助其他人。
Private Sub Command1_Click()
Open "c:''dir'file.csv" For Input As #1
Open "c:''dir'file2.csv" For Output As #2
Do Until EOF(1)
Line Input #1, test$
99
c = InStr(test$, """""")
If c > 0 Then
test$ = Left$(test$, c - 1) + Right$(test$, Len(test$) - (c + 1))
GoTo 99
End If
Print #2, test$
Loop
End Sub
在您开始逐行处理之前,我会先完成所有操作。另外,请查看CsvHelper。它既快捷又简单。只需将结果放入TextReader,然后将其传递给CvsReader即可。
这是您的逗号(双引号),然后是随后的双引号剥离器。
using (TextReader reader = File.OpenText(file))
{
// remove commas and double quotes inside file
var pattern = @"'""(.+?,.+)+'""";
var results = Regex.Replace(reader.ReadToEnd(), pattern, match => match.Value.Replace(",", " "));
results = results.Replace("'"", "");
}