解析文本文件并删除双引号中的逗号

本文关键字:删除 文本 文件 | 更新日期: 2023-09-27 18:29:19

我有一个文本文件需要转换为csv文件。我的计划是:

  • 逐行分析文件
  • 搜索并用空格替换双引号内的逗号
  • 然后删除所有双引号
  • 将该行附加到新的csv文件

问题:我需要一个函数来识别双引号内的逗号并替换它。

这是一条样品线:

"MRS Brown","4611 BEAUMONT ST",","WARRIOR RUN,PA"

解析文本文件并删除双引号中的逗号

您的文件似乎已经是CSV投诉格式。任何优秀的CSV阅读器都能正确阅读。

如果您的问题只是正确地读取字段值,那么您需要以正确的方式读取它。

这里有一种方法:

using Microsoft.VisualBasic.FileIO; 

    private void button1_Click(object sender, EventArgs e)
    {
        TextFieldParser tfp = new TextFieldParser("C:''Temp''Test.csv");
        tfp.Delimiters = new string[] { "," };
        tfp.HasFieldsEnclosedInQuotes = true;
        while (!tfp.EndOfData)
        {
            string[] fields = tfp.ReadFields();
            // do whatever you want to do with the fields now...
            // e.g. remove the commas and double-quotes from the fields.
            for (int i = 0; i < fields.Length;i++ )
            {
                fields[i] = fields[i].Replace(","," ").Replace("'"","");
            }
            // this is to show what we got as the output
            textBox1.AppendText(String.Join("'t", fields) + "'n");
        }
        tfp.Close();
    }

编辑:

我刚刚注意到这个问题已经在C#、VB.NET-2010下提交了。这是VB.NET版本,以防您使用VB.进行编码

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
    Dim tfp As New FileIO.TextFieldParser("C:'Temp'Test.csv")
    tfp.Delimiters = New String() {","}
    tfp.HasFieldsEnclosedInQuotes = True
    While Not tfp.EndOfData
        Dim fields() As String = tfp.ReadFields
        '' do whatever you want to do with the fields now...
        '' e.g. remove the commas and double-quotes from the fields.
        For i As Integer = 0 To fields.Length - 1
            fields(i) = fields(i).Replace(",", " ").Replace("""", "")
        Next
        '' this is to show what we got as the output
        TextBox1.AppendText(Join(fields, vbTab) & vbCrLf)
    End While
    tfp.Close()
End Sub

感谢VB中的Baz,The Glockster Answer,我刚刚在C#中转换了它,它运行得很好。有了这些代码,您就不需要任何第三方解析器了。

string line = reader.ReadLine();                    
line = ParseCommasInQuotes(line);
private string ParseCommasInQuotes(string arg)
{
  bool foundEndQuote = false;
  bool foundStartQuote = false;
  StringBuilder output = new StringBuilder();
  //44 = comma
  //34 = double quote
  foreach (char element in arg)
  {
    if (foundEndQuote)
    {
      foundStartQuote = false;
      foundEndQuote = false;
    }
    if (element.Equals((Char)34) & (!foundEndQuote) & foundStartQuote)
    {
      foundEndQuote = true;
      continue;
    }
    if (element.Equals((Char)34) & !foundStartQuote)
    {
      foundStartQuote = true;
      continue;
    }
    if ((element.Equals((Char)44) & foundStartQuote))
    {
      //skip the comma...its between double quotes
    }
    else
    {
      output.Append(element);
    }
  }
  return output.ToString();
}

这里有一个简单的函数,它将删除嵌入字符串中两个双引号之间的逗号。您可以传入一个长字符串,该字符串多次出现"abc,123"、10/13/12、"some description"。。。等等。它还将删除双引号。

Private Function ParseCommasInQuotes(ByVal arg As String) As String
    Dim foundEndQuote As Boolean = False
    Dim foundStartQuote As Boolean = False
    Dim output As New StringBuilder()
    '44 = comma
    '34 = double quote
    For Each element As Char In arg
        If foundEndQuote Then
            foundStartQuote = False
            foundEndQuote = False
        End If
        If element.Equals(Chr(34)) And (Not foundEndQuote) And foundStartQuote Then
            foundEndQuote = True
            Continue For
        End If

        If element.Equals(Chr(34)) And Not foundStartQuote Then
            foundStartQuote = True
            Continue For
        End If

        If (element.Equals(Chr(44)) And foundStartQuote) Then
            'skip the comma...its between double quotes
        Else
            output.Append(element)
        End If
    Next
    Return output.ToString()
End Function

我以前不理解你的问题。现在我确信我做对了:

TextFieldParser parser = new TextFieldParser(@"c:'file.csv");
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
while (!parser.EndOfData) 
{
    //Processing row
    string[] fields = parser.ReadFields();
    foreach (string field in fields) 
    {
        //TODO: Do whatever you need
    }
}
parser.Close();
var result = Regex.Replace(input,
                           @"[^'""]([^'""])*[^'""]", 
                           m => m.Value.Replace(",", " ") );

听起来你所描述的内容最终不会变成csv文件,但为了回答你的问题,我会这么做。

首先,您需要将文本文件转换为一些可用的代码,这些代码可以像这样循环:

    public static List<String> GetTextListFromDiskFile(String fileName)
    {
        List<String> list = new List<String>();
        try
        {
            //load the file into the streamreader 
            System.IO.StreamReader sr = new System.IO.StreamReader(fileName);
            //loop through each line of the file
            while (sr.Peek() >= 0)
            {
                list.Add(sr.ReadLine());
            }
            sr.Close();
        }
        catch (Exception ex)
        {
            list.Add("Error: Could not read file from disk. Original error: " + ex.Message);
        }
        return list;
    }

然后循环浏览列表,并使用一个简单的foreach循环,在列表上运行replace,如下所示:

        foreach (String item in list)
        {
            String x = item.Replace("'",'"", "'" '"");
            x = x.Replace("'"", "");
        }

完成此操作后,您需要逐行创建csv文件。我会再次使用StringBuilder,然后只需执行sb.AppendLine(x)来创建将成为文本文件的字符串,然后使用类似的方法将其写入磁盘。

    public static void SaveFileToDisk(String filePathName, String fileText)
    {
        using (StreamWriter outfile = new StreamWriter(filePathName))
        {
            outfile.Write(fileText);
        }
    }

这对我很有效。希望它能帮助其他人。

Private Sub Command1_Click()
Open "c:''dir'file.csv" For Input As #1
Open "c:''dir'file2.csv" For Output As #2
Do Until EOF(1)
Line Input #1, test$
99
c = InStr(test$, """""")
If c > 0 Then
test$ = Left$(test$, c - 1) + Right$(test$, Len(test$) - (c + 1))
GoTo 99
End If
Print #2, test$
Loop
End Sub

在您开始逐行处理之前,我会先完成所有操作。另外,请查看CsvHelper。它既快捷又简单。只需将结果放入TextReader,然后将其传递给CvsReader即可。

这是您的逗号(双引号),然后是随后的双引号剥离器。

        using (TextReader reader = File.OpenText(file))
        {
            // remove commas and double quotes inside file
            var pattern = @"'""(.+?,.+)+'""";
            var results = Regex.Replace(reader.ReadToEnd(), pattern, match => match.Value.Replace(",", " "));
            results = results.Replace("'"", "");
         }