从文件读取时Regex拆分

本文关键字:Regex 拆分 读取 文件 | 更新日期: 2023-09-27 18:06:16

我有一个文本文件,正在逐行读取。

我想用","分隔一行。

但我想跳过引号"中的逗号。

我尝试过使用regex,但它不能正常工作。

如何做到。

文件内容为

"Mobile","Custom1","Custom2","Custom3","First Name"
"61402818083","service","in Portsmith","is","First Name"
"61402818083","service","in Parramatta Park","is","First Name"
"61402818083","services","in postcodes 3000, 4000","are","First Name"
"61402818083","services","in postcodes 3000, 4000, 5000","are","First Name"
"61402818083","services",,"are","First Name"

正则表达式如下

,(?=([^'"]*'"[^'"]*'")*[^'"]*$)

此正则表达式为第5行输出以下内容

"61402818083"
,"First Name"
"services"
,"First Name"
"in postcodes 3000, 4000, 5000"
,"First Name"
"are"
"First Name"
"First Name"

结果应如下

"61402818083"
"services"
"in postcodes 3000, 4000, 5000"
"are"
"First Name"

从文件读取时Regex拆分

不要重新发明轮子。似乎您正在尝试解析逗号分隔的文件(即使文件扩展名与csv不同(。试试这个。

using (TextFieldParser reader = new TextFieldParser(@"c:'yourpath'file.csv"))
{
    reader.TextFieldType = FieldType.Delimited;
    reader.SetDelimiters(",");
    while (!reader.EndOfData) 
    {
        //Processing a line of the file
        string[] fields = reader.ReadFields();
        // now fields contains 5 elements, e.g.
        // fields[0] = "61402818083"
        // fields[1] = "services"
        // fields[2] = "in postcodes 3000, 4000, 5000"
        // fields[3] = "are"
        // fields[4] = "First Name"
    }
}

注意

需要在项目中添加Microsoft.VisualBasic作为参考

using System;
using System.Text.RegularExpressions;
public class Program
{
    public static void Main()
    {
        string line = "'"61402818083'",'"services'",'"in postcodes 3000, 4000'",'"are'",'"First Name'"";
        var reg = new Regex("'".*?'"");
        var matches = reg.Matches(line);
        foreach (var item in matches)
        {
            Console.WriteLine(item.ToString());
        }
    }
}

输出:

"61402818083"
"services"
"in postcodes 3000, 4000"
"are"
"First Name"

https://dotnetfiddle.net/5GxxIo

还有一个可能的解决方案:

using System;
using System.Text.RegularExpressions;
public class Program
{
    public static void Main()
    {
        string line = "'"61402818083'",'"services'",'"in postcodes 3000, 4000'",'"are'",'"First Name'"";
        Console.WriteLine(line.ToString());
        var reg = new Regex("(?:^|,)('"(?:[^'"]+|'"'")*'"|[^,]*)", RegexOptions.Compiled);
        var matches = reg.Matches(line);
        foreach (Match match in reg.Matches(line))
        {
            Console.WriteLine(match.Value.TrimStart(','));
        }
    }
}

https://dotnetfiddle.net/rRml2D

我认为您可以通过逐个连接字符串来实现这一点。

示例(未测试(

using System.IO;
using System.Text;
int counter = 0;
string line = String.Empty;
StringBuilder newString = new StringBuilder();
StreamReader file = new StreamReader("c:''test.txt");
while((line = file.ReadLine()) != null)
{
    newString.Append(line + ",");
}
file.Close();
newString.ToString().TrimEnd(',');
,(?=(?:[^'"]*'"[^'"]*'")*[^'"]*$)
     ^^

您的正则表达式是正确的。它有一个不必要的capturing group,结果证明它是邪恶的。请参阅演示。

https://regex101.com/r/fM9lY3/10