循环使用双引号,但忽略单引号中的双引号

本文关键字:单引号 循环 | 更新日期: 2023-09-27 18:36:31

我遇到了我猜想的逻辑问题。我用 C# 编码,但欢迎使用通用伪代码解决方案。

我有这个文本文件,例如,包含这个文本:

blah "hello john"
blah 'the code is "flower" '
blah "good night"
我想遍历双引号

并对它们做一些事情,但我想忽略单引号中包含的双引号。我得到了开头双引号和结尾双引号的位置(string data包含文本文件的内容):

C#

// Start searching from beginning
int quotestart = 0, quoteend = 0;
while (data.IndexOf('"', quotestart) != -1)
{
  // Get opening double quote
  quotestart = data.IndexOf('"', quotestart);
  // Get ending double quote
  quoteend = data.IndexOf('"', quotestart + 1);
  string sub = data.Substring(quotestart + 1, quoteend - quotestart - 1);
  Console.WriteLine(sub);
  // Set the start position for the next round
  quotestart = quoteend + 1;
}

使用我的代码,输出将是:

hello john
flower
good night

因为"花"在单引号内,我希望我的输出是:

hello john
good night

编辑

我目前正在研究一种方法,首先用"A"填充单引号之间的所有数据。这样,当我遍历双引号时,单引号之间的任何数据都将被忽略。不确定这是否是正确的方法。

循环使用双引号,但忽略单引号中的双引号

尝试在谷歌上搜索有限状态机,但没有接受过计算机工程方面的正式培训,我必须承认我有点迷茫。您还有其他指示吗?

FSM 是最简单的计算机形式之一。这个想法是,您拥有有限数量的"状态"信息和稳定的输入流。每个输入都会导致状态以可预测的方式更改,仅基于当前状态和当前输入,并导致发生可预测的输出

因此,假设您的输入是单个字符,而输出是单个字符或"空"字符。下面是一个 FSM,它可以执行您想要的操作:

  • 国家是OUTSIDEINSIDEDOUBLEINSIDESINGLE
  • 输入"'x。 (WOLOG让x代表任何其他角色。

我们有三个状态和三个输入,所以有九种可能的组合。

  • 如果我们OUTSIDE并得到x,请保持OUTSIDE并发出null.
  • 如果我们OUTSIDE并得到",请INSIDEDOUBLE并发出null.
  • 如果我们OUTSIDE并得到',请INSIDESINGLE并发出null.
  • 如果我们INSIDEDOUBLE并得到x,请保持INSIDEDOUBLE并发出x
  • 如果我们INSIDEDOUBLE并得到",请OUTSIDE并发出null
  • 如果我们INSIDEDOUBLE并得到',请保持INSIDEDOUBLE并发出'
  • 如果我们INSIDESINGLE并得到x,请保持INSIDESINGLE并发出null
  • 如果我们INSIDESINGLE并得到",请保持INSIDESINGLE并发出null
  • 如果我们INSIDESINGLE并得到',请OUTSIDE并发出null

唯一剩下的就是说开始状态是OUTSIDE

因此,假设输入是1 " 2 " 3 ' 4 " 5 " ' 6。状态和输出包括:

  • OUTSIDE 1,发出null,保持OUTSIDE
  • OUTSIDE得到",发出nullINSIDEDOUBLE
  • INSIDEDOUBLE 2,发出2,保持INSIDEDOUBLE
  • INSIDEDOUBLE ",发出nullOUTSIDE
  • OUTSIDE 3,发出null,保持OUTSIDE
  • OUTSIDE ',发出nullINSIDESINGLE

。自己填写其余部分。

这足以让你写代码吗?

不错的解决方案;对于小型FSM,使用switch语句是一种传统的方法,但是当状态和输入的数量变得庞大而复杂时,它变得笨拙。下面是一个更易于扩展的替代解决方案:表驱动解决方案。 也就是说,将有关转换和操作的事实放入数组中,然后 FSM 只不过是一系列数组查找:

// States
const int Outside = 0;
const int InDouble = 1;
const int InSingle = 2;
// Inputs
const int Other = 0;
const int DoubleQuote = 1;
const int SingleQuote = 2;
static readonly int[,] stateTransitions =
{   /*               Other     DoubleQ   SingleQ */
    /* Outside */  { Outside,  InDouble, InSingle },
    /* InDouble */ { InDouble, Outside,  InDouble },
    /* InSingle */ { InSingle, InSingle, Outside }
};
// Do we emit the character or ignore it?
static readonly bool[,] actions =
{   /*              Other   DoubleQ SingleQ */
    /* Outside */ { false,  false,  false },
    /* InDouble */{ true,   false,  true  },
    /* InSingle */{ false,  false,  false }
};
static int Classify(char c)
{
    switch (c)
    {
        case '''': return SingleQuote;
        case ''"': return DoubleQuote;
        default: return Other;
    }
}
static IEnumerable<char> FSM(IEnumerable<char> inputs)
{
    int state = Outside;
    foreach (char input in inputs)
    {
        int kind = Classify(input);
        if (actions[state, kind]) 
            yield return input;
        state = stateTransitions[state, kind];
    }
}

现在我们可以得到结果

string.Join("", FSM(@"1""2'3""4""5'6""7'8""9""A'B"))

非常感谢 Eric Lippert 提供此解决方案背后的逻辑。我在下面提供我的 C# 解决方案,以防有人需要它。它有一些不必要的重新分配,我只是为了清楚起见而留下的。

string state = "outside";
for (int i = 0; i < data.Length; i++)
{
    c = data[i];
    switch (state)
    {
        case "outside":
            switch (c)
            {
                case '''':
                    state = "insidesingle";
                    break;
                case '"':
                    state = "insidedouble";
                    break;
                default:
                    state = "outside";
                    break;
            }
            break;
        case "insidedouble":
            switch (c)
            {
                case '''':
                    state = "insidedouble";
                    Console.Write(c);
                    break;
                case '"':
                    state = "outside";
                    break;
                default:
                    state = "insidedouble";
                    Console.Write(c);
                    break;
            }
            break;  
        case "insidesingle":
            switch (c)
            {
                case '''':
                    state = "outside";
                    break;
                case '"':
                    state = "insidesingle";
                    break;
                default:
                    state = "insidesingle";
                    break;
            }
            break;
    }
}

只是为了好玩,我决定使用一个名为无状态的非常轻量级的 FSM 库来解决这个问题。

下面是使用此库时代码的外观。

就像 Eric 的解决方案一样,下面的代码可以轻松更改以满足新的要求。

void Main()
{
    Console.WriteLine(string.Join("", GetCharacters(@"1""2'3""4""5'6""7'8""9""A'B")));  
}
public enum CharacterType
{
    Other,
    SingleQuote,
    DoubleQuote
}
public enum State
{
    OutsideQuote,
    InsideSingleQuote,
    InsideDoubleQuote
}
public IEnumerable<char> GetCharacters(string input)
{
    //Initial state of the machine is OutSideQuote.
    var sm = new StateMachine<State, CharacterType>(State.OutsideQuote);
    //Below, we configure state transitions.
    //For a given state, we configure how CharacterType 
    //transitions state machine to a new state.
    sm.Configure(State.OutsideQuote)
        .Ignore(CharacterType.Other)        
        //If you are outside quote and you receive a double quote, 
        //state transitions to InsideDoubleQuote.
        .Permit(CharacterType.DoubleQuote, State.InsideDoubleQuote)
        //If you are outside quote and you receive a single quote,
        //state transitions to InsideSingleQuote.
        //Same logic applies for other state transitions below.
        .Permit(CharacterType.SingleQuote, State.InsideSingleQuote);
    sm.Configure(State.InsideDoubleQuote)
        .Ignore(CharacterType.Other)
        .Ignore(CharacterType.SingleQuote)
        .Permit(CharacterType.DoubleQuote, State.OutsideQuote);
    sm.Configure(State.InsideSingleQuote)
        .Ignore(CharacterType.Other)
        .Ignore(CharacterType.DoubleQuote)
        .Permit(CharacterType.SingleQuote, State.OutsideQuote);
    foreach (var character in input)
    {
        var characterType = GetCharacterType(character);
        sm.Fire(characterType);
        if(sm.IsInState(State.InsideDoubleQuote) && characterType != CharacterType.DoubleQuote)
            yield return character;
    }       
}
public CharacterType GetCharacterType(char input)
{
    switch (input)
    {
        case '''': return CharacterType.SingleQuote;
        case ''"': return CharacterType.DoubleQuote;
        default: return CharacterType.Other;
    }
}