如何比较两个字串并指出哪一部分不同?

本文关键字:一部分 串并 两个 何比较 比较 | 更新日期: 2023-09-27 18:03:14

例如,如果我有…

string a = "personil";
string b = "personal";

我想要……

string c = "person[i]l";

但是不一定是单个字符。我也可以这样……

string a = "disfuncshunal";
string b = "dysfunctional";

在这种情况下,我想得到…

string c = "d[isfuncshu]nal";

另一个例子是…(注意两个单词的长度是不同的)

string a = "parralele";
string b = "parallel";
string c = "par[ralele]";

另一个例子是…

string a = "ato";
string b = "auto";
string c = "a[]to";

我该怎么做呢?

编辑:两个字符串的长度可以不同。

编辑:增加了额外的例子。感谢Nenad的提问。

如何比较两个字串并指出哪一部分不同?

我今天一定很无聊,但我实际上做了一个UnitTest,通过所有4种情况(如果你没有在此期间添加更多)。

Edit:添加了2个边缘情况并修复了它们。

Edit2:重复多次的字母

[Test]
[TestCase("parralele", "parallel", "par[ralele]")]
[TestCase("personil", "personal", "person[i]l")]
[TestCase("disfuncshunal", "dysfunctional", "d[isfuncshu]nal")]
[TestCase("ato", "auto", "a[]to")]
[TestCase("inactioned", "inaction", "inaction[ed]")]
[TestCase("refraction", "fraction", "[re]fraction")]
[TestCase("adiction", "ad[]diction", "ad[]iction")]
public void CompareStringsTest(string attempted, string correct, string expectedResult)
{
    int first = -1, last = -1;
    string result = null;
    int shorterLength = (attempted.Length < correct.Length ? attempted.Length : correct.Length);
    // First - [
    for (int i = 0; i < shorterLength; i++)
    {
        if (correct[i] != attempted[i])
        {
            first = i;
            break;
        }
    }
    // Last - ]
    var a = correct.Reverse().ToArray();
    var b = attempted.Reverse().ToArray();
    for (int i = 0; i < shorterLength; i++)
    {
        if (a[i] != b[i])
        {
            last = i;
            break;
        }
    }
    if (first == -1 && last == -1)
        result = attempted;
    else
    {
        var sb = new StringBuilder();
        if (first == -1)
            first = shorterLength;
        if (last == -1)
            last = shorterLength;
        // If same letter repeats multiple times (ex: addition)
        // and error is on that letter, we have to trim trail.
        if (first + last > shorterLength)
            last = shorterLength - first;
        if (first > 0)
            sb.Append(attempted.Substring(0, first));
        sb.Append("[");
        if (last > -1 && last + first < attempted.Length)
            sb.Append(attempted.Substring(first, attempted.Length - last - first));
        sb.Append("]");
        if (last > 0)
            sb.Append(attempted.Substring(attempted.Length - last, last));
        result = sb.ToString();
    }
    Assert.AreEqual(expectedResult, result);
}

你试过我的DiffLib吗?

使用该库,以及以下代码(在LINQPad中运行):

void Main()
{
    string a = "disfuncshunal";
    string b = "dysfunctional";
    var diff = new Diff<char>(a, b);
    var result = new StringBuilder();
    int index1 = 0;
    int index2 = 0;
    foreach (var part in diff)
    {
        if (part.Equal)
            result.Append(a.Substring(index1, part.Length1));
        else
            result.Append("[" + a.Substring(index1, part.Length1) + "]");
        index1 += part.Length1;
        index2 += part.Length2;
    }
    result.ToString().Dump();
}

得到如下输出:

d[i]sfunc[shu]nal

老实说,我不明白这给了你什么,因为你似乎完全忽略了b字符串中的更改部分,只转储了a字符串的相关部分。

下面是一个完整的控制台应用程序,它将适用于您给出的两个示例:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApplication2
{
    class Program
    {
        static void Main(string[] args)
        {
            string a = "disfuncshunal";
            string b = "dysfunctional";
            StringBuilder sb = new StringBuilder();
            for (int i = 0; i < a.Length; i++)
            {
                if (a[i] != b[i])
                {
                    sb.Append("[");
                    sb.Append(a[i]);
                    sb.Append("]");
                    continue;
                }
                sb.Append(a[i]);
            }
            var str = sb.ToString();
            var startIndex = str.IndexOf("[");
            var endIndex = str.LastIndexOf("]");
            var start = str.Substring(0, startIndex + 1);
            var mid = str.Substring(startIndex + 1, endIndex - 1);
            var end = str.Substring(endIndex);
            Console.WriteLine(start + mid.Replace("[", "").Replace("]", "") + end);
        }
    }
}

it 将不起作用如果您想显示不匹配的单词的多个完整部分

您没有指定字符串长度不同时的处理方法,但是当字符串长度相等时,这里有一个解决问题的方法:

private string Compare(string string1, string string2) {
            //This only works if the two strings are the same length..
            string output = "";
            bool mismatch = false;
            for (int i = 0; i < string1.Length; i++) {
                char c1 = string1[i];
                char c2 = string2[i];
                if (c1 == c2) {
                    if (mismatch) {
                        output += "]" + c1;
                        mismatch = false;
                    } else {
                        output += c1;
                    }
                } else {
                    if (mismatch) {
                        output += c1;
                    } else {
                        output += "[" + c1;
                        mismatch = true;
                    }
                }
            }
            return output;
        }

不是很好的方法,但作为使用LINQ:任务的练习,似乎是为2个字符串找到匹配的前缀和后缀,返回"前缀+[+第一个字符串的中间+后缀。

所以你可以匹配前缀(Zip + TakeWhile(a==b)),然后通过反转两个字符串和反转结果来对后缀重复相同的操作。

var first = "disfuncshunal";
var second = "dysfunctional";
// Prefix
var zipped = first.ToCharArray().Zip(second.ToCharArray(), (f,s)=> new {f,s});
var prefix = string.Join("", 
    zipped.TakeWhile(c => c.f==c.s).Select(c => c.f));
// Suffix
var zippedReverse = first.ToCharArray().Reverse()
   .Zip(second.ToCharArray().Reverse(), (f,s)=> new {f,s});
var suffix = string.Join("", 
    zippedReverse.TakeWhile(c => c.f==c.s).Reverse().Select(c => c.f));
// Cut and combine.
var middle = first.Substring(prefix.Length,
      first.Length - prefix.Length - suffix.Length);
var result = prefix + "[" + middle + "]" + suffix;

更容易和更快的方法是使用2个for循环(从开始到结束,从结束到开始)。