查找锯齿数组的唯一行

本文关键字:唯一 数组 查找 | 更新日期: 2023-09-27 17:49:32

我有一个锯齿数组的字符串,我需要找到所有的行是唯一的。例如,

[ 
 ["A","B"] , 
 ["C","D","E"], 
 ["B", "A"],
 ["E","A"] 
]

这应该返回第1行和第3行,因为第0行和第2行是重复的。如何做到这一点?

查找锯齿数组的唯一行

首先,作为数组,第0行和第2行不是重复的。它们只是有相同的元素集合。但是,如果您只是想删除这些行,您可以这样做:

string[][] GetNonDuplicates(string[][] jagged)
{
  //not a hashset, but a dictionary. A value of false means that the row 
  //is not duplicate, a value of true means that at least one dulicate was found
  Dictionary<string[], bool> dict = 
          new Dictionary<string[], bool>(new RowEqualityComparer());
  foreach(string[] row in jagged)
  {
    //if a duplicate is found - using the hash and the compare method
    if (dict.ContainsKey(row)) 
    {
       dict[row] = true;  //set value to true
    }
    else
    {
      dict.Add(row, false);  //first time we see this row, add it
    }
  }
  //just pop out all the keys which have a value of false
  string[][] result = dict.Where(item => !item.Value)
                          .Select(item => item.Key)
                          .ToArray();
  return result;
}
...
string[][] jagged = new []{new []{"A","B"} , 
                           new []{"C","D","E"}, 
                           new []{"B", "A"},
                           new []{"E","A"}};
string[][] nonDuplicates = GetNonDuplicates(jagged);

其中RowEqualityComparer为:

class RowEqualityComparer : IEqualityComparer<string[]>
{
    public bool Equals(string[] first, string[] second)
    {
        // different legths - different rows
        if (first.Length != second.Length)
          return false;
        //we need to copy the arrays because Array.Sort 
        //will change the original rows
        var flist = first.ToList();
        flist.Sort();
        var slist = second.ToList();
        slist.Sort();
        //loop and compare one by one
        for (int i=0; i < flist.Count; i++)
        {
            if (flist[i]!=slist[i])
              return false;
        }
        return true;
    }
    public int GetHashCode(string[] row)
    {
       //I have no idea what I'm doing, just some generic hash code calculation
       if (row.Length == 0)
         return 0;
       int hash = row[0].GetHashCode();
       for (int i = 1; i < row.Length; i++)
         hash ^= row[i].GetHashCode();
       return hash;
    }
}

假设您想忽略顺序,重复项(因为您已经提到了一个HashSet)和结果应该只包含没有重复项的数组。

您可以为Enumerable.GroupBy实现自定义IEqualityComparer<String[]>,并仅选择唯一的数组(group-count==1):

class IgnoreOrderComparer : IEqualityComparer<string[]>
{
    public bool Equals(string[] x, string[] y)
    {
        if (x == null || y == null) return false;
        return !x.Distinct().Except(y.Distinct()).Any();
    }
    public int GetHashCode(string[] arr)
    {
        if (arr == null) return int.MinValue;
        int hash = 19;
        foreach (string s in arr.Distinct())
        {
            hash = hash + s.GetHashCode();
        }
        return hash;
    }
}

其余部分很简单:

String[][] uniques = arrays.GroupBy(arr => arr, new IgnoreOrderComparer())
                           .Where(g => g.Count() == 1)
                           .Select(g => g.First())
                           .ToArray();

Edit:这里可能是使用相同比较器的更有效的版本:

IEqualityComparer<string[]> comparer = new IgnoreOrderComparer();
String[][] uniques = arrays.Where(a1 =>
    !arrays.Any(a2 => a1 != a2 && comparer.Equals(a1, a2)))
                           .ToArray();

就算法解决方案而言,我将

  1. 排序你的行(你可以使用任何你喜欢的排序指标,只要它区分任何两个不同的行。)
  2. 选择相邻行不相同的行。

如果您这样做,您应该能够在O(m*n*lg(n))中完成您的要求,其中m是您的行长度,而n是行数

给定值集意味着相等,您可以对每行的单元格进行排序,以帮助您对行列表进行排序。这将导致 O (n * m * lg (m) + m * n * lg (n))

我将按如下方式计算每一行的哈希值:

[ 
 ["A","B"] , // hash of this row :10 as example 
 ["C","D","E"], // hash of this row  : 20
 ["B", "A"], // hash of this row would be 10 as well
 ["E","A"] 
]

由于它们都是字符串,您可以计算散列值并创建每行散列。

你可以这样使用HashSet,每一行创建一个HashSet,然后找到一行与其他行的差异,如果差异为空,则它们是相同的。

您也可以使用交集,如果交集不为空,则该行不是唯一的。