比较两个自定义类的数组列表，找出重复项

本文关键字：列表数组两个自定义比较 | 更新日期: 2023-09-27 18:06:53

我有两个ArrayList数组

public class ProductDetails
{
    public string id;
    public string description;
    public float rate;
}
ArrayList products1 = new ArrayList();
ArrayList products2 = new ArrayList();
ArrayList duplicateProducts = new ArrayList();

现在我想要的是获得所有产品(与ProductDetails类的所有字段)在products1和products2中具有重复的描述。

我可以像传统方式一样运行两个for/while循环，但这将非常慢，特别是如果我将在中拥有超过10k个元素数组。

所以也许可以用LINQ做点什么

比较两个自定义类的数组列表，找出重复项

如果你想使用linQ，你需要写你自己的EqualityComparer，你重写这两个方法Equals和GetHashCode()

 public class ProductDetails
    { 
        public string id {get; set;}
        public string description {get; set;}
        public float rate {get; set;}
    }
public class ProductComparer : IEqualityComparer<ProductDetails>
{
    public bool Equals(ProductDetails x, ProductDetails y)
    {
        //Check whether the objects are the same object. 
        if (Object.ReferenceEquals(x, y)) return true;
        //Check whether the products' properties are equal. 
        return x != null && y != null && x.id.Equals(y.id) && x.description.Equals(y.description);
    }
    public int GetHashCode(ProductDetails obj)
    {
        //Get hash code for the description field if it is not null. 
        int hashProductDesc = obj.description == null ? 0 : obj.description.GetHashCode();
        //Get hash code for the idfield. 
        int hashProductId = obj.id.GetHashCode();
        //Calculate the hash code for the product. 
        return hashProductDesc ^ hashProductId ;
    }
}

现在，假设你有这样的对象:

ProductDetails [] items1= { new ProductDetails { description= "aa", id= 9, rating=2.0f }, 
                       new ProductDetails { description= "b", id= 4, rating=2.0f} };
ProductDetails [] items= { new ProductDetails { description= "aa", id= 9, rating=1.0f }, 
                       new ProductDetails { description= "c", id= 12, rating=2.0f } };

IEnumerable<ProductDetails> duplicates =
    items1.Intersect(items2, new ProductComparer());

考虑重写System.Object。= 方法。

   public class ProductDetails
   {
     public string id;
     public string description;
     public float rate;
     public override bool Equals(object obj)
     {
       if(obj is ProductDetails == null)
          return false;
      if(ReferenceEquals(obj,this))
          return true;
       ProductDetails p = (ProductDetails)obj;
       return description == p.description;
    }
  }

过滤就像这样简单:

var result = products1.Where(product=>products2.Contains(product));

编辑:

请考虑这个实现不是最优的…

此外，在对你的问题的评论中，有人建议你使用数据库。
这样可以优化性能——根据数据库实现
在任何情况下——开销都不会是你的。

但是，您可以使用Dictionary或HashSet来优化此代码:
重载System.Object。GetHashCode方法方法:

public override int GetHashCode()
{
  return description.GetHashCode();
}

你现在可以这样做:

var hashSet = new HashSet<ProductDetails>(products1);
var result = products2.Where(product=>hashSet.Contains(product));

这将在一定程度上提高您的性能，因为查找将降低成本。

10k个元素没什么，但是要确保使用正确的集合类型。ArrayList已不推荐使用，请使用List<ProductDetails>。

下一步是为你的类实现适当的Equals和GetHashCode覆盖。这里的假设是description是关键，因为从复制的角度来看，这是您所关心的:

public class ProductDetails
{
    public string id;
    public string description;
    public float rate;
    public override bool Equals(object obj)
    {
        var p = obj as ProductDetails;
        return ReferenceEquals(p, null) ? false : description == obj.description;
    }
    public override int GetHashCode() => description.GetHashCode();    
}

现在我们有了选择。一种简单而有效的方法是使用散列集:

var set = new HashSet<ProductDetails>();
var products1 = new List<ProductDetails>();  // fill it
var products2 = new List<ProductDetails>();  // fill it
// shove everything in the first list in the set
foreach(var item in products1)
    set.Add(item);
// and simply test the elements in the second set
foreach(var item in products2)
    if(set.Contains(item))
    {
        // item.description was already used in products1, handle it here
    }

这给了你线性(O(n))时间复杂度，这是你能得到的最好结果。