如何通过多个键实现快速查找

本文关键字:查找 实现 何通过 | 更新日期: 2023-09-27 18:00:58

我想创建一个带有可选键的快速查找(dictionary?),例如,假设我有3个键:"first_name"、"last_name"answers"zipcode"

所以我希望能够做以下(伪代码):

GetValue(first_name) -- would return a list of everyone with that first name
GetValue(first_name, last_name) -- would return a list of everyone with that first name & last name
GetValue(zipcode, first_name) -- would return a list of everyone with that first_name in the specified zipcode

我应该能够查询出那些键的所有排列。您将使用什么数据结构?你将如何实现这一点?

如何通过多个键实现快速查找

您仍然可以使用常规字典,其中键可以是这样的自定义类型:

public class CompositeKey
{
     public CompositeKey(string firstName, string lastName, string zipCode)
     {
           FirstName = firstName;
           LastName = lastName;
           ZipCode = zipCode;
     }
     public string FirstName { get; }
     public string LastName { get; }
     public string ZipCode { get; }
}

现在,我将覆盖CompositeKey上的EqualsGetHashCode,以提供使复合密钥唯一的原因,从而使Dictionary<TKey, TValue>能够存储唯一的复合密钥。

最后,我可以这样查询字典:

var value = dict[new CompositeKey(firstName: "Matías", lastName: "Fidemraizer" )];

OP在一些评论中提出了这个问题:

我考虑过这种方法,但你会如何查询词典仅适用于"FirstName="Matias"?

由于您覆盖了EqualsGetHashCode,因此可以将所有组合作为关键字添加到整个字典中,并且它们都可以共存:

Person person = new Person { /* Set members here */ }
// Note that I'll add many keys that have the same value
dict.Add(new CompositeKey(name: "Matías"), person);
dict.Add(new CompositeKey(lastName: "Fidemraizer"), person);
dict.Add(new CompositeKey(firstName: "Matías", lastName: "Fidemraizer"), person);

每个键都会产生不同的哈希代码,这样它们就可以共存于同一个字典中,并且它们将提供一个强大的工具来通过许多标准和条件组合进行查询。

另一种方法

另一种方法可以是使用多个字典,其中它们的键是使用某种约定的整个值的级联,并且这些值是整个类的实例:

Dictionary<string, Person> names = new Dictionary<string, Person>();
names.Add("matias", new Person { /* Set members here */ });
Dictionary<string, Person> names = new Dictionary<string, Person>();
names.Add("matias:fidemraizer", new Person { /* Set members here */ });
// And so on, for every criteria you want to search...

稍后,您将实现一个代理,以根据给定的条件确定要查询的字典。

Redis呢

实际上,你应该看看Redis,它是一个键值存储,具有复杂的数据结构,如散列、集合、排序集等。也就是说,您可以使用Redis集中和分发缓存,并且您的缓存可能会被许多应用程序使用。

它的使用和安装非常简单(它是一个小于10MB的可执行文件…)

@狗仔队对字典法提出了质疑

他说:

另一个名字相同的人呢?

如果OP需要考虑这种情况(是的,这不是例外情况,所以值得考虑!),那么OP似乎需要将数据存储在字典中,其中键是整个组合键,值应该是List<Person>HashSet<Person>甚至LinkedList<Person>

此外,这意味着一个密钥(slot)将能够存储许多人,并且像get a person with first name"Matías"这样的查询将始终返回IEnumerable<Person>的实现(list、hash、linkedlist…),其中返回的整个集合将是finded persons:

KeyValuePair<CompositeKey, ISet<Person>> result;
if(dictionary.TryGetValue(new CompositeKey(firstName: "Matías"), out result))
{
    // I've got either one or many results and I'll decide what to do in
    // that case!
}

此外,这种增强的方法还有另一个可能的问题。当您使用像new CompositeKey(firstName: "Matías")这样的复合关键字进行查询,并且整个字典存储区可能存储了多个名字为"Matías"的人时,您将得到一个ISet<Person>IList<Person>LinkedList<Person>

获得一个或多个结果的第一次搜索具有复杂度O(1)(恒定时间),因为整个复合密钥是基于其哈希代码存储的,但第一次搜索的返回结果不再是字典,对它们的任何搜索都将是O(N)(获得的项目越多,查找结果所需的时间就越多

顺便说一句,如果你试图用名字来寻找一个人,这是因为你知道你可以得到的不仅仅是一个结果,除非字典中只存储了一个完整名字的人,否则你不能指望找到一个。

因此,如果结果的计数大于1,您似乎需要消除歧义,这可以通过提供一个超过名字的组合键来执行另一个O(1)搜索,也可以通过一些人类用户(或人工智能…)手动选择其中一个结果来完成。

总结:

  • 如果你通过提供一个组成部分来寻找一个人,你就冒着得到更多结果的风险。然后,如果是一个带有UI或某种人工智能的应用程序,则根本不应该进行搜索,而是直接从结果中选择一个项目(这是一个复杂度为O(1)的操作):
KeyValuePair<CompositeKey, ISet<Person>> result;
if(dictionary.TryGetValue(new CompositeKey(firstName: "Matías"), out result))
{
    if(result.Value.Count > 1)
    {
         // Here you would show the user what you've found in the UI
         // and the whole user would choose one of the results directly,
         // which is an operation with O(1) complexity 
    }
    else if(result.Value.Count <= 1)
    {
         // OK, I got 0 or 1 result, this is easier than I thought! ;)
    }
}
  • 如果你通过提供一个组件来寻找一个人,一旦你的应用程序意识到它得到了不止一个结果,它就可以自动提供另一个组件,你不会对结果执行搜索,但你会提供一个新的组合键,为主字典提供更多的组件,幸运的是,你会得到一个结果
public KeyValuePair<CompositeKey, ISet<Person>> SearchPerson(CompositeKey key)
{
    KeyValuePair<CompositeKey, ISet<Person>> result;
    if(dictionary.TryGetValue(new CompositeKey(firstName: "Matías"), out result))
    {
        if(result.Value.Count > 1)
        {
            // Oops! More than one result..... BUT I already know another 
            // component that will make the whole key absolutely unique, so
            // I'll call this method recursively to specialize the search even
            // more. Obviously, I've hardcoded the ZIP code as a sample, but
            // in a real-world case, who knows from where I would get this 
            // ZIP code... Maybe from some geolocalization query based on current
            // user's location?
            // Wait, it might happen that a person called Matías could live
            // in a location near be so this other person would have stored
            // the same ZIP code... Well, this goes outside the scope of this
            // Q&A. It's just an example of what to do, in an actual application
            // there should be many other choices to disambiguate persons
            // automatically...
            return SearchPerson(new CompositeKey(firstName: key.FirstName, zipCode: "03984"));
        }
        else if(result.Value.Count <= 1)
        {
             // OK, I got 0 or 1 result, this is easier than I thought! ;)
        }
    }
}

您可以使用3个Lookup s:

var FirstNamesLookup = data.ToLookup(x => Tuple.Create(x.FirstName), x => x);
var FirstAndLastLookup = data.ToLookup(x => Tuple.Create(x.FirstName, x.LastName), x => x);
var FirstAndZipLookup = data.ToLookup(x => Tuple.Create(x.FirstName, x.zipCode), x => x);

具有特定名字的所有记录:

var matches = FirstNamesLookup[Tuple.Create("SomeName")].ToList();

具有特定名字和姓氏的所有记录:

var matches = FirstAndLastLookup[Tuple.Create("SomeName", "SomeLastName")].ToList();

第三种情况也是如此。

您应该只使用任何通用集合类型,并使用LINQ进行查找:

var addresses = Enumerable.Empty<Address>();
// would return a list of everyone with that first name
addresses.Where(x => x.FirstName == "firstname");
// would return a list of everyone with that first name & last name
addresses.Where(x => x.FirstName == "firstname" && x.LastName == "lastname");
// would return a list of everyone with that first_name in the specified zipcode
addresses.Where(x => x.FirstName == "firstname" && x.ZipCode == "zipcode");

基于这里的所有想法和我被限制使用的事实。NET 2.0我是这样做的。包括它只是为了完整性,以防有人再次遇到这个问题。[不会将其标记为答案,因为它基于上面@Matias的许多想法,因此将他的答案标记为对最终解决方案贡献最大的答案]:

public class PersonKey {
    public string FirstName { get; private set; }
    public string LastName { get; private set; }
    public int Zipcode { get; private set; }
    public PersonKey() {
        FirstName = null;
        LastName = null;
        Zipcode = int.MinValue;
    }
    public PersonKey(int Zipcode, string FirstName) : this() {
        this.FirstName = FirstName;
        this.Zipcode = Zipcode;
    }
    public PersonKey(string LastName, string FirstName) : this() {
        this.FirstName = FirstName;
        this.LastName = LastName;
    }
    public PersonKey(int Zipcode, string LastName, string FirstName) {
        this.Zipcode = Zipcode;
        this.LastName = LastName;
        this.FirstName = FirstName;
    }
    public List<string> KeyList {
        get {
            var keyLst = new List<string>();
            if (!String.IsNullOrEmpty(FirstName))
                keyLst.Add("FirstName:" + FirstName);
            if (!String.IsNullOrEmpty(LastName))
                keyLst.Add("LastName:" + LastName);
            if (Zipcode != int.MinValue)
                keyLst.Add("Zipcode:" + Zipcode);
            return keyLst;
        }
    }
    public string Key {
        get {
            return MakeKey(KeyList.ToArray());
        }
    }
    public List<string[]> AllPossibleKeys {
        get {
            return CreateSubsets(KeyList.ToArray());
        }
    }
    List<T[]> CreateSubsets<T>(T[] originalArray) {
        List<T[]> subsets = new List<T[]>();
        for (int i = 0; i < originalArray.Length; i++) {
            int subsetCount = subsets.Count;
            subsets.Add(new T[] { originalArray[i] });
            for (int j = 0; j < subsetCount; j++) {
                T[] newSubset = new T[subsets[j].Length + 1];
                subsets[j].CopyTo(newSubset, 0);
                newSubset[newSubset.Length - 1] = originalArray[i];
                subsets.Add(newSubset);
            }
        }
        return subsets;
    }
    internal string MakeKey(string[] possKey) {
        return String.Join(",", possKey);
    }

然后我创建一个像这样的缓存:

//declare the cache
        private Dictionary<string, List<Person>> _lookup = new Dictionary<string, List<Person>>();

当我从数据库中读取记录时,我将其存储在缓存中,如下所示:

  var key = new PersonKey(person.ZipCode, person.LastName, person.FirstName);
  List<Person> lst;
  foreach (var possKey in key.AllPossibleKeys) {
      var k = key.MakeKey(possKey);
      if (!_lookup.TryGetValue(k, out lst)) {
           lst = new List<Person>();
           _lookup.Add(k, lst);
      }
      lst.Add(person);
   }

从缓存中查找非常简单:

  List<Person> lst;
  var key = new PersonKey(lastName, firstName);  //more constructors can be added in PersonKey
   _lookup.TryGetValue(key.Key, out lst);