如何获取SQL字符串';CLR函数中的排序规则

本文关键字:函数 CLR 规则 排序 何获取 获取 字符串 SQL | 更新日期: 2023-09-27 18:29:38

我正在C#中编写一个Levenstein Distance函数来计算两个字符串之间的编辑距离。问题是,我想用不同的排序规则多次调用该方法,但在SQL到CLR接口上只有一个排序规则,这是数据库的默认排序规则。

以下是CLR函数的代码:

[SqlFunction(IsDeterministic = true, Name = "LevenshteinDistance")]
public static SqlInt64 Distance(SqlString textA, SqlString textB)
{
    // get a collation-aware comparer so string/character comparisons 
    // will match the inputs' specified collation
    var aCompareInfo = textA.CompareInfo;
    var compareOptions = ConvertCompareOptions(textA.SqlCompareOptions);
    var aLength = textA.Value.Length;
    var bLength = textB.Value.Length;
    // degenerate cases
    if (aCompareInfo.Compare(textA.Value, 0, aLength, textB.Value, 0, bLength, compareOptions) == 0) { return 0; }
    if (aLength == 0) { return bLength; }
    if (bLength == 0) { return aLength; }
    // create two work vectors of integer distances
    var previousDistances = new SqlInt64[Maximum(aLength, bLength) + 1];
    var currentDistances = new SqlInt64[Maximum(aLength, bLength) + 1];
    // initialize previousDistances (the previous row of distances)
    // this row is A[0][i]: edit distance for an empty textA
    // the distance is just the number of characters to delete from textB
    for (var i = 0; i < previousDistances.Length; i++)
    {
        previousDistances[i] = i;
    }
    for (var i = 0; i < aLength; i++)
    {
        // calculate currentDistances from the previous row previousDistances
        // first element of currentDistances is A[i+1][0]
        //   edit distance is delete (i+1) chars from textA to match empty textB
        currentDistances[0] = i + 1;
        // use formula to fill in the rest of the row
        for (var j = 0; j < bLength; j++)
        {
            var cost = (aCompareInfo.Compare(textA.Value, i, 1, textB.Value, j, 1, compareOptions) == 0) ? 0 : 1;
            currentDistances[j + 1] = Minimum(currentDistances[j] + 1, previousDistances[j + 1] + 1, previousDistances[j] + cost);
        }
        // copy currentDistances to previousDistances for next iteration
        for (var j = 0; j < previousDistances.Length; j++)
        {
            previousDistances[j] = currentDistances[j];
        }
    }
    return currentDistances[bLength];
}

将CLR程序集部署到SQL Server(2008 R2)并像这样调用它之后:

print dbo.LevenshteinDistance('abc' collate Latin1_General_CI_AI, 'ABC' collate Latin1_General_CI_AI)
print dbo.LevenshteinDistance('abc' collate Latin1_General_CS_AS_KS_WS, N'ABC' collate Latin1_General_CS_AS_KS_WS)

两个调用都返回零(0)。因为我为第二次调用指定了区分大小写的排序规则,所以我希望第二次呼叫返回三(3)。

使用SQL Server中的CLR函数,是否可以指定数据库默认值以外的排序规则,并在CLR函数中使用它们?如果是,如何?

如何获取SQL字符串';CLR函数中的排序规则

如何在CLR函数中获取SQL字符串的排序规则?

不幸的是,你不能。根据TechNet的排序规则和CLR集成数据类型页面,在"参数排序规则"部分:

当创建公共语言运行时(CLR)例程,并且绑定到该例程的CLR方法的参数的类型为SqlString时,SQL Server将使用包含调用例程的数据库的默认排序规则创建该参数的实例。如果参数不是SqlType(例如,String而不是SqlString

因此,您看到的关于textA输入参数的CompareInfoSqlCompareOptions属性的行为虽然令人遗憾/沮丧/无法理解,但至少与文档中所说的系统应该如何工作一致。

因此,通过单独的输入参数传入属性的解决方案是可行的(尽管实际上应该使用SqlInt32SqlBoolean的SqlTypes;-)。

如果您的解决方案不涉及大于4K的字符串,有些人可能会认为还有另一种方法更好。将数据类型设为"object",而不是SqlString。这相当于SQL_VARIANT。尽管变体比标准类型会产生更多的开销,但它们可以使用任意排序规则来保存字符串。

SELECT dbo.ClrCollationTest(N'Anything' collate latin1_general_cs_as),
       dbo.ClrCollationTest(N'Anything' collate SQL_Latin1_General_CP1_CI_AS);

当CLR被编码时,上面分别返回0和1:

public static SqlBoolean ClrCollationTest(object anything)
{
    if (anything is SqlString)
        return new SqlBoolean(((SqlString)anything).SqlCompareOptions.HasFlag(SqlCompareOptions.IgnoreCase));
    else throw new ArgumentException(anything.GetType().Name + " is not a valid parameter data type.  SqlString is required.");
}

在互联网上没有看到任何替代方案,也没有对此问题的回答,我决定将所需的排序规则属性指定为函数参数,并根据输入或从数据库传递的默认排序规则选择CultureInfo对象和CompareOptions

[SqlFunction(IsDeterministic = true, Name = "LevenshteinDistance")]
public static SqlInt64 Distance(SqlString textA, SqlString textB, int? lcid, bool? caseInsensitive, bool? accentInsensitive, bool? kanaInsensitive, bool? widthInsensitive)
{
    // get a collation-aware comparer so string/character comparisons 
    // will match the inputs' specified collation
    //var aCompareInfo = textA.CompareInfo;
    var aCompareInfo = CultureInfo.GetCultureInfo(lcid ?? textA.LCID).CompareInfo;
    //var compareOptions = ConvertCompareOptions(textA.SqlCompareOptions);
    var compareOptions = GetCompareOptions(caseInsensitive, accentInsensitive, kanaInsensitive, widthInsensitive);
    // ...  more code ...
    // first comparison
    if (aCompareInfo.Compare(textA.Value, 0, aLength, textB.Value, 0, bLength, compareOptions) == 0) { return 0; }
    // ...  more code ...
    var cost = (aCompareInfo.Compare(textA.Value, i, 1, textB.Value, j, 1, compareOptions) == 0) ? 0 : 1;
    // ...  more code ...
}
private static CompareOptions GetCompareOptions(bool? caseInsensitive, bool? accentInsensitive, bool? kanaInsensitive, bool? widthInsensitive)
{
    var compareOptions = CompareOptions.None;
    compareOptions |= (caseInsensitive ?? false) ? CompareOptions.IgnoreCase : CompareOptions.None;
    compareOptions |= (accentInsensitive ?? false) ? CompareOptions.IgnoreNonSpace : CompareOptions.None;
    compareOptions |= (kanaInsensitive ?? false) ? CompareOptions.IgnoreKanaType : CompareOptions.None;
    compareOptions |= (widthInsensitive ?? false) ? CompareOptions.IgnoreWidth : CompareOptions.None;
    return compareOptions;
}

在更新了我的程序集和UDF声明之后,我可以调用这样的函数:

print dbo.LevenshteinDistance('abc', 'ABC', null, 1, 1, 1, 1)
print dbo.LevenshteinDistance('abc', 'ABC', null, 0, 0, 0, 0)

现在,第一个调用返回0(数据库默认区域性,一切不敏感),而第二个调用返回3(数据库默认地区性,一切敏感)。