如何在循环矢量化中消除数组边界检查?

本文关键字：数组边界检查循环矢量化 | 更新日期: 2023-09-27 18:04:54

我有一个任务，在二进制字面值0x0上分割数据库表中varbinary(8000)列的多个运行。但是，这可能会改变，所以我想保留这个变量。我希望使用SQLCLR作为流表值函数来快速完成此操作。我知道每个字符串至少有几千字节。

编辑:我已经更新了我的算法。以避免内部循环展开的肮脏。但是要说服CLR对寄存器分配做出正确的选择是极其困难的。如果有一种简单的方法可以让CLR相信j和i实际上是一样的，那就太棒了。但它却做了一些愚蠢的事情。最好是优化第一个路径循环。但是你不能在循环中使用goto。

我决定采用C函数memchr的64位实现。基本上，我不是一次扫描一个字节并进行比较，而是一次扫描8个字节，使用一些位旋转。作为参考，Array.IndexOf<Byte>对一个答案做了类似的4字节扫描，我只是想继续这样做。有几件事需要注意:

内存压力是SQLCLR函数中一个非常现实的问题。String.Split是out的，因为它在前面分配了很多内存，这是我想避免的。它也适用于UCS-2字符串，这将要求我将ascii字符串转换为unicode字符串，从而在返回时将数据视为lob数据类型。(SqlChars/SqlString在转换为lob类型之前只能返回4000字节)
我想流媒体。避免String.Split的另一个原因是它会一次完成所有工作，从而产生大量内存压力。对于带有大量分隔符的代码，纯T-SQL方法将开始击败它。
我想让它"安全"。所以一切都成功了。在安全检查中似乎有一笔很大的罚款。

Buffer.BlockCopy是非常快的，它似乎是更好的支付成本前期一次比不断支付的成本为BitConverter。这也比将输入转换为字符串并保留该引用更便宜。

代码相当快，但似乎我支付了相当多的绑定检查，在初始循环和临界区，当我找到一个匹配。因此，对于具有大量分隔符的代码，我倾向于使用更简单的c#枚举器，它只进行字节比较。

这是我的代码，

class SplitBytesEnumeratorA : IEnumerator
{
    // Fields
    private readonly byte[] _bytes;
    private readonly ulong[] _longs;
    private readonly ulong _comparer;
    private readonly Record _record = new Record();
    private int _start;
    private readonly int _length;
    // Methods
    internal SplitBytesEnumeratorA(byte[] bytes, byte delimiter)
    {
        this._bytes = bytes;
        this._length = bytes.Length;
        // we do this so that we can avoid a spillover scan near the end.
        // in unsafe implementation this would be dangerous as we potentially
        // will be reading more bytes than we should.
        this._longs = new ulong[(_length + 7) / 8];
        Buffer.BlockCopy(bytes, 0, _longs, 0, _length);
        var c = (((ulong)delimiter << 8) + (ulong)delimiter);
        c = (c << 16) + c;
        // comparer is now 8 copies of the original delimiter.
        c |= (c << 32);
        this._comparer = c;
    }
    public bool MoveNext()
    {
        if (this._start >= this._length) return false;
        int i = this._start;
        var longs = this._longs;
        var comparer = this._comparer;
        var record = this._record;
        record.id++;
        // handle the case where start is not divisible by eight.
        for (; (i & 7) != 0; i++)
        {
            if (i == _length || _bytes[i] == (comparer & 0xFF))
            {
                record.item = new byte[(i - _start)];
                Buffer.BlockCopy(_bytes, _start, record.item, 0, i - _start);
                _start = i + 1;
                return true;
            }
        }
        // main loop. We crawl the array 8 bytes at a time.
        for (int j=i/8; j < longs.Length; j++)
        {
            ulong t1 = longs[j];
            unchecked
            {
                t1 ^= comparer;
                ulong t2 = (t1 - 0x0101010101010101) & ~t1;
                if ((t2 & 0x8080808080808080) != 0)
                {
                    i =j*8;
                    // make every case 3 comparison instead of n. Potentially better. 
                    // This is an unrolled binary search.
                    if ((t2 & 0x80808080) == 0)
                    {
                        i += 4;
                        t2 >>= 32;
                    }
                    if ((t2 & 0x8080) == 0)
                    {
                        i += 2;
                        t2 >>= 16;
                    }
                    if ((t2 & 0x80) == 0)
                {
                i++;
                }
                record.item = new byte[(i - _start)];
                // improve cache locality by not switching collections.
                Buffer.BlockCopy(longs, _start, record.item, 0, i - _start);                _start = i + 1;
                return true;
            }
        }
        // no matches found increment by 8
    }
    // no matches left. Let's return the remaining buffer.
    record.item = new byte[(_length - _start)];
    Buffer.BlockCopy(longs, _start, record.item, 0, (_length - _start));
    _start = _bytes.Length;
    return true;
    }
    void IEnumerator.Reset()
    {
        throw new NotImplementedException();
    }
    public object Current
    {
        get
        {
            return this._record;
        }
    }
}
// We use a class to avoid boxing .
class Record
{
    internal int id;
    internal byte[] item;
}

如何在循环矢量化中消除数组边界检查?

考虑跳出框框，您是否考虑过将字符串转换为XML并使用XQuery进行拆分?

例如，您可以传入分隔符和(air code):

DECLARE @xml as xml
DECLARE @str as varchar(max)
SET @str = (SELECT CAST(t.YourBinaryColumn AS varchar(max) FROM [tableName] t) 
SET @xml = cast(('<X>'+replace(@str,@delimiter,'</X><X>')+'</X>') as xml)

这将二进制转换为字符串，并用XML标记替换分隔符。然后:

SELECT N.value('.', 'varchar(10)') as value FROM @xml.nodes('X') as T(N)

将获得各个"元素"，即每个分隔符之间的数据。

也许这个想法可能是有用的，或者作为一个催化剂，你可以在此基础上发展。

相关文章：

最新信息: