知道从c#调用SQL Server时何时重试或失败

本文关键字:何时 重试 失败 Server SQL 调用 | 更新日期: 2023-09-27 18:06:47

我有一个c#应用程序,它从托管在一个有点不稳定的环境中的SQL Server获取数据。对于环境问题,我无能为力,所以我需要尽可能优雅地处理它们。

要做到这一点,我想重试操作是基础设施故障的结果,如网络故障,SQL服务器脱机,因为他们正在重新启动,查询超时等。同时,如果查询因逻辑错误而失败,我不想重试查询。我只希望那些将异常冒泡到客户端。

我的问题是:区分环境问题(失去连接,超时)和其他类型的异常(即使环境稳定也会发生的逻辑错误)的最佳方法是什么?

在c#中有一个常用的模式来处理这样的事情吗?例如,是否有一个属性可以检查SqlConnection对象以检测失败的连接?如果没有,解决这个问题的最佳方法是什么?

无论如何,我的代码并没有什么特别之处:

using (SqlConnection connection = new SqlConnection(myConnectionString))
using (SqlCommand command = connection.CreateCommand())
{
  command.CommandText = mySelectCommand;
  connection.Open();
  using (SqlDataReader reader = command.ExecuteReader())
  {
    while (reader.Read())
    {
      // Do something with the returned data.
    }
  }
}

知道从c#调用SQL Server时何时重试或失败

单个SqlException(可能)封装多个SQL Server错误。你可以用Errors属性来遍历它们。每个错误是SqlError:

foreach (SqlError error in exception.Errors)

每个SqlError都有一个Class属性,您可以使用它来大致确定您是否可以重试(如果您重试,如果您也必须重新创建连接)。从MSDN:

  • Class <10是你传递的信息错误,然后(可能)你不能重试,如果你先不纠正输入。
  • Class从11到16是"由用户生成的",那么如果用户首先不纠正他的输入,你可能又不能做任何事情。请注意,第16类包含许多临时错误,第13类用于死锁(感谢EvZ),所以如果您逐个处理它们,您可以排除这些类。
  • Class从17到24是一般的硬件/软件错误,你可以重试。当Class为20或更高时,您必须重新创建连接。22和23可能是严重的硬件/软件错误,24表示媒体错误(用户应该被警告,但如果它只是一个"临时"错误,你可以重试)。

你可以在这里找到每个类的更详细的描述。

一般来说,如果你用他们的类处理错误,你不需要确切地知道每个错误(使用error.Number属性或exception.Number,这只是该列表中第一个SqlError的快捷方式)。这样做的缺点是,当它没有用时(或者错误无法恢复),您可能会重试。我建议采用两步方法:

  • 检查已知错误代码(用SELECT * FROM master.sys.messages列出错误代码),看看你想处理什么(知道如何处理)。该视图包含所有支持的语言的消息,因此您可能需要通过msglangid列(例如英语的1033)过滤它们。
  • 对于其他所有依赖于错误类,当Class为13或高于16时重试(如果为20或更高则重新连接)。
  • 严重程度高于21(22、23和24)的错误是严重的错误,很少的等待不会修复这些问题(数据库本身也可能损坏)。

关于高等阶级的一句话。如何处理这些错误并不简单,它取决于许多因素(包括应用程序的风险管理)。作为简单的第一步,当尝试写操作时,我不会重试22,23,24:如果数据库,文件系统或介质严重损坏,那么写新数据可能会进一步破坏数据完整性(SQL Server非常小心,即使在关键情况下也不会为了查询而损害DB)。损坏的服务器(取决于您的数据库网络体系结构)甚至可能被热插拔(自动地,在指定的时间之后,或者在触发指定的触发器时)。始终咨询和接近您的DBA。

重试策略取决于您正在处理的错误:释放资源,等待挂起的操作完成,采取替代操作等。一般来说,只有当所有错误都是"可重试的"时,才应该重试:

bool rebuildConnection = true; // First try connection must be open
for (int i=0; i < MaximumNumberOfRetries; ++i) {
    try {
        // (Re)Create connection to SQL Server
        if (rebuildConnection) {
            if (connection != null)
                connection.Dispose();
            // Create connection and open it...
        }
        // Perform your task
        // No exceptions, task has been completed
        break;
    }
    catch (SqlException e) {
        if (e.Errors.Cast<SqlError>().All(x => CanRetry(x))) {
            // What to do? Handle that here, also checking Number property.
            // For Class < 20 you may simply Thread.Sleep(DelayOnError);
            rebuildConnection = e.Errors
                .Cast<SqlError>()
                .Any(x => x.Class >= 20);
            continue; 
        }
        throw;
    }
}

try/finally中包装所有内容以正确处理连接。使用这个简单的假装朴素的CanRetry()函数:

private static readonly int[] RetriableClasses = { 13, 16, 17, 18, 19, 20, 21, 22, 24 };
private static bool CanRetry(SqlError error) {
    // Use this switch if you want to handle only well-known errors,
    // remove it if you want to always retry. A "blacklist" approach may
    // also work: return false when you're sure you can't recover from one
    // error and rely on Class for anything else.
    switch (error.Number) {
        // Handle well-known error codes, 
    }
    // Handle unknown errors with severity 21 or less. 22 or more
    // indicates a serious error that need to be manually fixed.
    // 24 indicates media errors. They're serious errors (that should
    // be also notified) but we may retry...
    return RetriableClasses.Contains(error.Class); // LINQ...
}

查找非关键错误列表的一些相当棘手的方法。

通常我将所有这些(模板)代码嵌入到一个方法中(我可以隐藏为创建/处置/重新创建连接所做的所有脏东西),并使用以下签名:
public static void Try(
    Func<SqlConnection> connectionFactory,
    Action<SqlCommand> performer);

可以这样使用:

Try(
    () => new SqlConnection(connectionString),
    cmd => {
             cmd.CommandText = "SELECT * FROM master.sys.messages";
             using (var reader = cmd.ExecuteReader()) {
                 // Do stuff
         }
    });

请注意,骨架(重试错误)也可以使用,当你不使用SQL Server(实际上它可以用于许多其他操作,如I/O和网络相关的东西,所以我建议写一个通用的函数,并广泛地重用它)。

您可以简单地使用SqlConnectionStringBuilder属性来重试sql连接。

var conBuilder = new SqlConnectionStringBuilder(Configuration["Database:Connection"]); conBuilder.ConnectTimeout = 90; conBuilder.ConnectRetryInterval = 15; conBuilder.ConnectRetryCount = 6;

注意:-需要。net 4.5或更高版本。

我不知道任何标准,但这里有一个Sql-Server例外列表,我通常认为是可检索的,也有DTC的味道:

catch (SqlException sqlEx)
{
    canRetry = ((sqlEx.Number == 1205) // 1205 = Deadlock
        || (sqlEx.Number == -2) // -2 = TimeOut
        || (sqlEx.Number == 3989) // 3989 = New request is not allowed to start because it should come with valid transaction descriptor
        || (sqlEx.Number == 3965) // 3965 = The PROMOTE TRANSACTION request failed because there is no local transaction active.
        || (sqlEx.Number == 3919) // 3919 Cannot enlist in the transaction because the transaction has already been committed or rolled back
        || (sqlEx.Number == 3903)); // The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION.
}

关于重试,建议在重试之间添加随机延迟,以减少例如相同的两个事务再次死锁的机会。

对于一些DTC相关的错误,删除连接可能是必要的(或者在最坏的情况下,删除SqlClient.SqlConnection.ClearAllPools())——否则一个无用的连接将返回到池中。

本着分离关注点的精神,我在本例中设想了三个逻辑层…

  1. 应用层,调用"片状依赖处理程序"层
  2. "片状依赖处理程序"层,它调用数据访问层
  3. 数据访问层,不知道flakiness

所有用于重试的逻辑都将在该处理程序层中,这样除了与数据库通信之外,不会用逻辑污染数据访问层。(因此,您的数据访问代码不需要更改。如果逻辑上需要更改新功能,则无需担心"片状"。

重新尝试的模式可以基于捕获计数器循环中的特定异常。(计数器只是为了防止无限重试。)像这样:

public SomeReturnValue GetSomeData(someIdentifier)
{
    var tries = 0;
    while (tries < someConfiguredMaximum)
    {
        try
        {
            tries++;
            return someDataAccessObject.GetSomeData(someIdentifier);
        }
        catch (SqlException e)
        {
            someLogger.LogError(e);
            // maybe wait for some number of milliseconds?  make the method async if possible
        }
    }
    throw new CustomException("Maximum number of tries has been reached.");
}

这将循环一些配置的次数,重新尝试,直到它工作或达到最大值。在这个最大数目之后,将抛出一个自定义异常供应用程序处理。您可以通过检查捕获的特定SqlException进一步微调异常处理。也许根据错误消息,您可能希望继续循环或抛出CustomException .

您可以通过捕获其他异常类型、检查这些异常类型等来进一步细化此逻辑。这里的要点是,此职责与应用程序中的特定逻辑层保持隔离,对其他层尽可能透明。理想情况下,处理程序层和数据访问层实现相同的接口。这样,如果您将代码移到更稳定的环境中,并且不再需要处理程序层,则无需对应用层进行任何更改即可删除它。

我不知道有什么真正的标准。您可以尝试查看瞬时故障处理应用程序块。它非常健壮,但对一些用户来说可能有点太"企业"了。另一种方法可能是使用方面框架来捕获错误。否则,try/catch老方法就可以了。

在确定要重试什么时,通常需要查看异常。SqlException提供了关于问题来源的相当多的信息,但是解析它可能会很痛苦。我编写了一些代码来将它们分开,并试图确定哪些是可重试的,哪些是不可重试的。这已经有一段时间没有维护了,所以你应该把它当作一个起点,而不是一个成品。此外,这是针对SQL Azure的,所以它可能不完全适用于您的情况(例如,资源限制是Azure特有的功能,IIRC)。

/// <summary>
/// Helps to extract useful information from SQLExceptions, particularly in SQL Azure
/// </summary>
public class SqlExceptionDetails
{
    public ResourcesThrottled SeriouslyExceededResources { get; private set; }
    public ResourcesThrottled SlightlyExceededResources { get; private set; }
    public OperationsThrottled OperationsThrottled { get; private set; }
    public IList<SqlErrorCode> Errors { get; private set; }
    public string ThrottlingMessage { get; private set; }
    public bool ShouldRetry { get; private set; }
    public bool ShouldRetryImmediately { get; private set; }
    private SqlExceptionDetails()
    {
        this.ShouldRetryImmediately = false;
        this.ShouldRetry = true;
        this.SeriouslyExceededResources = ResourcesThrottled.None;
        this.SlightlyExceededResources = ResourcesThrottled.None;
        this.OperationsThrottled = OperationsThrottled.None;
        Errors = new List<SqlErrorCode>();
    }
    public SqlExceptionDetails(SqlException exception) :this(exception.Errors.Cast<SqlError>())
    {
    }
    public SqlExceptionDetails(IEnumerable<SqlError> errors) : this()
    {
        List<ISqlError> errorWrappers = (from err in errors
                                         select new SqlErrorWrapper(err)).Cast<ISqlError>().ToList();
        this.ParseErrors(errorWrappers);
    }
    public SqlExceptionDetails(IEnumerable<ISqlError> errors) : this()
    {
        ParseErrors(errors);
    }
    private void ParseErrors(IEnumerable<ISqlError> errors)
    {
        foreach (ISqlError error in errors)
        {
            SqlErrorCode code = GetSqlErrorCodeFromInt(error.Number);
            this.Errors.Add(code);
            switch (code)
            {
                case SqlErrorCode.ServerBusy:
                    ParseServerBusyError(error);
                    break;
                case SqlErrorCode.ConnectionFailed:
                    //This is a very non-specific error, can happen for almost any reason
                    //so we can't make any conclusions from it
                    break;
                case SqlErrorCode.DatabaseUnavailable:
                    ShouldRetryImmediately = false;
                    break;
                case SqlErrorCode.EncryptionNotSupported:
                    //this error code is sometimes sent by the client when it shouldn't be
                    //Therefore we need to retry it, even though it seems this problem wouldn't fix itself
                    ShouldRetry = true;
                    ShouldRetryImmediately = true;
                    break;
                case SqlErrorCode.DatabaseWorkerThreadThrottling:
                case SqlErrorCode.ServerWorkerThreadThrottling:
                    ShouldRetry = true;
                    ShouldRetryImmediately = false;
                    break;

                //The following errors are probably not going to resolved in 10 seconds
                //They're mostly related to poor query design, broken DB configuration, or too much data
                case SqlErrorCode.ExceededDatabaseSizeQuota:
                case SqlErrorCode.TransactionRanTooLong:
                case SqlErrorCode.TooManyLocks:
                case SqlErrorCode.ExcessiveTempDBUsage:
                case SqlErrorCode.ExcessiveMemoryUsage:
                case SqlErrorCode.ExcessiveTransactionLogUsage:
                case SqlErrorCode.BlockedByFirewall:
                case SqlErrorCode.TooManyFirewallRules:
                case SqlErrorCode.CannotOpenServer:
                case SqlErrorCode.LoginFailed:
                case SqlErrorCode.FeatureNotSupported:
                case SqlErrorCode.StoredProcedureNotFound:
                case SqlErrorCode.StringOrBinaryDataWouldBeTruncated:
                    this.ShouldRetry = false;
                    break;
            }
        }
        if (this.ShouldRetry && Errors.Count == 1)
        {
            SqlErrorCode code = this.Errors[0];
            if (code == SqlErrorCode.TransientServerError)
            {
                this.ShouldRetryImmediately = true;
            }
        }
        if (IsResourceThrottled(ResourcesThrottled.Quota) ||
            IsResourceThrottled(ResourcesThrottled.Disabled))
        {
            this.ShouldRetry = false;
        }
        if (!this.ShouldRetry)
        {
            this.ShouldRetryImmediately = false;
        }
        SetThrottlingMessage();
    }
    private void SetThrottlingMessage()
    {
        if (OperationsThrottled == Sql.OperationsThrottled.None)
        {
            ThrottlingMessage = "No throttling";
        }
        else
        {
            string opsThrottled = OperationsThrottled.ToString();
            string seriousExceeded = SeriouslyExceededResources.ToString();
            string slightlyExceeded = SlightlyExceededResources.ToString();
            ThrottlingMessage = "SQL Server throttling encountered. Operations throttled: " + opsThrottled
                        + ", Resources Seriously Exceeded: " + seriousExceeded
                        + ", Resources Slightly Exceeded: " + slightlyExceeded;
        }
    }
    private bool IsResourceThrottled(ResourcesThrottled resource)
    {
        return ((this.SeriouslyExceededResources & resource) > 0 ||
                (this.SlightlyExceededResources & resource) > 0);
    }
    private SqlErrorCode GetSqlErrorCodeFromInt(int p)
    {
        switch (p)
        {
            case 40014:
            case 40054:
            case 40133:
            case 40506:
            case 40507:
            case 40508:
            case 40512:
            case 40516:
            case 40520:
            case 40521:
            case 40522:
            case 40523:
            case 40524:
            case 40525:
            case 40526:
            case 40527:
            case 40528:
            case 40606:
            case 40607:
            case 40636:
                return SqlErrorCode.FeatureNotSupported;
        }
        try
        {
            return (SqlErrorCode)p;
        }
        catch
        {
            return SqlErrorCode.Unknown;
        }
    }
    /// <summary>
    /// Parse out the reason code from a ServerBusy error. 
    /// </summary>
    /// <remarks>Basic idea extracted from http://msdn.microsoft.com/en-us/library/gg491230.aspx
    /// </remarks>
    /// <param name="error"></param>
    private void ParseServerBusyError(ISqlError error)
    {
        int idx = error.Message.LastIndexOf("Code:");
        if (idx < 0)
        {
            return;
        }
        string reasonCodeString = error.Message.Substring(idx + "Code:".Length);
        int reasonCode;
        if (!int.TryParse(reasonCodeString, out reasonCode))
        {
            return;
        }
        int opsThrottledInt = (reasonCode & 3);
        this.OperationsThrottled = (OperationsThrottled)(Math.Max((int)OperationsThrottled, opsThrottledInt));

        int slightResourcesMask = reasonCode >> 8;
        int seriousResourcesMask = reasonCode >> 16;
        foreach (ResourcesThrottled resourceType in Enum.GetValues(typeof(ResourcesThrottled)))
        {
            if ((seriousResourcesMask & (int)resourceType) > 0)
            {
                this.SeriouslyExceededResources |= resourceType;
            }
            if ((slightResourcesMask & (int)resourceType) > 0)
            {
                this.SlightlyExceededResources |= resourceType;
            }
        }
    }
}
public interface ISqlError
{
    int Number { get; }
    string Message { get; }
}
public class SqlErrorWrapper : ISqlError
{
    public SqlErrorWrapper(SqlError error)
    {
        this.Number = error.Number;
        this.Message = error.Message;
    }
    public SqlErrorWrapper()
    {
    }
    public int Number { get; set; }
    public string Message { get; set; }
}
/// <summary>
/// Documents some of the ErrorCodes from SQL/SQL Azure. 
/// I have not included all possible errors, only the ones I thought useful for modifying runtime behaviors
/// </summary>
/// <remarks>
/// Comments come from: http://social.technet.microsoft.com/wiki/contents/articles/sql-azure-connection-management-in-sql-azure.aspx
/// </remarks>
public enum SqlErrorCode : int
{
    /// <summary>
    /// We don't recognize the error code returned
    /// </summary>
    Unknown = 0,
    /// <summary>
    /// A SQL feature/function used in the query is not supported. You must fix the query before it will work.
    /// This is a rollup of many more-specific SQL errors
    /// </summary>
    FeatureNotSupported = 1,
    /// <summary>
    /// Probable cause is server maintenance/upgrade. Retry connection immediately.
    /// </summary>
    TransientServerError = 40197,
    /// <summary>
    /// The server is throttling one or more resources. Reasons may be available from other properties
    /// </summary>
    ServerBusy = 40501,
    /// <summary>
    /// You have reached the per-database cap on worker threads. Investigate long running transactions and reduce server load. 
    /// http://social.technet.microsoft.com/wiki/contents/articles/1541.windows-azure-sql-database-connection-management.aspx#Throttling_Limits
    /// </summary>
    DatabaseWorkerThreadThrottling = 10928,
    /// <summary>
    /// The per-server worker thread cap has been reached. This may be partially due to load from other databases in a shared hosting environment (eg, SQL Azure).
    /// You may be able to alleviate the problem by reducing long running transactions.
    /// http://social.technet.microsoft.com/wiki/contents/articles/1541.windows-azure-sql-database-connection-management.aspx#Throttling_Limits
    /// </summary>
    ServerWorkerThreadThrottling = 10929,
    ExcessiveMemoryUsage = 40553,
    BlockedByFirewall = 40615,
    /// <summary>
    /// The database has reached the maximum size configured in SQL Azure
    /// </summary>
    ExceededDatabaseSizeQuota = 40544,
    /// <summary>
    /// A transaction ran for too long. This timeout seems to be 24 hours.
    /// </summary>
    /// <remarks>
    /// 24 hour limit taken from http://social.technet.microsoft.com/wiki/contents/articles/sql-azure-connection-management-in-sql-azure.aspx
    /// </remarks>
    TransactionRanTooLong = 40549,
    TooManyLocks = 40550,
    ExcessiveTempDBUsage = 40551,
    ExcessiveTransactionLogUsage = 40552,
    DatabaseUnavailable = 40613,
    CannotOpenServer = 40532,
    /// <summary>
    /// SQL Azure databases can have at most 128 firewall rules defined
    /// </summary>
    TooManyFirewallRules = 40611,
    /// <summary>
    /// Theoretically means the DB doesn't support encryption. However, this can be indicated incorrectly due to an error in the client library. 
    /// Therefore, even though this seems like an error that won't fix itself, it's actually a retryable error.
    /// </summary>
    /// <remarks>
    /// http://social.msdn.microsoft.com/Forums/en/ssdsgetstarted/thread/e7cbe094-5b55-4b4a-8975-162d899f1d52
    /// </remarks>
    EncryptionNotSupported = 20,
    /// <summary>
    /// User failed to connect to the database. This is probably not recoverable.
    /// </summary>
    /// <remarks>
    /// Some good info on more-specific debugging: http://blogs.msdn.com/b/sql_protocols/archive/2006/02/21/536201.aspx
    /// </remarks>
    LoginFailed = 18456,
    /// <summary>
    /// Failed to connect to the database. Could be due to configuration issues, network issues, bad login... hard to tell
    /// </summary>
    ConnectionFailed = 4060,
    /// <summary>
    /// Client tried to call a stored procedure that doesn't exist
    /// </summary>
    StoredProcedureNotFound = 2812,
    /// <summary>
    /// The data supplied is too large for the column
    /// </summary>
    StringOrBinaryDataWouldBeTruncated = 8152
}

参考此文档:所有自定义实现处理大多数发生的问题。

// Define the retry logic parameters
var options = new SqlRetryLogicOption()
{
    // Tries 5 times before throwing an exception
    NumberOfTries = 5,
    // Preferred gap time to delay before retry
    DeltaTime = TimeSpan.FromSeconds(1),
    // Maximum gap time for each delay time before retry
    MaxTimeInterval = TimeSpan.FromSeconds(20),
    // SqlException retriable error numbers
    TransientErrors = new int[] { 4060, 1024, 1025}
};

// Create a custom retry logic provider
SqlRetryLogicBaseProvider provider = CustomRetry.CreateCustomProvider(options);

    // Assumes that connection is a valid SqlConnection object 
// Set the retry logic provider on the connection instance
connection.RetryLogicProvider = provider;
// Establishing the connection will trigger retry if one of the given transient failure occurs.
connection.Open();
文档女士