SQL 服务器暂时性异常编号
本文关键字:编号 异常 暂时性 服务器 SQL | 更新日期: 2023-09-27 18:36:08
我想为我的数据库调用编写一些包装代码(使用 C# 和用于访问数据库的Microsoft技术),在"暂时性"异常时自动重试。通过瞬态,我的意思是很有可能最终解决的问题(相对于永远不起作用的逻辑错误)。我能想到的例子包括:
- 僵局
- 连接超时
- 命令超时
我曾计划使用SqlException的错误号来发现这些错误。所以例如:
List<RunStoredProcedureResultType> resultSet = null;
int limit = 3;
for (int i = 0; i < limit; ++i)
{
bool isLast = i == limit - 1;
try
{
using (var db = /* ... */)
{
resultSet = db.RunStoredProcedure(param1, param2).ToList();
}
//if it gets here it was successful
break;
}
catch (SqlException ex)
{
if (isLast)
{
//3 transient errors in a row. So just kill it
throw;
}
switch (ex.Number)
{
case 1205: //deadlock
case -2: //timeout (command timeout?)
case 11: //timeout (connection timeout?)
// do nothing - continue the loop
break;
default:
//a non-transient error. Just throw the exception on
throw;
}
}
Thread.Sleep(TimeSpan.FromSeconds(1)); //some kind of delay - might not use Sleep
}
return resultSet;
(请原谅我的任何错误 - 我只是在飞行中写的。我也意识到我可以很好地包装它...
所以关键问题是:我应该认为哪些数字是"瞬态的"(我意识到我认为瞬态的数字可能与其他人认为瞬态的数字不同)。我在这里找到了一个不错的列表:
https://msdn.microsoft.com/en-us/library/cc645603.aspx
但它的庞大和笔记非常有用。有没有人建立了一个他们用于类似事情的列表?
更新
最后,我们选择了一个"错误列表"——如果错误是已知的"非瞬态错误"列表之一——这通常是程序员的错误。我列出了我们用作答案的数字列表。
sql Azure 中有一个类[SqlDatabaseTransientErrorDetectionStrategy.cs]
用于暂时性故障处理。它涵盖了几乎所有可被视为瞬态的异常代码类型。它也是Retry strategy
的完整实现。
在此处添加代码段以供将来参考:
/// <summary>
/// Error codes reported by the DBNETLIB module.
/// </summary>
private enum ProcessNetLibErrorCode
{
ZeroBytes = -3,
Timeout = -2,
/* Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. */
Unknown = -1,
InsufficientMemory = 1,
AccessDenied = 2,
ConnectionBusy = 3,
ConnectionBroken = 4,
ConnectionLimit = 5,
ServerNotFound = 6,
NetworkNotFound = 7,
InsufficientResources = 8,
NetworkBusy = 9,
NetworkAccessDenied = 10,
GeneralError = 11,
IncorrectMode = 12,
NameNotFound = 13,
InvalidConnection = 14,
ReadWriteError = 15,
TooManyHandles = 16,
ServerError = 17,
SSLError = 18,
EncryptionError = 19,
EncryptionNotSupported = 20
}
进一步切换案例以检查 sql 异常中是否返回错误号:
switch (err.Number)
{
// SQL Error Code: 40501
// The service is currently busy. Retry the request after 10 seconds. Code: (reason code to be decoded).
case ThrottlingCondition.ThrottlingErrorNumber:
// Decode the reason code from the error message to determine the grounds for throttling.
var condition = ThrottlingCondition.FromError(err);
// Attach the decoded values as additional attributes to the original SQL exception.
sqlException.Data[condition.ThrottlingMode.GetType().Name] =
condition.ThrottlingMode.ToString();
sqlException.Data[condition.GetType().Name] = condition;
return true;
// SQL Error Code: 10928
// Resource ID: %d. The %s limit for the database is %d and has been reached.
case 10928:
// SQL Error Code: 10929
// Resource ID: %d. The %s minimum guarantee is %d, maximum limit is %d and the current usage for the database is %d.
// However, the server is currently too busy to support requests greater than %d for this database.
case 10929:
// SQL Error Code: 10053
// A transport-level error has occurred when receiving results from the server.
// An established connection was aborted by the software in your host machine.
case 10053:
// SQL Error Code: 10054
// A transport-level error has occurred when sending the request to the server.
// (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)
case 10054:
// SQL Error Code: 10060
// A network-related or instance-specific error occurred while establishing a connection to SQL Server.
// The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server
// is configured to allow remote connections. (provider: TCP Provider, error: 0 - A connection attempt failed
// because the connected party did not properly respond after a period of time, or established connection failed
// because connected host has failed to respond.)"}
case 10060:
// SQL Error Code: 40197
// The service has encountered an error processing your request. Please try again.
case 40197:
// SQL Error Code: 40540
// The service has encountered an error processing your request. Please try again.
case 40540:
// SQL Error Code: 40613
// Database XXXX on server YYYY is not currently available. Please retry the connection later. If the problem persists, contact customer
// support, and provide them the session tracing ID of ZZZZZ.
case 40613:
// SQL Error Code: 40143
// The service has encountered an error processing your request. Please try again.
case 40143:
// SQL Error Code: 233
// The client was unable to establish a connection because of an error during connection initialization process before login.
// Possible causes include the following: the client tried to connect to an unsupported version of SQL Server; the server was too busy
// to accept new connections; or there was a resource limitation (insufficient memory or maximum allowed connections) on the server.
// (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)
case 233:
// SQL Error Code: 64
// A connection was successfully established with the server, but then an error occurred during the login process.
// (provider: TCP Provider, error: 0 - The specified network name is no longer available.)
case 64:
// DBNETLIB Error Code: 20
// The instance of SQL Server you attempted to connect to does not support encryption.
case (int)ProcessNetLibErrorCode.EncryptionNotSupported:
return true;
}
在此处查看完整来源。
很抱歉回答我自己的问题,但如果有人仍然感兴趣,我们刚刚开始建立自己的错误代码列表。不理想,但我们认为这种情况不应该经常发生。
我们选择了"坏清单"方法,而不是问题中暗示的"好清单"。到目前为止,我们拥有的 ID 是:
PARAMETER_NOT_SUPPLIED = 201;
CANNOT_INSERT_NULL_INTO_NON_NULL = 515;
FOREGIN_KEY_VIOLATION = 547;
PRIMARY_KEY_VIOLATION = 2627;
MEMORY_ALLOCATION_FAILED = 4846;
ERROR_CONVERTING_NUMERIC_TO_DECIMAL = 8114;
TOO_MANY_ARGUMENTS = 8144;
ARGUMENT_IS_NOT_A_PARAMETER = 8145;
ARGS_SUPPLIED_FOR_PROCEDURE_WITHOUT_PARAMETERS = 8146;
STRING_OR_BINARY_TRUNCATED = 8152;
INVALID_POINTER = 10006;
WRONG_NUMBER_OF_PARAMETERS = 18751;
我们注意到的另一件事是,如果连接池超时,您不会得到 SqlException - 而是收到一个 InvalidOperationException 报告"超时已过期"。很遗憾它不是 SqlException,但非常值得一看。
我会尽量通过任何补充来保持最新状态。
没有可重试代码的规范列表。其他团队以前也遇到过这个问题。EF 团队制定了重试策略。你可能想突袭他们的代码。但该列表并不完整。我在 GitHub 上看到了 EF 提交,他们修改了列表。
我也有这个问题。我添加了一些明显的错误代码,这些代码是从SELECT * FROM sys.messages WHERE language_id = 1033 AND text LIKE '%...%'
中挖出来的。然后,我在应用程序遇到代码时添加了代码。
您还需要重试超时和网络错误的特殊错误号。服务器无法生成该号码,因为连接已断开。我认为这个数字是-2,但你需要确定。
SQL Server 定义的错误级别对于此目的毫无用处(并且主要是一般情况下)。
我们正在使用 vendettamit 的列表,并在我们偶然发现另一个我们确定是暂时的错误代码时不断扩展它。重要的是要注意,瞬态的定义是"值得重试,也许下次会起作用",而不是必要的SQL Server问题。到目前为止,我们已经添加了以下代码:
- 53: 建立与服务器的连接时出错。
- 109:管道已结束(仅当通过命名管道连接到本地 SQL 服务器时才需要)
- 11004:无法连接(通常表示客户端网络尚未准备就绪)
- 17142:服务器暂停(方便在SQL服务器"消失"一段时间时测试行为)
当您知道您的连接字符串很好时(即因为您最近设法建立了连接),您可以添加以下代码。同样,不应使用这些参数重试建立初始连接,如果连接字符串中的参数错误,这些参数也会弹出。
- 11001:找不到主机(可能出现在网络更改时)
- 1326:错误的用户名或密码(我们在 VM 上拉动虚拟以太网电缆时看到此弹出窗口)
- 258:找不到服务器(没有 TCP 应答)