知道从c#调用SQL Server时何时重试或失败
本文关键字:何时 重试 失败 Server SQL 调用 | 更新日期: 2023-09-27 18:06:47
我有一个c#应用程序,它从托管在一个有点不稳定的环境中的SQL Server获取数据。对于环境问题,我无能为力,所以我需要尽可能优雅地处理它们。
要做到这一点,我想重试操作是基础设施故障的结果,如网络故障,SQL服务器脱机,因为他们正在重新启动,查询超时等。同时,如果查询因逻辑错误而失败,我不想重试查询。我只希望那些将异常冒泡到客户端。
我的问题是:区分环境问题(失去连接,超时)和其他类型的异常(即使环境稳定也会发生的逻辑错误)的最佳方法是什么?
在c#中有一个常用的模式来处理这样的事情吗?例如,是否有一个属性可以检查SqlConnection对象以检测失败的连接?如果没有,解决这个问题的最佳方法是什么?无论如何,我的代码并没有什么特别之处:
using (SqlConnection connection = new SqlConnection(myConnectionString))
using (SqlCommand command = connection.CreateCommand())
{
command.CommandText = mySelectCommand;
connection.Open();
using (SqlDataReader reader = command.ExecuteReader())
{
while (reader.Read())
{
// Do something with the returned data.
}
}
}
单个SqlException
(可能)封装多个SQL Server错误。你可以用Errors
属性来遍历它们。每个错误是SqlError
:
foreach (SqlError error in exception.Errors)
每个SqlError
都有一个Class
属性,您可以使用它来大致确定您是否可以重试(如果您重试,如果您也必须重新创建连接)。从MSDN:
-
Class
<10是你传递的信息错误,然后(可能)你不能重试,如果你先不纠正输入。 -
Class
从11到16是"由用户生成的",那么如果用户首先不纠正他的输入,你可能又不能做任何事情。请注意,第16类包含许多临时错误,第13类用于死锁(感谢EvZ),所以如果您逐个处理它们,您可以排除这些类。 -
Class
从17到24是一般的硬件/软件错误,你可以重试。当Class
为20或更高时,您必须重新创建连接。22和23可能是严重的硬件/软件错误,24表示媒体错误(用户应该被警告,但如果它只是一个"临时"错误,你可以重试)。
你可以在这里找到每个类的更详细的描述。
一般来说,如果你用他们的类处理错误,你不需要确切地知道每个错误(使用error.Number
属性或exception.Number
,这只是该列表中第一个SqlError
的快捷方式)。这样做的缺点是,当它没有用时(或者错误无法恢复),您可能会重试。我建议采用两步方法:
- 检查已知错误代码(用
SELECT * FROM master.sys.messages
列出错误代码),看看你想处理什么(知道如何处理)。该视图包含所有支持的语言的消息,因此您可能需要通过msglangid
列(例如英语的1033)过滤它们。 - 对于其他所有依赖于错误类,当
Class
为13或高于16时重试(如果为20或更高则重新连接)。 - 严重程度高于21(22、23和24)的错误是严重的错误,很少的等待不会修复这些问题(数据库本身也可能损坏)。
关于高等阶级的一句话。如何处理这些错误并不简单,它取决于许多因素(包括应用程序的风险管理)。作为简单的第一步,当尝试写操作时,我不会重试22,23,24:如果数据库,文件系统或介质严重损坏,那么写新数据可能会进一步破坏数据完整性(SQL Server非常小心,即使在关键情况下也不会为了查询而损害DB)。损坏的服务器(取决于您的数据库网络体系结构)甚至可能被热插拔(自动地,在指定的时间之后,或者在触发指定的触发器时)。始终咨询和接近您的DBA。
重试策略取决于您正在处理的错误:释放资源,等待挂起的操作完成,采取替代操作等。一般来说,只有当所有错误都是"可重试的"时,才应该重试:
bool rebuildConnection = true; // First try connection must be open
for (int i=0; i < MaximumNumberOfRetries; ++i) {
try {
// (Re)Create connection to SQL Server
if (rebuildConnection) {
if (connection != null)
connection.Dispose();
// Create connection and open it...
}
// Perform your task
// No exceptions, task has been completed
break;
}
catch (SqlException e) {
if (e.Errors.Cast<SqlError>().All(x => CanRetry(x))) {
// What to do? Handle that here, also checking Number property.
// For Class < 20 you may simply Thread.Sleep(DelayOnError);
rebuildConnection = e.Errors
.Cast<SqlError>()
.Any(x => x.Class >= 20);
continue;
}
throw;
}
}
在try
/finally
中包装所有内容以正确处理连接。使用这个简单的假装朴素的CanRetry()
函数:
private static readonly int[] RetriableClasses = { 13, 16, 17, 18, 19, 20, 21, 22, 24 };
private static bool CanRetry(SqlError error) {
// Use this switch if you want to handle only well-known errors,
// remove it if you want to always retry. A "blacklist" approach may
// also work: return false when you're sure you can't recover from one
// error and rely on Class for anything else.
switch (error.Number) {
// Handle well-known error codes,
}
// Handle unknown errors with severity 21 or less. 22 or more
// indicates a serious error that need to be manually fixed.
// 24 indicates media errors. They're serious errors (that should
// be also notified) but we may retry...
return RetriableClasses.Contains(error.Class); // LINQ...
}
查找非关键错误列表的一些相当棘手的方法。
通常我将所有这些(模板)代码嵌入到一个方法中(我可以隐藏为创建/处置/重新创建连接所做的所有脏东西),并使用以下签名:public static void Try(
Func<SqlConnection> connectionFactory,
Action<SqlCommand> performer);
可以这样使用:
Try(
() => new SqlConnection(connectionString),
cmd => {
cmd.CommandText = "SELECT * FROM master.sys.messages";
using (var reader = cmd.ExecuteReader()) {
// Do stuff
}
});
请注意,骨架(重试错误)也可以使用,当你不使用SQL Server(实际上它可以用于许多其他操作,如I/O和网络相关的东西,所以我建议写一个通用的函数,并广泛地重用它)。
您可以简单地使用SqlConnectionStringBuilder属性来重试sql连接。
var conBuilder = new SqlConnectionStringBuilder(Configuration["Database:Connection"]);
conBuilder.ConnectTimeout = 90;
conBuilder.ConnectRetryInterval = 15;
conBuilder.ConnectRetryCount = 6;
注意:-需要。net 4.5或更高版本。
我不知道任何标准,但这里有一个Sql-Server
例外列表,我通常认为是可检索的,也有DTC的味道:
catch (SqlException sqlEx)
{
canRetry = ((sqlEx.Number == 1205) // 1205 = Deadlock
|| (sqlEx.Number == -2) // -2 = TimeOut
|| (sqlEx.Number == 3989) // 3989 = New request is not allowed to start because it should come with valid transaction descriptor
|| (sqlEx.Number == 3965) // 3965 = The PROMOTE TRANSACTION request failed because there is no local transaction active.
|| (sqlEx.Number == 3919) // 3919 Cannot enlist in the transaction because the transaction has already been committed or rolled back
|| (sqlEx.Number == 3903)); // The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION.
}
关于重试,建议在重试之间添加随机延迟,以减少例如相同的两个事务再次死锁的机会。
对于一些DTC
相关的错误,删除连接可能是必要的(或者在最坏的情况下,删除SqlClient.SqlConnection.ClearAllPools()
)——否则一个无用的连接将返回到池中。
本着分离关注点的精神,我在本例中设想了三个逻辑层…
- 应用层,调用"片状依赖处理程序"层
- "片状依赖处理程序"层,它调用数据访问层
- 数据访问层,不知道flakiness
所有用于重试的逻辑都将在该处理程序层中,这样除了与数据库通信之外,不会用逻辑污染数据访问层。(因此,您的数据访问代码不需要更改。如果逻辑上需要更改新功能,则无需担心"片状"。
重新尝试的模式可以基于捕获计数器循环中的特定异常。(计数器只是为了防止无限重试。)像这样:
public SomeReturnValue GetSomeData(someIdentifier)
{
var tries = 0;
while (tries < someConfiguredMaximum)
{
try
{
tries++;
return someDataAccessObject.GetSomeData(someIdentifier);
}
catch (SqlException e)
{
someLogger.LogError(e);
// maybe wait for some number of milliseconds? make the method async if possible
}
}
throw new CustomException("Maximum number of tries has been reached.");
}
这将循环一些配置的次数,重新尝试,直到它工作或达到最大值。在这个最大数目之后,将抛出一个自定义异常供应用程序处理。您可以通过检查捕获的特定SqlException
进一步微调异常处理。也许根据错误消息,您可能希望继续循环或抛出CustomException
.
您可以通过捕获其他异常类型、检查这些异常类型等来进一步细化此逻辑。这里的要点是,此职责与应用程序中的特定逻辑层保持隔离,对其他层尽可能透明。理想情况下,处理程序层和数据访问层实现相同的接口。这样,如果您将代码移到更稳定的环境中,并且不再需要处理程序层,则无需对应用层进行任何更改即可删除它。
我不知道有什么真正的标准。您可以尝试查看瞬时故障处理应用程序块。它非常健壮,但对一些用户来说可能有点太"企业"了。另一种方法可能是使用方面框架来捕获错误。否则,try/catch老方法就可以了。
在确定要重试什么时,通常需要查看异常。SqlException
提供了关于问题来源的相当多的信息,但是解析它可能会很痛苦。我编写了一些代码来将它们分开,并试图确定哪些是可重试的,哪些是不可重试的。这已经有一段时间没有维护了,所以你应该把它当作一个起点,而不是一个成品。此外,这是针对SQL Azure的,所以它可能不完全适用于您的情况(例如,资源限制是Azure特有的功能,IIRC)。
/// <summary>
/// Helps to extract useful information from SQLExceptions, particularly in SQL Azure
/// </summary>
public class SqlExceptionDetails
{
public ResourcesThrottled SeriouslyExceededResources { get; private set; }
public ResourcesThrottled SlightlyExceededResources { get; private set; }
public OperationsThrottled OperationsThrottled { get; private set; }
public IList<SqlErrorCode> Errors { get; private set; }
public string ThrottlingMessage { get; private set; }
public bool ShouldRetry { get; private set; }
public bool ShouldRetryImmediately { get; private set; }
private SqlExceptionDetails()
{
this.ShouldRetryImmediately = false;
this.ShouldRetry = true;
this.SeriouslyExceededResources = ResourcesThrottled.None;
this.SlightlyExceededResources = ResourcesThrottled.None;
this.OperationsThrottled = OperationsThrottled.None;
Errors = new List<SqlErrorCode>();
}
public SqlExceptionDetails(SqlException exception) :this(exception.Errors.Cast<SqlError>())
{
}
public SqlExceptionDetails(IEnumerable<SqlError> errors) : this()
{
List<ISqlError> errorWrappers = (from err in errors
select new SqlErrorWrapper(err)).Cast<ISqlError>().ToList();
this.ParseErrors(errorWrappers);
}
public SqlExceptionDetails(IEnumerable<ISqlError> errors) : this()
{
ParseErrors(errors);
}
private void ParseErrors(IEnumerable<ISqlError> errors)
{
foreach (ISqlError error in errors)
{
SqlErrorCode code = GetSqlErrorCodeFromInt(error.Number);
this.Errors.Add(code);
switch (code)
{
case SqlErrorCode.ServerBusy:
ParseServerBusyError(error);
break;
case SqlErrorCode.ConnectionFailed:
//This is a very non-specific error, can happen for almost any reason
//so we can't make any conclusions from it
break;
case SqlErrorCode.DatabaseUnavailable:
ShouldRetryImmediately = false;
break;
case SqlErrorCode.EncryptionNotSupported:
//this error code is sometimes sent by the client when it shouldn't be
//Therefore we need to retry it, even though it seems this problem wouldn't fix itself
ShouldRetry = true;
ShouldRetryImmediately = true;
break;
case SqlErrorCode.DatabaseWorkerThreadThrottling:
case SqlErrorCode.ServerWorkerThreadThrottling:
ShouldRetry = true;
ShouldRetryImmediately = false;
break;
//The following errors are probably not going to resolved in 10 seconds
//They're mostly related to poor query design, broken DB configuration, or too much data
case SqlErrorCode.ExceededDatabaseSizeQuota:
case SqlErrorCode.TransactionRanTooLong:
case SqlErrorCode.TooManyLocks:
case SqlErrorCode.ExcessiveTempDBUsage:
case SqlErrorCode.ExcessiveMemoryUsage:
case SqlErrorCode.ExcessiveTransactionLogUsage:
case SqlErrorCode.BlockedByFirewall:
case SqlErrorCode.TooManyFirewallRules:
case SqlErrorCode.CannotOpenServer:
case SqlErrorCode.LoginFailed:
case SqlErrorCode.FeatureNotSupported:
case SqlErrorCode.StoredProcedureNotFound:
case SqlErrorCode.StringOrBinaryDataWouldBeTruncated:
this.ShouldRetry = false;
break;
}
}
if (this.ShouldRetry && Errors.Count == 1)
{
SqlErrorCode code = this.Errors[0];
if (code == SqlErrorCode.TransientServerError)
{
this.ShouldRetryImmediately = true;
}
}
if (IsResourceThrottled(ResourcesThrottled.Quota) ||
IsResourceThrottled(ResourcesThrottled.Disabled))
{
this.ShouldRetry = false;
}
if (!this.ShouldRetry)
{
this.ShouldRetryImmediately = false;
}
SetThrottlingMessage();
}
private void SetThrottlingMessage()
{
if (OperationsThrottled == Sql.OperationsThrottled.None)
{
ThrottlingMessage = "No throttling";
}
else
{
string opsThrottled = OperationsThrottled.ToString();
string seriousExceeded = SeriouslyExceededResources.ToString();
string slightlyExceeded = SlightlyExceededResources.ToString();
ThrottlingMessage = "SQL Server throttling encountered. Operations throttled: " + opsThrottled
+ ", Resources Seriously Exceeded: " + seriousExceeded
+ ", Resources Slightly Exceeded: " + slightlyExceeded;
}
}
private bool IsResourceThrottled(ResourcesThrottled resource)
{
return ((this.SeriouslyExceededResources & resource) > 0 ||
(this.SlightlyExceededResources & resource) > 0);
}
private SqlErrorCode GetSqlErrorCodeFromInt(int p)
{
switch (p)
{
case 40014:
case 40054:
case 40133:
case 40506:
case 40507:
case 40508:
case 40512:
case 40516:
case 40520:
case 40521:
case 40522:
case 40523:
case 40524:
case 40525:
case 40526:
case 40527:
case 40528:
case 40606:
case 40607:
case 40636:
return SqlErrorCode.FeatureNotSupported;
}
try
{
return (SqlErrorCode)p;
}
catch
{
return SqlErrorCode.Unknown;
}
}
/// <summary>
/// Parse out the reason code from a ServerBusy error.
/// </summary>
/// <remarks>Basic idea extracted from http://msdn.microsoft.com/en-us/library/gg491230.aspx
/// </remarks>
/// <param name="error"></param>
private void ParseServerBusyError(ISqlError error)
{
int idx = error.Message.LastIndexOf("Code:");
if (idx < 0)
{
return;
}
string reasonCodeString = error.Message.Substring(idx + "Code:".Length);
int reasonCode;
if (!int.TryParse(reasonCodeString, out reasonCode))
{
return;
}
int opsThrottledInt = (reasonCode & 3);
this.OperationsThrottled = (OperationsThrottled)(Math.Max((int)OperationsThrottled, opsThrottledInt));
int slightResourcesMask = reasonCode >> 8;
int seriousResourcesMask = reasonCode >> 16;
foreach (ResourcesThrottled resourceType in Enum.GetValues(typeof(ResourcesThrottled)))
{
if ((seriousResourcesMask & (int)resourceType) > 0)
{
this.SeriouslyExceededResources |= resourceType;
}
if ((slightResourcesMask & (int)resourceType) > 0)
{
this.SlightlyExceededResources |= resourceType;
}
}
}
}
public interface ISqlError
{
int Number { get; }
string Message { get; }
}
public class SqlErrorWrapper : ISqlError
{
public SqlErrorWrapper(SqlError error)
{
this.Number = error.Number;
this.Message = error.Message;
}
public SqlErrorWrapper()
{
}
public int Number { get; set; }
public string Message { get; set; }
}
/// <summary>
/// Documents some of the ErrorCodes from SQL/SQL Azure.
/// I have not included all possible errors, only the ones I thought useful for modifying runtime behaviors
/// </summary>
/// <remarks>
/// Comments come from: http://social.technet.microsoft.com/wiki/contents/articles/sql-azure-connection-management-in-sql-azure.aspx
/// </remarks>
public enum SqlErrorCode : int
{
/// <summary>
/// We don't recognize the error code returned
/// </summary>
Unknown = 0,
/// <summary>
/// A SQL feature/function used in the query is not supported. You must fix the query before it will work.
/// This is a rollup of many more-specific SQL errors
/// </summary>
FeatureNotSupported = 1,
/// <summary>
/// Probable cause is server maintenance/upgrade. Retry connection immediately.
/// </summary>
TransientServerError = 40197,
/// <summary>
/// The server is throttling one or more resources. Reasons may be available from other properties
/// </summary>
ServerBusy = 40501,
/// <summary>
/// You have reached the per-database cap on worker threads. Investigate long running transactions and reduce server load.
/// http://social.technet.microsoft.com/wiki/contents/articles/1541.windows-azure-sql-database-connection-management.aspx#Throttling_Limits
/// </summary>
DatabaseWorkerThreadThrottling = 10928,
/// <summary>
/// The per-server worker thread cap has been reached. This may be partially due to load from other databases in a shared hosting environment (eg, SQL Azure).
/// You may be able to alleviate the problem by reducing long running transactions.
/// http://social.technet.microsoft.com/wiki/contents/articles/1541.windows-azure-sql-database-connection-management.aspx#Throttling_Limits
/// </summary>
ServerWorkerThreadThrottling = 10929,
ExcessiveMemoryUsage = 40553,
BlockedByFirewall = 40615,
/// <summary>
/// The database has reached the maximum size configured in SQL Azure
/// </summary>
ExceededDatabaseSizeQuota = 40544,
/// <summary>
/// A transaction ran for too long. This timeout seems to be 24 hours.
/// </summary>
/// <remarks>
/// 24 hour limit taken from http://social.technet.microsoft.com/wiki/contents/articles/sql-azure-connection-management-in-sql-azure.aspx
/// </remarks>
TransactionRanTooLong = 40549,
TooManyLocks = 40550,
ExcessiveTempDBUsage = 40551,
ExcessiveTransactionLogUsage = 40552,
DatabaseUnavailable = 40613,
CannotOpenServer = 40532,
/// <summary>
/// SQL Azure databases can have at most 128 firewall rules defined
/// </summary>
TooManyFirewallRules = 40611,
/// <summary>
/// Theoretically means the DB doesn't support encryption. However, this can be indicated incorrectly due to an error in the client library.
/// Therefore, even though this seems like an error that won't fix itself, it's actually a retryable error.
/// </summary>
/// <remarks>
/// http://social.msdn.microsoft.com/Forums/en/ssdsgetstarted/thread/e7cbe094-5b55-4b4a-8975-162d899f1d52
/// </remarks>
EncryptionNotSupported = 20,
/// <summary>
/// User failed to connect to the database. This is probably not recoverable.
/// </summary>
/// <remarks>
/// Some good info on more-specific debugging: http://blogs.msdn.com/b/sql_protocols/archive/2006/02/21/536201.aspx
/// </remarks>
LoginFailed = 18456,
/// <summary>
/// Failed to connect to the database. Could be due to configuration issues, network issues, bad login... hard to tell
/// </summary>
ConnectionFailed = 4060,
/// <summary>
/// Client tried to call a stored procedure that doesn't exist
/// </summary>
StoredProcedureNotFound = 2812,
/// <summary>
/// The data supplied is too large for the column
/// </summary>
StringOrBinaryDataWouldBeTruncated = 8152
}
参考此文档:所有自定义实现处理大多数发生的问题。
// Define the retry logic parameters
var options = new SqlRetryLogicOption()
{
// Tries 5 times before throwing an exception
NumberOfTries = 5,
// Preferred gap time to delay before retry
DeltaTime = TimeSpan.FromSeconds(1),
// Maximum gap time for each delay time before retry
MaxTimeInterval = TimeSpan.FromSeconds(20),
// SqlException retriable error numbers
TransientErrors = new int[] { 4060, 1024, 1025}
};
// Create a custom retry logic provider
SqlRetryLogicBaseProvider provider = CustomRetry.CreateCustomProvider(options);
// Assumes that connection is a valid SqlConnection object
// Set the retry logic provider on the connection instance
connection.RetryLogicProvider = provider;
// Establishing the connection will trigger retry if one of the given transient failure occurs.
connection.Open();
文档女士。