WCF应用程序挂起(附加WinDBG输出)
本文关键字:WinDBG 输出 附加 应用程序 挂起 WCF | 更新日期: 2023-09-27 18:28:24
我们的WCF服务有一个问题,无法重现。该服务有时不响应客户的呼叫。这种情况通常发生在星期一,经过一段时间的不活动之后。
WCF服务自托管在windows服务中。实例上下文是每个调用的。它使用NetTcpBinding
而没有安全性,整个WCF配置都是在代码中完成的,没有XML配置。我们已将会话、调用和实例的ServiceThrottle
参数设置为1024。以下是完整的ServiceHost
配置:
ServiceThrottlingBehavior throttle;
throttle = _svcHost.Description.Behaviors.Find<ServiceThrottlingBehavior>();
if (throttle == null)
{
throttle = new ServiceThrottlingBehavior();
throttle.MaxConcurrentCalls = 1024;
throttle.MaxConcurrentSessions = 1024;
throttle.MaxConcurrentInstances = 1024;
_svcHost.Description.Behaviors.Add(throttle);
}
...
TimeSpan timeout = new TimeSpan(0, 0, 5);
NetTcpBinding binding = new NetTcpBinding(SecurityMode.None);
binding.OpenTimeout = timeout;
binding.CloseTimeout = timeout;
binding.ReceiveTimeout = timeout;
binding.SendTimeout = timeout;
binding.MaxBufferSize = 10485760;
binding.MaxReceivedMessageSize = 10485760;
XmlDictionaryReaderQuotas quotas = new XmlDictionaryReaderQuotas();
binding.ReaderQuotas = quotas;
binding.ReaderQuotas.MaxStringContentLength = 10485760;
binding.ReaderQuotas.MaxArrayLength = 10000;
binding.Security.Message.ClientCredentialType = MessageCredentialType.None;
...
ServiceEndpoint endpoint = _svcHost.AddServiceEndpoint(intfType, binding, serviceBaseAddress + "/" + intfType.Name);
endpoint.Behaviors.Add(new ClientTrackerEndpointBehavior());
问题显示为向客户端抛出异常。有5到10个客户端连接到该服务,每个客户端都会抛出这种类型的异常(即使是在与服务本身相同的机器上运行的客户端):
System.ServiceModel.CommunicationException
The socket connection was aborted. This could be caused by an error processing your message or a receive timeout being exceeded by the remote host, or an underlying network resource issue. Local socket timeout was '00:00:29.9969997'.
An existing connection was forcibly closed by the remote host
StackTrace:
at System.Net.Sockets.Socket.Send(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags)
at System.ServiceModel.Channels.SocketConnection.Write(Byte[] buffer, Int32 offset, Int32 size, Boolean immediate, TimeSpan timeout)
抛出异常后,我们尝试使用telnet
手动连接到服务,但似乎也拒绝了此连接尝试。由于本周一我们在生产系统上遇到了3个客户的问题,我们无法将Visual Studio调试器连接到这些系统,因此我们使用WinDBG创建了用户迷你转储来分析问题。我们检查的第一件事是ServiceThrottle
的当前值(这里只有一个转储,但输出与其他转储生成的输出等效):
0:032> !dumpheap -type ServiceThrottle -short
01fb65a4
0:032> !do 01fb65a4
Name: System.ServiceModel.Dispatcher.ServiceThrottle
MethodTable: 70cc56f0
EEClass: 70a07ce4
Size: 40(0x28) bytes
File: C:'Windows'Microsoft.Net'assembly'GAC_MSIL'System.ServiceModel'v4.0_4.0.0.0__b77a5c561934e089'System.ServiceModel.dll
Fields:
MT Field Offset Type VT Attr Value Name
70cc572c 400314f 4 ...cher.FlowThrottle 0 instance 01fb6660 calls
70cc572c 4003150 8 ...cher.FlowThrottle 0 instance 01fb6760 sessions
71725030 4003151 c ...her.QuotaThrottle 0 instance 00000000 dynamic
70cc572c 4003152 10 ...cher.FlowThrottle 0 instance 02013110 instanceContexts
70cb653c 4003153 14 ...l.ServiceHostBase 0 instance 01fadd84 host
70cc7cf8 4003154 18 ...manceCountersBase 0 instance 02003590 servicePerformanceCounters
74246788 4003155 20 System.Boolean 1 instance 1 isActive
7423f744 4003156 1c System.Object 0 instance 01fb65cc thisLock
74242ad4 400314d 1134 System.Int32 1 static 128 DefaultMaxConcurrentCallsCpuCount
74242ad4 400314e 1138 System.Int32 1 static 800 DefaultMaxConcurrentSessionsCpuCount
0:032> !do 01fb6660
Name: System.ServiceModel.Dispatcher.FlowThrottle
MethodTable: 70cc572c
EEClass: 70a07d24
Size: 52(0x34) bytes
File: C:'Windows'Microsoft.Net'assembly'GAC_MSIL'System.ServiceModel'v4.0_4.0.0.0__b77a5c561934e089'System.ServiceModel.dll
Fields:
MT Field Offset Type VT Attr Value Name
74242ad4 4002ede 20 System.Int32 1 instance 1024 capacity
74242ad4 4002edf 24 System.Int32 1 instance 0 count
74246788 4002ee0 2c System.Boolean 1 instance 0 warningIssued
74242ad4 4002ee1 28 System.Int32 1 instance 89 warningRestoreLimit
7423f744 4002ee2 4 System.Object 0 instance 01fb6694 mutex
74232914 4002ee3 8 ...ding.WaitCallback 0 instance 01fb6640 release
00000000 4002ee4 c 0 instance 01fb66a0 waiters
7423fb08 4002ee5 10 System.String 0 instance 01fb65d8 propertyName
7423fb08 4002ee6 14 System.String 0 instance 01fb660c configName
74230f78 4002ee7 18 System.Action 0 instance 02007670 acquired
74230f78 4002ee8 1c System.Action 0 instance 02007690 released
0:032> !do 01fb6760
Name: System.ServiceModel.Dispatcher.FlowThrottle
MethodTable: 70cc572c
EEClass: 70a07d24
Size: 52(0x34) bytes
File: C:'Windows'Microsoft.Net'assembly'GAC_MSIL'System.ServiceModel'v4.0_4.0.0.0__b77a5c561934e089'System.ServiceModel.dll
Fields:
MT Field Offset Type VT Attr Value Name
74242ad4 4002ede 20 System.Int32 1 instance 1024 capacity
74242ad4 4002edf 24 System.Int32 1 instance 0 count
74246788 4002ee0 2c System.Boolean 1 instance 0 warningIssued
74242ad4 4002ee1 28 System.Int32 1 instance 560 warningRestoreLimit
7423f744 4002ee2 4 System.Object 0 instance 01fb6794 mutex
74232914 4002ee3 8 ...ding.WaitCallback 0 instance 01fb6740 release
00000000 4002ee4 c 0 instance 01fb67a0 waiters
7423fb08 4002ee5 10 System.String 0 instance 01fb66d0 propertyName
7423fb08 4002ee6 14 System.String 0 instance 01fb6708 configName
74230f78 4002ee7 18 System.Action 0 instance 020076b0 acquired
74230f78 4002ee8 1c System.Action 0 instance 020076d0 released
所有这些价值观似乎都很好。然后我们检查了线程池和线程:
0:032> !threadpool
CPU utilization: 0%
Worker Thread: Total: 1023 Running: 1017 Idle: 6 MaxLimit: 1023 MinLimit: 1000
Work Request in Queue: 0
--------------------------------------
Number of Timers: 4
--------------------------------------
Completion Port Thread:Total: 32 Free: 0 MaxFree: 16 CurrentLimit: 33 MaxLimit: 1000 MinLimit: 1000
0:032> !threads -special
ThreadCount: 1027
UnstartedThread: 997
BackgroundThread: 28
PendingThread: 997
DeadThread: 1
Hosted Runtime: no
PreEmptive GC Alloc Lock
ID OSID ThreadOBJ State GC Context Domain Count APT Exception
0 1 1d80 006b6518 a020 Enabled 00000000:00000000 006ac0b0 0 MTA
2 2 238c 006c1840 b220 Enabled 00000000:00000000 006ac0b0 0 MTA (Finalizer)
XXXX 4 00704f58 1019820 Enabled 00000000:00000000 006ac0b0 0 Ukn (Threadpool Worker)
4 5 21a4 00706480 3009220 Enabled 0bedaf78:0bedb5c8 006ac0b0 0 MTA (Threadpool Worker)
5 6 e8c 03de9428 100a220 Enabled 00000000:00000000 006ac0b0 0 MTA (Threadpool Worker)
7 7 634 03e0d318 3009220 Enabled 00000000:00000000 006ac0b0 0 MTA (Threadpool Worker)
8 8 1d38 03ebeb08 3009220 Enabled 00000000:00000000 006ac0b0 0 MTA (Threadpool Worker)
9 9 1808 03e4fd70 3009220 Enabled 00000000:00000000 006ac0b0 0 MTA (Threadpool Worker)
10 a 1c48 03e50d70 3009220 Enabled 00000000:00000000 006ac0b0 0 MTA (Threadpool Worker)
11 b 1d88 073be2b0 3009220 Enabled 00000000:00000000 006ac0b0 0 MTA (Threadpool Worker)
12 c 1c74 073bf2b8 3009220 Enabled 00000000:00000000 006ac0b0 0 MTA (Threadpool Worker)
13 d 1ae4 073c0dc8 3009220 Enabled 00000000:00000000 006ac0b0 0 MTA (Threadpool Worker)
14 e 1818 073c1598 3009220 Enabled 00000000:00000000 006ac0b0 0 MTA (Threadpool Worker)
15 f 1a58 073c1fa8 3009220 Enabled 00000000:00000000 006ac0b0 0 MTA (Threadpool Worker)
16 10 13e0 073c4bb8 3009220 Enabled 00000000:00000000 006ac0b0 0 MTA (Threadpool Worker)
17 11 1a3c 073c5388 3009220 Enabled 00000000:00000000 006ac0b0 0 MTA (Threadpool Worker)
18 12 1b5c 03e9ffe0 3009220 Enabled 00000000:00000000 006ac0b0 0 MTA (Threadpool Worker)
19 13 1b80 03ea04e8 3009220 Enabled 00000000:00000000 006ac0b0 0 MTA (Threadpool Worker)
20 14 900 03ea09f0 3009220 Enabled 00000000:00000000 006ac0b0 0 MTA (Threadpool Worker)
21 15 1c84 03ea0ef8 3009220 Enabled 00000000:00000000 006ac0b0 0 MTA (Threadpool Worker)
22 16 da0 03ea1400 3009220 Enabled 00000000:00000000 006ac0b0 0 MTA (Threadpool Worker)
23 17 13b0 03ea1908 3009220 Enabled 00000000:00000000 006ac0b0 0 MTA (Threadpool Worker)
24 18 18cc 03ea1e10 3009220 Enabled 00000000:00000000 006ac0b0 0 MTA (Threadpool Worker)
25 1a 1008 03ea2820 3009220 Enabled 0bfe03b8:0bfe21c4 006ac0b0 0 MTA (Threadpool Worker)
27 29 bc4 0b041588 1009220 Enabled 0bdd7c3c:0bdd99f8 006ac0b0 0 MTA (Threadpool Worker)
28 3b ad8 0af7d740 1009220 Enabled 0bf8c0d8:0bf8c1c4 006ac0b0 0 MTA (Threadpool Worker)
29 64 dd8 0ae54890 1009220 Enabled 0bdfbb9c:0bdfd9f8 006ac0b0 0 MTA (Threadpool Worker)
30 2f 440 0b03c010 1009220 Enabled 0be03bf0:0be059f8 006ac0b0 0 MTA (Threadpool Worker)
31 25 2198 0b03f080 1009220 Enabled 0bd6d5b4:0bd6f410 006ac0b0 0 MTA (Threadpool Worker)
32 63 1b9c 0ae41388 1009220 Enabled 0bdb5b14:0bdb79f8 006ac0b0 0 MTA (Threadpool Worker)
XXXX 33 2270 0b09cd20 1400 Enabled 00000000:00000000 006ac0b0 0 Ukn
XXXX 5b 1554 0ae54388 1400 Enabled 00000000:00000000 006ac0b0 0 MTA
XXXX 31 1098 0ae53978 1400 Enabled 00000000:00000000 006ac0b0 0 Ukn
XXXX 34 15c 0af7be18 1400 Enabled 00000000:00000000 006ac0b0 0 Ukn
... -> lots of more threads here
XXXX 403 24e4 0d85b578 1400 Enabled 00000000:00000000 006ac0b0 0 Ukn
XXXX 404 24d8 0d85ba80 1400 Enabled 00000000:00000000 006ac0b0 0 Ukn
XXXX 405 24e0 0d85bf88 1400 Enabled 00000000:00000000 006ac0b0 0 Ukn
OSID Special thread type
1 a88 DbgHelper
2 238c Finalizer
4 21a4 ThreadpoolWorker
5 e8c Timer
7 634 ThreadpoolWorker
8 1d38 ThreadpoolWorker
9 1808 ThreadpoolWorker
10 1c48 ThreadpoolWorker
11 1d88 ThreadpoolWorker
12 1c74 ThreadpoolWorker
13 1ae4 ThreadpoolWorker
14 1818 ThreadpoolWorker
15 1a58 ThreadpoolWorker
16 13e0 ThreadpoolWorker
17 1a3c ThreadpoolWorker
18 1b5c ThreadpoolWorker
19 1b80 ThreadpoolWorker
20 900 ThreadpoolWorker
21 1c84 ThreadpoolWorker
22 da0 ThreadpoolWorker
23 13b0 ThreadpoolWorker
24 18cc ThreadpoolWorker
25 1008 ThreadpoolWorker
26 1b08 Gate
27 bc4 ThreadpoolWorker
28 ad8 ThreadpoolWorker
29 dd8 ThreadpoolWorker
30 440 ThreadpoolWorker
31 2198 ThreadpoolWorker
32 1b9c ThreadpoolWorker
We were shocked when we saw this amount of threads. Now we suspect that our problem might have something to do with the high number of threads. So if anybody can answer the following questions that would be much appreciated:
- Could our assumption be correct? Or are we on the wrong track investigating the threads?
- The !threadpool command outputs a very high number of running threads (1027). Looking at the output from !threads, it seems that only 28 threads are working. How can these differences be explained?
- We have a very high number of unstarted / pending threads. What is the difference between an unstarted and a pending thread? We tried to reproduce this behavior, but even with setting the minimum and maximum threads in the threadpool we don't get these high numbers. Even more, investigating a dump created after 2 hours of process uptime, no unstarted or pending threads are found (service was still working at this time). The process uptime of the orginal dump was about 14 days.
- What does the completion port thread free value of 0 mean?
- Are there any other methods / commands in WinDBG we could use to better understand our problem?
We are very unhappy with the current state of our software, and looking for information regarding this topic most often says it is a unclosed session / concurrent calls issue, but as stated before this doesn't seem to be our problem. Any help is very much appreciated!
We experienced a similar issue with a self-hosted WCF interface which provided a synchronous request/response web service for an asynchronous (2 one way service calls) backend request. Early in our testing, we noticed that after a somewhat variable number of days, our service became unresponsive to new requests. After some research, we discovered that whenever the backend service (out of our control) did not send a response, we continued to wait indefinitely and as such we kept our client connection open.
We fixed the issue by providing a “time-to-wait” configuration value so we were sure to respond to the client and close the connection. We used something like the following …
Task processTask = Task.Factory.StartNew(() => Process(message));
bool isProcessSuccess = processTask.Wait(shared.ConfigReader.SyncWebServiceWaitTime);
if (!isProcessSuccess)
{
//handle error …
}
以下链接提供了有关WCF服务性能计数器的信息,可能有助于进一步确定调用是否按预期关闭。http://blogs.microsoft.co.il/blogs/idof/archive/2011/08/11/wcf-scaling-check-your-counters.aspx