一:背景
1. 讲故事
微信好友里有位朋友找到我,说他部署在windows上的程序,用debug模式正常,但用 release 模式跑程序就崩溃,如果把程序切到 .NET6 的话又都正常,所以很迷茫,让我看看怎么回事,哈哈,这种问题直接抓dump分析就好了。
二:崩溃分析
1. 为什么会崩溃
分析过崩溃程序的朋友应该知道,不管是托管还是非托管崩溃,先用 !analyze -v
命令开路,简化输出如下:
0:000> !analyze -v
*******************************************************************************
* *
* Exception Analysis *
* *
*******************************************************************************CONTEXT: (.ecxr)
rax=0000000000000004 rbx=000001e34b283ec0 rcx=0000000000000228
rdx=0000000000000000 rsi=000001e34ac2f4e0 rdi=000001e34ab58e70
rip=00007ff95ac53659 rsp=0000007735d7e1c0 rbp=0000007735d7e1e0r8=0000000000000000 r9=000001e3464ba1c0 r10=0000000000000228
r11=0000000000000228 r12=0000000000000000 r13=000001e34880eae8
r14=000001e34ab58e70 r15=0000000000000008
iopl=0 nv up di pl nz na pe nc
cs=0000 ss=0000 ds=0000 es=0000 fs=0000 gs=0000 efl=00000000
System_Private_CoreLib!System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw+0x39:
00007ff9`5ac53659 cc int 3
Resetting default scopeEXCEPTION_RECORD: (.exr -1)
ExceptionAddress: 00007ff95ac53659 (System_Private_CoreLib!System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw+0x0000000000000039)ExceptionCode: e0434f4d (CLR exception)ExceptionFlags: 00000000
NumberParameters: 0
...
从卦中的 ExceptionCode: e0434f4d (CLR exception)
来看,这是一个经典的托管异常,既然是托管异常,这个问题就比较简单了,使用 !t
找下到底是哪一个托管线程抛的,输出如下:
0:000> !t
ThreadCount: 15
UnstartedThread: 0
BackgroundThread: 11
PendingThread: 0
DeadThread: 3
Hosted Runtime: noLock DBG ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception0 1 81d8 000001E3464BA1C0 a6028 Preemptive 000001E34ABA2340:000001E34ABA3D30 000001e347fc40b0 -00001 STA Prism.Ioc.ContainerResolutionException 000001e34a9866086 2 448c 000001E34803A440 2b228 Preemptive 000001E34A876980:000001E34A8784B0 000001e347fc40b0 -00001 MTA (Finalizer) ...
从卦中的 Prism.Ioc.ContainerResolutionException
来看,貌似是和 Prism 有关,接下来可以用 !pe
命令观察调用栈详情。
0:000> !pe
Exception object: 000001e34a986608
Exception type: Prism.Ioc.ContainerResolutionException
Message: An unexpected error occurred while resolving 'xxx.Views.LoginWindow'
InnerException: Unity.ResolutionFailedException, Use !PrintException 000001E34A986228 to see more.
StackTrace (generated):SP IP Function0000007735D668E0 00007FF95A64DEC8 Prism_Unity_Wpf!Prism.Unity.UnityContainerExtension.Resolve(System.Type, System.ValueTuple`2<System.Type,System.Object>[])+0x2a80000007735D7DC60 00007FF95A64DBFD Prism_Unity_Wpf!Prism.Unity.UnityContainerExtension.Resolve(System.Type)+0x3d0000007735D7DCA0 00007FF95A64DB88 Prism!Prism.Ioc.IContainerProviderExtensions.Resolve[[System.__Canon, System.Private.CoreLib]](Prism.Ioc.IContainerProvider)+0x480000007735D7DCF0 00007FF95A956742 xxx!xxx.App.InitializeShell(System.Windows.Window)+0x420000007735D7DD40 00007FF959B21148 Prism_Wpf!Prism.PrismApplicationBase.Initialize()+0x2080000007735D7DDA0 00007FF959B20F17 xxx!xxx.App.<>n__0()+0x17....
从卦象来看,这不是最原始的异常,言外之意就是下面还有子异常,也只有找到最里层的异常才能发现灾难的祸根,经过一层层的下钻,最后找到了最原始的异常,参考如下:
0:000> !PrintException /d 000001E34A97E940
Exception object: 000001e34a97e940
Exception type: System.PlatformNotSupportedException
Message: System.IO.Ports is currently only supported on Windows.
InnerException: <none>
StackTrace (generated):SP IP Function0000007735D7B580 00007FF95A9588E7 System_IO_Ports!System.IO.Ports.SerialPort.GetPortNames()+0x470000007735D7B5C0 00007FF95A958859 xxx!xxx.ViewModels.LoginWindowViewModel.RefreshComs()+0x190000007735D7B600 00007FF95A957FBC xxx!xxx.ViewModels.LoginWindowViewModel..ctor()+0x14c0000007735D7B9D0 0000000000000000 System_Private_CoreLib!System.RuntimeMethodHandle.InvokeMethod(System.Object, Void**, System.Signature, Boolean)+0x46a770b00000007735D7B9D0 00007FF9B8C03106 System_Private_CoreLib!System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(System.Object, System.Reflection.BindingFlags)+0x36StackTraceString: <none>
HResult: 80131539
从卦中来看是 GetPortNames()
方法抛出来的平台不支持异常,这就很迷惑了。
2. 为什么会平台不支持
了解 PlatformNotSupportedException 异常,只能寻找相关的源代码了,通过dnspy截图如下:
从卦中来看这是一个空方法,接下来拿这个异常在网上找下资料,看样子是这位朋友需要升级或者降级 system.io.ports 的版本,截图如下:
完整链接:https://learn.microsoft.com/en-us/answers/questions/1621393/system-io-ports-only-availble-on-windows-but-im-us
本来是很兴奋的,以为是类似多线程操控非 volatile 变量导致的debug和release行为不一致呢,结果是这玩意,害!
三:总结
本次故障相对比较简单,对我们这些老手来说简直是 1+1,但我们何尝不是从新手练过来的,所以本篇是初学者很好的一个练手素材。
