Bugs in the NT DDK Packet Protocol Driver Sample

Programmers wishing to write NDIS protocol drivers for Windows NT have long used the Packet sample in the NT DDK as a learning tool. Some programmers have used the Packet sample as an armature to build their drivers around, adding code but leaving the structure intact. Unfortunately, the sample has bugs. This page attempts to collect warnings about the dangers lurking in the Packet sample. This information in this page is offered in the spirit of cooperation among programmers. The author hopes that he will be contacted by those who find additional bugs in the Packet sample or wish to correct inaccuracies in this page.

The latest news: This page may be obsolete

Eliyas Yakub of Microsoft has reworked the NT 4.0 Packet sample and he believes he has addressed all the bugs reported on this page. You can download the updated sample from MS Support.

Eliyas reports on his work:

I think I took care of all the bugs. I took out the cleanup routine because it was completely wrong. In the first place, it's not a good idea to reset an adapter just because an application doesn't want to use the adapter any more. Resets are expensive operation. An adapter is not a resource of one application or protocol in the system. Secondly, there was a race condition between the protocol and miniport during freeing of packet. There is no reliable way to do this in the existing architecture. There are plans to introduce new APIs to cancel a pending Recv.
I also don't like the idea of using FsContext to save the BindApapterContext. This is not generally recommended. I modified it in the W2K version but I left it as is on NT4.0.

I would like to hear about anyone's experience with the new version of the sample.

What isn't a bug

We won't list as bugs the design choices in the Packet sample that could have been done better and probably would have been done better in a production driver unless those choices harm the robustness or throughput of the driver. Under this heading come:

ways in which the application could have been better isolated from the hardware;
debatable issues of whether buffering should be done by driver or application;
trivial errors in the documentation.

The bugs themselves

Failure to complete an I/O: When a dispatch routine in the Packet sample detects a problem that will make the I/O impossible it tends to return an error status without completing the I/O, i.e. without calling IoCompleteRequest(). This happens in PacketOpen(), PacketRead(), PacketWrite() and PacketIoControl().
Misuse of stack for asynchronous operations: When a call is made to the NDIS wrapper which may not be completed before the calling function returns, it's extremely dangerous to ask the wrapper function to write into the calling function's stack. But that's what PacketOpen() does with its stack variable Medium when calling NdisOpenAdapter().
Marking an IRP pending twice: In the Cleanup dispatch routing PacketCleanup() two calls are made to IoMarkIrpPending() against the Cleanup IRP FlushIrp.
Macro misuse: In PacketCleanup() an attempt is made to go from a receive packet's list entry in the packet's ProtocolReserved to the packet itself with the use of one invocation of CONTAINING_RECORD. This wrongly telescopes the 2 enclosing levels into one, which happens to work just because the ListElement entry in PACKET_RESERVED comes first and hence has the same address as PacketReserved. If the order of structure members in PACKET_RESERVED were changed, the driver would blow up if there were any pending reads at cleanup time. The correct way to achieve this is to invoke CONTAINING_RECORD twice: the first time to get a pointer to the reserved area of the packet and the second time to get a pointer to the packet itself. The correct way can be seen in PacketReceiveIndicate(), by the way. (Thanks to Alf Ludwig for clarifications on this item.)
Inability to query OIDs: In PacketIoControl() there's a typo in which IOCTL_PROTOCOL_SET_OID is tested for twice, the test results being ORed together; one of the tests should have been for IOCTL_PROTOCOL_QUERY_OID. Because of this the driver will never ask the netcard driver for the value of an OID. (Thanks to Victor Ishikeev for noticing this bug.)
Wrong return code in ProtocolReceive: When PacketReceiveIndicate has no use for a packet it returns NDIS_STATUS_SUCCESS when it should return NDIS_STATUS_NOT_ACCEPTED. According to the chapter on NDIS Driver Lower-Edge Functions in the DDK's NetworkReference, this will slow down overall network throughput. Presumably this would be especially serious in a multi-protocol setting.
Calling app is always told WriteFile() wrote zero bytes: Due to a bug in PacketSendComplete() noticed by Skip Hanson, Irp->IoStatus.Information is zeroed. Skip recommends NdisQueryPacket(pPacket, NULL, NULL, NULL, &(pIrp->IoStatus.Information)) to fix this. In my NT protocol driver I use MmGetMdlByteCount() to show the calling app the number of bytes sent by the app (as opposed to the grand total including the framing added by the driver.) You can see this in BgpsSendCompleteHandler() in ntbgps.c in the billgPC zipfile.
No IRPs are canceled: No provision is made for cancellation of IRPs (as opposed to the cleanup dispatch routine.) This means that a thread that issues a CloseHandle(), or terminates, or whose process terminates, while connected to the driver with a read outstanding is likely to hang.
But why is this so? This is a point that isn't covered adequately by what might be called the standard sources (the DDK's documentation and Art Baker's The Windows NT Device Driver Book), so we'll try to be thorough here. In what follows, we are indebted to Jamie Hanrahan's postings in comp.os.ms-windows.programmer.nt.kernel-mode as well as the article on IRP cancel operations in the November-December 1997 issue of The NT Insider. (Responsibility for any mistakes lies of course with this document's author.)
When all goes well, a driver's cleanup routine will be entered when a process's connection to the device's file object is severed. There are three ways to begin the process of severing the process's connection to the file object:
- CloseHandle() is issued against the last or only handle to the file object;
- The thread owning the last or only handle terminates;
- The process that created the thread owning the last or only handle terminates.
For the Packet sample's cleanup processing to be executed, it is not enough to begin the process of disconnecting the process from the file object. The cleanup routine will only be entered if all references to the corresponding file object have been eliminated. An IRP created by an I/O request that has not been completed has just such a reference. And the only way NT provides for a driver to clear a thread's references when Win32 tries to disconnect the file object is through IRP cancel routines.
The likely consequences of an application severing its connection to the Packet sample driver with an outstanding I/O (probably a read) include the following:
- the application hanging for five minutes while the I/O Manager waits for the application's references to the file object to be cleared;
- system error processing, possibly including a popup message, when the I/O Manager stops waiting;
- the loss of system memory which the I/O Manager believes cannot safely be recycled, including outstanding IRPs and data buffers.
Rick Rigby was kind enough to contribute a solution to the I/O cancel bug. You can find his code, together with some commentary warning of a danger in it, here.
Another solution to the cancel problem which complements Rigby's but has its own flaws, is in the NT driver for billgPC, the free bootp client by this page's author. One flaw is that the use of the global cancel spinlock where a private spinlock could improve overall system performance. (The normal lifespan of the driver, and the application it supports, is only a few seconds.) The second flaw is a matter of debate. In the first of the two illuminating articles on I/O cancel operations in OSR's The NT Insider the reader is strongly advised to ignore the cancel routine's IRP pointer argument and search the driver's own queue for the first IRP with a cancel flag set. This, the article claims, is the one to dequeue and cancel; the reasons for this are promised in the second installment. Unfortunately, the second part of the article lacks an explanation for this algorithm, and billgPC's driver does heed the IRP pointer argument. To see billgPC's implementation, search for BgpsCancelReadIrp in ntbgps.c in the billgPC zipfile.

Please contribute to this page!

There's no point in trying to convince you that reading this page is good for you; presumably you can decide that for yourself. But if you know something that would improve this page there are reasons for contributing to it that you may not have though of:

Exposing a bit of your knowledge here might provoke one of your colleagues into publishing an insight that's new and valuable to you.
If you're worried about helping your competitors you might want to consider whether this is a realistic fear in the realm of kernel-mode protocol drivers; how many people do this kind of thing, really?
The possibility exists that this page might influence Microsoft to improve the Packet sample. In the first week after this page was published it was downloaded to IP addresses within microsoft.com seven times.

<perin@acm.org>: home page / email

Last modified: Tue Sep 07 17:35:01 1999