use strict; =head1 NAME dlldoc.pl - Checking to see which Microsoft DLL entrypoints are undocumented, and which of the undocumented entrypoints are imported by EXEs. =head1 SYNOPSIS dlldoc.pl [options] sends undocumented entrypoint report to STDOUT dlldoc.pl [options] exe-files sends undocumented import report to STDOUT The output of the first form is used as the -u option for the second form. Options: -v Just print the version number of the program -B Borland utility for dumping (one of B or M is mandatory) -M Microsoft utility for dumping (one of B or M is mandatory) -u undocumented entrypoints report file (mandatory for EXE import report) -d directory for dump utility if not in path -h comma-separated list of top level header directories (mandatory for undocumented entrypoint report) covering both SDK and DDK =head1 REQUIRES Win32::API, Getopt::Long, Pod::Usage, IO::File, File::Find, Fatal, Env =head1 DESCRIPTION B is a means for measuring the extent to which certain executables that run on Win32 operating systems import undocumented entrypoints in DLLs belonging to the OS. To be candid, it is motivated by questions about whether applications written by Microsoft use OS facilities unknown to programmers who use publicly available documentation. Running the program to generate the undocumented entrypoint report (see L<"SYNOPSIS">) means surveying the program development environment on a machine and comparing it to the DLLs that are part of the OS on that machine. Running the program to generate the undocumented import report means surveying specified application executables (EXEs and DLLs) on a machine (not necessarily the same machine as the one where the entrypoint report was run) and comparing the executables with the output of the export report. For the purpose of this program a function is documented if it can be found in a Microsoft header file, which is a much weaker standard than e.g. inclusion in help files. For the entrypoint output to be interesting, it must come from a machine with both SDK and DDK header files that are compatible with the OS DLLs on the same machine. For the import report to be interesting, the executables it covers must not be significantly newer than the development environment; otherwise what the report finds may be due to the documentation having improved in the meantime. It is also pointless to use this program to examine the imports of executables that are really parts of the OS. (Of course, where to draw that line is a matter of debate.) =head1 BUGS The fact that this program does not find an executable using a given undocumented entrypoint is not I that the executable does not use the entrypoint. This is because it is always possible for an executable to bind to the entrypoint on the fly using C. This program has not been tested with any versions of the Microsoft utility dumpbin more recent than VC++ 4.2. =head1 AUTHOR, LICENSE Copyright (c) Lew Perin Eperin@acm.orgE. Released under the GNU Public License. =cut use Win32::API; use Getopt::Long; use Pod::Usage; use Env qw(SystemRoot OS); use IO::File; use File::Find; use Fatal qw(IO::File::new Win32::API::new); my %opts; # command-line options my %dump; # aspects of dumping DLLs and EXEs my (%documentedFunctions, %undocumentedFunctions); my ($GetFileVersionInfoSize, $GetFileVersionInfo, $VerQueryValue); # MS APIs my $version = '1.0'; &handleArgs; eval { if (scalar @ARGV) { # Remaining args are exes to check &readUndocumentedFunctions; for (map { glob } @ARGV) { &checkExe($_); } } else { my $dllDir = ($OS eq 'Windows_NT') ? "$SystemRoot\\system32" : "$SystemRoot\\system"; &connectToAPI; find(sub { checkHeaderFile($File::Find::name) if -f and /\.h$/i }, split(',', $opts{h})); find(sub { checkDll($File::Find::name) if -f and /\.dll$/i }, $dllDir); } }; print STDERR "Aborting program:\n$@" if $@; exit 0; # Done! =head1 checkDll We want to avoid false positives; there are certainly enough true positives. So we discount sightings of undocumented functions that arise merely because of the ASCII/Unicode suffix (A/W.) What makes things a bit more complex is the MS habit of doubling the entry points (at least in ntdll.dll) by calling the same function NtXxx and ZwXxx. This makes it necessary to cache the candidate undocumented function names in C<%candidates> rather than just putting them out on the spot. =cut sub checkDll { my $dll = shift; $dll =~ tr{/}{\\}; return unless &isOsDll($dll); my %candidates = (); LINE: for (`$dump{cmd} $dump{export}->{option} $dll`) { if (/$dump{export}->{pattern}/) { my ($ordinal, $name) = ($1, $2); for (qw(DllEntryPoint DllMain LibMain WEP)) { next LINE if $name eq $_ } unless (defined $documentedFunctions{$name}) { if ($name =~ /^(.*)[AW]$/) { next if defined $documentedFunctions{$1}; } else { next if (defined $documentedFunctions{$name . 'A'} || defined $documentedFunctions{$name . 'W'}); } if ($name =~ /^Nt(.+)$/) { next if defined $documentedFunctions{'Zw' . $1}; } elsif ($name =~ /^Zw(.+)$/) { next if defined $documentedFunctions{'Nt' . $1}; } $candidates{$name} = sprintf('%d', $ordinal); # trim leading zeroes } } } my $nwinners = 0; for (sort keys %candidates) { unless ((/^Nt(.+)$/ && defined $candidates{'Zw' . $1}) || (/^Zw(.+)$/ && defined $candidates{'Nt' . $1})) { print "$dll\n" unless $nwinners++; print "\t$_\t$candidates{$_}\n" ; } } } =head2 isOsDll We do not want to belabor Microsoft with the lack of documentation of functions in DLLs MS did not designate as part of the operating system, so we look here at the version information in the DLL and report accordingly. =cut sub isOsDll { my $dllName = shift; return 0 if lc($dllName) =~ /psxdll.dll$/; # MS need not document POSIX! my $cdllName = pack("a" . length $dllName, $dllName); my $cversionValueLen = pack('L', 0); my $versionInfoLen = $GetFileVersionInfoSize->Call($cdllName, pack('L', 0)); my $cversionInfo = ' ' x ($versionInfoLen + 1); if ($GetFileVersionInfo->Call($cdllName, 0, $versionInfoLen, $cversionInfo)) { my $cpversionValue = pack('L', 0); if ($VerQueryValue->Call($cversionInfo, pack('a*', "\\VarFileInfo\\Translation"), $cpversionValue, $cversionValueLen) && (unpack('L', $cversionValueLen) >= 4)) { my $langAndCharset = sprintf("%04x%04x", unpack('SS', unpack('p', $cpversionValue))); if ($VerQueryValue->Call($cversionInfo, pack('a*', "\\StringFileInfo\\$langAndCharset" . "\\ProductName"), $cpversionValue, $cversionValueLen)) { my $unicodeProdName = unpack("a" . unpack('L', $cversionValueLen), unpack('p', $cpversionValue)); return 1 if sprintf("%s", $unicodeProdName) =~ /Microsoft\(R\) Windows/; } } } 0; } =head1 checkHeaderFile We do not want to look at every last file that comes with a C++ compiler. We really just want to look at the DKs: DDK, SDK. Unfortunately that will cause lots of false positives for undocumented functions, as there are functions in MS Win32 OS DLLs that mimic standard C library functions, e.g. C, C. (Maybe these would not be I positives but they would sure be I positives!) So we fudge when it comes to the variable C<$msHdr>, the switch that should really mean "this header file is from Microsoft". We separate each line into code and comments so we can check the latter for the Microsoft signature, but we recombine the code and comments before we search for the pattern that signifies function documentation, for sometimes the documentation is only in the comments. =cut sub checkHeaderFile { my $file = shift; $file =~ tr{/}{\\}; my $msHdr = ($file =~ /^.*\\_?(str|std|ctype)/i) ? 1 : 0; my $inComment = 0; # currently in a multiline comment? my ($lineCode, $lineComments); for (new IO::File($file, 'r')->getlines) { &threshCppLine($_, \$inComment, \$lineComments, \$lineCode); $msHdr = 1 if (!$msHdr && ($lineComments =~ /Copyright.*Microsoft|WINSOCK/i)); my $lineDocumentation = "$lineCode $lineComments"; next if $lineDocumentation =~ /^\s*$/; if ($msHdr) { while ($lineDocumentation =~ /_*(\w+)\s*\(/) { $documentedFunctions{$1} = 1; $lineDocumentation = $'; } } } } =head2 threshCppLine Separate a line of C[++] code into code and comments, maintaining a state variable that tells whether we are within a multiline comment. =cut sub threshCppLine { my ($line, $inCommentRef, $commentsRef, $codeRef) = @_; $$commentsRef = ''; $$codeRef = ''; until ($line =~ /^\s*$/) { if ($$inCommentRef) { if ($line =~ m{^(.*?)(\*\/)}) { $$commentsRef .= " $1"; $line = $'; $$inCommentRef = 0; } else { $$commentsRef .= " $line"; $line = ''; } } else { if ($line =~ m{^(.*?)(\/\/|\/\*)}) { $$codeRef .= " $1"; if ($2 eq '//') { $$commentsRef .= " $'"; $line = ''; } else { $line = $'; $$inCommentRef = 1; } } else { $$codeRef .= " $line"; $line = ''; } } } } =head2 readUndocumentedFunctions Bring the undocumented function output of a previous run into C<%undocumentedFunctions>. =cut sub readUndocumentedFunctions { my $currDllRef; for (new IO::File($opts{u}, 'r')->getlines) { if (/^\S+\\(\w+)\.dll$/i) { my $key = lc $1; $undocumentedFunctions{$key} = {} unless defined $undocumentedFunctions{$key}; $currDllRef = $undocumentedFunctions{$key}; } elsif (/^\t(\w+)\t(\d+)$/) { $currDllRef->{$1} = $2 } } } =head2 connectToAPI We need to connect to some Win32 API functions to do our work. =cut sub connectToAPI { $GetFileVersionInfoSize = new Win32::API('version', 'GetFileVersionInfoSizeA', ['P', 'P'], 'N'); $GetFileVersionInfo = new Win32::API('version', 'GetFileVersionInfoA', ['P', 'N', 'N', 'P'], 'I'); $VerQueryValue = new Win32::API('version', 'VerQueryValueA', ['P', 'P', 'P', 'P'], 'I'); } =head2 handleArgs Extract the information that depends on how we are invoked. =cut sub handleArgs { GetOptions(\%opts, "B", "M", "u=s", "d=s", "h=s", "v"); if ($opts{B}) { # BC5.5 $dump{cmd} = 'tdump'; $dump{export}->{option} = '-ee'; $dump{export}->{pattern} = qr{EXPORT ord:(\d{4})='_?([^'@]+)(@[^'])?'}; $dump{import}->{option} = '-em'; $dump{import}->{combinedFuncPattern} = qr{IMPORT:\s+(\w+).dll.*'(\w+)'}; $dump{import}->{combinedOrdinalPattern} = qr{IMPORT:\s+(\w+).dll(\d+)}; } elsif ($opts{M}) { # Microsoft $dump{cmd} = 'dumpbin'; $dump{export}->{option} = '/exports'; $dump{export}->{pattern} = qr{^\s+(\d+)\s+\w+\s+(\w+)\s+\(.*\)}; $dump{import}->{option} = '/imports'; $dump{import}->{funcNameOnlyPattern} = qr{^\s*[0-9a-fA-F]+\s*[0-9a-fA-F]+\s+(\w+)}; $dump{import}->{dllNameOnlyPattern} = qr{^\s*(\w+)\.dll\$}; $dump{import}->{ordinalOnlyPattern} = qr{^\s*Ordinal\s+(\d+)}; } elsif ($opts{v}) { print "dlldoc.pl version $version\n"; exit 0; } else { pod2usage(1); } pod2usage(1) if !$opts{u} and (scalar(@ARGV) or !$opts{h}); $dump{cmd} = $opts{d} . "\\" . $dump{cmd} if defined $opts{d}; } =head1 checkExe Here we see which among the functions imported by an exe can be found in our store of undocumented ones. The only complicated case is when the exe imports functions from a particular DLL by ordinal rather than name; in this case we make a hash equivalent to the subhash of undocumented functions for that DLL, but with ordinal rather than name as the key. =cut sub checkExe { my $exeName = shift; print "$exeName\n"; my ($dllName, %undocDllOrdinals) = ('', ()); my ($importOrdinal, $importFunc, $importDll); for (`$dump{cmd} $dump{import}->{option} $exeName`) { if ($opts{B}) { # BC5.5 if (/$dump{import}->{combinedFuncPattern}/i) { ($importDll, $importFunc) = ($1, $2); if (defined $undocumentedFunctions{$importDll}->{$importFunc}) { print "\t$importDll\t$importFunc\n"; } } elsif (/$dump{import}->{combinedOrdinalPattern}/i) { ($importDll, $importOrdinal) = (lc($1), $2); if ($importDll ne $dllName) { $dllName = $importDll; &buildOrdinalsHash(\%undocDllOrdinals, $dllName); } print "\t$importDll\t$undocDllOrdinals{$importOrdinal}\n" if (defined $undocDllOrdinals{$importOrdinal}); } } else { # MS if (/$dump{import}->{dllNameOnlyPattern}/i) { $importDll = $dllName = $1; &buildOrdinalsHash(\%undocDllOrdinals, $dllName); } elsif (/$dump{import}->{funcNameOnlyPattern}/i) { print "\t$importDll\t$1\n" if (defined $undocumentedFunctions{$importDll}->{$1}); } elsif (/$dump{import}->{ordinalOnlyPattern}/i) { print "\t$importDll\t$1\n" if (defined $undocumentedFunctions{$importDll}->{$1}); } } } } =head2 buildOrdinalsHash Here we build the reversed hash yielding the function names for undocumented ordinals. =cut sub buildOrdinalsHash { my ($undocDllOrdinalsRef, $dllName) = @_; %$undocDllOrdinalsRef = (); while (my ($undocName, $undocOrdinal) = each%{$undocumentedFunctions{$dllName}}) { $undocDllOrdinalsRef->{$undocOrdinal} = $undocName; } }