Compiler Testing Bibliography for “Practical Testing of a C99 Compiler Using Output Comparison”

In May 2024,’s URL redirection for life is ending, so this page is moving to

This is an updated version of the bibliography from my article, “Practical Testing of a C99 Compiler Using Output Comparison,” published in Software: Practice and Experience (; a pre-print is available at The journal version of the bibliography is in reference order, and contains items discussed in the article but not directly relevant to compiler testing. The online version is sorted alphabetically, with items on topics other than compiler testing separated; it will be updated if more articles on compiler testing appear. The emphasis, as the title of the article indicates, is on practical testing of C/C++ compilers.

The literature on compiler testing is surprisingly scant. There is substantial literature on the theoretical design of compilers which would provably not need testing, but the audience for such work is largely disjoint from that for the testing of compilers for widely-used languages which will have a substantial user base. There are also a number of articles on the automated generation of test code, but given that there is now a substantial base of real Open Source software, this is less useful than formerly.

This article, and PalmSource’s testing, was firmly directed towards shipping a high-quality, but imperfect, compiler which would be of practical use to the developer community. Producing an inherently bug-free compiler for a theoretically desirable language was not an option. The goal was to catch as high a proportion of serious bugs as possible in a useful compiler for two widely-used languages, C99 and C++98.

The best available bibliography was over a decade old, by Dr. C.J. Burgess of the University of Bristol; it was a posting to the comp.compilers Usenet newsgroups, below [Burgess]. (See now also the bibliography in [Chen, Patra, et al.]) Bailey & Davidson [Bailey & Davidson] is an academic article on the testing of function calls, somewhat similar to Lindig’s Quest [Lindig]; it contains the interesting observations that “the state-of-the-art in compiler testing is inadequate” (p. 1040), and that in their experience, the ratio of failed tests to bugs was approximately one thousand to one (p. 1041). The standard work on compiler theory is Compilers: Principles, Techniques and Tools [Aho et al], commonly known as the Dragon book. It is a good general introduction, but had little direct relevance to our testing, except for some extra caution in including examples of spaghetti code; other standard compiler texts which were consulted, but did not have significant sections on testing, are omitted from the bibliography. A Retargetable C Compiler: Design and Implementation [Fraser & Hanson] contains a brief section on the authors’ experience with testing their compiler, with some practical advice on the importance of regression test cases; difficulties in using lcc’s regression tests for other compilers are discussed above, in the section on emulated-execution output correctness testing. An updated and alphabetized version of this bibliography will be made available at

Compiler Testing

  1. Bailey, Mark W. and Davidson, Jack W., “Automatic Detection and Diagnosis of Faults in Generated Code for Procedure Calls”, IEEE Transactions on Software Engineering, volume 29, issue 11, 2003. An abstract is available online, at, as is an earlier version of the full paper,
  2. Bhattacharya, Soumyabrata “ANSI C Test suites,” comp.compilers,, 1994.
  3. Burgess, C.J. , “Bibliography for Automatic Test Data Generation for Compilers,” comp.compilers,, 1993.
  4. Junjie Chen, Yanwei Bai, Dan Hao, Yingfei Xiong, Hongyu Zhang, Bing Xie. “Learning to Prioritize Test Programs for Compiler Testing,” ICSE'17: 39th International Conference on Software Engineering, Buenos Aires, Argentina, May 2017.
  5. Junjie Chen, Jibesh Patra, Michael Pradel, Yingfei Xiong, Hongyu Zhang, Dan Hao, and Lu Zhang. 2019. “A Survey of Compiler Testing.” ACM Computing Surveys, to Appear, 2020. (But note that the preprint refers to itself as “ACM Comput. Surv. 1, 1, Article 1 (January 2019),” with an invalid DOI:
  6. Schloss Dagstuhl, Testing and Verification of Compilers
  7. DejaGnu, 1993-.
  8. Delta, a tool for test failure minimization, Wilkerson, Daniel and McPeak, Scott,, 2003-5. Based on [Zeller]. See also [Open Source Quality Project].
  9. Alex Denisov, “System Under Test: LLVM,” 2016
  10. Dziubinski, Matt P. “C++ links: compilers - correctness”.
  11. Eide, Eric and Regehr, John “Volatiles are miscompiled, and what to do about it” in Proceedings of the 7th ACM international conference on Embedded software, ISBN 978-1-60558-468-3, Association for Computing Machinery 2008. Preprint at
  12. Ellison, Chucky and Rosu, Grigore “Defining the Undefinedness of C,” University of Illinois technical report, 2012.
  13. Equivalent Modulo Input Compiler Validation Project, UC Davis
  14. Fernandez, Mary and Ramsey, Norman “Automatic Checking of Instruction Specifications,” in Proceedings of the 19th International Conference on Software Engineering, ISBN:0-89791-914-9, Association for Computing Machinery 1997. Preprint at
  15. Fraser, Christopher and Hanson, David, A Retargetable C compiler: Design and Implementation, ISBN: 0-8053-1670-1, Benjamin/Cummings Publishing, 1995, §19.5 pp. 531–3.
  16. Niranjan Hasabnis, Rui Qiao, and R. Sekar, 2015. “Checking correctness of code generator architecture specifications.” In Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '15). IEEE Computer Society, Washington, DC, USA, 167-178.
  17. H. Jiang, Z. Zhou, Z. Ren, J. Zhang, and X. Li, “CTOS: Compiler Testing for Optimization Sequences of LLVM,” in IEEE Transactions on Software Engineering, 2021, doi: 10.1109/TSE.2021.3058671.
  18. Jones, Derek “Who Guards the Guardians?” (a study of the coverage of the Perennial Validation Suite),, 1993.
  19. Kahan, William Sumner, Thos, et al., Paranoia Floating Point Test,, 1983-5.
  20. P. Kreutzer, S. Kraus and M. Philippsen, "Language-Agnostic Generation of Compilable Test Programs," 2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST), Porto, Portugal, 2020, pp. 39-50, doi: 10.1109/ICST46399.2020.00015.
  21. lcc, A Retargetable Compiler for ANSI C,; described in A Retargetable C Compiler: Design and Implementation, Hanson, David R. and Fraser, Christopher W., ISBN: 0-8053-1670-1, Benjamin/Cummings Publishing 1995.
  22. Lindig, Christian, “Random Testing of the Translation of C Function Calls”, Proceedings of the Sixth International Workshop on Automated Debugging, ISBN 1-59593-050-7, Association for Computing Machinery 2005.
  23. Nuno P. Lopes, Juneyoung Lee, Chung-Kil Hur, Zhengyang Liu, and John Regehr. 2021. “Alive2: Bounded Translation Validation for LLVM.” Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI ’21), (archive; the web server sometimes fails to respond.) See also the following blog posts by John Regehr and Nuno P. Lopes:
  24. Haoyang Ma, “A Survey of Modern Compiler Fuzzing,”
  25. McKeeman, William M. “Differential Testing for Software,” Digital Technical Journal, Vol. 10 No. 1, 1998.
  26. Modena Test++ Suite,
  27. mttd (Reddit), “How do you test compiler projects?”, 2023
  28. George C. Necula, “Translation Validation for an Optimizing Compiler
  29. Open Source Quality Project
  30. Perennial Validation Suites
  31. Plum Hall C and C++ Validation Test Suites
  32. Regehr, John “Embedded in Academia : A Critical Look at the SCADE Compiler Verification Kit,” blog posting, 2011.
  33. Regehr, John “Are Compilers Getting More or Less Reliable?” blog posting, 2013.
  34. Regehr, John “Guidelines for Research on Finding Bugs” blog posting, 2013.
  35. John Regehr, Yang Chen, Pascal Cuoq, Eric Eide, Chucky Ellison, and Xuejun Yang, Test-Case Reduction for C Compiler Bugs (C-Reduce) in Proceedings of 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2012), Beijing, China, June 2012.
  36. Richard Schumi and Jun Sun, “SpecTest: Specification-Based Compiler Testing.” In: Guerra, E., Stoelinga, M. (eds) Fundamental Approaches to Software Engineering. FASE 2021. Lecture Notes in Computer Science, vol 12649. Springer, Cham. 2021
  37. Sheridan, Flash, “Practical Testing of a C99 Compiler Using Output Comparison,” Software: Practice and Experience,, 2007. A pre-print is available at A list of bugs discovered using the techniques in the article is at
  38. Small Device C Compiler (SDCC), Dutta, Sandeep et al.,, 1999-.
  39. Zhendong Su et al., Equivalence Modulo Inputs Compiler Validation Project, University of California, Davis, 2014–.
  40. Chengnian Sun, Vu Le, Qirun Zhang, and Zhendong Su, “Toward Understanding Compiler Bugs in GCC and LLVM,” Proceedings of ISSTA 2016, Saarbrucken, Germany, 2016. Source code and dataset; pre-print.
  41. Y. Tang, H. Jiang, Z. Zhou, X. Li, Z. Ren and W. Kong, "Detecting Compiler Warning Defects Via Diversity-Guided Program Mutation," in IEEE Transactions on Software Engineering, doi: 10.1109/TSE.2021.3119186.
  42. Yixuan Tang, Zhilei Ren, Weiqiang Kong, He Jiang, “Compiler Testing: A Systematic Literature Analysis,” Frontiers of Computer Science 14 (2020). (2018 preprint).
  43. Tydeman, Fred, C99 FPCE Test Suite,, 1995-2006.
  44. Vallat, Miod “Compilers in OpenBSD,” openbsd-misc posting 2013.
  45. Xi Wang, Nickolai Zeldovich, M. Frans Kaashoek, and Armando Solar-Lezama, “Towards optimization-safe systems,”SOSP'13: The 24th ACM Symposium on Operating Systems Principles,2013.
  46. Xuejun Yang, Random Testing of Open Source C Compilers,, doctoral thesis, The University of Utah, December 2014.
  47. Xuejun Yang, Yang Chen, Eric Eide, and John Regehr, “Finding and Understanding Bugs in C Compilers,” Proceedings of the 2011 ACM SIGPLAN Conference on Programming Language Design and Implementation,, preprint at
  48. Zeller, A.: “Yesterday, my program worked. Today, it does not. Why?”, Software Engineering - ESEC/FSE'99: 7th European Software Engineering Conference, ISSN 0302-9743, volume 1687 of Lecture Notes in Computer Science, pp. 253-267, 1999.
  49. Qirun Zhang, Chengnian Sun, and Zhendong Su, “Skeletal Program Enumeration for Rigorous Compiler Testing,” in Proceedings of PLDI, Barcelona, Spain, June 2017.

Source Code Useful for Compiler Testing (Primarily C/C++)


Copyright © 2002-2007, Access Systems Americas, Inc. PalmSource, Palm OS and Palm Powered, and certain other trade names, trademarks and logos are trademarks which may be registered in the United States, France, Germany, Japan, the United Kingdom and other countries, and are either owned by PalmSource, Inc. or its affiliates, or are licensed exclusively to PalmSource, Inc. by Palm Trademark Holding Company, LLC. All other brands, trademarks and service marks used herein are or may be trademarks of, and are used to identify other products or services of, their respective owners. All rights reserved. Copyright © 2008-2017, Flash (K.J.) Sheridan.