@setfilename cppinternals.info
@settitle The GNU C Preprocessor Internals
+@include gcc-common.texi
+
@ifinfo
-@dircategory Programming
+@dircategory Software development
@direntry
* Cpplib: (cppinternals). Cpplib internals.
@end direntry
@ifinfo
This file documents the internals of the GNU C Preprocessor.
-Copyright 2000, 2001, 2002 Free Software Foundation, Inc.
+Copyright 2000, 2001, 2002, 2004, 2005, 2006, 2007 Free Software
+Foundation, Inc.
Permission is granted to make and distribute verbatim copies of
this manual provided the copyright notice and this permission notice
@end ifinfo
@titlepage
-@c @finalout
@title Cpplib Internals
-@subtitle Last revised January 2002
-@subtitle for GCC version 3.1
+@versionsubtitle
@author Neil Booth
@page
@vskip 0pt plus 1filll
@c man begin COPYRIGHT
-Copyright @copyright{} 2000, 2001, 2002
+Copyright @copyright{} 2000, 2001, 2002, 2004, 2005
Free Software Foundation, Inc.
Permission is granted to make and distribute verbatim copies of
@top
@chapter Cpplib---the GNU C Preprocessor
-The GNU C preprocessor in GCC 3.x has been completely rewritten. It is
-now implemented as a library, @dfn{cpplib}, so it can be easily shared between
+The GNU C preprocessor is
+implemented as a library, @dfn{cpplib}, so it can be easily shared between
a stand-alone preprocessor, and a preprocessor integrated with the C,
C++ and Objective-C front ends. It is also available for use by other
programs, though this is not recommended as its exposed interface has
* Line Numbering:: Tracking location within files.
* Guard Macros:: Optimizing header files with guard macros.
* Files:: File handling.
-* Index:: Index.
+* Concept Index:: Index.
@end menu
@node Conventions
The convention is that functions and types that are exposed to multiple
files internally are prefixed with @samp{_cpp_}, and are to be found in
-the file @file{cpphash.h}. Functions and types exposed to external
+the file @file{internal.h}. Functions and types exposed to external
clients are in @file{cpplib.h}, and prefixed with @samp{cpp_}. For
historical reasons this is no longer quite true, but we should strive to
stick to it.
@cindex escaped newlines
@section Overview
-The lexer is contained in the file @file{cpplex.c}. It is a hand-coded
+The lexer is contained in the file @file{lex.c}. It is a hand-coded
lexer, and not implemented as a state machine. It can understand C, C++
and Objective-C source code, and has been extended to allow reasonably
successful preprocessing of assembly language. The lexer does not make
@end smallexample
This is a good example of the subtlety of getting token spacing correct
-in the preprocessor; there are plenty of tests in the test suite for
+in the preprocessor; there are plenty of tests in the testsuite for
corner cases like this.
The lexer is written to treat each of @samp{\r}, @samp{\n}, @samp{\r\n}
The tokens forming a macro's replacement list are collected by the
@code{#define} handler, and placed in storage that is only freed by
-@code{cpp_destroy}. So if a macro is expanded in our line of tokens,
-the pointers to the tokens of its expansion that we return will always
+@code{cpp_destroy}. So if a macro is expanded in the line of tokens,
+the pointers to the tokens of its expansion that are returned will always
remain valid. However, macros are a little trickier than that, since
they give rise to three sources of fresh tokens. They are the built-in
macros like @code{__LINE__}, and the @samp{#} and @samp{##} operators
@cindex spacing
@cindex token spacing
-First, let's look at an issue that only concerns the stand-alone
-preprocessor: we want to guarantee that re-reading its preprocessed
+First, consider an issue that only concerns the stand-alone
+preprocessor: there needs to be a guarantee that re-reading its preprocessed
output results in an identical token stream. Without taking special
measures, this might not be the case because of macro substitution.
For example:
and after each macro replacement, each argument replacement, and
additionally each token created by the @samp{#} and @samp{##} operators.
-Let's look at how the preprocessor gets whitespace output correct
+Look at how the preprocessor gets whitespace output correct
normally. The @code{cpp_token} structure contains a flags byte, and one
of those flags is @code{PREV_WHITE}. This is flagged by the lexer, and
indicates that the token was preceded by whitespace of some form other
Here, two padding tokens are generated with sources the @samp{foo} token
between the brackets, and the @samp{bar} token from foo's replacement
-list, respectively. Clearly the first padding token is the one we
-should use, so our output code should contain a rule that the first
+list, respectively. Clearly the first padding token is the one to
+use, so the output code should contain a rule that the first
padding token in a sequence is the one that matters.
-But what if we happen to leave a macro expansion? Adjusting the above
+But what if a macro expansion is left? Adjusting the above
example slightly:
@smallexample
C-style comments. For example:
@smallexample
-foo /* A long
-comment */ bar \
+foo /* @r{A long
+comment} */ bar \
baz
@result{}
foo bar baz
on.
Note that whilst we are inside the conditional block, @code{mi_valid} is
-likely to be reset to @code{false}, but this does not matter since the
+likely to be reset to @code{false}, but this does not matter since
the closing @code{#endif} restores it to @code{true} if appropriate.
Finally, since @code{_cpp_lex_direct} pops the file off the buffer stack
@cindex files
Fairly obviously, the file handling code of cpplib resides in the file
-@file{cppfiles.c}. It takes care of the details of file searching,
+@file{files.c}. It takes care of the details of file searching,
opening, reading and caching, for both the main source file and all the
headers it recursively includes.
applies. This may be higher up the directory tree than the full path to
the file minus the base name.
-@node Index
-@unnumbered Index
+@node Concept Index
+@unnumbered Concept Index
@printindex cp
@bye