Back to Chapter 1 -- Index -- Chapter 3

The ANSI standard has made many small changes and additions to basic types and
expressions. There are now `signed` and `unsigned` forms of all
integer types, and notations for unsigned constants and hexadecimal character
constants. Floating-point operations may be done in single precision; there is
also a `long` double type for extended precision. String constants may
be concatenated at compile time. Enumerations have become part of the language,
formalizing a feature of long standing. Objects may be declared `const`,
which prevents them from being changed. The rules for automatic coercions
among arithmetic types have been augmented to handle the richer set of types.

At least the first 31 characters of an internal name are significant. For
function names and external variables, the number may be less than 31, because
external names may be used by assemblers and loaders over which the language
has no control. For external names, the standard guarantees uniqueness only for
6 characters and a single case. Keywords like `if`, `else`,
`int`, `float`, etc., are reserved: you can't use them as
variable names. They must be in lower case.

It's wise to choose variable names that are related to the purpose of the variable, and that are unlikely to get mixed up typographically. We tend to use short names for local variables, especially loop indices, and longer names for external variables.

char | a single byte, capable of holding one character in the local character set |

int | an integer, typically reflecting the natural size of integers on the host machine |

float | single-precision floating point |

double | double-precision floating point |

In addition, there are a number of qualifiers that can be applied to these
basic types. `short` and `long` apply to integers:

short int sh; long int counter;The word

The intent is that `short` and `long` should provide different
lengths of integers where practical; `int` will normally be the natural
size for a particular machine. `short` is often 16 bits long, and
`int` either 16 or 32 bits. Each compiler is free to choose appropriate sizes
for its own hardware, subject only to the the restriction that `short`s
and ints are at least 16 bits, `long`s are at least 32 bits, and
`short` is no longer than `int`, which is no longer than `long`.

The qualifier `signed` or `unsigned` may be applied to `char` or
any integer. `unsigned` numbers are always positive or zero, and obey the
laws of arithmetic modulo *2 ^{n}*, where

The type `long double` specifies extended-precision floating point. As
with integers, the sizes of floating-point objects are implementation-defined;
`float`, `double` and `long double` could represent one,
two or three distinct sizes.

The standard headers `<limits.h>` and `<float.h>` contain symbolic
constants for all of these sizes, along with other properties of the machine
and compiler. These are discussed in Appendix B.

**Exercise 2-1.** Write a program to determine the ranges of `char`,
`short`, `int`, and `long` variables, both `signed` and
`unsigned`, by printing appropriate values from standard headers and by direct
computation. Harder if you compute them: determine the ranges of the various
floating-point types.

Floating-point constants contain a decimal point (`123.4`) or an exponent
(`1e-2`) or both; their type is `double`, unless suffixed. The
suffixes `f` or `F` indicate a `float` constant; `l`
or `L` indicate a `long double`.

The value of an integer can be specified in octal or hexadecimal instead of
decimal. A leading `0` (zero) on an integer constant means octal; a
leading `0x` or `0X` means hexadecimal. For example, decimal 31
can be written as `037` in octal and `0x1f` or `0x1F` in
hex. Octal and hexadecimal constants may also be followed by `L` to make
them `long` and `U` to make them `unsigned`: `0XFUL`
is an *unsigned long* constant with value 15 decimal.

A `character constant` is an integer, written as one character within
single quotes, such as `'x'`. The value of a character constant is the
numeric value of the character in the machine's character set. For example,
in the ASCII character set the character constant `'0'` has the value 48,
which is unrelated to the numeric value 0. If we write `'0'` instead of
a numeric value like 48 that depends on the character set, the program is
independent of the particular value and easier to read. Character constants
participate in numeric operations just as any other integers, although they
are most often used in comparisons with other characters.

Certain characters can be represented in character and string constants by
escape sequences like `\n` (newline); these sequences look like two
characters, but represent only one. In addition, an arbitrary byte-sized bit
pattern can be specified by

'\whereooo'

'\xwherehh'

#define VTAB '\013' /* ASCII vertical tab */ #define BELL '\007' /* ASCII bell character */or, in hexadecimal,

#define VTAB '\xb' /* ASCII vertical tab */ #define BELL '\x7' /* ASCII bell character */The complete set of escape sequences is

\a | alert (bell) character | \\ | backslash |

\b | backspace | \? | question mark |

\f | formfeed | \' | single quote |

\n | newline | \" | double quote |

\r | carriage return | \ooo | octal number |

\t | horizontal tab | \xhh | hexadecimal number |

\v | vertical tab |

The character constant `'\0'` represents the character with value zero,
the null character. `'\0'` is often written instead of `0` to
emphasize the character nature of some expression, but the numeric value is
just 0.

A *constant expression* is an expression that involves only constants.
Such expressions may be evaluated at during compilation rather than run-time,
and accordingly may be used in any place that a constant can occur, as in

#define MAXLINE 1000 char line[MAXLINE+1];or

#define LEAP 1 /* in leap years */ int days[31+28+LEAP+31+30+31+30+31+31+30+31+30+31];A

"I am a string"or

"" /* the empty string */The quotes are not part of the string, but serve only to delimit it. The same escape sequences used in character constants apply in strings;

"hello, " "world"is equivalent to

"hello, world"This is useful for splitting up long strings across several source lines.

Technically, a string constant is an array of characters. The internal
representation of a string has a null character `'\0'` at the end, so
the physical storage required is one more than the number of characters
written between the quotes. This representation means that there is no limit
to how long a string can be, but programs must scan a string completely to
determine its length. The standard library function `strlen(s)` returns
the length of its character string argument `s`, excluding the terminal
`'\0'`. Here is our version:

/* strlen: return length of s */ int strlen(char s[]) { int i; while (s[i] != '\0') ++i; return i; }

Be careful to distinguish between a character constant and a string that
contains a single character: `'x'` is not the same as `"x"`. The
former is an integer, used to produce the numeric value of the letter *x* in
the machine's character set. The latter is an array of characters that
contains one character (the letter *x*) and a `'\0'`.

There is one other kind of constant, the *enumeration constant*. An
enumeration is a list of constant integer values, as in

enum boolean { NO, YES };The first name in an

enum escapes { BELL = '\a', BACKSPACE = '\b', TAB = '\t', NEWLINE = '\n', VTAB = '\v', RETURN = '\r' }; enum months { JAN = 1, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV, DEC }; /* FEB = 2, MAR = 3, etc. */Names in different enumerations must be distinct. Values need not be distinct in the same enumeration.

Enumerations provide a convenient way to associate constant values with
names, an alternative to `#define` with the advantage that the values
can be generated for you. Although variables of `enum` types may be
declared, compilers need not check that what you store in such a variable is
a valid value for the enumeration. Nevertheless, enumeration variables offer
the chance of checking and so are often better than `#define`s. In
addition, a debugger may be able to print values of enumeration variables in
their symbolic form.

int lower, upper, step; char c, line[1000];Variables can be distributed among declarations in any fashion; the lists above could well be written as

int lower; int upper; int step; char c; char line[1000];The latter form takes more space, but is convenient for adding a comment to each declaration for subsequent modifications.

A variable may also be initialized in its declaration. If the name is followed by an equals sign and an expression, the expression serves as an initializer, as in

char esc = '\\'; int i = 0; int limit = MAXLINE+1; float eps = 1.0e-5;If the variable in question is not automatic, the initialization is done once only, conceptionally before the program starts executing, and the initializer must be a constant expression. An explicitly initialized automatic variable is initialized each time the function or block it is in is entered; the initializer may be any expression. External and static variables are initialized to zero by default. Automatic variables for which is no explicit initializer have undefined (i.e., garbage) values.

The qualifier `const` can be applied to the declaration of any variable
to specify that its value will not be changed. For an array, the `const`
qualifier says that the elements will not be altered.

const double e = 2.71828182845905; const char msg[] = "warning: ";The

int strlen(const char[]);The result is implementation-defined if an attempt is made to change a

x % yproduces the remainder when

if ((year % 4 == 0 && year % 100 != 0) || year % 400 == 0) printf("%d is a leap year\n", year); else printf("%d is not a leap year\n", year);The

The binary `+` and `-` operators have the same precedence, which
is lower than the precedence of `*`, `/` and `%`, which is
in turn lower than unary `+` and `-`. Arithmetic operators
associate left to right.

Table 2.1 at the end of this chapter summarizes precedence and associativity for all operators.

> >= < <=They all have the same precedence. Just below them in precedence are the equality operators:

== !=Relational operators have lower precedence than arithmetic operators, so an expression like

More interesting are the logical operators `&&` and `||`.
Expressions connected by `&&` or `||` are evaluated
left to right, and evaluation stops as soon as the truth or falsehood of the
result is known. Most C programs rely on these properties. For example, here
is a loop from the input function `getline` that we wrote in
Chapter 1:

for (i=0; i < lim-1 && (c=getchar()) != '\n' && c != EOF; ++i) s[i] = c;Before reading a new character it is necessary to check that there is room to store it in the array

Similarly, it would be unfortunate if `c` were tested against
`EOF` before `getchar` is called; therefore the call and
assignment must occur before the character in `c` is tested.

The precedence of `&&` is higher than that of `||`, and
both are lower than relational and equality operators, so expressions like

i < lim-1 && (c=getchar()) != '\n' && c != EOFneed no extra parentheses. But since the precedence of

(c=getchar()) != '\n'to achieve the desired result of assignment to

By definition, the numeric value of a relational or logical expression is 1 if the relation is true, and 0 if the relation is false.

The unary negation operator `!` converts a non-zero operand into 0, and
a zero operand in 1. A common use of `!` is in constructions like

if (!valid)rather than

if (valid == 0)It's hard to generalize about which form is better. Constructions like

**Exercise 2-2.** Write a loop equivalent to the `for`
loop above without using `&&` or `||`.

A `char` is just a small integer, so `char`s may be freely used
in arithmetic expressions. This permits considerable flexibility in certain
kinds of character transformations. One is exemplified by this naive
implementation of the function `atoi`, which converts a string of digits
into its numeric equivalent.

/* atoi: convert s to integer */ int atoi(char s[]) { int i, n; n = 0; for (i = 0; s[i] >= '0' && s[i] <= '9'; ++i) n = 10 * n + (s[i] - '0'); return n; }As we discussed in Chapter 1, the expression

s[i] - '0'gives the numeric value of the character stored in

Another example of `char` to `int` conversion is the function
`lower`, which maps a single character to lower case *for the ASCII
character set*. If the character is not an upper case letter,
`lower` returns it unchanged.

/* lower: convert c to lower case; ASCII only */ int lower(int c) { if (c >= 'A' && c <= 'Z') return c + 'a' - 'A'; else return c; }This works for ASCII because corresponding upper case and lower case letters are a fixed distance apart as numeric values and each alphabet is contiguous -- there is nothing but letters between

The standard header `<ctype.h>`, described in Appendix B, defines a family of functions that provide
tests and conversions that are independent of character set. For example, the
function `tolower` is a portable replacement for the function
`lower` shown above. Similarly, the test

c >= '0' && c <= '9'can be replaced by

isdigit(c)We will use the

There is one subtle point about the conversion of characters to integers. The
language does not specify whether variables of type `char` are signed
or unsigned quantities. When a `char` is converted to an `int`,
can it ever produce a negative integer? The answer varies from machine to
machine, reflecting differences in architecture. On some machines a
`char` whose leftmost bit is 1 will be converted to a negative integer
(``sign extension''). On others, a `char` is promoted to an int by
adding zeros at the left end, and thus is always positive.

The definition of C guarantees that any character in the machine's standard
printing character set will never be negative, so these characters will
always be positive quantities in expressions. But arbitrary bit patterns
stored in character variables may appear to be negative on some machines, yet
positive on others. For portability, specify `signed` or `unsigned`
if non-character data is to be stored in `char` variables.

Relational expressions like `i > j` and logical expressions
connected by `&&` and `||` are defined to have value 1
if true, and 0 if false. Thus the assignment

d = c >= '0' && c <= '9'sets

Implicit arithmetic conversions work much as expected. In general, if an
operator like `+` or `*` that takes two operands (a binary operator)
has operands of different types, the ``lower'' type is *promoted* to the
``higher'' type before the operation proceeds. The result is of the integer
type. Section 6 of Appendix A
states the conversion rules precisely. If there are no `unsigned`
operands, however, the following informal set of rules will suffice:

- If either operand is
`long double`, convert the other to`long double`. - Otherwise, if either operand is
`double`, convert the other to`double`. - Otherwise, if either operand is
`float`, convert the other to`float`. - Otherwise, convert
`char`and`short`to`int`. - Then, if either operand is
`long`, convert the other to`long`.

Conversion rules are more complicated when `unsigned` operands are
involved. The problem is that comparisons between signed and unsigned values
are machine-dependent, because they depend on the sizes of the various integer
types. For example, suppose that `int` is 16 bits and `long` is 32
bits. Then `-1L < 1U`, because `1U`, which is an `unsigned int`, is
promoted to a `signed long`. But `-1L > 1UL` because `-1L`
is promoted to `unsigned long` and thus appears to be a large positive
number.

Conversions take place across assignments; the value of the right side is converted to the type of the left, which is the type of the result.

A character is converted to an integer, either by sign extension or not, as described above.

Longer integers are converted to shorter ones or to `char`s by dropping
the excess high-order bits. Thus in

int i; char c; i = c; c = i;the value of

If `x` is `float` and `i` is `int`, then `x = i`
and `i = x` both cause conversions; `float` to `int` causes
truncation of any fractional part. When a `double` is converted to `float`,
whether the value is rounded or truncated is implementation dependent.

Since an argument of a function call is an expression, type conversion also
takes place when arguments are passed to functions. In the absence of a
function prototype, `char` and `short` become int, and `float`
becomes `double`. This is why we have declared function arguments to be
`int` and `double` even when the function is called with `char`
and `float`.

Finally, explicit type conversions can be forced (``coerced'') in any
expression, with a unary operator called a `cast`. In the construction

(*type name*) *expression*

the *expression* is converted to the named type by the conversion rules
above. The precise meaning of a cast is as if the *expression* were
assigned to a variable of the specified type, which is then used in place of
the whole construction. For example, the library routine `sqrt` expects
a `double` argument, and will produce nonsense if inadvertently handled
something else. (`sqrt` is declared in `<math.h>`.) So if
`n` is an integer, we can use

sqrt((double) n)to convert the value of

If arguments are declared by a function prototype, as the normally should be,
the declaration causes automatic coercion of any arguments when the function
is called. Thus, given a function prototype for `sqrt`:

double sqrt(double)the call

root2 = sqrt(2)coerces the integer

The standard library includes a portable implementation of a pseudo-random number generator and a function for initializing the seed; the former illustrates a cast:

unsigned long int next = 1; /* rand: return pseudo-random integer on 0..32767 */ int rand(void) { next = next * 1103515245 + 12345; return (unsigned int)(next/65536) % 32768; } /* srand: set seed for rand() */ void srand(unsigned int seed) { next = seed; }

if (c == '\n') ++nl;The unusual aspect is that

x = n++;sets

x = ++n;sets

In a context where no value is wanted, just the incrementing effect, as in

if (c == '\n') nl++;prefix and postfix are the same. But there are situations where one or the other is specifically called for. For instance, consider the function

/* squeeze: delete all c from s */ void squeeze(char s[], int c) { int i, j; for (i = j = 0; s[i] != '\0'; i++) if (s[i] != c) s[j++] = s[i]; s[j] = '\0'; }Each time a non-

if (s[i] != c) { s[j] = s[i]; j++; }Another example of a similar construction comes from the

if (c == '\n') { s[i] = c; ++i; }by the more compact

if (c == '\n') s[i++] = c;As a third example, consider the standard function

/* strcat: concatenate t to end of s; s must be big enough */ void strcat(char s[], char t[]) { int i, j; i = j = 0; while (s[i] != '\0') /* find end of s */ i++; while ((s[i++] = t[j++]) != '\0') /* copy t */ ; }As each member is copied from

**Exercise 2-4.** Write an alternative version of `squeeze(s1,s2)` that
deletes each character in `s1` that matches any character in the
*string* `s2`.

**Exercise 2-5.** Write the function `any(s1,s2)`, which returns the
first location in a string `s1` where any character from the string
`s2` occurs, or `-1` if `s1` contains no characters from `s2`.
(The standard library function `strpbrk` does the same job but returns a
pointer to the location.)

& | bitwise AND |

| | bitwise inclusive OR |

^ | bitwise exclusive OR |

<< | left shift |

>> | right shift |

~ | one's complement (unary) |

The bitwise AND operator `&` is often used to mask off some set of bits,
for example

n = n & 0177;sets to zero all but the low-order 7 bits of

The bitwise OR operator `|` is used to turn bits on:

x = x | SET_ON;sets to one in

The bitwise exclusive OR operator `^` sets a one in each bit position
where its operands have different bits, and zero where they are the same.

One must distinguish the bitwise operators `&` and `|` from the
logical operators `&&` and `||`, which imply left-to-right
evaluation of a truth value. For example, if `x` is 1 and `y` is 2,
then `x & y` is zero while `x && y` is one.

The shift operators `<<` and `>>` perform left and right shifts of
their left operand by the number of bit positions given by the right operand,
which must be non-negative. Thus `x << 2` shifts the value of
`x` by two positions, filling vacated bits with zero; this is equivalent to
multiplication by 4. Right shifting an `unsigned` quantity always fits
the vacated bits with zero. Right shifting a signed quantity will fill with
bit signs (``arithmetic shift'') on some machines and with 0-bits (``logical
shift'') on others.

The unary operator `~` yields the one's complement of an integer; that
is, it converts each 1-bit into a 0-bit and vice versa. For example

x = x & ~077sets the last six bits of

As an illustration of some of the bit operators, consider the function
`getbits(x,p,n)` that returns the (right adjusted) `n`-bit
field of `x` that begins at position `p`. We assume that bit
position 0 is at the right end and that `n` and `p` are
sensible positive values. For example, `getbits(x,4,3)` returns the
three bits in positions 4, 3 and 2, right-adjusted.

/* getbits: get n bits from position p */ unsigned getbits(unsigned x, int p, int n) { return (x >> (p+1-n)) & ~(~0 << n); }The expression

**Exercise 2-6.** Write a function `setbits(x,p,n,y)` that returns
`x` with the `n` bits that begin at position `p` set to the rightmost
`n` bits of `y`, leaving the other bits unchanged.

**Exercise 2-7.** Write a function `invert(x,p,n)` that returns `x`
with the `n` bits that begin at position `p` inverted (i.e., 1
changed into 0 and vice versa), leaving the others unchanged.

**Exercise 2-8.** Write a function `rightrot(x,n)` that returns the
value of the integer `x` rotated to the right by `n` positions.

i = i + 2in which the variable on the left side is repeated immediately on the right, can be written in the compressed form

i += 2The operator

Most binary operators (operators like `+` that have a left and right
operand) have a corresponding assignment operator *op*`=`, where
*op* is one of

+ - * / % << >> & ^ |If

is equivalent toexpr_{1}op= expr_{2}

except thatexpr= (_{1}expr)_{1}op(expr)_{2}

x *= y + 1means

x = x * (y + 1)rather than

x = x * y + 1As an example, the function

/* bitcount: count 1 bits in x */ int bitcount(unsigned x) { int b; for (b = 0; x != 0; x >>= 1) if (x & 01) b++; return b; }Declaring the argument

Quite apart from conciseness, assignment operators have the advantage that
they correspond better to the way people think. We say ``add 2 to `i`''
or ``increment `i` by 2'', not ``take `i`, add 2, then put the result
back in `i`''. Thus the expression `i += 2` is preferable to
`i = i+2`. In addition, for a complicated expression like

yyval[yypv[p3+p4] + yypv[p1]] += 2the assignment operator makes the code easier to understand, since the reader doesn't have to check painstakingly that two long expressions are indeed the same, or to wonder why they're not. And an assignment operator may even help a compiler to produce efficient code.

We have already seen that the assignment statement has a value and can occur in expressions; the most common example is

while ((c = getchar()) != EOF) ...The other assignment operators (

In all such expressions, the type of an assignment expression is the type of its left operand, and the value is the value after the assignment.

**Exercise 2-9.** In a two's complement number system, `x &= (x-1)`
deletes the rightmost 1-bit in `x`. Explain why. Use this observation to
write a faster version of `bitcount`.

if (a > b) z = a; else z = b;compute in

the expressionexpr?_{1}expr:_{2}expr_{3}

z = (a > b) ? a : b; /* z = max(a, b) */It should be noted that the conditional expression is indeed an expression, and it can be used wherever any other expression can be. If

(n > 0) ? f : nis of type

Parentheses are not necessary around the first expression of a conditional
expression, since the precedence of `?:` is very low, just above
assignment. They are advisable anyway, however, since they make the condition
part of the expression easier to see.

The conditional expression often leads to succinct code. For example, this
loop prints `n` elements of an array, 10 per line, with each column
separated by one blank, and with each line (including the last) terminated by
a newline.

for (i = 0; i < n; i++) printf("%6d%c", a[i], (i%10==9 || i==n-1) ? '\n' : ' ');A newline is printed after every tenth element, and after the

printf("You have %d items%s.\n", n, n==1 ? "" : "s");

Operators | Associativity |
---|---|

() [] -> . | left to right |

! ~ ++ -- + - * (type) sizeof | right to left |

* / % | left to right |

+ - | left to right |

<< >> | left to right |

< <= > >= | left to right |

== != | left to right |

& | left to right |

^ | left to right |

| | left to right |

&& | left to right |

|| | left to right |

?: | right to left |

= += -= *= /= %= &= ^= |= <<= >>= | right to left |

, | left to right |

Unary & +, -, and * have higher precedence than the binary forms.

**Table 2.1:** Precedence and Associativity of Operators

Note that the precedence of the bitwise operators `&`, `^`,
and `|` falls below `==` and `!=`. This implies that
bit-testing expressions like

if ((x & MASK) == 0) ...must be fully parenthesized to give proper results.

C, like most languages, does not specify the order in which the operands of
an operator are evaluated. (The exceptions are `&&`,
`||`, `?:`, and ``,`'.) For example, in a statement like

x = f() + g();

Similarly, the order in which function arguments are evaluated is not specified, so the statement

printf("%d %d\n", ++n, power(2, n)); /* WRONG */can produce different results with different compilers, depending on whether

++n; printf("%d %d\n", n, power(2, n));Function calls, nested assignment statements, and increment and decrement operators cause ``side effects'' - some variable is changed as a by-product of the evaluation of an expression. In any expression involving side effects, there can be subtle dependencies on the order in which variables taking part in the expression are updated. One unhappy situation is typified by the statement

a[i] = i++;The question is whether the subscript is the old value of

The moral is that writing code that depends on order of evaluation is a bad
programming practice in any language. Naturally, it is necessary to know what
things to avoid, but if you don't know *how* they are done on various
machines, you won't be tempted to take advantage of a particular
implementation.

Back to Chapter 1 -- Index -- Chapter 3