18.6. Interpreting and Writing Rewrite Rules

Arguably the most powerful feature of sendmail is the rewrite rule. Rewrite rules are used by sendmail to determine how to process a received mail message. sendmail passes the addresses from the headers of a mail message through collections of rewrite rules called rulesets The rewrite rules transform a mail address from one form to another and you can think of them as being similar to a command in your editor that replaces all text matching a specified pattern with another.

Each rule has a lefthand side and a righthand side, separated by at least one tab character. When sendmail is processing mail it scans through the rewriting rules looking for a match on the lefthand side. If an address matches the lefthand side of a rewrite rule, the address is replaced by the righthand side and processed again.

18.6.1. sendmail.cf R and S Commands

In the sendmail.cf file, the rulesets are defined using commands coded as Sn, where n specifies the ruleset that is considered the current one.

The rules themselves appear in commands coded as R. As each R command is read, it is added to the current ruleset.

If you're dealing only with the sendmail.mc file, you won't need to worry about S commands at all, as the macros will build those for you. You will need to manually code your R rules.

A sendmail ruleset therefore looks like:
Sn
Rlhs rhs
Rlhs2 rhs2

18.6.2. Some Useful Macro Definitions

sendmail uses a number of standard macro definitions internally. The most useful of these in writing rulesets are:

$j

The fully qualified domain name of this host.

$w

The hostname component of the FQDN.

$m

The domain name component of the FQDN.

We can incorporate these macro definitions in our rewrite rules. Our Virtual Brewery configuration uses the $m macro.

18.6.3. The Lefthand Side

In the lefthand side of a rewriting rule, you specify a pattern that will match an address you wish to transform. Most characters are matched literally, but there are a number of characters that have special meaning; these are described in the following list. The rewrite rules for the lefthand side are:

$@

Match exactly zero tokens

$*

Match zero or more tokens

$+

Match one or more tokens

$-

Match exactly one token

$=x

Match any phrase in class x

$~x

Match any word not in class x

A token is a string of characters delimited by spaces. There is no way to include spaces in a token, nor is it necessary, as the expression patterns are flexible enough to work around this need. When a rule matches an address, the text matched by each of the patterns in the expression will be assigned to special variables that we'll use in the righthand side. The only exception to this is the $@, which matches no tokens and therefore will never generate text to be used on the righthand side.

18.6.4. The Righthand Side

When the lefthand side of a rewrite rule matches an address, the original text is deleted and replaced by the righthand side of the rule. All tokens in the righthand side are copied literally, unless they begin with a dollar sign. Just as for the lefthand side, a number of metasymbols may be used on the righthand side. These are described in the following list. The rewrite rules for the righthand side are:

$n

This metasymbol is replaced with the n'th expression from the lefthand side.

$[name$]

This metasymbol resolves hostname to canonical name. It is replaced by the canonical form of the host name supplied.

$(map key $@arguments $:default $)

This is the more general form of lookup. The output is the result of looking up key in the map named map passing arguments as arguments. The map can be any of the maps that sendmail supports such as the virtusertable that we describe a little later. If the lookup is unsuccessful, default will be output. If a default is not supplied and lookup fails, the input is unchanged and key is output.

$>n

This will cause the rest of this line to be parsed and then given to ruleset n to evaluate. The output of the called ruleset will be written as output to this rule. This is the mechanism that allows rules to call other rulesets.

$#mailer

This metasymbol causes ruleset evaluation to halt and specifies the mailer that should be used to transport this message in the next step of its delivery. This metasymbol should be called only from ruleset 0 or one of its subroutines. This is the final stage of address parsing and should be accompanied by the next two metasymbols.

$@host

This metasymbol specifies the host that this message will be forwarded to. If the destination host is the local host, it may be omitted. The host may be a colon-separated list of destination hosts that will be tried in sequence to deliver the message.

$:user

This metasymbol specifies the target user for the mail message.

A rewrite rule that matches is normally tried repeatedly until it fails to match, then parsing moves on to the next rule. This behavior can be changed by preceding the righthand side with one of two special righthand side metaymbols described in the following list. The rewrite rules for a righthand side loop control metasymbols are:

$@

This metasymbol causes the ruleset to return with the remainder of the righthand side as the value. No other rules in the ruleset are evaluated.

$:

This metasymbol causes this rule to terminate immediately, but the rest of the current ruleset is evaluated.

18.6.5. A Simple Rule Pattern Example

To better see how the macro substitution patterns operate, consider the following rule lefthand side:
$* < $+ >

This rule matches “Zero or more tokens, followed by the < character, followed by one or more tokens, followed by the > character.”

If this rule were applied to brewer@vbrew.com or Head Brewer < >, the rule would not match. The first string would not match because it does not include a < character, and the second would fail because $+ matches one or more tokens and there are no tokens between the <> characters. In any case in which a rule does not match, the righthand side of the rule is not used.

If the rule were applied to Head Brewer < brewer@vbrew.com >, the rule would match, and on the righthand side $1 would be substituted with Head Brewer and $2 would be substituted with brewer@vbrew.com.

If the rule were applied to < brewer@vbrew.com > the rule would match because $* matches zero or more tokens, and on the righthand side $1 would be substituted with the empty string.

18.6.6. Ruleset Semantics

Each of the sendmail rulesets is called upon to perform a different task in mail processing. When you are writing rules, it is important to understand what each of the rulesets are expected to do. We'll look at each of the rulesets that the m4 configuration scripts allow us to modify:

LOCAL_RULE_3

Ruleset 3 is responsible for converting an address in an arbitrary format into a common format that sendmail will then process. The output format expected is the familiar looking local-part@host-domain-spec.

Ruleset 3 should place the hostname part of the converted address inside the < and > characters to make parsing by later rulesets easier. Ruleset 3 is applied before sendmail does any other processing of an email address, so if you want sendmail to gateway mail from some system that uses some unusual address format, you should add a rule using the LOCAL_RULE_3 macro to convert addresses into the common format.

LOCAL_RULE_0 and LOCAL_NET_CONFIG

Ruleset 0 is applied to recipient addresses by sendmail after Ruleset 3. The LOCAL_NET_CONFIG macro causes rules to be inserted into the bottom half of Ruleset 0.

Ruleset 0 is expected to perform the delivery of the message to the recipient, so it must resolve to a triple that specifies each of the mailer, host, and user. The rules will be placed before any smart host definition you may include, so if you add rules that resolve addresses appropriately, any address that matches a rule will not be handled by the smart host. This is how we handle the direct smtp for the users on our local LAN in our example.

LOCAL_RULE_1 and LOCAL_RULE_2

Ruleset 1 is applied to all sender addresses and Ruleset 2 is applied to all recipient addresses. They are both usually empty.

18.6.6.1. Interpreting the rule in our example

Our sample in Example 18-3 uses the LOCAL_NET_CONFIG macro to declare a local rule that ensures that any mail within our domain is delivered directly using the smtp mailer. Now that we've looked at how rewrite rules are constructed, we will be able to understand how this rule works. Let's take another look at it.

Example 18-3. Rewrite Rule from vstout.uucpsmtp.m4

LOCAL_NET_CONFIG
# This rule ensures that all local mail is delivered using the
# smtp transport, everything else will go via the smart host.
R$* < @ $* .$m. > $*  $#smtp $@ $2.$m. $: $1 < @ $2.$m. > $3

We know that the LOCAL_NET_CONFIG macro will cause the rule to be inserted somewhere near the end of ruleset 0, but before any smart host definition. We also know that ruleset 0 is the last ruleset to be executed and that it should resolve to a three-tuple specifying the mailer, user, and host.

We can ignore the two comment lines; they don't do anything useful. The rule itself is the line beginning with R. We know that the R is a sendmail command and that it adds this rule to the current ruleset, in this case ruleset 0. Let's look at the lefthand side and the righthand side in turn.

The lefthand side looks like: $* < @ $* .$m. > $*.

Ruleset 0 expects < and > characters because it is fed by ruleset 3. Ruleset 3 converts addresses into a common form and to make parsing easier, it also places the host part of the mail address inside <>s.

This rule matches any mail address that looks like: 'DestUser < @ somehost.ourdomain. > Some Text'. That is, it matches mail for any user at any host within our domain.

You will remember that the text matched by metasymbols on the lefthand side of a rewrite rule is assigned to macro definitions for use on the righthand side. In our example, the first $* matches all text from the start of the address until the < character. All of this text is assigned to $1 for use on the righthand side. Similarly the second $* in our rewrite rule is assigned to $2, and the last is assigned to $3.

We now have enough to understand the lefthand side. This rule matches mail for any user at any host within our domain. It assigns the username to $1, the hostname to $2, and any trailing text to $3. The righthand side is then invoked to process these.

Let's now look at what we're expecting to see outputed. The righthand side of our example rewrite rule looks like: $#smtp $@ $2.$m. $: $1 < @ $2.$m. > $3.

When the righthand side of our ruleset is processed, each of the metasymbols are interpreted and relevant substitutions are made.

The $# metasymbol causes this rule to resolve to a specific mailer, smtp in our case.

The $@ resolves the target host. In our example, the target host is specified as $2.$m., which is the fully qualified domain name of the host on in our domain. The FQDN is constructed of the hostname component assigned to $2 from our lefthand side with our domain name (.$m.) appended.

The $: metasymbol specifies the target user, which we again captured from the lefthand side and had stored in $1.

We preserve the contents of the <> section, and any trailing text, using the data we collected from the lefthand side of the rule.

Since this rule resolves to a mailer, the message is forwarded to the mailer for delivery. In our example, the message would be forwarded to the destination host using the SMTP protocol.