Automatic SLIP link maintenance.
Tom Jennings <tomj@wps.com>
18 Aug 93

This is a shell script system to automatically keep a SLIP link alive.
It assumes you have a "Hayes type" modem (ZyXEL, Telebit, US Robotics,
etc), you have KERMIT available, use SLATTACH, cron, etc.  This
was coded for 386BSD Jolitz, moved to BSDI's BSD/386 with no change.

BRIEFLY:

slip-check works by testing the SLIP link every (suggested) five
minutes, and if it does not respond, assumes the link is dead, and
reestablishes it. By default, when the link dies and requires
restarting, you get mail.

The kermit script assumes you have a "normal" login/password prompt
pair. If you do not, you'll have to hack the kermit.dial script. 


WHAT ERRORS IT RECOVERS FROM:

Modem carrier loss from any cause; momentary power outage, unplugged
modem, etc.

WHAT IT DOES NOT RECOVER FROM:

Everything else.


FILES:

Here are the files involved. My installation assumes the unique files
are in /etc/slip, created for the occasion. The pathname is buried in
some of the scripts (don't worry they're small).

README				duh

slip-check*			what does all the work, via cron
slip-parms			oh, stuff like phone number, password, etc
slip-restart*			strtup for your /etc/netstart

kermit.dial*			a nice, reasonably smart dialer
kermit.doc			some kermit dox

test*				handy script tester

Files you'll need to edit:

/etc/netstart			...to invoke slip-restart
crontab				...to make cron run the slip-check script


HOW TO INSTALL IT:

0: Make sure the modem is initialized the way WE like it! (See below.)

1. Edit slip-parms; it contains the tty port, phone number, login,
password, etc.

2. Edit slip-check and slip-restart; make sure the pathnames are
correct, etc.

3. TEST the script with the test script included. When it works, you'll
get no results -- a single ping (likely viewable on the modems lights)
and nothing else. (Check for that.) Now break the link -- disconnect the
phone line, kill -9 slattach, whatever. Now test should take a long
time. You'll likely see errors from ping. Not to worry. Eventually ping
should attempt to kill slattach, then run kermit, dial, connect, run
slattach and quit.

HEY! NOTE! If you're testing, and Control-C the test script, you must
manually rm /tmp/slip-check.lock, else the script will not run. It's a
lock file created and checked for by slip-check.

4. Put the slip-check invokation into your cron. For example:

0-59/5  *  *  *  *	/etc/slip/slip-check `/bin/cat /etc/slip/slip-parms` 

5. Add to /etc/netstart:

/etc/slip/slip-restart `/bin/cat /etc/slip/slip-parms`


MODEM INITIALIZATION:

Modem initialization is tricky enough to do realtime (I know, I wrote
BBS software) that I would just rather palm it off on you, and tell you
to save it in NVRAM. 


IGNORE DTR IS REQUIRED. After kermit establishes the connection, it
terminates, not suspends, and since it owned the tty link, DTR goes
false. When slattach is run, it raises DTR. If you don't set ignore-DTR,
then after kermit makes the connection, it will be lost as soon as
kermit terminates! The alternative is to run slattach under kermit,
which seems like such a waste to me.

These settings are REQUIRED:

(All are prefixed with AT, of course)

			ZyXEL
No command echo		E0
Fixed DTE rate		&b1
IGNORE DTR!!!		&d0	REQUIRED!
Hardware handshake	&h3
enable "+++"		S2=43

HOW IT WORKS:

Assume initially the SLIP link is installed, running, etc.

cron runs slip-check every N minutes. Paremeters come from a separate
file, slip-parms, so that you don't have to embed login and password
into the script. slip-check tests the link by assuming a single
ping packet to the other end of your slip link (which address
contained in slip-parms). If the ping succeeds, the test ends there.
If it fails, the test is repeated in two seconds; then four seconds;
then eight seconds. If any one ping succeeds, the link is assumed
to be OK.

If all the pings fail, the link is assumed to be dead. While of course
"the SLIP link is dead" could mean anything, in fact it is assumed that
it once worked; you have the software installation correct, your
roomates have not borrowed your serial cable to tie up a package, etc.

slip-check uses ps to find any running slattach program, and kills it.
(Did I forget to tell you your cron must run as root.) It then invokes
KERMIT with it's script. KERMIT does all the nasty work of disconnecting
the link (if still connected), dialing, looking for dead modems, etc. 

slip-check uses a lock file in /tmp to prevent more than one slip-check
from running at once. For example when, at 10:05 slip-check is run to
start up the link, and at 10:10 it's still working at dealing with a
worst-case modem or noisy line.


slip-restart clears the lock file, then launches slattach in case the
modem link is still up, likely if you rebooted and your modem is setup
correctly. If not, no harm done; slip-check will bring the link up. 

Except for the lock file clearing, this is optional; if slattach is not
run initially, slip-check will discover it and restart the link anyways.



THINGS THAT LOOK LIKE ACCIDENTS OF CODING BUT PROBABLY AREN'T:

The kermit script does a limited number of retries, rather than camp out
on the modem forever. This is an intentional part of the error recovery
process. If the hardware systems is flat out dead, camping out will not
repair it. If it is some externally-caused but hopefully temporary
problem, such as you're moving phone lines, telco is fraying wires up on
a pole, etc, checking every N minutes is more than adequate.

An finally and most importantly, it give you the chance to manually get
in with KERMIT or TIP to check out the modem (though the scripts report
if the modem is alive) etc.


After kermit terminates, for any reason, slattach is run, even if the
link is dead. It's simply easier than checking, and nothing better
happens if you DON'T run slattach if the modem link is dead. Big deal.
Also, if the real problem is "downstream" this system won't catch that,
and if the link comes alive you'll be all set.
