Discussion:
[rancid] F5 backups are only working for some hosts via cron, but always manually
Stephan Seitz
2013-12-04 16:22:50 UTC
Permalink
Hi!

I have two F5 clusters, bigip2a/bigip2b and bigip3a/bigip3b.
If I start rancid-run for these four hosts manually all four hosts are
backuped without problems. But if rancid-run is launched via cron I get
error messages for the two b systems (bigip2b and bigip3b).

The rancid log shows messages like:
bigip3b: found unexpected command - "bigpipe list"
bigip2b: missed cmd(s): ls --full-time --color=never /config/ssl/ssl.crt,ls --full-time --color=never /config/ssl/ssl.key
bigip3b: missed cmd(s): ls --full-time --color=never /config/ssl/ssl.crt,ls --full-time --color=never /config/ssl/ssl.key
bigip3b: End of run not found

Why don’t I have the same errors if I run „rancid-run bigip3b”? And how
can I fix it?

Shade and sweet water!

Stephan
--
| Stephan Seitz E-Mail: ***@fsing.rootsland.net |
| Public Keys: http://fsing.rootsland.net/~stse/keys.html |
Alan McKinnon
2013-12-04 20:05:51 UTC
Permalink
Post by Stephan Seitz
Hi!
I have two F5 clusters, bigip2a/bigip2b and bigip3a/bigip3b.
If I start rancid-run for these four hosts manually all four hosts are
backuped without problems. But if rancid-run is launched via cron I get
error messages for the two b systems (bigip2b and bigip3b).
bigip3b: found unexpected command - "bigpipe list"
bigip2b: missed cmd(s): ls --full-time --color=never
/config/ssl/ssl.crt,ls --full-time --color=never /config/ssl/ssl.key
bigip3b: missed cmd(s): ls --full-time --color=never
/config/ssl/ssl.crt,ls --full-time --color=never /config/ssl/ssl.key
bigip3b: End of run not found
Why don’t I have the same errors if I run „rancid-run bigip3b”? And how
can I fix it?
Shade and sweet water!
Stephan
This is a common problem with cron and has little to do with the program
being run and everything to do with the environment. Cron does not run
out of a login shell, so the environment is not set up at all. As a
human user you get used to this being there and forget it is entirely
absent from cron.

Look in the .raw files generated by f5rancid and see what's going around
the end of 'bigpipe route static show' and beginning of 'ls --full-time
--color=never /config/ssl/ssl.crt'

With luck you'll find clues as to an environment-related cause.

Unfortunately there's no easy answer to your question as there are 100s
of possible causes. You need to look closely at your own unique results.
--
Alan McKinnon
***@gmail.com
Stephan Seitz
2013-12-04 21:14:02 UTC
Permalink
Post by Alan McKinnon
Look in the .raw files generated by f5rancid and see what's going around
the end of 'bigpipe route static show' and beginning of 'ls --full-time
--color=never /config/ssl/ssl.crt'
Thanks, I will analyse the environment and the .raw files.

Shade and sweet water!

Stephan
--
| Stephan Seitz E-Mail: ***@fsing.rootsland.net |
| Public Keys: http://fsing.rootsland.net/~stse/keys.html |
Stephan Seitz
2013-12-05 15:22:36 UTC
Permalink
Post by Alan McKinnon
This is a common problem with cron and has little to do with the program
being run and everything to do with the environment. Cron does not run
Well, I can say after several tests that I have probably found the
problem, but I don’t understand it.

The command:
clogin -t 90 -c"bigpipe version;bigpipe platform;cat /config/bigip.license;bigpipe monitor list all;bigpipe profile list;bigpipe base list;bigpipe db show;bigpipe route static show;ls --full-time --color=never /config/ssl/ssl.crt;ls --full-time --color=never /config/ssl/ssl.key;bigpipe list" bigip2b

a) called directly from the rancid user shell
b) called from the root shell via „su - rancid -c <command>”

Looking at the different outputs of the problematic lines I can see the
following:
a)
[***@bigip2b:Standby] config # ls --full-time --color=never /config/ssl/ssl.crt^M
<output>
[***@bigip2b:Standby] config # ls --full-time --color=never /config/ssl/ssl.key^M
<output>
[***@bigip2b:Standby] config # bigpipe list^M
<output>

b)
[***@bigip2b:Standby] config # ls --full-time --color=never /config/ssl/ssl.crt ^M^[[A[***@bigip2b:Standby] config # ls --full-time --color=never /config/ssl/ssl.cr^[[Kt^M
<output>
ls --full-time --color=never /config/ssl/ssl.key^M
[***@bigip2b:Standby] config # ls --full-time --color=never /config/ssl/ssl.key ^M^[[A[***@bigip2b:Standby] config #total 64^M
<output>
e[***@bigip2b:Standby] config # bigpipe list^M
<output>

So besides the CR characters the second output shows some other control
characters which probably confuses the parser in the end.

Is this the right conclusion? But why do I get these additional control
characters in the second case?

Shade and sweet water!

Stephan
--
| Stephan Seitz E-Mail: ***@fsing.rootsland.net |
| Public Keys: http://fsing.rootsland.net/~stse/keys.html |
Alan McKinnon
2013-12-05 19:35:18 UTC
Permalink
Post by Stephan Seitz
Post by Alan McKinnon
This is a common problem with cron and has little to do with the program
being run and everything to do with the environment. Cron does not run
Well, I can say after several tests that I have probably found the
problem, but I don’t understand it.
clogin -t 90 -c"bigpipe version;bigpipe platform;cat
/config/bigip.license;bigpipe monitor list all;bigpipe profile
list;bigpipe base list;bigpipe db show;bigpipe route static show;ls
--full-time --color=never /config/ssl/ssl.crt;ls --full-time
--color=never /config/ssl/ssl.key;bigpipe list" bigip2b
a) called directly from the rancid user shell
b) called from the root shell via „su - rancid -c <command>”
Looking at the different outputs of the problematic lines I can see the
a)
/config/ssl/ssl.crt^M
<output>
/config/ssl/ssl.key^M
<output>
<output>
b)
--color=never /config/ssl/ssl.cr^[[Kt^M
<output>
ls --full-time --color=never /config/ssl/ssl.key^M
<output>
<output>
So besides the CR characters the second output shows some other control
characters which probably confuses the parser in the end.
Is this the right conclusion? But why do I get these additional control
characters in the second case?
Those ANSI escape sequences containing "[" are never supposed to be
echoed to the screen at all, they are controls to the terminal emulator
to take some action or other.

To see why you get them in case b) look the the man page for su under
option -c:

-c, --command COMMAND
Specify a command that will be invoked by the shell using its -c.

The executed command will have no controlling terminal. This
option cannot be used to
execute interractive programs which need a controlling TTY.

You disable the terminal emulator with -c, so the escape sequences are
passed through and not acted on.

We now need to check what actually happens in your cron jobs. What is
the content of rancid.conf, especially the settings NOPIPE, PATH and TERM?
--
Alan McKinnon
***@gmail.com
Stephan Seitz
2013-12-06 13:09:20 UTC
Permalink
Post by Alan McKinnon
Those ANSI escape sequences containing "[" are never supposed to be
echoed to the screen at all, they are controls to the terminal emulator
to take some action or other.
To see why you get them in case b) look the the man page for su under
-c, --command COMMAND
Specify a command that will be invoked by the shell using its -c.
The executed command will have no controlling terminal. This
option cannot be used to
execute interractive programs which need a controlling TTY.
Ah, thank you very much for the explanation. So I’ll better test via
cron.
Post by Alan McKinnon
We now need to check what actually happens in your cron jobs. What is
the content of rancid.conf, especially the settings NOPIPE, PATH and TERM?
TERM=xterm;export TERM
LC_COLLATE=”POSIX”; export LC_COLLATE
umask 027
TMPDIR=/tmp; export TMPDIR
BASEDIR=/var/lib/rancid; export BASEDIR
PATH=/usr/lib/rancid/bin:/usr/bin:/usr/sbin:/bin:/usr/local/bin:/usr/bin;
export PATH
CVSROOT=$BASEDIR/CVS; export CVSROOT
LOGDIR=$BASEDIR/logs; export LOGDIR
RCSSYS=svn; export RCSSYS
ACLSORT=YES; export ACLSORT
FILTER_PWDS=NO; export FILTER_PWDS
NOCOMMSTR=NO; export NOCOMMSTR
LIST_OF_GROUPS=”networking”

The default TERM setting after installation (Debian package) was network,
but this doesn’t exist. So I changed it in the last days when I tried to
analyse the problem.

I tested TERM with the values xterm, linux, and screen together with
NOPIPE=yes and no. But the results are always the same. All Cisco devices
and the two active F5 are working, even with the nonexisting TERM setting
network. The two standby F5 are only working manually.

The environment variables for the rancid user via cron are:
HOME=/var/lib/rancid
LOGNAME=rancid
PATH=/usr/bin:/bin
LANG=en_US.UTF-8
SHELL=/bin/sh
PWD=/var/lib/rancid

The environment variables for the rancid user (bash) are:
SHELL=/bin/bash
TERM=xterm
XDG_SESSION_COOKIE=fe5755edd7b665ab56f270925278ef8f-1386334922.264463-273332059
USER=rancid
MAIL=/var/mail/rancid
PATH=/usr/local/bin:/usr/bin:/bin:/usr/games
PWD=/var/lib/rancid
LANG=en_US.UTF-8
SHLVL=1
HOME=/var/lib/rancid
LOGNAME=rancid
DISPLAY=localhost:10.0
_=/usr/bin/env

I simply don’t understand why the two F5 systems are failing. Since they
are part of a cluster both sides have the same configuration and the same
OS version.

Shade and sweet water!

Stephan
--
| Stephan Seitz E-Mail: ***@fsing.rootsland.net |
| Public Keys: http://fsing.rootsland.net/~stse/keys.html |
Loading...