Discussion:
[rancid] xrrancid vs. IOS-XR on Cisco ASR9000 (ASR9K)
Buchan Milne
2014-10-09 08:01:15 UTC
Permalink
We have been using rancid successfully with IOS- and IOS-XE-based
devices for some time, and we haven't had any real problems.

However, we have replaced some of these devices with Cisco ASR9000s, and
have had trouble with rancid on these. At present we are running IOS-XR
4.3.1 on most, but we have a few running 5.1.2.

The problem seems to be that xrrancid doesn't realise it has retrieved
all the results of the commands, and thus times out waiting for more
data.

If I manually run xrrancid, this still happens:

$ time xrrancid -d $device
executing clogin -t 90 -c"terminal no-timestamp;terminal exec prompt
no-timestamp;admin show version;admin show install summary;admin show
license;admin show variables boot;show redundancy secondary;show install
active;admin show env all;dir /all nvram:;dir /all bootflash:;dir /all
compactflash:;dir /all compactflasha:;dir /all slot0:;dir /all
disk0:;dir /all disk0a:;dir /all slot1:;dir /all disk1:;dir /all
disk1a:;dir /all slot2:;dir /all disk2:;dir /all harddisk:;dir /all
harddiska:;dir /all harddiskb:;dir /all slavenvram:;dir /all
slavebootflash:;dir /all slaveslot0:;dir /all slavedisk0:;dir /all
slaveslot1:;dir /all slavedisk1:;dir /all slaveslot2:;dir /all
slavedisk2:;dir /all sec-nvram:;dir /all sec-bootflash:;dir /all
sec-slot0:;dir /all sec-disk0:;dir /all sec-slot1:;dir /all
sec-disk1:;dir /all sec-slot2:;dir /all sec-disk2:;show
controllers;admin show running;admin show diag;admin show inventory
raw;show vlan;show debug;show rpl maximum;show running-config" $device

...

$device: missed cmd(s): show running-config,dir /all slot0:,admin show
env all,show debug,show vlan,show rpl maximum
$device: missed cmd(s): show running-config,dir /all slot0:,admin show
env all,show debug,show vlan,show rpl maximum
$device: End of run not found
$device: End of run not found
!

real 3m46.481s
user 0m0.900s
sys 0m0.120s


Naturally, if I use clogin and run the commands manually, they complete.

How can I troubleshoot this further?

This far I have been using the distro-supplied tcl/expect packages in
production, but I have applied the patches on my workstation, but they
made no difference.

On my workstation I am using the 2.3.8-1 package I created for Mageia:
http://svnweb.mageia.org/packages/cauldron/rancid/current/

(I have a 3.1 build, but need to consider whether I should add
post-upgrade scripts to change : to ; in router.db files - the behaviour
above was the same with this package)

Incidentally, I have in the past done some terminal-scraping with
IPC::Run, is there a reason why rancid still uses expect, when it might
be more reliable to handle this in the rancid perl script?

Regards,
Buchan
Alan McKinnon
2014-10-09 09:09:59 UTC
Permalink
Post by Buchan Milne
We have been using rancid successfully with IOS- and IOS-XE-based
devices for some time, and we haven't had any real problems.
However, we have replaced some of these devices with Cisco ASR9000s, and
have had trouble with rancid on these. At present we are running IOS-XR
4.3.1 on most, but we have a few running 5.1.2.
The problem seems to be that xrrancid doesn't realise it has retrieved
all the results of the commands, and thus times out waiting for more
data.
$ time xrrancid -d $device
executing clogin -t 90 -c"terminal no-timestamp;terminal exec prompt
no-timestamp;admin show version;admin show install summary;admin show
license;admin show variables boot;show redundancy secondary;show install
active;admin show env all;dir /all nvram:;dir /all bootflash:;dir /all
compactflash:;dir /all compactflasha:;dir /all slot0:;dir /all
disk0:;dir /all disk0a:;dir /all slot1:;dir /all disk1:;dir /all
disk1a:;dir /all slot2:;dir /all disk2:;dir /all harddisk:;dir /all
harddiska:;dir /all harddiskb:;dir /all slavenvram:;dir /all
slavebootflash:;dir /all slaveslot0:;dir /all slavedisk0:;dir /all
slaveslot1:;dir /all slavedisk1:;dir /all slaveslot2:;dir /all
slavedisk2:;dir /all sec-nvram:;dir /all sec-bootflash:;dir /all
sec-slot0:;dir /all sec-disk0:;dir /all sec-slot1:;dir /all
sec-disk1:;dir /all sec-slot2:;dir /all sec-disk2:;show
controllers;admin show running;admin show diag;admin show inventory
raw;show vlan;show debug;show rpl maximum;show running-config" $device
...
$device: missed cmd(s): show running-config,dir /all slot0:,admin show
env all,show debug,show vlan,show rpl maximum
$device: missed cmd(s): show running-config,dir /all slot0:,admin show
env all,show debug,show vlan,show rpl maximum
$device: End of run not found
$device: End of run not found
!
real 3m46.481s
user 0m0.900s
sys 0m0.120s
Naturally, if I use clogin and run the commands manually, they complete.
How can I troubleshoot this further?
This far I have been using the distro-supplied tcl/expect packages in
production, but I have applied the patches on my workstation, but they
made no difference.
http://svnweb.mageia.org/packages/cauldron/rancid/current/
(I have a 3.1 build, but need to consider whether I should add
post-upgrade scripts to change : to ; in router.db files - the behaviour
above was the same with this package)
Incidentally, I have in the past done some terminal-scraping with
IPC::Run, is there a reason why rancid still uses expect, when it might
be more reliable to handle this in the rancid perl script?
Regards,
Buchan
Hi Buchan

[ it's been a while :-) ]

You're on the right track, rancid is not detecting the start of a
command in the output (or perhaps not detecting the end of the previous
command).

Ideally I'd need the entire output of an xrrancid -d run on a failing
device, but that's not usually feasible, let's try this instead:

For each failing command, grab 5 or so lines of text from the output of
xrrancid -d above and below where clogin runs that command. There will
be something there that confuses the prompt detection code:

There's a regex that searches for /^$prompt/ and it gets easily
confused. I well recall one case with IS's XRs where one of the admin
show commands output wasn't ending properly with a newline making the
regex fail. Those are the kinds of oddities you need to look for.

There's also the old classic problem of having "#" or ">" characters in
your login banner, that one causes the entire run to fail.
--
Alan McKinnon
***@gmail.com
Buchan Milne
2014-10-14 08:04:24 UTC
Permalink
Post by Buchan Milne
We have been using rancid successfully with IOS- and IOS-XE-based
devices for some time, and we haven't had any real problems.
However, we have replaced some of these devices with Cisco ASR9000s, and
have had trouble with rancid on these. At present we are running IOS-XR
4.3.1 on most, but we have a few running 5.1.2.
The problem seems to be that xrrancid doesn't realise it has retrieved
all the results of the commands, and thus times out waiting for more
data.
$ time xrrancid -d $device
executing clogin -t 90 -c"terminal no-timestamp;terminal exec prompt
no-timestamp;admin show version;admin show install summary;admin show
license;admin show variables boot;show redundancy secondary;show install
active;admin show env all;dir /all nvram:;dir /all bootflash:;dir /all
compactflash:;dir /all compactflasha:;dir /all slot0:;dir /all
disk0:;dir /all disk0a:;dir /all slot1:;dir /all disk1:;dir /all
disk1a:;dir /all slot2:;dir /all disk2:;dir /all harddisk:;dir /all
harddiska:;dir /all harddiskb:;dir /all slavenvram:;dir /all
slavebootflash:;dir /all slaveslot0:;dir /all slavedisk0:;dir /all
slaveslot1:;dir /all slavedisk1:;dir /all slaveslot2:;dir /all
slavedisk2:;dir /all sec-nvram:;dir /all sec-bootflash:;dir /all
sec-slot0:;dir /all sec-disk0:;dir /all sec-slot1:;dir /all
sec-disk1:;dir /all sec-slot2:;dir /all sec-disk2:;show
controllers;admin show running;admin show diag;admin show inventory
raw;show vlan;show debug;show rpl maximum;show running-config" $device
...
$device: missed cmd(s): show running-config,dir /all slot0:,admin show
env all,show debug,show vlan,show rpl maximum
$device: missed cmd(s): show running-config,dir /all slot0:,admin show
env all,show debug,show vlan,show rpl maximum
$device: End of run not found
$device: End of run not found
!
real 3m46.481s
user 0m0.900s
sys 0m0.120s
Naturally, if I use clogin and run the commands manually, they complete.
The problem seems to stem from the following configuration we had on our
IOS-XR boxes:

line default
cli whitespace completion

(which completes commands when you hit space, rather than requiring tab)


This seems to be confusing xrrancid, as I was getting lines like this
from the debug output:

HIT COMMAND:RP/0/RSP0/CPU0:ROUTERNAME#terminalRP/0/RSP0/CPU0:ROUTERNAME#terminal execRP/0/RSP0/CPU0:ROUTERNAME#terminal exec promptRP/0/RSP0/CPU0:ROUTERNAME#terminal exec prompt no-timestamp


Having compared the output to a router running IOS, I saw that the prompt wasn't repeating.

After disabling 'cli whitespace completion', the debug logs look like this:

HIT COMMAND:RP/0/RSP0/CPU0:NBSC-TI-PR-OMICRON-00#terminal exec prompt no-timestamp
In FilterAll: RP/0/RSP0/CPU0:NBSC-TI-PR-OMICRON-00#terminal exec prompt no-timestamp


And now the run completes with all commands and output in 22s.

Unfortunately, it seems Cisco professional services likes to enable this feature by default. It would be nice if xrrancid could be improved to work with this feature enabled.

Regards,
Buchan
Buchan Milne
2014-10-14 15:04:59 UTC
Permalink
Post by Buchan Milne
Post by Buchan Milne
We have been using rancid successfully with IOS- and IOS-XE-based
devices for some time, and we haven't had any real problems.
However, we have replaced some of these devices with Cisco ASR9000s, and
have had trouble with rancid on these. At present we are running IOS-XR
4.3.1 on most, but we have a few running 5.1.2.
The problem seems to be that xrrancid doesn't realise it has retrieved
all the results of the commands, and thus times out waiting for more
data.
$ time xrrancid -d $device
executing clogin -t 90 -c"terminal no-timestamp;terminal exec prompt
no-timestamp;admin show version;admin show install summary;admin show
license;admin show variables boot;show redundancy secondary;show install
active;admin show env all;dir /all nvram:;dir /all bootflash:;dir /all
compactflash:;dir /all compactflasha:;dir /all slot0:;dir /all
disk0:;dir /all disk0a:;dir /all slot1:;dir /all disk1:;dir /all
disk1a:;dir /all slot2:;dir /all disk2:;dir /all harddisk:;dir /all
harddiska:;dir /all harddiskb:;dir /all slavenvram:;dir /all
slavebootflash:;dir /all slaveslot0:;dir /all slavedisk0:;dir /all
slaveslot1:;dir /all slavedisk1:;dir /all slaveslot2:;dir /all
slavedisk2:;dir /all sec-nvram:;dir /all sec-bootflash:;dir /all
sec-slot0:;dir /all sec-disk0:;dir /all sec-slot1:;dir /all
sec-disk1:;dir /all sec-slot2:;dir /all sec-disk2:;show
controllers;admin show running;admin show diag;admin show inventory
raw;show vlan;show debug;show rpl maximum;show running-config" $device
...
$device: missed cmd(s): show running-config,dir /all slot0:,admin show
env all,show debug,show vlan,show rpl maximum
$device: missed cmd(s): show running-config,dir /all slot0:,admin show
env all,show debug,show vlan,show rpl maximum
$device: End of run not found
$device: End of run not found
!
real 3m46.481s
user 0m0.900s
sys 0m0.120s
Naturally, if I use clogin and run the commands manually, they complete.
The problem seems to stem from the following configuration we had on our
line default
cli whitespace completion
(which completes commands when you hit space, rather than requiring tab)
This seems to be confusing xrrancid, as I was getting lines like this
HIT COMMAND:RP/0/RSP0/CPU0:ROUTERNAME#terminalRP/0/RSP0/CPU0:ROUTERNAME#terminal execRP/0/RSP0/CPU0:ROUTERNAME#terminal exec promptRP/0/RSP0/CPU0:ROUTERNAME#terminal exec prompt no-timestamp
Having compared the output to a router running IOS, I saw that the prompt wasn't repeating.
HIT COMMAND:RP/0/RSP0/CPU0:NBSC-TI-PR-OMICRON-00#terminal exec prompt no-timestamp
In FilterAll: RP/0/RSP0/CPU0:NBSC-TI-PR-OMICRON-00#terminal exec prompt no-timestamp
And now the run completes with all commands and output in 22s.
Last remaining problem is that the file harddisk:ce_switch.log seems to
change very regularly, with contents such as:

Oct 14 14:31:15.340::ce_switch_set_port_led() line:2675 set port:1 to LED state:3


I have made the following change to get rid of it:

@@ -581,8 +581,8 @@
return(-1) if (/command authorization failed/i);
return(1) if (/(Open device \S+ failed|Error opening \S+:)/);
# filter frequently changing files from IOX bootflash / hardiska
- if ($dev =~ /(bootflash|harddiska)/) {
- if (/(temp_cont|uptime_cont|volt_cont)\s*$/) {
+ if ($dev =~ /(bootflash|harddiska|harddisk)/) {
+ if (/(temp_cont|uptime_cont|volt_cont|ce_switch.log)\s*$/) {
if (/(\s*\d+\s+\S+\s+)(\d+)(\s+)(\w+ \w+\s+\d+ \d+:\d+:\d+ \d+)/) {
my($a, $sz, $c, $dt, $rem) = ($1, $2, $3, $4, $');
my($szl, $dtl) = (length($sz), length($dt));
@@ -1342,6 +1342,7 @@


Regards,
Buchan
heasley
2014-10-14 15:05:56 UTC
Permalink
Post by Buchan Milne
The problem seems to stem from the following configuration we had on our
line default
cli whitespace completion
(which completes commands when you hit space, rather than requiring tab)
This seems to be confusing xrrancid, as I was getting lines like this
HIT COMMAND:RP/0/RSP0/CPU0:ROUTERNAME#terminalRP/0/RSP0/CPU0:ROUTERNAME#terminal execRP/0/RSP0/CPU0:ROUTERNAME#terminal exec promptRP/0/RSP0/CPU0:ROUTERNAME#terminal exec prompt no-timestamp
Having compared the output to a router running IOS, I saw that the prompt wasn't repeating.
HIT COMMAND:RP/0/RSP0/CPU0:NBSC-TI-PR-OMICRON-00#terminal exec prompt no-timestamp
In FilterAll: RP/0/RSP0/CPU0:NBSC-TI-PR-OMICRON-00#terminal exec prompt no-timestamp
And now the run completes with all commands and output in 22s.
Unfortunately, it seems Cisco professional services likes to enable this feature by default. It would be nice if xrrancid could be improved to work with this feature enabled.
given that the entire command is there, with no completion required,
it should not have output/printed anything. This would be difficult to
filter. if they love it so much, perhaps they will open a DDTS to have
it fixed? or, you, as the customer, can forbid them from enabling it.
Loading...