Discussion:
[rancid] Failure logging in to HP Procurve switches
Meyers, Dan
2012-05-30 16:19:44 UTC
Permalink
I'm having trouble getting rancid to play nicely with some HP Procurve switches we've got on our network. The error I'm getting in the logs is a timeout. If I run hlogin manually with debugging turned on I can see that when it connects to the HPs it does not receive the 'correct' prompt back.

If I specify a command to run using -c "sh run" (for example) rancid is expecting to see "Press any key to continue". What it actually sees varies depending on the device in question. Normally it is something like "Prekey any key to continue" or "Press any key to ctntieue". Neither of these match the expected regexp, so of course a timeout occurs. The odd thing is that if you don't specify a command to run with -c both of these prompts always display and are parsed correctly. These prompts are always displayed correctly when logging into the switch myself from the same server using the same username, password and method.

However rancid running via the rancid-run command still has issues with these switches, so I am not sure if it is doing the equivalent of a -c "<command>" or not. Running hlogin manually with no command gives the following when it hits the prompt:

----------
H2h2hroEP1 drop
xpect: does " \r\r\n\u001b1H\u001b[2h\u001b[?[2h\u001b[2?[2h\u001b[241 droEP1 drop\u001b \u001b" (spawn_id exp6) match regular expression "[\r\n]+"? (No Gate, RE only) gate=yes re=yes
expect: set expect_out(0,string) "\r\r\n"
expect: set expect_out(spawn_id) "exp6"
expect: set expect_out(buffer) " \r\r\n"
expect: continuing expect

expect: does "\u001b1H\u001b[2h\u001b[?[2h\u001b[2?[2h\u001b[241 droEP1 drop\u001b \u001b" (spawn_id exp6) match regular expression "[\r\n]+"? (No Gate, RE only) gate=yes re=no
"^.+#"? Gate "*#"? gate=no
expect: timed out

Error: TIMEOUT reached
----------

What that prompt *actually*reads, if you log into it manually on the command line, is 'RNEP1 drop# '. I'm not sure where all the control characters in the debug output are coming from, and whether they are actually a problem or not. The log for the rancid-run run contains stuff like this:

----------
Trying to get all of the configs.
172.29.1.18 clogin error: Error: TIMEOUT reached
172.29.1.18: missed cmd(s): show stack,show module,show flash,show version,show system-information,write term,show system information
172.29.1.18: End of run not found
;
=====================================
Getting missed routers: round 1.
172.29.1.18 clogin error: Error: TIMEOUT reached
172.29.1.18: missed cmd(s): show stack,show module,show flash,show version,show system-information,write term,show system information
172.29.1.18: End of run not found
;
=====================================
Getting missed routers: round 2.
172.29.1.18 clogin error: Error: TIMEOUT reached
172.29.1.18: missed cmd(s): show stack,show module,show flash,show version,show system-information,write term,show system information
172.29.1.18: End of run not found
;
=====================================
Getting missed routers: round 3.
172.29.1.18 clogin error: Error: TIMEOUT reached
172.29.1.18: missed cmd(s): show stack,show module,show flash,show version,show system-information,write term,show system information
172.29.1.18: End of run not found
;
=====================================
Getting missed routers: round 4.
172.29.1.18 clogin error: Error: TIMEOUT reached
172.29.1.18: missed cmd(s): show stack,show module,show flash,show version,show system-information,write term,show system information
172.29.1.18: End of run not found
;
----------

I note that that says 'clogin error'. Does that mean it's trying to use clogin instead of hlogin even though I've specified in the router.b file that these devices are of type 'hp'? If that is the case, how do I get it to use hlogin?
If I have more than 1 router specified in my router.db file then I get some extra output in the log from some of those, as follows:

----------
Getting missed routers: round 1.
couldn't compile regular expression pattern: parentheses () not balanced
while executing
"expect {
-re $reprompt {}
-re "\[\n\r]+" { exp_continue }
}"
(procedure "run_commands" line 10)
invoked from within
"run_commands $prompt $command"
("foreach" body line 142)
invoked from within
"foreach router [lrange $argv $i end] {
set router [string tolower $router]
send_user "$router\n"

# Figure out prompt.
# Since autoena..."
(file "/usr/lib/rancid/bin/hlogin" line 595)^M
172.29.4.18: missed cmd(s): show stack,show module,show flash,show version,show system-information,write term,show system information
172.29.4.18: End of run not found
----------

All this is running on an Ubuntu 10.04 LTS 4 bit box installed a few months back. If there is a known issue with the release contained therein (2.3.2) I'm happy to upgrade to 12.04 as it's now out, but I figured it was more likely to be something I was doing wrong so I'd ask on here. All the Ciscos and Junipers we have on the network back up fine, it's just all the HPs I'm having a problem with...
--
Dan Meyers
Network Support Specialist, Lancaster University
David Byers
2012-05-30 17:26:11 UTC
Permalink
Post by Meyers, Dan
I'm having trouble getting rancid to play nicely with some HP Procurve switches we've got on our network. The error I'm getting in the logs is a timeout. If I run hlogin manually with debugging turned on I can see that when it connects to the HPs it does not receive the 'correct' prompt back.
If I specify a command to run using -c "sh run" (for example) rancid is expecting to see "Press any key to continue". What it actually sees varies depending on the device in question. Normally it is something like "Prekey any key to continue" or "Press any key to ctntieue". Neither of these match the expected regexp, so of course a timeout occurs. The odd thing is that if you don't specify a command to run with -c both of these prompts always display and are parsed correctly. These prompts are always displayed correctly when logging into the switch myself from the same server using the same username, password and method.
You're probably running into a bug in hpuifilter that manifests on
certain versions of glibc on 64-bit Linux.

Have a look at this message in February for a patch:

http://www.gossamer-threads.com/lists/rancid/users/6202

(Though I accidentally reversed the patch.)
--
David Byers
Linköping University
Meyers, Dan
2012-05-31 10:10:12 UTC
Permalink
You're probably running into a bug in hpuifilter that manifests on certain
versions of glibc on 64-bit Linux.
http://www.gossamer-threads.com/lists/rancid/users/6202
(Though I accidentally reversed the patch.)
Thanks for this, it does look like my issue :) However as I'm running Ubuntu 10.04 LTS with rancid from packages I'm only on 2.3.2, not whatever version that patch was written for. The code in those areas of hpuifilter.c is somewhat different, making use of strcpy not memcpy/memmove, with surrounding lines having changes as well. For the time being, as it's only a few (5) switches that are not often altered, I'll do manual backups. When 12.04 has its first point release in July I'll upgrade to that, which will give me rancid 2.3.6, then have another look at your patch if I'm still having issues.

Dan
heasley
2012-05-31 14:40:01 UTC
Permalink
Post by Meyers, Dan
Thanks for this, it does look like my issue :) However as I'm running Ubuntu 10.04 LTS with rancid from packages I'm only on 2.3.2, not whatever version that patch was written for. The code in those areas of hpuifilter.c is somewhat different, making use of strcpy not memcpy/memmove, with surrounding lines having changes as well. For the time being, as it's only a few (5) switches that are not often altered, I'll do manual backups. When 12.04 has its first point release in July I'll upgrade to that, which will give me rancid 2.3.6, then have another look at your patch if I'm still having issues.
just download 2.3.8, build it, and copy hpuifilter over whatever your
package system installed. or stop using the package system for rancid.
Loading...