Discussion:
[rancid] clogin + ssh: stuck at fingerprint verification
Patrik Lundin
2017-05-18 14:43:57 UTC
Permalink
Hello,

I have been trying to figure out an odd problem related to clogin when
using ssh that appeared the other day. Basically clogin will (sometimes)
get stuck when the ssh client prompts for fingerprint verification.

OS version: Ubuntu 16.04.2 LTS
RANCID package version: 3.3.0-1
Expect package version: 5.45-7
OpenSSH version: 1:7.2p2-4ubuntu2.2

The .cloginrc looks like this:
===
add autoenable * 1
add method * {ssh}
add user * test
add password * secret
===

The output of running clogin looks like this when it hangs (and
eventually times out):
===
# clogin switch01.example.com
switch01.example.com
spawn ssh -c 3des-cbc -x -l test switch01.example.com
The authenticity of host 'switch01.example.com (10.0.0.10)' can't be established.
RSA key fingerprint is SHA256:<fingerprint>.
Are you sure you want to continue connecting (yes/no)?
Error: TIMEOUT reached
===

The problem is that clogin fails to successfully parse the ssh output in
order to send the "yes" needed to continue.

What makes this problem tricky is that it seems to be timing related.
Here is an attempt that initially works and then fails on the second
attempt after removing the fingerprint again:
===
# clogin switch01.example.com
switch01.example.com
spawn ssh -c 3des-cbc -x -l test switch01.example.com
The authenticity of host 'switch01.example.com (10.0.0.10)' can't be established.
RSA key fingerprint is SHA256:<fingerprint>.
Are you sure you want to continue connecting (yes/no)?
Host switch01.example.com added to the list of known hosts.
yes
Warning: Permanently added 'switch01.example.com,10.0.0.10' (RSA) to the list of known hosts.

Password:
[...]

# ssh-keygen -R switch01.example.com
# Host switch01.example.com found: line 1
/root/.ssh/known_hosts updated.
Original contents retained as /root/.ssh/known_hosts.old

# clogin switch01.example.com
switch01.example.com
spawn ssh -c 3des-cbc -x -l test switch01.example.com
The authenticity of host 'switch01.example.com (10.0.0.10)' can't be established.
RSA key fingerprint is SHA256:<fingerprint>.
Are you sure you want to continue connecting (yes/no)?
Error: TIMEOUT reached
===

The regex in clogin that is responsible for answering the question looks like this:
===
-re "(Host key not found |The authenticity of host .* be established).* \\(yes/no\\)\\?" {
send "yes\r"
send_user "\nHost $router added to the list of known hosts.\n"
exp_continue
}
===

It requires that all three lines of output are parsed as a single chunk
(starting with "The authenticity of host" and ending with "(yes/no)".
When stuff works this is indeed what happens (heavily trimmed output):
===
# clogin -d switch01.example.com
[...]
The authenticity of host 'switch01.example.com (10.0.0.10)' can't be established.
RSA key fingerprint is SHA256:<fingerprint>.
Are you sure you want to continue connecting (yes/no)?
[...]
expect: does "The authenticity of host 'switch01.example.com (10.0.0.10)' can't be established.\r\nRSA key fingerprint is SHA256:<fingerprint>.\r\nAre you sure you want to continue connecting (yes/no)? " (spawn_id exp4) match glob pattern "Host is unreachable"? no
"No address associated with name"? no
"(Host key not found |The authenticity of host .* be established).* \(yes/no\)\?"? (No Gate, RE only) gate=yes re=yes
expect: set expect_out(0,string) "The authenticity of host 'switch01.example.com (10.0.0.10)' can't be established.\r\nRSA key fingerprint is SHA256:<fingerprint>.\r\nAre you sure you want to continue connecting (yes/no)?"
expect: set expect_out(1,string) "The authenticity of host 'switch01.example.com (10.0.0.10)' can't be established"
expect: set expect_out(spawn_id) "exp4"
expect: set expect_out(buffer) "The authenticity of host 'switch01.example.com (10.0.0.10)' can't be established.\r\nRSA key fingerprint is SHA256:<fingerprint>.\r\nAre you sure you want to continue connecting (yes/no)?"
send: sending "yes\r" to { exp4 }
===

On the specific host where the above output has been collected it even
goes as far as running clogin without debug mostly hangs while it always
manages to send a "yes" if running with -d (I'm guessing because it is
giving the ssh binary more time to present the output while debug output
is being printed).

Here is how it can look on a host where running with -d fails, heavily
trimmed:
===
expect: does "The authenticity of host 'switch01.example.com (10.0.0.10)' can't be established." (spawn_id exp4) match regular expression [...]
[...]
expect: does "The authenticity of host 'switch01.example.com (10.0.0.10)' can't be established.\r\n" (spawn_id exp4) match regular expression [..]
[...]
expect: does "" (spawn_id exp4) match regular expression [...]
[...]
expect: does "RSA key fingerprint is SHA256:<fingerprint>." (spawn_id exp4) match regular expression [...]
[...]
expect: does "RSA key fingerprint is SHA256:<fingerprint>.\r\nAre you sure you want to continue connecting (yes/no)? " (spawn_id exp4) match regular expression [...]
[...]
expect: does "Are you sure you want to continue connecting (yes/no)? " (spawn_id exp4) match regular expression [...]
===

As can be seen, instead of receiving the complete output as a single
chunk it is instead handled in pieces, which means the regex that is
supposed to send a "yes" is never matched.

It appears I can get around this by increasing the magic "sleep 0.3" in
clogin to something like "sleep 5" but it seems like a pretty brittle
workaround.

Has anyone struggled with something like this before?
--
Patrik Lundin
Charles T. Brooks
2017-05-18 15:26:25 UTC
Permalink
Whenever you change a host key, put the new key in the known_hosts file on the rancid server. Don't use rancid to defeat a reasonable security measure. Silently deactivating the SSH warning is bad policy.

--Charlie

________________________________________
From: Rancid-discuss [rancid-discuss-***@shrubbery.net] on behalf of Patrik Lundin [***@sigterm.se]
Sent: Thursday, May 18, 2017 10:43 AM
To: rancid-***@shrubbery.net
Subject: [rancid] clogin + ssh: stuck at fingerprint verification

Hello,

I have been trying to figure out an odd problem related to clogin when
using ssh that appeared the other day. Basically clogin will (sometimes)
get stuck when the ssh client prompts for fingerprint verification.

OS version: Ubuntu 16.04.2 LTS
RANCID package version: 3.3.0-1
Expect package version: 5.45-7
OpenSSH version: 1:7.2p2-4ubuntu2.2

The .cloginrc looks like this:
===
add autoenable * 1
add method * {ssh}
add user * test
add password * secret
===

The output of running clogin looks like this when it hangs (and
eventually times out):
===
# clogin switch01.example.com
switch01.example.com
spawn ssh -c 3des-cbc -x -l test switch01.example.com
The authenticity of host 'switch01.example.com (10.0.0.10)' can't be established.
RSA key fingerprint is SHA256:<fingerprint>.
Are you sure you want to continue connecting (yes/no)?
Error: TIMEOUT reached
===

The problem is that clogin fails to successfully parse the ssh output in
order to send the "yes" needed to continue.

What makes this problem tricky is that it seems to be timing related.
Here is an attempt that initially works and then fails on the second
attempt after removing the fingerprint again:
===
# clogin switch01.example.com
switch01.example.com
spawn ssh -c 3des-cbc -x -l test switch01.example.com
The authenticity of host 'switch01.example.com (10.0.0.10)' can't be established.
RSA key fingerprint is SHA256:<fingerprint>.
Are you sure you want to continue connecting (yes/no)?
Host switch01.example.com added to the list of known hosts.
yes
Warning: Permanently added 'switch01.example.com,10.0.0.10' (RSA) to the list of known hosts.

Password:
[...]

# ssh-keygen -R switch01.example.com
# Host switch01.example.com found: line 1
/root/.ssh/known_hosts updated.
Original contents retained as /root/.ssh/known_hosts.old

# clogin switch01.example.com
switch01.example.com
spawn ssh -c 3des-cbc -x -l test switch01.example.com
The authenticity of host 'switch01.example.com (10.0.0.10)' can't be established.
RSA key fingerprint is SHA256:<fingerprint>.
Are you sure you want to continue connecting (yes/no)?
Error: TIMEOUT reached
===

The regex in clogin that is responsible for answering the question looks like this:
===
-re "(Host key not found |The authenticity of host .* be established).* \\(yes/no\\)\\?" {
send "yes\r"
send_user "\nHost $router added to the list of known hosts.\n"
exp_continue
}
===

It requires that all three lines of output are parsed as a single chunk
(starting with "The authenticity of host" and ending with "(yes/no)".
When stuff works this is indeed what happens (heavily trimmed output):
===
# clogin -d switch01.example.com
[...]
The authenticity of host 'switch01.example.com (10.0.0.10)' can't be established.
RSA key fingerprint is SHA256:<fingerprint>.
Are you sure you want to continue connecting (yes/no)?
[...]
expect: does "The authenticity of host 'switch01.example.com (10.0.0.10)' can't be established.\r\nRSA key fingerprint is SHA256:<fingerprint>.\r\nAre you sure you want to continue connecting (yes/no)? " (spawn_id exp4) match glob pattern "Host is unreachable"? no
"No address associated with name"? no
"(Host key not found |The authenticity of host .* be established).* \(yes/no\)\?"? (No Gate, RE only) gate=yes re=yes
expect: set expect_out(0,string) "The authenticity of host 'switch01.example.com (10.0.0.10)' can't be established.\r\nRSA key fingerprint is SHA256:<fingerprint>.\r\nAre you sure you want to continue connecting (yes/no)?"
expect: set expect_out(1,string) "The authenticity of host 'switch01.example.com (10.0.0.10)' can't be established"
expect: set expect_out(spawn_id) "exp4"
expect: set expect_out(buffer) "The authenticity of host 'switch01.example.com (10.0.0.10)' can't be established.\r\nRSA key fingerprint is SHA256:<fingerprint>.\r\nAre you sure you want to continue connecting (yes/no)?"
send: sending "yes\r" to { exp4 }
===

On the specific host where the above output has been collected it even
goes as far as running clogin without debug mostly hangs while it always
manages to send a "yes" if running with -d (I'm guessing because it is
giving the ssh binary more time to present the output while debug output
is being printed).

Here is how it can look on a host where running with -d fails, heavily
trimmed:
===
expect: does "The authenticity of host 'switch01.example.com (10.0.0.10)' can't be established." (spawn_id exp4) match regular expression [...]
[...]
expect: does "The authenticity of host 'switch01.example.com (10.0.0.10)' can't be established.\r\n" (spawn_id exp4) match regular expression [..]
[...]
expect: does "" (spawn_id exp4) match regular expression [...]
[...]
expect: does "RSA key fingerprint is SHA256:<fingerprint>." (spawn_id exp4) match regular expression [...]
[...]
expect: does "RSA key fingerprint is SHA256:<fingerprint>.\r\nAre you sure you want to continue connecting (yes/no)? " (spawn_id exp4) match regular expression [...]
[...]
expect: does "Are you sure you want to continue connecting (yes/no)? " (spawn_id exp4) match regular expression [...]
===

As can be seen, instead of receiving the complete output as a single
chunk it is instead handled in pieces, which means the regex that is
supposed to send a "yes" is never matched.

It appears I can get around this by increasing the magic "sleep 0.3" in
clogin to something like "sleep 5" but it seems like a pretty brittle
workaround.

Has anyone struggled with something like this before?

--
Patrik Lundin

_______________________________________________
Rancid-discuss mailing list
Rancid-***@shrubbery.net
http://www.shrubbery.net/mailman/listinfo/rancid-discuss
------------------ CONFIDENTIALITY NOTICE ---------------

This message, including any attachments, is for the sole use of the
intended recipient(s) and may contain privileged confidential information
protected by law. Any unauthorized review, use, disclosure or distribution
of this message is prohibited. If you are not the intended recipient, please
contact the sender by reply e-mail and destroy all copies of this message.

------------------ CONFIDENTIALITY NOTICE ---------------
Patrik Lundin
2017-05-19 08:05:38 UTC
Permalink
[...] Has anyone struggled with something like this before?
If the risk of man in the middle attacks is acceptable, you could remove
add method * {ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no}
Thank you for the input, I prefer to utilize fingerprint verification whenever
I can however.
Whenever you change a host key, put the new key in the known_hosts file on
the rancid server. Don't use rancid to defeat a reasonable security measure.
Silently deactivating the SSH warning is bad policy.
Right, I agree with this position in general, but managing the host key
separately only hides what I percieve as the bigger issue.

Actually my question is not so much "how do I avoid/fix this specific problem"
as it is "is it possible assumptions made in the clogin code no longer hold
true" which potentially could undermine it's operation in general.

It is obvious the pattern matching in the code is based on the fact that all
text end up in the buffer. I have seen that on the affected systems this is not
always true.

Maby someone more well versed in expect internals could chime in :).
--
Patrik Lundin
heasley
2017-05-19 21:47:21 UTC
Permalink
Post by Patrik Lundin
[...] Has anyone struggled with something like this before?
If the risk of man in the middle attacks is acceptable, you could remove
add method * {ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no}
Thank you for the input, I prefer to utilize fingerprint verification whenever
I can however.
Whenever you change a host key, put the new key in the known_hosts file on
the rancid server. Don't use rancid to defeat a reasonable security measure.
Silently deactivating the SSH warning is bad policy.
Right, I agree with this position in general, but managing the host key
separately only hides what I percieve as the bigger issue.
Actually my question is not so much "how do I avoid/fix this specific problem"
as it is "is it possible assumptions made in the clogin code no longer hold
true" which potentially could undermine it's operation in general.
I think there was another change that caused this to surface. Anyway, i
believe I have already fixed this and it is included in rancid-3.6:

*login: change handling of ssh key-related prompts to one line at a time
to eliminate timing-related problem.
Post by Patrik Lundin
It is obvious the pattern matching in the code is based on the fact that all
text end up in the buffer. I have seen that on the affected systems this is not
always true.
Maby someone more well versed in expect internals could chime in :).
--
Patrik Lundin
_______________________________________________
Rancid-discuss mailing list
http://www.shrubbery.net/mailman/listinfo/rancid-discuss
Patrik Lundin
2017-05-20 08:13:23 UTC
Permalink
Post by heasley
I think there was another change that caused this to surface. Anyway, i
*login: change handling of ssh key-related prompts to one line at a time
to eliminate timing-related problem.
Ah, that is great. I did look at the CHANGES page but obviously missed
that. Thanks for pointing it out :).
--
Patrik Lundin
Loading...