Discussion:
[rancid] High CPU Utilization on routers during Rancid capture
shane Haslem
2008-01-27 12:00:03 UTC
Permalink
Hi all,
Can anyone advise if they have experienced high CPU Utilization on routers during config capture, I am using SSH to login, would this be a factor?
Regards


__________________________________________________________
Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com
Chris Gauthier
2008-01-27 17:00:18 UTC
Permalink
Can you be more specific? There are many different brands and models of
routers/switches/ out there.

Thanks,

Chris
Post by shane Haslem
Hi all,
Can anyone advise if they have experienced high CPU Utilization on
routers during config capture, I am using SSH to login, would this be
a factor?
Regards
------------------------------------------------------------------------
Sent from Yahoo!
<http://us.rd.yahoo.com/mailuk/taglines/isp/control/*http://us.rd.yahoo.com/evt=51949/*http://uk.docs.yahoo.com/mail/winter07.html>
- a smarter inbox.
------------------------------------------------------------------------
_______________________________________________
Rancid-discuss mailing list
http://www.shrubbery.net/mailman/listinfo.cgi/rancid-discuss
--
Chris Gauthier, CCNA, Network+, A+
Network Administration Team
Portland Community College
Portland, Oregon

"For once you have tasted flight you will walk the earth with your eyes turned skywards, for there you have been and there you will long to return."
--Leonardo da Vinci
Justin Shore
2008-01-27 21:11:27 UTC
Permalink
Of course. I have 2 3660s and one 7206 (G1) that spike at 100% every
hour on the hour. It's not RANCID's fault. It happens anytime I do a
sh run. The 7206 has about 13k lines in its config. One 3660 has just
under 6k lines. The other 3660 has over 17k config lines. That 3660's
load stays at 100% for well over a minute. A high load is expected
given the sheer size of the config. SSH has a higher load than telnet
of course but that's no reason to not use SSH.

Justin
Post by shane Haslem
Hi all,
Can anyone advise if they have experienced high CPU Utilization on
routers during config capture, I am using SSH to login, would this be a
factor?
Regards
Frank Bulk - iNAME
2008-01-27 22:04:16 UTC
Permalink
I'm guess you're terminating PPPoX on there: have you looked into the range
command to slim down the config a bit? Or is that not possible with your
requirements?

Frank

-----Original Message-----
From: rancid-discuss-***@shrubbery.net
[mailto:rancid-discuss-***@shrubbery.net] On Behalf Of Justin Shore
Sent: Sunday, January 27, 2008 3:11 PM
To: shane Haslem
Cc: rancid-***@shrubbery.net
Subject: [rancid] Re: High CPU Utilization on routers during Rancid capture

Of course. I have 2 3660s and one 7206 (G1) that spike at 100% every
hour on the hour. It's not RANCID's fault. It happens anytime I do a
sh run. The 7206 has about 13k lines in its config. One 3660 has just
under 6k lines. The other 3660 has over 17k config lines. That 3660's
load stays at 100% for well over a minute. A high load is expected
given the sheer size of the config. SSH has a higher load than telnet
of course but that's no reason to not use SSH.

Justin
Post by shane Haslem
Hi all,
Can anyone advise if they have experienced high CPU Utilization on
routers during config capture, I am using SSH to login, would this be a
factor?
Regards
Justin Shore
2008-01-28 00:31:35 UTC
Permalink
Frank,

No PPPoE here but you're thinking along the right track. I have about
1200 PVCs configured for RBE DSL termination on the 3660. The best
design I can think of would have been VTIs or some other template
mechanism, one per speed package we offer. Unfortunately this is what I
inherited. ADSL is being phased out and being replaced with FTTH and
ADSL2+ on distributed IP DSLAMs instead of centralized routers in the
core. These routers will breathe easier when the DSL load is taken off
of them.

Slightly off-topic but still related is a problem I first encountered a
couple years ago. RANCID can help alert you to a low memory problem if
you know what signs to look for. This same 3660 started generating
RANCID diffs every day or two. A PVC or 2 would disappear and then
reappear the next time RANCID ran. It was always there when I checked
by hand (sh run int ATMa/b.xyz). I figured it was a fluke, that perhaps
RANCID couldn't handle configs this big. I ignored the diffs for
months, even setting up Outlook to mark diffs related to that router as
read. Over time the number of PVCs disappearing and reappearing grew
larger, up to hundreds at a time. The time between occurences also
shortened until it happened on every RANCID run. The router was running
fine so we never gave it a second thought. One day the router was
reported as down in RANCID. I checked and the router was still up.
However I could not do a sh run; it just returned me to the command
prompt. I figured out then what was going on. The router was running
out of RAM. I tried all sorts of methods of getting the config, dumping
it to tftp, etc before our scheduled maintenance window (just in case).
Nothing worked. About 4 hours before the window the router went
offline. Once onsite I consoled in and found that OSPF had died (not
enough RAM). I rebooted without writing (which I was sure would jack
the config if I wrote it). It came up and ran ok. I diffed the current
config against one a few months back and found I was missing about 12k
lines of config. Woo! I spent the rest of the morning pasting in
config from a RANCID diff over a year old (before the problem first
showed up). It worked but seriously screwed up our carrier system. The
field techs spent most of the day driving around and resetting cards
manually.

I've since seen this exact problem come up twice now with 2 completely
unrelated pieces of equipment. Both had a memory leak. I managed to
reboot them without incident since I caught the problem so quickly. So,
to make a long story short, if you see anything like what I describe
above DO NOT WRITE THE CONFIG and schedule a maintenance window for a
reboot ASAP. Learn from my mistake.

Justin
Post by Frank Bulk - iNAME
I'm guess you're terminating PPPoX on there: have you looked into the range
command to slim down the config a bit? Or is that not possible with your
requirements?
Frank
-----Original Message-----
Sent: Sunday, January 27, 2008 3:11 PM
To: shane Haslem
Subject: [rancid] Re: High CPU Utilization on routers during Rancid capture
Of course. I have 2 3660s and one 7206 (G1) that spike at 100% every
hour on the hour. It's not RANCID's fault. It happens anytime I do a
sh run. The 7206 has about 13k lines in its config. One 3660 has just
under 6k lines. The other 3660 has over 17k config lines. That 3660's
load stays at 100% for well over a minute. A high load is expected
given the sheer size of the config. SSH has a higher load than telnet
of course but that's no reason to not use SSH.
Justin
Post by shane Haslem
Hi all,
Can anyone advise if they have experienced high CPU Utilization on
routers during config capture, I am using SSH to login, would this be a
factor?
Regards
_______________________________________________
Rancid-discuss mailing list
http://www.shrubbery.net/mailman/listinfo.cgi/rancid-discuss
john heasley
2008-01-28 18:11:15 UTC
Permalink
Post by Justin Shore
the config if I wrote it). It came up and ran ok. I diffed the current
config against one a few months back and found I was missing about 12k
lines of config. Woo! I spent the rest of the morning pasting in
config from a RANCID diff over a year old (before the problem first
showed up). It worked but seriously screwed up our carrier system. The
field techs spent most of the day driving around and resetting cards
manually.
if you know when the last good config was collected, then you can make
rancid & cvs do a lot of this work for you. for example; if you know it
was last successfully (and with a proper config) collected on thursday at
5pm, then you can look at 'cvs log configfile' for that date (also see
cvs's -D option for many of the cvs commands).

you can then checkout that version
cvs co -p -r rev <group>/configs/configfile > /tftpboot/configfile

edit it for passwords removed and so forth, then load it directly to the
device's start-up config and reload the box (without saving).

if you know that changes many have been applied between the last successful
collection and the reboot, then run rancid against the device
rancid-run -r devicename_from_router.db

and diff the two files
cvs diff -r rev -r HEAD configfile

this can probably easily be greped/awked/edited into something that you
can load like
copy tftp: running


not that I expect you did it differently, but others might get the idea
from your "applying diffs" note that that arduous task would be manual.
Justin Shore
2008-01-29 14:34:18 UTC
Permalink
Post by john heasley
Post by Justin Shore
the config if I wrote it). It came up and ran ok. I diffed the current
config against one a few months back and found I was missing about 12k
lines of config. Woo! I spent the rest of the morning pasting in
config from a RANCID diff over a year old (before the problem first
showed up). It worked but seriously screwed up our carrier system. The
field techs spent most of the day driving around and resetting cards
manually.
if you know when the last good config was collected, then you can make
rancid & cvs do a lot of this work for you.
not that I expect you did it differently, but others might get the idea
from your "applying diffs" note that that arduous task would be manual.
Right. I don't want them to think that they have to do it the hard way
too. My situation though required me to go back over a year to get a
complete working config. After spending an hour trying to find a
version of the config from the weeks and months of diffs prior to the
failure that didn't have missing PVCs I finally said to hell with it. I
went back to shortly after I set up RANCID for that router and used that
config. I hadn't added any PVCs so the only thing I was going to lose
was UBR changes that I could recreate later. Recreating the PVCs at
least let me get all my users back online even if their speeds weren't
accurate. It was not a fun morning.

For the other 99% of the time I can simply pull the latest RANCID config
for rapid recovery of a failed stick of flash or CF card. Works like a
champ.

Justin

Loading...