[rancid] router.db and up/down notifications

Discussion:

Sean

2016-09-13 23:54:30 UTC

I'm curious about the default behavior/relationship between router.db and routers.down.

I have rancid integrated with Observium, which auto-generates router.db once an hour based on the status of each device. If a router is down in Observium, it will modify router.db and mark it 'down'. rancid-run is configured to run five minutes later.

In this case, would rancid send an âunreachableâ e-mail the next time itâs run? When they switched from "up" to "down" in router.db, I didn't receive an "unreachable" e-mail. When they were fixed and returned to "up," I received a "routers changed to up" email. This is the first time such a transition has taken place, i.e. first time any devices have been unreachable for any period of time since rancid was first run.

What sort of behavior should I expect in this case? Might have just been timing, but Iâd like to be sure. Searches have come up with little to go on other than formatting and the like.

I have another rancid installation that sends both up and down notifications, but it's a stand-alone install without any automated router.db generation. I don't see any major differences in configuration, cron jobs, etc. other than who/what generates router.db.

P.S. Right list this time.

Alan McKinnon

2016-09-14 09:37:44 UTC

Permalink

Post by Sean
I'm curious about the default behavior/relationship between router.db and routers.down.
I have rancid integrated with Observium, which auto-generates router.db
once an hour based on the status of each device. If a router is down in
Observium, it will modify router.db and mark it 'down'. rancid-run is
configured to run five minutes later.

router.db is in cvs IIRC. Does your data push from observium do a commit?

Post by Sean
In this case, would rancid send an “unreachable” e-mail the next time
it’s run? When they switched from "up" to "down" in router.db, I didn't
receive an "unreachable" e-mail. When they were fixed and returned to
"up," I received a "routers changed to up" email. This is the first time
such a transition has taken place, i.e. first time any devices have been
unreachable for any period of time since rancid was first run.
What sort of behavior should I expect in this case? Might have just been
timing, but I’d like to be sure. Searches have come up with little to go
on other than formatting and the like.
I have another rancid installation that sends both up and down
notifications, but it's a stand-alone install without any automated
router.db generation. I don't see any major differences in
configuration, cron jobs, etc. other than who/what generates router.db.
P.S. Right list this time.
_______________________________________________
Rancid-discuss mailing list
http://www.shrubbery.net/mailman/listinfo/rancid-discuss

--
Alan McKinnon
***@gmail.com

Sean

2016-09-14 13:20:07 UTC

Permalink

It’s actually started to send down notifications again. The replaced device generated an SSH key mismatch, which caused it to become unreachable from rancid’s perspective. Had to clear known_hosts.

Must have been a timing issue originally.

router.db is in cvs IIRC. Does your data push from observium do a commit?

--
Alan McKinnon
***@gmail.com

_______________________________________________
Rancid-discuss mailing list
Rancid-***@shrubbery.net
http://www.shrubbery.net/mailman/listinfo/rancid-discuss

heasley

2016-09-14 23:37:55 UTC

Permalink

Post by Sean
I'm curious about the default behavior/relationship between router.db and routers.down.

routers.down is solely for keeping track of what was 'down' in router.db.

Post by Sean
I have rancid integrated with Observium, which auto-generates router.db once an hour based on the status of each device. If a router is down in Observium, it will modify router.db and mark it 'down'. rancid-run is configured to run five minutes later.
In this case, would rancid send an “unreachable” e-mail the next time it’s run? When they switched from "up" to "down" in router.db, I didn't receive an "unreachable" e-mail. When they were fixed and returned to "up," I received a "routers changed to up" email. This is the first time such a transition has taken place, i.e. first time any devices have been unreachable for any period of time since rancid was first run.

yes, thats right. if its 'down', rancid will not attempt contact it and
there will be no 'not reached' email.

i wouldnt expect a device to be allowed to remain down for too long. maybe
don't change it to 'down' and adjust OLDTIME and MAX_ROUNDS in rancid.conf
to achieve a goal of not complaining too much about 'not reachable', not
having a up/down email and not wasting too much time collecting devices that
are down.

Post by Sean
What sort of behavior should I expect in this case? Might have just been timing, but I’d like to be sure. Searches have come up with little to go on other than formatting and the like.
I have another rancid installation that sends both up and down notifications, but it's a stand-alone install without any automated router.db generation. I don't see any major differences in configuration, cron jobs, etc. other than who/what generates router.db.
P.S. Right list this time.
_______________________________________________
Rancid-discuss mailing list
http://www.shrubbery.net/mailman/listinfo/rancid-discuss