new card, different release, interfaces went down). Item: physical-interface/logical-interfaceįamily: address-family/address-family-nameĪdd: address-family/interface-address/ifa-localįor example, the above example, collects some specific interfaces information (status, configured family, aggregated bundle if any, ip address).Īdditionally, we might compare the current snapshot with the previous one in order to highlight any change (e.g. Optionally, we might leverage PyEz tables&views to only get the data we need an in an organized way (python dictionary): IfaceTerse: Then, we can process the RPC reply and produce the desired output (CSV, prettytable, json, …) it is just a matter of programming. The core of this script will be represented by running RPCs: hw=_chassis_information() gathers device facts (hostname, version, …).The idea is to periodically run a script that:
This task can be easily achieved with a python script using PyEz library. This means always have a clear picture of the installed base: release, hardware, interfaces, power, … Having this information always available can help in case of issue as we can provide it very fast to the case representative along with the historic changes of a given device. With large networks, it is important to have an updated view of it. This choice makes sense as Ansible has its own Juniper modules which, internally, rely on Juniper Python library PyEz. One possible choice is to leverage Ansible and Python. This server might run automation scripts. We add a new element: a server acting as automation host.
Is this enough? Mmm…no 🙂 We can add more and we can do it by leveraging the newest and hottest networking trends: automation, devops, … With this basic elements we should be sure to have all the historical data about the events in our network. Last, we always have a good old jump host server from where we connect to the devices and do anything on it. This would allow to use the same influxdb solution for both syslog/telemetry and smp. Influxdb supports snmp polling and traps. However, it still an useful tool, especially if we look at traps.įor this reason, having a “SNMP station” is highly recommended.
Compared to other solutions like Junos Telemetry, allowing data to be sent via PFE, SNMP is not ideal performance wise. Nowadays, SNMP does not represent the best solution polling is control plane intensive (routing engine is involved) so the SNMP system must be designed so to avoid excessive polling. This means that the same stack might be potentially used for both syslog messages and telemetry data.Īnother important source of information is SNMP, which means polling (a system asking for data to devices) and traps (devices sending data about an even that has just taken place). Using such a stack will allow the network operator to catch any event taking place on the device and to have a user-friendly GUI to browser through them.įluentd is also able to receive junos telemtry data, both Protobuf over UDP and OpenConfig key-value Pairs over gRPC. For example, we might build a FIG stack where fluentd received logs, influxdb stores them and grafana let the end user visualize them. There are tons of log receiver products and solution. We can configure junos so to send those messages to an external server. Anyhow, there are few things that we can always do in order to be a bit more prepared when bad things happen.įirst, we have to be sure we are ready to collect information from our devices.ĭevices continuously generate syslogs containing information about everything going on inside them. A routing anomaly and a hardware one require different analysis. Similarly, we open a case and we soon see we do not have all the needed information.īeing 100% prepared upon any issue is impossible. Often, we start troubleshooting and we realize we miss some data about the network status prior to the issue. Even if we know it, most times, we end up telling ourselves “I should have been more prepared!”. Reality is harsh…issues and unexpected behaviors happen.