Large Multi-Location Federal Agency
A major branch of the U. S. government automates audit compliance and code upgrades for 800 locations.
The communications team at a large federal agency has its work cut out for it. With a multi-building campus in Washington, D.C. and 800 locations nationwide, network change is a constant challenge. A 10-person group manages more than 2,200 network devices.
In the past, as the network complexity increased, so did the number of problems. Configuration errors impacted traffic flow and application availability, affecting employee productivity. Over the years, the agency tried various network-management solutions, but still found itself in reactive mode much of the time. And to make things even more complicated for the agency, as a government body, it also had to be prepared for internal audits by the inspector general (IG).
“The agency has good processes and a competent staff, but when we’re talking about that many devices and sites, there are plenty of opportunities for things to go awry,” says Marty Adkins, senior consultant, Chesapeake NetCraftsmen, a mid-Atlantic-area network consulting firm that has been helping the agency cope with these problems. “They needed a way to audit continuously—not just when someone could catch their breath and search.”
Working with Chesapeake NetCraftsmen, the agency was an early adopter of Infoblox (then Netcordia) NetMRI, which automates network management by correlating the impact of network-configuration issues to network health and identifying network problems early on. NetMRI enables organizations to take control of configurations and changes—making it easy to identify hard-to-find problems.
“I have not seen any other tools that analyze a live network, that actually monitor the live network 24×7,” Adkins says. “NetMRI finds things that are not yet service-impacting and alerts you to them. This solution allows us to continually monitor the impact of network changes on correctness, compliance, and availability.”
Without any customization, NetMRI alerts the agency to issues that may impact network health and provides remediation options. The solution takes automation a step beyond just checking configurations; it actually logs into devices and issues scripted commands as a staff member would. “It’s quite extensible with the scripting capability,” Adkins adds.
With NetMRI, the team flexibly groups devices into logical categories by geography, function, and sphere of control, with overlaps, allowing someone to isolate specific devices quickly. The solution’s discovery engine intelligently and quickly assesses devices and the relationships between them.
NetMRI constantly monitors for change, alerting operators immediately to what was changed, where, and by whom. NetMRI assesses the resulting configurations to proactively identify inconsistent or incorrect settings, and facilitates remediation for fast issue resolution. When IT pushes changes out, it does so quickly to hundreds of offices nationwide, and verifies that they are correct.
“NetMRI finds things that are not yet service-impacting and alerts you to them. This solution allows us to continually monitor the impact of network changes on correctness, compliance, and availability.”
NetMRI serves as a constant monitor, and found specific issues in the first few hours of deployment for the agency, such as:
Configuration errors before going live
Redundant power-supply disconnects
Redundant link outages
Unstable or marginal WAN links and VPN connections
Spanning tree instability
Device crashes in remote offices
Not only does NetMRI find issues; it also gives the communications staff post-mortem analysis for understanding why the event might have happened.
NetMRI also simplifies annual IG compliance audits for the communications team. In a recent audit, the IG team asked if they could find configuration changes on a specific device for the past year. NetMRI exceeded that with a history of configuration changes on a device going all the way back to the initial installation—impressing the IG team.
With NetMRI, the agency operates at a level that would otherwise require more staff. For code upgrades, NetMRI helped the communications team quickly identify what-ifs, determine space for new images, and pinpoint what to delete, giving them confidence that everything was in place before reloading. Logs then showed any problems or reload needs.
“We used NetMRI to do code upgrades on Cisco devices in a rather bulletproof fashion—all at an extremely high rate,” Adkins says. “In just a few hours, we can do hundreds of devices and have complete details and logs.” Ultimately that means fiscal efficiency for the government agency, with fewer outages and greater productivity for employees nationwide.
“NetMRI allows the agency to be much more productive with the same staff,” Adkins says. “It ensures that everything stays up with best practices without needing a person to do that.”