407: Old School Outages
Listen now
Description
Jim shares his Nagios tips and Wes chimes in with some modern tools as we chat monitoring in the wake of some high-profile outages. Plus we turn our eye to hardware and get excited about the latest Ryzen line from AMD. Links: Third parties confirm AMD’s outstanding Ryzen 3000 numbers | Ars Technica — AMD debuted its new Ryzen 3000 desktop CPU line a few weeks ago at E3, and it looked fantastic. For the first time in 20 years, it looked like AMD could go head to head with Intel's desktop CPU line-up across the board. The question: would independent, third-party testing back up AMD's assertions?The Internet broke today: Facebook, Verizon, and more see major outages | Ars Technica — Last week, Verizon caused a major BGP misroute that took large chunks of the Internet, including CDN company Cloudflare, partially down for a day. This week, the rest of the Internet has apparently asked Verizon to hold its beer. It was a really bad month for the internet | TechCrunch — In the past month there were several major internet outages affecting millions of users across the world. Sites buckled, services broke, images wouldn’t load, direct messages ground to a halt and calendars and email were unavailable for hours at a time.Cloudflare outage caused by bad software deploy (updated) — For about 30 minutes today, visitors to Cloudflare sites received 502 errors caused by a massive spike in CPU utilization on our network. This CPU spike was caused by a bad software deploy that was rolled back. How Verizon and a BGP Optimizer Knocked Large Parts of the Internet Offline Today — Today at 10:30UTC, the Internet had a small heart attack. A small company in Northern Pennsylvania became a preferred path of many Internet routes through Verizon (AS701), a major Internet transit provider. Getting started | Prometheus — This guide is a "Hello World"-style tutorial which shows how to install, configure, and use Prometheus in a simple example setup. prometheus/node_exporter — Prometheus exporter for hardware and OS metrics exposed by *NIX kernels, written in Go with pluggable metric collectors. Using netdata with Prometheus — Prometheus is a distributed monitoring system which offers a very simple setup along with a robust data model. Recently netdata added support for Prometheus.prometheus/nagios_plugins — Nagios plugin for alerting on prometheus query results.RobustPerception/nrpe_exporter — The NRPE exporter exposes metrics on commands sent to a running NRPE daemon. m-lab/prometheus-nagios-exporter — The Prometheus Nagios exporter reads status and performance data from nagios plugins via the MK Livestatus Nagios plugin and publishes this in a form that can be scrapped by Prometheus.Comparison to alternatives | Prometheus — Prometheus is a full monitoring and trending system that includes built-in and active scraping, storing, querying, graphing, and alerting based on time series data.Quality server monitoring solution using NetData/Prometheus/Grafana — I’m going to quickly show you how to install both netdata and Prometheus on the client and server. We can then use grafana pointed at Prometheus to obtain long-term metrics netdata offers.Monitoring stack by using Grafana + Prometheus + Netdata — This monitoring stack you can monitoring in real-time by Netdata and see the history by using Grafana.Monitoring Agent · NCPA — New to NCPA? See some of the awesome features present in the Web GUI and API, available on any operating system. Nagios 101: Understanding the Fundamentals - NagiosNagios Documentation
More Episodes
It's a storage showdown as Jim and Wes bust some performance myths about RAID and ZFS. Plus our favorite features from Fedora 32, and why Wes loves DNF. Links: What's new in Fedora 32 Workstation Fedora 32 ChangeSet Linux distro review: Fedora Workstation 32 TechSNAP 428: RAID Reality Check ZFS...
Published 05/29/20
Jim and Wes take the latest release of the Caddy web server for a spin, investigate Intel's Comet Lake desktop CPUs, and explore the fight over 5G between the US Military and the FCC. Links: Caddy offers TLS, HTTPS, and more in one dependency-free Go Web server Caddy 2 Caddy v2 Improvements...
Published 05/15/20
We dive deep into the world of RAID, and discuss how to choose the right topology to optimize performance and resilience. Plus Cloudflare steps up its campaign to secure BGP, and why you might want to trade in cron for systemd timers. Links: AMD Claims World’s Fastest Per-Core Performance with...
Published 05/01/20