Every Dev Has Their Day

This week's episode we discuss a host of OurCompose Developments, the ins and outs of Rundeck, and monitoring related to future plans with the OurCompose Suite.

Intro

News / Community Updates

OurCompose Developments

Instance Features

Split Up Services into Roles
- All compositional/tasks/<service>.yml tasks have been put into /<service>/tasks/present.yml
- New built-in playbooks (but cannot be included by project due to variable inheritence)
- Split into present/stopped/absent
- Removed all of the checking if the mariadb container was instantiated yet in the roles, since that timeout issue has been solved.
- Added the check to the bottom of the mariadb present.yml in the role, so it’ll get checked first as a dependency of anything that needs it.
- Playbooks now use local defaults for role variables rather than including the compositional role’s variable file, as this is gone now.
- Portal present waits for c_r.service to go into stopped or inactive before restarting its socket service
Commands Receivable Updates
- Choosing which project branch to use (We did this before, but only for determining whether to build the cache or not. Now we do it for that and the dockerfile which builds the container.)
- Handling for if c_r gets ran from the master branch (hardcode project to stable-3 for now)
- Set the ansible version to install in the docker file based on that major version
- Clean up the docker_build directory
- Remove the containers as we run them. They were taking up space.
- Echoing localhost into an inventory file. This allows us to pick up the group_vars/ directory that is inside of the environment/ dir.
Akaunting Fixes
- Build-in Setup and Additional Vars
- Healthchecks
- Bind Mounts
Portal and CC
- Starting Timeouts
- Fix routing for running compositional role, accepting service parameter
- Init system

Service Resiliancy

Split Project into stable-3/stable-4
CI/CD had to point to new path to file (in portal role)
New no-impact (to other services) restart of services
Secure Remote Callback Support

Engagement

Containers, Zombie Processes, and Init Systems

Administrative

Integration Discussion - Rundeck - Activity, Nodes and Commands

Grab Bag - Monitoring

Without monitoring, you have no way to tell whether the service is even working; absent a thoughtfully designed monitoring infrastructure, you’re flying blind. Maybe everyone who tries to use the website gets an error, maybe not—but you want to be aware of problems before your users notice them.

Monitoring - collecting, processing, aggregating, and displaying real-time quantitative data about a system, such as query counts and types, error counts and types, processing times, and server lifetimes.

White-box monitoring - Monitoring based on metrics exposed by the internals of the system, including logs, interfaces like the Java Virtual Machine Profiling Interface, or an HTTP handler that emits internal statistics. (cause oriented, focused more on the innards)

Black-box monitoring - Testing externally visible behavior as a user would see it. (symptom oriented)

Benefits of Monitoring:

Analyzing long-term trends
Comparing over time or experiment groups
Alerting
Building Dashboards
Ad hoc retrospective analysis (debugging)

Monitoring should address two questions: what’s broken, and why?

The “what’s broken” indicates the symptom; the “why” indicates a (possibly intermediate) cause.

“What” versus “why” is one of the most important distinctions in writing good monitoring with maximum signal and minimum noise.

The Four Golden Signals of Monitoring:

Latency - the time it takes to service a request
Traffic - A measure of how much demand is being placed on a system, measured in a high-level system-specific metric
Errors - The rate of requests that fail, either explicitly, implicitly, or by policy
Saturation - how “full” your service is.

If you measure all four golden signals and page a human when one signal is problematic (or, in the case of saturation, nearly problematic, your service will be at least decently covered by monitoring.

Worrying about your tail:

It’s tempting to design a system based upon the mean of some quantity: the mean latency, the mean CPU usage of your nodes, or the mean fullness of your databases, but this presents a problem. If you run a web service with an average latency of 100 ms at 1,000 requests per second, 1% of requests might easily take 5 seconds.23 If your users depend on several such web services to render their page, the 99th percentile of one backend can easily become the median response of your frontend.

Piling requirements on top of each other can add up to a very complex system. The sources of potential complexity are never-ending. Like all software systems, monitoring can become so complex that it’s fragile, complicated to change, and a maintenance burden.

The tools we use to get it done

At Compositional Enterprises, we value our time as much as you do. That's why we only use the best Free, Libre, and Open Source tools to produce our quality content and products.

Take action and start using the same secure and convenient tools that we use by signing up for your OurCompose instance today! Invest in your community by donating directly to the podcast! Every bit (and byte) goes back into growing and spreading the show. Otherwise, to stay updated with the show and all future developments, find us on reddit or sign up for our mailing list using the form below!

Episode #38

Monday, December 06, 2021