Every Dev Has Their Day
This week's episode we discuss a host of OurCompose Developments, the ins and outs of Rundeck, and monitoring related to future plans with the OurCompose Suite.
Intro
News / Community Updates
OurCompose Developments
Instance Features
- Split Up Services into Roles
- All
compositional/tasks/<service>.yml
tasks have been put into/<service>/tasks/present.yml
- New built-in playbooks (but cannot be included by project due to variable inheritence)
- Split into present/stopped/absent
- Removed all of the checking if the mariadb container was instantiated yet in the roles, since that timeout issue has been solved.
- Added the check to the bottom of the mariadb present.yml in the role, so it’ll get checked first as a dependency of anything that needs it.
- Playbooks now use local defaults for role variables rather than including the compositional role’s variable file, as this is gone now.
- Portal present waits for c_r.service to go into stopped or inactive before restarting its socket service
- All
- Commands Receivable Updates
- Choosing which project branch to use (We did this before, but only for determining whether to build the cache or not. Now we do it for that and the dockerfile which builds the container.)
- Handling for if c_r gets ran from the master branch (hardcode project to stable-3 for now)
- Set the ansible version to install in the docker file based on that major version
- Clean up the docker_build directory
- Remove the containers as we run them. They were taking up space.
- Echoing
localhost
into an inventory file. This allows us to pick up thegroup_vars/
directory that is inside of theenvironment/
dir.
- Akaunting Fixes
- Build-in Setup and Additional Vars
- Healthchecks
- Bind Mounts
- Portal and CC
- Starting Timeouts
- Fix routing for running compositional role, accepting service parameter
- Init system
Service Resiliancy
- Split Project into stable-3/stable-4
- CI/CD had to point to new path to file (in portal role)
- New no-impact (to other services) restart of services
- Secure Remote Callback Support
Engagement
Administrative
Integration Discussion - Rundeck - Activity, Nodes and Commands
Grab Bag - Monitoring
Without monitoring, you have no way to tell whether the service is even working; absent a thoughtfully designed monitoring infrastructure, you’re flying blind. Maybe everyone who tries to use the website gets an error, maybe not—but you want to be aware of problems before your users notice them.
Monitoring - collecting, processing, aggregating, and displaying real-time quantitative data about a system, such as query counts and types, error counts and types, processing times, and server lifetimes.
White-box monitoring - Monitoring based on metrics exposed by the internals of the system, including logs, interfaces like the Java Virtual Machine Profiling Interface, or an HTTP handler that emits internal statistics. (cause oriented, focused more on the innards)
Black-box monitoring - Testing externally visible behavior as a user would see it. (symptom oriented)
Benefits of Monitoring:
- Analyzing long-term trends
- Comparing over time or experiment groups
- Alerting
- Building Dashboards
- Ad hoc retrospective analysis (debugging)
Monitoring should address two questions: what’s broken, and why?
The “what’s broken” indicates the symptom; the “why” indicates a (possibly intermediate) cause.
“What” versus “why” is one of the most important distinctions in writing good monitoring with maximum signal and minimum noise.
The Four Golden Signals of Monitoring:
- Latency - the time it takes to service a request
- Traffic - A measure of how much demand is being placed on a system, measured in a high-level system-specific metric
- Errors - The rate of requests that fail, either explicitly, implicitly, or by policy
- Saturation - how “full” your service is.
If you measure all four golden signals and page a human when one signal is problematic (or, in the case of saturation, nearly problematic, your service will be at least decently covered by monitoring.
Worrying about your tail:
- It’s tempting to design a system based upon the mean of some quantity: the mean latency, the mean CPU usage of your nodes, or the mean fullness of your databases, but this presents a problem. If you run a web service with an average latency of 100 ms at 1,000 requests per second, 1% of requests might easily take 5 seconds.23 If your users depend on several such web services to render their page, the 99th percentile of one backend can easily become the median response of your frontend.
Piling requirements on top of each other can add up to a very complex system. The sources of potential complexity are never-ending. Like all software systems, monitoring can become so complex that it’s fragile, complicated to change, and a maintenance burden.
The tools we use to get it done
At Compositional Enterprises, we value our time as much as you do. That's why we only use the best Free, Libre, and Open Source tools to produce our quality content and products.
Take action and start using the same secure and convenient tools that we use by signing up for your OurCompose instance today! Invest in your community by donating directly to the podcast! Every bit (and byte) goes back into growing and spreading the show. Otherwise, to stay updated with the show and all future developments, find us on reddit or sign up for our mailing list using the form below!