AAA-DBA.COM

Why Consistency Is the Secret to Automation Success

When I spoke at the Washington/Oregon SQL Saturday this year about moving from a reactive DBA to a proactive DBRE, one of the biggest topics was automation. But you cannot talk about automation without talking about consistency. In every large environment I’ve worked in, the two are inseparable.

One of the biggest obstacles to automating infrastructure is inconsistency. If your environments are snowflakes, your automation will fail. I didn’t learn that from a book. I learned it the hard way.

In a previous role, a very inspirational leader stepped into the CTO role and called an all-hands meeting. He announced a massive initiative. We were going to automate our entire environment.

He said, “We do hard things every day. This is just another thing to prove that we can do this too.”

He called it the Moonshot.

The goal was simple to say and hard to execute: spin up an entire environment with a single button click. At first, most of us thought it was impossible. When we looked at what we were dealing with, it made sense why.

We had 17 development environments, one stage environment, one production environment, and one main environment that existed for historical reasons no one could explain. Consistency didn’t exist. Every team had its own setup, slightly different configurations, and its own assumptions. That inconsistency was the reason that, in nearly every release, something would break.

Through determination and a lot of uncomfortable work, we made it happen. By the end of the project, we had consolidated everything down to one dev, one stage, and one prod environment.

The impact was immediate. Releases that used to take days dropped to hours, and eventually minutes. Far fewer releases broke because environment drift was gone.

It even prepared us for disasters we never planned for. When servers were accidentally deleted in production, our spin-up automation handled recovery cleanly and predictably. That experience drove something home for me. That moment in my career changed my life and the way I look at any environment. 

Consistency is not just an individual task. It is a culture.

From a DBRE perspective, consistency is what turns databases into systems instead of pets. Reliability engineering is about reducing unknowns, reducing blast radius, and making failure predictable. None of that is possible when every environment behaves differently.

Automation doesn’t create reliability on its own. Consistency does. Automation just enforces it at scale.

What Consistency Actually Looks Like 

Some people may come after me with pitchforks for being this rigid, and that’s fine. I’ve paid for these lessons in outages and failed releases. If you want to eliminate toil, you have to standardize the basics. All these things can be automated once consistency is reached.

In DBRE terms, these standards are guardrails. They limit inconsistency, making incidents easier to diagnose, recoveries faster, and automation deterministic rather than conditional.

Standardize Drive Letters

It sounds trivial, but it matters for scripting. Whether you use C for system, D for data, or L for logs, it doesn’t matter which ones you choose. What matters is that they are the same everywhere. Scripts shouldn’t have to guess.

Job Ownership

No job or automated process should ever be owned by a specific person. I’ve seen critical jobs fail simply because the creator left the company, and their account was disabled. Always use service accounts.

Server Configurations

If Dev has “Optimize for Ad Hoc Workloads” enabled and Prod does not, you are in for a treat. Settings like MAXDOP and Max Server Memory need a standard. You may have a one-off, but not all servers should be different.

Service Accounts and Permissions

If the number of times I’ve seen “Access Denied” errors were worth a penny each, I’d be rich. If developers have elevated rights in Dev but not in Prod, their code will break. The great thing about having things go through a pipeline is that they can take ownership of those errors. Certain things should never have admin rights, and consistency helps enforce that.

Autogrowth

Never rely on defaults. I’ve seen production databases created with 1MB autogrowth settings that later caused massive issues. Automate file sizes and growth to eliminate that risk.

Schema and Stored Procedures

All changes must go through a pipeline. This makes failures visible and reduces drift. The only acceptable exception is stopping a production fire, and even then, the change must be checked in immediately, even if the official release date is later. What matters is traceability.

Explicit Naming

Never let SQL Server name your constraints. Randomly generating system names makes automation and troubleshooting painful. Always explicitly name primary keys, foreign keys, and default constraints.

Audit Columns

In healthcare or finance systems, most tables need an audit footprint. Standardize columns like CreatedDate and ModifiedDate with default constraints so developers don’t have to remember to add them.

Automated Patching

Manually patching hundreds of servers does not scale. I’ve worked in environments with thousands of servers that were not cloud-managed, that were patched every month because it was automated. 

Why This Matters

We don’t automate just to be tidy. We automate to survive. 

  • Reduces human error because scripts don’t get tired or skip steps.
  • Eliminates click fatigue and repetitive manual work.
  • Enables ticket automation, allowing routine tasks to be handled by less expensive people.
  • Prevents schema drift, ensuring code that works in Dev also works in Prod.
  • Speeds up recovery by turning incidents into execution rather than investigation.
  • Enables scale, because snowflake servers don’t scale well.
  • Reduces cognitive load, so engineers can focus on real problems rather than tribal knowledge.
  • Improves security by reducing unknown settings and surprises.
  • Shortens onboarding, letting new engineers contribute faster.

From a reliability engineering standpoint, consistency reduces mean time to recovery more than almost any tool you can buy.

The Path to Scalability

We have to do better. Scalability is the next hard problem in our environments. If we cannot be consistent, we cannot eliminate toil.

You cannot automate chaos.

You have to tame it first.

Database Reliability is not just about preventing failure. It’s about making failure boring, repeatable, and recoverable.

If you are interested in starting this journey I made a simple checklist that we went through here: https://aaa-dba.com/2025/12/23/dbre-consistency-automation-checklist/

Leave a Reply

Your email address will not be published. Required fields are marked *