Bottom Line
A Berlin-based global specialist for building automation, energy management, and data-driven building services is working with Scalytics on a practical question: how do you introduce AI agents into information security and governance without weakening auditability?
The answer is not another policy document. It is an operating model where governance is executable, skills are signed, memory is controlled, and every agent action can be reconstructed.
This is early field work, not a packaged success story. That is exactly why it matters. The hard part of agent adoption in regulated environments is not proving that an agent can be useful. The hard part is proving that the agent can act inside a control environment and leave behind evidence that an auditor, security lead, or board member can trust.
The real test is not speed
AI agents are already useful enough to draft, search, summarize, classify, route work, and prepare evidence. That is not the hard problem anymore.
The hard problem is whether a company can answer simple questions after the agent has acted:
- Who asked the agent to do this?
- Which data was in scope?
- Which data was out of scope?
- Which tools were allowed?
- Which skill version ran?
- Which policy governed the action?
- Which human reviewed the result?
- Why was the action allowed?
Those questions matter more in information security and governance than in almost any other business function. A sales team can test an agent on low-risk work and improve the process over time. A governance function cannot quietly experiment with ungoverned tools while telling the rest of the company to follow control procedures.
That is the pressure point in this engagement. The customer is a German building-technology company headquartered in Berlin, operating in a sector where buildings, energy systems, automation, and operational data increasingly converge.
Building automation is not abstract software. It touches heating, ventilation, energy efficiency, operations, comfort, maintenance, and long-lived infrastructure.
That makes AI governance more than an IT concern. When agents enter security and governance work in this kind of environment, the architecture has to protect evidence, accountability, and operational trust from the first step.
A governance function cannot use ungoverned agents
Many AI projects start with a prototype. Someone connects a model to documents, adds a few tools, shows a faster workflow, and governance follows later.
That order is risky in regulated work. In a control function, it is the wrong order.
A security and governance team has to be able to explain its own tools before it can approve tools for everyone else. If the team uses an agent to inspect controls, summarize findings, classify risks, or prepare evidence, the agent itself becomes part of the control environment. It cannot sit outside the same discipline it is supposed to support.
That changes the design question.
The question is not: can we make the agent useful first?
The question is: can we make the agent governable before it becomes useful?
This is where the work with the Berlin-based building automation company becomes interesting. The engagement starts small on purpose. The aim is not to push a large platform into production on day one. The aim is to prove the governance method with a narrow operating surface:
- One controlled environment
- Explicit rules
- Versioned skills
- Reviewable memory
- A traceable record of what happened
Small is not a compromise here. Small is the right architecture phase. It keeps the surface area visible while the controls are still being designed.
Governance as code, not governance as a PDF
A written AI policy is necessary, but it does not govern an agent at runtime.
A PDF can describe acceptable use. It cannot stop an agent from calling the wrong tool. It cannot check whether a skill has been changed. It cannot force a review step when the task crosses a risk boundary. It cannot prove which version of the rules was active when a decision was made.
The operating model treats governance as something the agent must execute.
The first artifact is an executable constitution. It is a plain-text governance file that the agent reads at the start of every interaction. It defines the operating frame:
- What the agent is allowed to do
- What the agent is not allowed to do
- Which data classes are in scope
- Which data classes are out of scope
- Which skills are released for this context
- Which actions require human review
- Which output standards apply
- Which evidence must be recorded
This is not meant to replace legal, security, or compliance policy. It translates the relevant parts of those policies into runtime instructions the agent has to follow.
The second artifact is the skill.
A skill is not just a prompt. It is a versioned capability with a defined purpose, scope, input, output, and acceptance criteria. It lives in version control. It can be reviewed. It can be tested. It can be signed. If the skill changes, the change is visible.
That matters because agent governance fails quickly when capabilities are vague. A vague tool called “analyze documents” is hard to control. A specific skill called “summarize an internal control description without extracting personal data” can be reviewed, tested, and released under a known policy.
The control that matters is integrity. A signed skill gives the system a way to check whether the capability being loaded is the capability that was approved. If a person changes it without review, or if a prompt-injection attempt tries to alter behavior, the integrity check fails. The skill should not run.
That is the difference between a policy aspiration and an engineering control.
A policy says approved skills should not be modified.
A signed skill makes unauthorized modification detectable before execution.
The same instinct extends to the record itself, where a tamper-evident audit matters more than a log a privileged user can quietly edit.
Three lines, adapted for agent operations
Governance also needs a structure, and the model borrows one that audit and risk professionals already trust. The Institute of Internal Auditors updated its long-standing “Three Lines of Defense” to the Three Lines Model in 2020, specifically to move away from defensive silos.
For agent operations, the model translates cleanly.
The first line is governance. This is the constitution, the approved skill catalog, the data boundaries, the review rules, and the operating intent.
The second line is operations. This is the agent performing approved work inside the defined frame, producing task results and journals.
The third line is revision and assurance. This is the review of what happened against what was supposed to happen, with findings fed back into the constitution, skills, tests, and release process.
The pattern is intentionally self-similar. It works at every level:
- One operator with a personal constitution
- A team with a shared charter
- A department with a formal operating model
- A company-wide AI management system
The artifacts grow, but the pattern does not need to be reinvented at every level.
There is one important caveat. In a one-person or early-stage setup, separation of duties is limited. The same person may define the rule, run the task, and review the outcome. That can be useful for learning, but it is not true independence.
The model has to say that plainly. Simulated separation is not the same as independent assurance. As the system scales, the third line has to become a genuinely separate review function. Otherwise the company has a workflow, not a control structure.
Memory is a control surface
Agent memory is often treated as a convenience feature. That is too casual for regulated work.
Memory decides what the agent carries forward. It shapes future answers. It can preserve useful context, but it can also preserve stale assumptions, sensitive details, or unreviewed interpretations.
In this model, memory is governed deliberately.
The early pattern uses different memory layers across daily, weekly, and monthly horizons:
- Short-lived working memory supports the immediate task.
- Weekly memory captures operational learning.
- Monthly memory preserves higher-level decisions, changes to rules, and durable context.
The point is not to keep everything. The point is to decide what should be kept, why it should be kept, who reviewed it, and when it should expire or be revised.
That gives reviewers a cleaner question to ask. Not “what did the model remember?” but “which memory layer informed this action, and who allowed that context to persist?”
That distinction matters. A black-box memory feature creates risk. A governed memory practice creates evidence.
Auditability is where this meets the standards
This is not governance for its own sake. The constructs map onto the frameworks a regulated company is already measured against.
An executable constitution becomes the documented policy and transparency layer.
Signed, tested skills become system-impact and robustness controls.
The three-lines model becomes the performance-evaluation and independence layer.
Memory rotation and the audit trail become record-keeping.
The regulation makes this concrete. The EU AI Act entered into force in August 2024, and its obligations are phasing in. Its AI-literacy duty under Article 4 has applied to every EU deployer since 2 February 2025, with national supervision from August 2026. Its record-keeping and human-oversight obligations sit at the center of how higher-risk systems are expected to operate.
Alongside the law, ISO/IEC 42001, published in 2023, is the first AI management system standard. The NIST AI Risk Management Framework gives a widely used functional vocabulary of govern, map, measure, and manage. In Germany, BSI IT-Grundschutz provides the baseline controls security teams already apply.
This is not about earning a certificate. An executable constitution, signed skills, a three-lines structure, and a real audit trail produce the documented information, the human-oversight evidence, and the record-keeping these frameworks ask for as a by-product of operating.
That is the important shift. The team does not reconstruct governance under deadline. The system produces evidence because that is how it runs.
Architecture follows practice, not the other way around
The working model starts deliberately small.
The early stage uses standard tools and version control:
- The executable constitution
- Signed skills
- Version history as the first audit trail
- Human review at defined decision points
There is no heavy data infrastructure at this point, by design. The governance discipline has to be established before the platform scales, not after.
Scaled across the company, the same model holds without breaking its structure. Persistent, tamper-evident audit replaces the version history. Federated execution lets agents work where the data already lives, through Apache Wayang, rather than copying sensitive data into a central store.
An intent-based access layer makes denied access visible rather than silently filtered. That matters. An agent and its reviewer should both know what the agent could not see, why it could not see it, and how access could be requested deliberately.
That layer runs on Lascaris, the sovereign decision fabric.
Across every scale, the constants hold:
- The constitution is the unit of governance.
- Skills are always signed.
- The audit trail records intent, action, and review.
- Access denials are visible.
- The three-lines pattern does not change.
The agent should not become a reason to centralize sensitive information. It should become a governed interface to distributed information, with visible boundaries and reviewable decisions.
What this means for the customer
For the Berlin-based building automation company, the impact is not a flashy agent demo.
The impact is a safer path to adoption.
The security and governance function can test agents without giving up the practices that make the function credible. The company can start with a narrow control surface and expand only when the evidence supports expansion.
Skills can be approved one by one. Rules can be changed through versioned review. Memory can be kept only where it serves a purpose. Audit evidence can be produced as part of normal work.
That is a more serious outcome than faster reporting.
It means the organization can learn how agents behave inside its own governance environment before exposing them to broader operational complexity.
It also means the company can have a better internal conversation. Instead of debating AI in abstract terms, teams can inspect concrete artifacts:
- Here is the constitution.
- Here is the skill.
- Here is the hash.
- Here is the journal.
- Here is the review.
- Here is the denied action.
- Here is the standard this maps to.
That changes the conversation from opinion to evidence.
What a team should require
The first generation of agent projects asked whether an agent could be made useful. The next generation has to ask whether the decision environment around the agent can be made accountable.
That is the higher standard.
A team putting agents into work should require evidence on five points:
- Executable policy the agent reads on every interaction, not a document filed away from runtime.
- Signed, inspectable skills whose integrity is checked before they run.
- A real audit trail of intents, actions, and decisions, not a transcript of a prompt and a reply.
- Named human review, with genuine independence as the model scales.
- A clear mapping to the standards and regulations the company will actually be assessed against.
The win for a regulated team is not a faster agent. It is an agent whose every action the team’s own auditors can reconstruct.
If the answer to “why did the agent do that?” is “the model decided,” the governance has already failed.
About Scalytics
Our founding team created Apache Wayang, the federated execution framework that lets computation run where the data lives and dramatically reduces unnecessary data movement.
We also built and maintain kafSCALE, a high-performance, Kafka-compatible streaming platform designed for Kubernetes and object storage. It delivers elastic scale without broker complexity or lock-in.
Our mission: Keep data in place. Bring compute to the data. Enable secure, sovereign, and production-ready AI operations.