Announcing the CTTAF Benchmark: Evaluating AI Alignment with Christian Theological Values

I’m excited to share a new open-source project I’ve been working on: the Christian Theological Triage Alignment Framework (CTTAF) Benchmark.

As AI models become more integrated into daily life, education, counseling, and even ministry contexts, it’s increasingly important to understand how they handle deep questions of faith. Do they reason faithfully across Christian traditions? Can they distinguish between core gospel truths and areas where faithful believers legitimately disagree?

That’s where CTTAF comes in.

What is the CTTAF Benchmark?

The CTTAF Benchmark is a comprehensive evaluation framework designed to test how well large language models align with Christian theological reasoning. It draws on the well-established concept of theological triage — the practice of categorizing doctrines by their relative importance (inspired by thinkers like Albert Mohler) — to create a nuanced, pluralistic assessment tool.

The benchmark currently includes:

900 theological questions covering foundational, secondary, and tertiary doctrines
Rich metadata for each question: triage level, doctrinal dimension, and denominational context
A sophisticated dual-judge evaluation system using independent LLM judges
Geometric mean scoring that fairly handles disagreements between judges
Tier-weighted penalties and inter-judge agreement metrics (like Cohen’s kappa)

This isn’t about forcing AI into a single denominational box. Instead, it evaluates alignment across the diversity of Christian thought while maintaining clear boundaries around essential Christian orthodoxy.

Key Features

Pluralistic yet grounded: Tests responses across various Christian traditions while anchoring in core doctrines.
Reproducible and transparent: Full dataset, prompts, rubrics, and evaluation code are open source.
Practical tooling: Python scripts that work with major LLM providers (OpenAI, Anthropic, and more).
Sample datasets for quick testing (including a 100-question starter set).

The full project structure includes dedicated folders for the whitepaper, data, prompts, rubric, evaluation scripts, and appendices for reproducibility.

Project Status: Under Active Development

Important note: CTTAF is still in its early stages. No official releases have been published yet. We’re refining questions, expanding the whitepaper, improving documentation, and hardening the evaluation pipeline.

This is very much a living project — and that’s where you come in.

Call for Contributors

We’re actively looking for collaborators from a variety of backgrounds:

Theologians and pastors for doctrinal review and perspective
AI/ML engineers to improve evaluation methods
Software developers for tooling and automation
Researchers interested in faith-aligned AI
Writers and editors for documentation and the whitepaper

If you’re passionate about responsible AI development and the intersection of technology with Christian faith, we’d love your help. Check out the CONTRIBUTING.md file for details on how to get involved.

How to Get Started

Visit the repository: https://github.com/thebytebar/cttaf-benchmark/
Read the README for setup instructions
Try the sample dataset
Join the conversation in Issues or Discussions

The project is licensed under CC-BY-SA-4.0, encouraging broad use and sharing with attribution.

Why This Matters

In an era where AI is increasingly asked to reason about morality, purpose, and ultimate truth, benchmarks like CTTAF can help developers, researchers, and users better understand model behavior on matters of eternal significance.

Whether you’re building AI applications for church use, studying AI ethics through a faith lens, or simply curious about how today’s models handle theology — this benchmark aims to provide valuable insights.

I’d love to hear your thoughts. Drop a comment, star the repo, or reach out if you’re interested in contributing.

Let’s build AI that better understands — and respects — Christian theological depth.