RxReason

The safety layer that teaches medical AI when to answer, when to ask, and when to stop.

Medical AI is racing into patient and clinician workflows, but most models are optimized to answer questions, not to judge whether a medication question is safe to answer. RxReason turns every medication query into a structured, auditable safety decision: answer, caution, clarify, or block.

See the Benchmark Thesis Explore RxGuard

Connect

✦ 𝕏 in ◎

Medical AI has an over-answering problem. A model can perform well on exam-style benchmarks and still answer too confidently when dose, route, medication list, pregnancy status, renal function, allergies, or dangerous drug combinations are missing.

The over-answering problem

0 Est. Annual Global Cost Of Medication Errors

0 Target Valid Structured Audit Output

0 Target Premature-Answer Rate (Held-Out Eval)

Platform5

Sufficiency Judgment

Decides whether enough information exists to answer a medication question safely, instead of defaulting to an answer.

›

Minimal Clarification

Identifies and asks the single most clinically decisive missing field — not a generic “see your doctor.”

›

Hard-Stop Calibration

Blocks explicit dangerous combinations and contraindications, explains why, and escalates appropriately.

›

Structured JSON Output

Loggable, testable, auditable output usable by downstream systems, grounded in FDA labeling, DailyMed, DrugBank, DDInter, and RxNorm.

›

RxGuard4

RxReason is the reasoning engine.
RxGuard is the runtime shield.

RxGuard applies RxReason-style safety control around production AI systems. It can sit in front of a general medical chatbot, telehealth assistant, pharmacy workflow, or EHR-integrated AI feature to detect underspecified medication questions, enforce clarification, block high-risk cases, and preserve an audit trail.

Pre-Answer Safety Gate

Routes medication questions through sufficiency and risk checks before a model gives advice.

›

Policy Enforcement

Converts unsafe or underspecified cases into clarify, caution, or block decisions.

›

Audit Trail

Captures structured logs for review, QA, compliance, and model improvement.

›

Model-Agnostic Layer

Designed to sit alongside existing LLMs rather than replace the entire stack.

›

RxReason + RxGuard = research-grade medication reasoning with production-grade containment.

Benchmarks5

Benchmarked against the failure mode that matters. Most medical AI benchmarks reward factual recall — RxReason evaluates a harder operational question: did the model know whether it had enough information to answer?

Benchmark / BaselineWhat it measuresRxReason position

MedQA / MedMCQA / PubMedQABroad medical knowledge

Used as retention checks, not the main win condition.

MedGemma-style medical modelsStrong medical recall

Targets medication-safety sufficiency and auditability.

General chatbotsHelpful language generation

Published studies show persistent unsafe or premature answers in medical settings.

Prompt-only safety guardrailsCan reduce risk

Trains the behavior directly and measures it with held-out evaluation.

RxSafeBench / Rx-LLM / med-safety benchmarksValidate the category

Adds a structured model, internal benchmark suite, and deployment layer.

What is already shown

Pilot training showed the safety signal is learnable — ~0.91 proposed-action accuracy and 0.93 information-sufficiency accuracy on in-distribution evaluation.
Structured prompting improved JSON reliability but did not solve action calibration by itself.
Held-out clarification evaluation exposed the real research problem: models must generalize beyond rule-like training data.
The source-grounding pipeline, RxNorm normalization direction, benchmark assets, and audit schema are already built.

Flagship model targets

Targets, not achieved claims.

Valid JSON rate: ≥ 0.95
Proposed-action accuracy: ≥ 0.70
Information-sufficiency accuracy: ≥ 0.70
Decisive missing-field F1: ≥ 0.50
Hard-stop F1: ≥ 0.75
Premature-answer rate: ≤ 0.03
RxClarifyScore: Above pilot baseline
General/medical QA retention: Within ~2 pts of base

From patient question to auditable action.

Parse the medication question

Identify medication, dose, route, schedule, intent, and known patient factors.

Detect missing decisive context

Determine whether current facts are enough for a safe response.

Ground risk hypotheses

Check medication-specific risks, interactions, contraindications, and context-sensitive factors.

Choose the safety action

Return answer, caution, clarify, or block.

Log the audit

Produce structured JSON that downstream systems can inspect, store, and evaluate.

Roadmap5

Benchmark publication

Package the evaluation suite and baseline comparisons into a public technical thesis.

Flagship model run

Train and evaluate the model against held-out medication-safety tasks.

RxGuard pilot

Deploy the runtime shield around a controlled medical AI workflow.

Bring medication-safety reasoning to your AI stack.

Talk to us about RxReason benchmarks or piloting RxGuard as a pre-answer safety layer for medical AI.

Contact the team

What does RxReason actually do?

RxReason turns a medication question into a structured safety audit: it identifies the medication, known patient context, missing critical fields, risk hypotheses, the next best question, and a proposed action — answer, caution, clarify, or block.

No. RxReason and RxGuard are research and software-safety infrastructure — not medical advice, and not a substitute for a clinician or pharmacist.

RxReason is the reasoning engine that makes the safety decision. RxGuard is the runtime layer that applies those checks around a production AI system before it answers.

A structured JSON audit that is loggable, testable, and usable by downstream systems — grounded in sources like FDA labeling, DailyMed, DrugBank, DDInter, and RxNorm.

Measured pilot results and projected flagship targets are reported separately. Targets are goals, not achieved claims.

Yes. RxGuard is model-agnostic and designed to sit alongside existing LLMs rather than replace the stack.