Technical architecture.

This page documents the technical machinery behind every signal in the Whel substrate. It covers the six active data pipelines and the sources each one queries, the five-dimension rubric Whel applies to every signal, the female-applicability multiplier and the four confidence tiers scores map into, the arm-specific admission standards each of the three evidence arms enforces, the descriptive regulatory & development-status layer reported beside each score, and the documented limitations of the methodology as currently shipped.

Architecture · Where each layer stands

The three-layer substrate, and what is built today

Whel's public architecture describes three layers: a corrected, sex-aware substrate, a retrieval-and-validation layer, and a hypothesis-from-signal layer. That is the target architecture, and it is partially built. The six-condition database documented on the rest of this page runs on the scored-signals engine the layers are progressively replacing, so this section gives an honest status of each layer.

Layer 01

The substrate

Foundation live · graph live (Open Targets conditions)

The corrected, sex-aware knowledge base. Its grounding is live: every condition resolves to a MONDO or EFO disease identifier and every drug to canonical ChEMBL and RxNorm identifiers, so entities are matched by identity rather than by name string. The graph itself, the drug-to-target-to-disease edges drawn from Open Targets, is now built over the conditions Open Targets covers and surfaces a graph-supports or graph-silent cross-check beside each signal in the gated view. Where Open Targets has no entry, the graph stays silent, which is shown rather than hidden. The sex-aware extension splits in two: sex-specific pharmacokinetics is now seeded for an initial set of compounds, each sourced to an FDA drug label or the curated sex-PK literature (Zucker & Prendergast 2020; Soldin & Mattison 2009) and shown beside the signal, while cyclical hormonal phase is now seeded for the strongest-evidence PMDD cases (luteal-phase SSRI dosing; drospirenone cycle suppression) and shown beside the relevant signals, with broader population ongoing.

Layer 02

Retrieval and validation

Built as a flagship (PMDD, PMS)

Provenance-preserving extraction: each atomic claim is tied to a verbatim source span, checked for entailment against that span, and contradictions in the literature are surfaced rather than averaged. This is built and running, but seeded for PMDD and PMS only; it is not yet extended across all six conditions or wired into the main signal index.

Layer 03

Hypothesis from signal

Intake live · validation loop flagship

Off-label and patient-community signal as hypothesis generation. The community arm is live across all six conditions, where the Reddit pipeline and off-label patterns feed the index. The formal downstream validation against mechanistic and clinical evidence runs through the same PMDD flagship as Layer 02.

The candidate index and condition pages live today are produced by the substrate engine described below: the six data pipelines, the per-arm five-dimension rubric, the female-applicability multiplier, and the four confidence tiers. Every signal traces to verbatim-verified claims; the full sex-aware extension is still being built out across every condition.

Around that scored substrate sit several descriptive layers, reported beside each signal but never folded into the score: the independent MATRIX cross-reference, the sex-specific pharmacokinetics and cyclical-phase reads, and a regulatory & development-status layer. The regulatory layer grounds each candidate in the external US record, reading three authoritative public sources, the FDA-approved drug label via DailyMed, the FDA Orange Book, and ClinicalTrials.gov, into reviewed, committed snapshots so the panel is reproducible and can be checked against the upstream source. Each is read conservatively (label categories limited to NDA / ANDA / BLA; Orange Book to single-ingredient products; trials to interventional studies of the drug as a therapy), so it maps the landscape a 505(b)(2) route would build on without ever becoming a scoring input or regulatory advice. The full read is detailed in the scoring framework below, and every source is listed on the external references page.

Figure 1 · Pipeline register

The data pipelines

Whel runs six active data pipelines that populate the substrate on demand. A seventh (EudraVigilance) is implemented but not yet contributing signals to the current snapshot. The regulatory & development-status sources (the FDA-approved label via DailyMed and the FDA Orange Book) sit outside this register: they are read offline into reviewed, committed snapshots and reported beside the score, not ingested as on-demand pipelines.

Pipeline	Evidence arm	API	Status
PubMed	Direct Research	NCBI Entrez	● Active
ClinicalTrials.gov	Direct Research	REST API v2	● Active
FDA AEMS	Pathway Insights	OpenFDA	● Active
Open Targets Platform	Pathway Insights	GraphQL	● Active
SIDER	Pathway Insights	Bulk TSV	● Active
Reddit	Community Forum Reports	OAuth JSON	● Active
EudraVigilance EVDAS	Pathway Insights	Oracle BI API	● In development

PubMed

Queries the NCBI Entrez API for published studies directly investigating each condition. Searches are condition specific and filtered for relevance. Results are parsed for study type, date, and abstract, then passed to Claude Opus for signal extraction and evidence strength classification.

ClinicalTrials.gov

Queries the ClinicalTrials.gov REST API v2 for active, completed, and recruiting trials targeting each condition. Trial phase, status, and intervention type are captured and stored alongside the primary signal.

FDA Adverse Event Reporting System (AEMS) [Formerly FAERS]

Queries the FDA adverse-event public API (OpenFDA, the system formerly known as FAERS) for condition-aware reaction counts. Each record is given a dual read: first a safety caveat, then, only where the pharmacology supports it, a hedged mechanistic lead within the Pathway arm, never presented as efficacy. Records carry a mandatory caveat that spontaneous reports cannot establish causation or incidence.

Open Targets Platform

Queries the Open Targets Platform GraphQL API (platform.opentargets.org) for each condition using standardized EFO and MONDO disease ontology identifiers. Retrieves drug candidates, mechanistic associations, and biological target scores aggregated from genetic association data, known drug target interactions, Reactome pathway analysis, and differential gene expression. Results are analyzed by Claude Opus for pathway level repurposing hypotheses. No authentication required.

SIDER

Loads the SIDER side-effect resource (sideeffects.embl.de), which pairs marketed drugs with label-documented side effects and their reported frequencies. Like AEMS, each record is rendered into a fixed verbatim sentence and read as a safety caveat first, a hedged mechanistic lead second. Carries a vintage caveat (2015 label snapshot).

Queries condition specific subreddits (r/Endo, r/PCOS, r/PMDD, r/Menopause, r/adenomyosis, r/vulvodynia) for posts and comments. Individual permalinks, thread IDs, authors, and timestamps are stored so independence can be judged deterministically. The pipeline looks for consistent patterns across many independent accounts, not individual anecdotes, and never scores them on clinical-trial criteria.

EudraVigilance EVDAS (in development, not yet contributing signals)

Queries the European Medicines Agency adverse event database (dap.ema.europa.eu) via the Oracle BI Analytics API. Substance codes are resolved via the public adrreports.eu substance table. Female patient reaction data is filtered and grouped by condition. Requires a free registered EMA account for session authentication. This pipeline is implemented but has not yet been ingested into the current database snapshot.

Figure 2 · Scoring framework

How evidence is scored

Whel applies a structured, multidimensional inclusion framework to every signal before it enters the database. The goal is a tiered evidence framework with minimum standards for reliability, reproducibility, and actionability, rather than a single universal cutoff. The framework was developed in consultation with published research on evidence synthesis and pharmacovigilance methodology, drawing on established practices in systematic review design and drug repurposing research.

The arm score is produced by the five-dimension rubric applied to the extracted evidence, then discounted by the female-applicability multiplier. The MATRIX cross-reference is an independent external benchmark: it is computed separately and reported beside the score; the rubric never reads it as an input. Keeping it out of the scoring pipeline means an outside benchmark cannot raise or lower the rubric score, and any agreement between an independent benchmark and the score carries real information.

A regulatory & development-status layer is reported beside the score on the same terms: it is descriptive landscape context, never a scoring input. Built offline from three public US sources into reviewed, committed snapshots, it records, per candidate, whether the target condition is an FDA-approved (on-label) use or off-label (read from the FDA-approved drug label via DailyMed, counting only NDA, ANDA, and BLA marketing categories so supplements and homeopathics are excluded); whether the molecule is a generic or a single-source brand still under patent (read from the FDA Orange Book using single-ingredient products only, so patents on novel branded combination formulations are never attributed to the base molecule); and how far the drug has been studied as a therapy for the condition (read from ClinicalTrials.gov, excluding mechanistic, drug-interaction, and Phase-4 post-marketing studies). It sketches the landscape a 505(b)(2) filing would build on but is explicitly not a viability assessment or regulatory advice; it is live across all six conditions.

Model Selection: Claude Opus 4.8

The dimension scoring is performed using Claude Opus 4.8 (claude-opus-4-8), released in May 2026; the deterministic steps (the female-applicability multiplier, imprecision caps, and tier assignment) are computed in code, not by the model. Opus 4.8 was selected for its performance on complex multicriteria reasoning, where each arm's five dimensions must be assessed against the source content in a single analytical pass. In our internal testing, smaller and faster models produced flatter, less discriminating scores on plausibility and consistency.

On clinical text specifically, Anthropic reports that Opus 4.8 scores 55.8% on HealthBench Professional, an external, physician-authored benchmark of real clinical tasks, up from 51.9% for the previous Opus release. No frontier model is close to the ceiling on these tasks. Opus 4.8 has not been evaluated on any women's-health-specific benchmark; the most recent such evaluation, WHBench (March 2026), tested the earlier Opus 4.6 and found it the strongest of the models studied at 72.1%, while still flagging meaningful safety and completeness gaps. We read these results as evidence the model handles clinical text well. They are not a guarantee that any individual score is correct, which is why every score is shown beside its verbatim source and the model's written rationale.

The Five-Dimension Scoring Framework

Every signal is scored from 0 to 2 on five dimensions, summing to a 0-10 arm strength. The five dimension scores and structured facts are proposed by Claude Opus 4.8 from the full source content; the deterministic parts (the female-applicability multiplier, the imprecision caps, and the confidence tier) are then computed in code, never by the model. The full per-arm criteria are documented on the signal types and scoring page.

Dimension	Score 0	Score 1	Score 2
Corroboration Independent corroboration, kept distinct from rigor and consistency so the same fact is never scored twice.	0A single source (a lone review or primary study).	1A single synthesis, or two independent sources.	2Three or more genuinely independent, consistent sources (or one large, low-bias pivotal trial).
Rigor Study design / risk of bias for Direct; model strength for Pathway; report specificity for Community.	0Case report, preclinical, in-vitro, or vague report.	1Observational, small trial, or partial detail.	2RCT, meta-analysis, active guideline, or human-relevant model.
Specificity Whether the evidence speaks to this exact drug and this exact condition or outcome.	0Proxy only; drug or outcome vague.	1One side named, the other adjacent.	2Both named directly and linked.
Plausibility Whether a credible biological mechanism connects the drug to the condition.	0Mechanism asserted or unexplained.	1Plausible mechanism.	2Evidenced in relevant biology / directly fits known pharmacology.
Consistency Whether the sources agree in direction; contradictions cap this and are shown, not averaged.	0Conflicting findings.	1Mostly one direction.	2Unanimous (a single study is scored neutral, not penalized).

Pre-registered validation

The benchmark that tests whether this rubric separates high- confidence signals from lower ones is pre-registered: sample, external comparators, and reporting rules are all fixed before the run.

Read the methodology →

Figure 3 · Tier mapping

Confidence tiers

Each arm's five dimensions sum to a strength of 0-10, which is then multiplied by a female-applicability factor. The resulting arm score maps to four confidence tiers (cutoffs frozen against the real score distribution):

Strong

Composite ≥ 8.0

Guideline-grade: independently replicated, low-bias evidence generated in women.

Moderate

Composite 6.0 to 7.9

Good evidence with solid rationale, not yet definitive.

Emerging

Composite 3.5 to 5.9

A real early lead worth watching: some corroboration or mechanistic support.

Exploratory

Composite < 3.5

Thin or single-source signals, surfaced with heavy caveat for hypothesis generation.

Figure 4 · Category standards

Category-specific minimum standards

Each of the three evidence arms carries its own scoring bar, because a published trial, a mechanistic link, and a community report each demand a different kind of corroboration before they count. The full per-arm criteria, with their sources and worked examples, are documented on the signal types and scoring page.

The three evidence arms in depth →

Figure 5 · Reliability rules

Cross-cutting reliability rules

For every signal across all three evidence arms, Whel applies five cross-cutting reliability checks:

Outcome specificity

"Improved" is insufficient. Qualifying outcomes include pelvic pain, heavy bleeding, cycle regularity, mood lability in luteal phase, vulvar burning, and similar condition specific clinical endpoints.

Effect directionality

Every signal must be classified as one of: improves, worsens, mixed, or unclear.

Corroboration

One source is interesting. Two or more independent sources start to constitute a signal.

Confounding assessment

Known confounders are flagged: drugs with multiple indications where symptom improvement may be indirect, forum populations reporting multiple concurrent therapies, and adverse event data that may reflect reporting bias rather than true incidence.

Denominator awareness

FDA AEMS and community data do not provide true incidence rates. They are signal generating sources, not causal datasets. All signals from these sources are labeled accordingly and require corroboration before elevation above Emerging.

One Guiding Principle

Frequency is not truth.

A rare but repeatedly observed, highly specific signal from a single credible source may carry more evidential weight than 500 vague forum mentions. Whel's scoring framework is designed to privilege specificity, reproducibility, and triangulation over raw volume.

Figure 6 · Infrastructure

Database and infrastructure

Database

Supabase (PostgreSQL). The substrate's core tables are entities (ontology-grounded drugs and conditions), documents and source_spans (the immutable source text), claims (each atomic claim pinned to a verbatim quote with verified character offsets), contradictions, and substrate_signals. Each substrate_signals row is one evidence arm's reading of an (intervention, condition, aspect): the five dimension scores and rationales, the female-applicability band and multiplier, the generated arm strength and arm score, the confidence tier, and back-references to the claims behind it. Rows are unique per (intervention, condition, aspect, arm).

Frontend

Next.js (TypeScript) with Tailwind CSS, hosted on Vercel. Analytics via Vercel Analytics.

Source deduplication

Sources are deduplicated by URL before storage. The frontend applies additional normalization to prevent the same compound appearing multiple times in the same evidence bucket.

Figure 7 · Limitations

Documented limitations

Whel is a signal aggregator rather than a clinical recommendation engine. Evidence strength classifications are generated by a language model against a published five-dimension rubric and should be treated as a starting point for further investigation, not a definitive assessment. Community Forum Reports reflect patient-reported patterns and are explicitly not clinical evidence. The absence of signals in the Direct Research arm for a given condition is itself data; it reflects the current state of published research rather than a gap in the tool. The full list of known limitations is documented below, grouped for navigation.

Planned work against these limitations, including external cross-reference to Every Cure's MATRIX scores, a cross-arm concordance flag, a compound-level synthesis score, and a dedicated audit log, is tracked on the Roadmap.

Back to About