WhelWomen's Health Evidence Lab

External references.

Whel is built on top of public databases that other research groups also use, and it sits within a broader ecosystem of drug-repurposing work being done by other teams. This page lays out those relationships explicitly. The intent is to be honest about what Whel is doing that no one else does, what it is using from other projects, what it is comparing itself against, and what kinds of work are intentionally out of scope.

At a glance
Independent layer
Cross-reference
  • Every Cure's MATRIX dataset, displayed where it has coverage
  • Machine-learned biological-plausibility scores from a biomedical knowledge graph
  • Shown alongside Whel's own grades rather than blended into them
Primary data
Live sources
  • Five primary sources, each running an active pipeline
  • Every cited record reachable upstream to its source
  • Public, free, and built on stable identifiers wherever possible
Out of scope
Intentional exclusions
  • Secondary consumer-health portals and Wikipedia summaries
  • Closed or paywalled patient-experience platforms
  • Generative AI outputs treated as evidence
01 · Underlying data

The primary sources Whel runs on

Every signal in the database traces back to one of a small set of primary sources. The table below lists each source, its role in the pipelines, and the current integration status. Each name links straight to the source.

SourceRoleStatus
PubMedPublished literature; the spine of the Direct Research armLive
ClinicalTrials.govTrial registry; both Direct Research and Cross-Condition armsLive
FDA openFDA / AEMSAdverse-event data underlying the Cross-Condition armLive
Open TargetsGenetic-target and pathway evidence behind Pathway InsightsLive
Reddit communitiesCurated condition-specific subreddits feeding Community Forum ReportsLive
Monarch Initiative · MONDO
Identifier resolution only
Disease ontology used to align condition names with the external biomedical knowledge graphLive
Every Cure MATRIX
Disclosure layer; not blended into Whel grades
Independent biological-plausibility layer; displayed where MATRIX has coverageLive
Named society guidelines (ESHRE, ISSWSH, NAMS)
Three bodies curated to date (ESHRE 2022, ISSWSH 2021, NAMS 2020); expansion ongoing
Published clinical guidelines from named society bodies, human-curated into strength × certainty pairs that corroborate the Direct evidence arm and anchor the clinical validation status where a named recommendation covers a compound–condition pairLive
EudraVigilanceEuropean adverse-event data; under review for parity with openFDAPlanned
DrugBankDrug-target and indication data; licensing model under reviewPlanned
SIDERDrug side-effect reference; under review for retention or formal retirementPlanned
DRKG (Drug Repurposing Knowledge Graph)
Validation cross-reference; not integrated into the core architecture
Open, multi-source repurposing knowledge graph. Planned as an independent validation cross-reference shown beside a signal, not merged into Whel's own graph, because it carries the field's male-default coverage that Whel exists to correctPlanned
PrimeKG (Precision Medicine Knowledge Graph)
Validation cross-reference; not integrated into the core architecture
Open precision-medicine graph across drugs, diseases, phenotypes, and pathways. Planned as a second independent cross-reference to widen the graph-supports-or-silent disclosure beyond one sourcePlanned
TxGNN (graph foundation model)
Validation cross-reference; not integrated into the core architecture
Open, zero-shot drug-repurposing model. Planned as a benchmark and hypothesis cross-reference whose predictions Whel would validate rather than adopt, since the model inherits the same male-default training dataPlanned

Every cited record in Whel is reachable upstream to one of the live sources listed above.

02 · Structured grounding

Two structured grounding layers on top of LLM extraction

Whel's evidence extraction and scoring layer runs on a large language model. Documented LLM failure modes (universal social-determinants blind spots reported by WHBench in 2026; high reference-fabrication rates reported by Bhattacharyya et al. 2023 in Cureus, 47 percent of ChatGPT-generated medical references fully fabricated and 46 percent authentic but with bibliographic errors) motivate grounding the pipeline in structured external knowledge rather than relying on LLM output alone. Two such layers are recorded in the methodology version log at v3.4: ontology-grounded entity resolution (Path A) and knowledge-graph grounding (Path B). Both are now in place in their first form. Path A canonicalizes extracted entities to standard identifiers and enriches them with structured metadata, and is applied across the corpus with ambiguous cases held for human review. Path B builds a domain-restricted graph over Open Targets and surfaces a 'graph supports' or 'graph silent' layer beside each signal, in the same shape as the MATRIX coverage block above. Two extensions stay planned and are called out below: feeding the graph into LLM scoring at prompt time, and a deeper property-graph version alongside an independent open-knowledge-graph validation track. These are architectural additions, not post-hoc checks. Both blocks are collapsed by default; expand for the full account.

Path A: Ontology-grounded entity resolutionLive · Canonical IDs resolved; audit numbers to followChEMBL · MONDO · EFO
What the layer does

This layer serves three functions, not one. First, it canonicalizes: every compound and condition the LLM extracts is resolved against a canonical biomedical registry and rewritten with that registry's standard identifier before being written to Whel's database. Compounds resolve against ChEMBL or DrugBank; conditions resolve against MONDO (the same ontology Whel already uses for the MATRIX cross-reference above). Second, it enriches: the resolution call returns structured metadata (generic name, drug class, ATC code, known targets for a compound; ontology lineage for a condition) that travels with the signal into the database, changing the shape of the data Whel stores. Third, it gates: entities that fail to resolve are flagged for human review rather than silently stored, which catches the structured-output hallucination class of error documented in the LLM literature.

What the audit disclosure will show once surfaced
  • Per-pipeline entity resolution rate. Percentage of LLM-extracted entities that resolved against the canonical ontology, broken down by pipeline (PubMed, ClinicalTrials.gov, FDA AEMS, Open Targets, Reddit). A pipeline with a noticeably lower resolution rate is a pipeline whose extraction prompt is producing more hallucinated entities.
  • Per-condition resolution rate. So resolution quality differences across the six conditions are visible rather than averaged away. A condition whose extracted compounds resolve at a lower rate is a condition where extraction is less trustworthy.
  • Count of entities currently flagged for human review. With the pipeline and condition each was extracted from, and the reason resolution failed (no matching identifier, ambiguous match across multiple registered compounds, deprecated identifier).
  • Sample of unresolved entities from the most recent run. So the failure mode is concrete rather than abstract. A reader can see the actual text the LLM produced and judge for themselves whether the rejection is a true positive or whether the canonical ontology is incomplete.
  • Enrichment summary. Average number of structured metadata fields attached to each resolved entity (drug class, ATC code, known targets for compounds; ontology lineage for conditions), so the data-shape change is visible rather than implicit.
Literature anchor

Bhattacharyya et al. 2023 (Cureus, doi:10.7759/cureus.39238) examined 115 references across 30 ChatGPT-generated medical papers and found 47 percent fully fabricated, 46 percent authentic but with bibliographic errors, and only 7 percent authentic and accurate. WHBench (Maurya, Saboo & Kumar 2026, arXiv:2604.00024) documents a 35.5 percent fully-correct rate for the top frontier LLM on women's health clinical questions, with systematic gaps in safety, completeness, and the social-determinants criterion. Resolution and enrichment against canonical ontologies addresses the structured-output failure mode that both papers describe and also moves the data Whel stores from free-text strings to canonical identifiers with structured metadata.

Where this lives in the project

Recorded in the methodology revision history at v3.4 (see the methodology changelog ). Listed on the Roadmap under the technical-architecture track as “Ontology-grounded entity resolution (Path A).” The resolution, enrichment, and human-review gate are now applied across the corpus, so the canonical identifiers and structured metadata travel with every signal. The per-pipeline and per-condition audit numbers above are the remaining piece: they populate this block once the resolution-rate disclosure is surfaced.

Path B: Knowledge-graph grounding (graph supports / graph silent)Live · Graph supports / silent shipped over Open TargetsOpen Targets · gated view · BioCypher version planned
What the layer does

This layer is both a grounding mechanism and a disclosure layer, and it arrives in stages. The stage live today is relational: a domain-restricted graph of drug, target, and condition relationships built over Open Targets, restricted to Whel's six conditions and the compounds attached to active signals. From it, each signal carries a ‘graph supports’ or ‘graph silent’ tag beside its grade, in the same shape as the MATRIX score row above. ‘Graph supports, via target X’ means the drug acts on a target that Open Targets associates with the condition; ‘graph silent’ means no such shared target is present, which can reflect either a real biological gap or a limit of the source data. Two of the six conditions show this plainly: vulvodynia and PMDD return no disease entry in Open Targets at all, so every signal under them is graph silent by construction, an absence the page shows rather than hides. The planned stage deepens this: a property-graph version built with the BioCypher framework (Lobentanzer et al., Nature Biotechnology 2023), grounded in canonical ontologies, and a feed of the relevant subgraph into the LLM at prompt time so the model relies less on parametric memory alone. Open repurposing graphs and models such as DRKG, PrimeKG, and TxGNN sit in a separate validation track, checked against rather than merged in, because they carry the field's male-default coverage that Whel exists to correct.

What the disclosure covers

The signal-level ‘graph supports / graph silent’ tag is live in the gated view today. The aggregate audit views below (graph size, per-condition coverage, tier cross-tabulation) are computed from the same data and are being surfaced as reporting.

  • Knowledge graph size and shape. Number of nodes by type (drug, condition, gene, pathway, adverse event) and edges by relationship type (targets, treats, interacts, associated-with, etc.). Reported as a snapshot table with the data-source SHAs each edge type was drawn from, identical in shape to the MATRIX dataset snapshot above.
  • Per-condition graph coverage. For each of the six conditions, the count of Whel compounds that have at least one graph-supported mechanistic path to that condition, and the count that have none. A condition with low graph coverage is a condition where the Whel grades stand alone without a graph cross-reference, and that fact is made visible.
  • Signal-level graph support. Each individual signal carries a 'graph supports' or 'graph silent' tag. 'Graph supports' means at least one mechanistic path exists in the KG that connects the compound to the condition through known targets, pathways, or co-occurring annotations. 'Graph silent' is not the same as 'graph contradicts'; it means the open KGs do not contain a relevant edge, which can reflect either a real biological gap or a known limitation of the source data.
  • Cross-tabulation against Whel tiers. How Whel's four confidence tiers (Strong, Moderate, Emerging, Exploratory) cross with graph support. A graph-supported Strong-tier signal is the strongest combined evidence the platform can present. A graph-silent Strong-tier signal is a signal where the literature replicates but the open knowledge graphs have not yet caught up; that pattern is also informative.
Literature anchor

BioCypher is the peer-reviewed, EU-funded biomedical knowledge graph framework introduced in Lobentanzer et al., Nature Biotechnology 2023, and actively maintained. The architectural pattern of layering structured knowledge graphs on top of LLM extraction, rather than replacing the LLM with classical ML, is the direction argued by Zong, Lv, Xue, Zheng, Wan & Zhang 2026 (“Building evidence-based knowledge bases from full-text literature for disease-specific biomedical reasoning,” arXiv:2603.28325, which introduces the EvidenceNet dataset). Every Cure's MATRIX builds on the KGML-xDTD framework (Ma, Zhou, Liu & Koslicki, GigaScience 2023) and demonstrates that knowledge graph plus machine learning systems outperform LLM-only approaches for the separate problem of global drug repurposing prediction, which Whel does not attempt.

Where this lives in the project

Recorded in the methodology revision history at v3.4 (see the methodology changelog ). Listed on the Roadmap under the technical-architecture track as “Knowledge-graph grounding,” now live for the relational Open Targets version with the per-signal tag in the gated view. The BioCypher property graph and the prompt-time scoring feed remain planned, and the open knowledge graphs and models are tracked separately under the Roadmap's validation layer. Whel will not train a custom graph neural network; the platform consumes machine learning (Claude Opus 4.8 for extraction and scoring, MATRIX scores as the existing cross-reference) but does not develop its own ML models.

03 · Output validation in progress

A three-part pipeline for LLM output validation

Whel's structured grounding layers (Path A and Path B, documented in section 01c above) constrain what data the LLM works with. A separate set of failure modes apply to what the LLM produces as output: per-source extraction misclassification, summary drift beyond the source, and fabricated or mis-attributed citations in long-form prose. Path C is a three-part pipeline that validates the LLM's outputs against external authoritative sources before publication. Recorded in methodology v3.6. Phase 1, citation validation, is live; phases 2 and 3 are planned. This section sets out what each phase does, or will do, and what its disclosure looks like. The block is collapsed by default; expand for the full plan.

Path C: Citation validation and summary groundingPhase 1 live · Phases 2 and 3 pendingNCBI E-utilities · Crossref REST API · Sentence-BERT
Phase 1: citation validation

Every PMID in Whel's database, and every reference in any prose Whel publishes (featured signal walkthroughs, the methods PDF, written drafts), is resolved against NCBI E-utilities; DOIs are resolved against the Crossref REST API. Each lookup returns canonical title, authors, journal, and year, which are compared against the LLM-claimed metadata. References that fail to resolve or whose returned metadata mismatch the LLM's claims are flagged for human review and blocked from publication. This addresses the citation fabrication and citation-misattribution failure modes directly.

Phase 2: sentence-level summary grounding

For every signal row in the database, a sentence-transformer model (the Sentence-BERT family of models published on Hugging Face) computes the cosine similarity between each sentence in the LLM-generated summary and the source abstract. Sentences that fall below a calibrated similarity threshold are flagged as “not directly supported by the source” and either suppressed or surfaced with that marker on the signal card. The threshold is tuned against a held-out human-validation set rather than picked by intuition. This addresses the summary-drift failure mode documented in the medical LLM literature (Bhattacharyya et al. 2023 in Cureus, doi:10.7759/cureus.39238) applied to Whel's specific extraction task.

Phase 3: prompt hardening for published prose

Any LLM-generated long-form prose that ships to users (featured walkthroughs, methods PDF text, future Substack drafts written through Whel's tooling) is generated under a hardened prompt that forbids citation generation outside a pre-verified reference list provided to the model, forbids numerical claims (prevalence rates, effect sizes) unless they appear verbatim in the input context, and requires the model to produce, alongside the text, a sentence-by-sentence list of which input sources support each sentence. The list is then checked by Phase 1 before the prose is published.

Phase 1 audit · live as of June 8, 2026

Path C Phase 1 ships as scripts/verify-citations.py and reads the pre-verified reference list at lib/whel-citations.json. The script resolves every PMID against NCBI E-utilities, every DOI against the Crossref REST API, and every arXiv ID against the arXiv API, then compares returned canonical metadata (title, first-author surname, container title, year) against the claims in the manifest using fuzzy match with calibrated thresholds. Output is written to scripts/audit-output/citation-audit-report.json and mirrored to lib/citation-audit-snapshot.json, which this page reads from. --strict mode exits non-zero on any unresolved or mismatched entry and is wired for pre-publish use.

Total citations
22
Resolved + match
22
Resolved + mismatch
0
Unresolved
0

The expanded manifest on June 7, 2026 (v3.9) added the eight hand-written featured-page references. The verifier caught a real author misattribution among them: a paper attributed to one author group resolved, at the cited PMC link, to a paper by an entirely different group. The featured page and the manifest were both corrected before the disclosure published. The full run log is at scripts/audit-output/citation-audit-report.json.

Database-sources audit · live as of June 8, 2026

The much larger surface is the live sources table: every PMID from PubMed, every NCT ID from ClinicalTrials.gov, every Open Targets identifier, and every AEMS / Reddit URL that the LLM extraction pipeline attached to an active signal and rendered on a drug card.

Total rows audited
2166
resolved match
180
format only pass
1986

Zero resolved_mismatch entries on the first run: 113 of 113 PubMed PMIDs clean against NCBI E-utilities, 19 of 19 ClinicalTrials.gov NCT IDs clean against the ClinicalTrials.gov API v2, and 38 of 38 canonical Open Targets identifiers clean against the Open Targets GraphQL search. The 10 unresolved are all Open Targets rows storing a synthetic OT-{DRUGNAME} shorthand in the external_id column instead of a canonical CHEMBL identifier; the URL on those rows still points at a real platform.opentargets.org page, so what users see on the drug card is a valid citation. The failure is at the identifier-storage layer rather than the user-visible content layer. Backfill recorded on the roadmap under Backfill canonical Open Targets identifiers on signals using OT-DRUGNAME shorthand; methodology v3.10 has the full finding write-up. Run log at scripts/audit-output/database-sources-audit-report.json.

Phase 2a (summary grounding) · tooling shipped, awaiting first run

Sentence-level grounding for free-text sources (PubMed, ClinicalTrials.gov, Reddit). Each LLM-generated finding sentence on sources.key_finding_excerpt will be embedded via Sentence-BERT (all-MiniLM-L6-v2) and compared against canonical source sentences using max cosine similarity. Sentences scoring below 0.40 will be flagged as not directly supported by the source. Tooling shipped as scripts/verify-summary-grounding.py; runs after the export script is re-run to populate the key_finding_excerpt field on the sources snapshot and after pip install sentence-transformers on the host. This block switches to live numbers the moment lib/summary-grounding-audit-snapshot.json is populated.

Phase 2b (structured-source verification) · tooling shipped, awaiting first run

Field-by-field verification for structured sources (AEMS reaction counts; Open Targets target attributions). AEMS counts will be re-queried against the openFDA drug/event endpoint and compared to the count in the LLM- extracted title. Open Targets target claims will be verified through the OT GraphQL drug(chemblId) query against the linkedTargets list. Tooling shipped as scripts/verify-structured-sources.py; this block switches to live numbers the moment lib/structured-sources-audit-snapshot.json is populated.

What Phase 3 will add later
  • Count of references blocked at publish time. Once Phase 3 prompt hardening lands, this disclosure surfaces how many references the LLM proposed that failed the Phase 1 manifest check and were therefore stripped before publish, instead of the audit reporting on the pre-verified manifest only.
Literature anchor

Bhattacharyya et al. 2023 (Cureus, doi:10.7759/cureus.39238) examined 115 references across 30 ChatGPT-generated medical papers and found 47 percent fully fabricated, 46 percent authentic but with bibliographic errors, and only 7 percent authentic and accurate; Gravel, D'Amours-Gravel & Osmanlliu 2023 (Mayo Clin Proc Digit Health, doi:10.1016/j.mcpdig.2023.05.004) reported the same pattern across a separate medical question set. WHBench (Maurya, Saboo & Kumar 2026, arXiv:2604.00024) documents the broader pattern of frontier LLMs producing confident structured output with systematic failure modes. Path C's three phases each target a specific failure surface within Whel's pipeline rather than attempting to address the LLM gap in aggregate.

Where this lives in the project

Recorded in the methodology revision history at v3.6 (definition), v3.7 (manual audit prototype), v3.8 (Phase 1 live for the hand-written manifest), v3.9 (Phase 1 tooling for the database-sources audit), v3.10 (Phase 1 first database-sources run), and v3.11 (OT-DRUGNAME backfill closing the v3.10 architectural-debt finding). Full revision history at the methodology changelog . Phase 1 is now a Live register row on the Roadmap as “Citation validation and summary grounding (Path C);” Phase 2 (sentence-level summary grounding via Sentence-BERT) and Phase 3 (prompt hardening that forbids citation generation outside the Phase 1 manifest) remain Planned. The structured fields above carry real audit numbers for Phase 1 and will populate the remaining fields when Phase 2 and Phase 3 ship. Path C is distinct from Path A (ontology-grounded entity resolution) and Path B (knowledge-graph grounding), which are documented in section 01c above. Path A and Path B ground the LLM's inputs; Path C validates the LLM's outputs.

04 · Independent cross-references

Every Cure and the MATRIX dataset

Whel's most direct counterpart in drug repurposing is Every Cure, a nonprofit dedicated to systematic re-evaluation of approved drugs across all of disease. Its core dataset, MATRIX, is the largest public source of machine-learned biological-plausibility scores in the field, and Whel surfaces those scores as an independent layer beside its own grades wherever they have coverage.

Featured · Every Cure

Predicted treatment probabilities for around 39.5 million drug-disease pairs.

Every Cure is a nonprofit founded in 2022 by physicians David Fajgenbaum and Grant Mitchell, with Tracey Sikora, to identify new uses for already-approved drugs. Its core work is funded by an ARPA-H program also called MATRIX, which awarded Every Cure $48.3M in Phase 1 and up to $76M in Phase 2, by a TED Audacious Project grant, and through a research collaboration with Google Cloud.

The matrix-scores dataset is a public release of those predictions on Hugging Face. It covers roughly 1,800 drugs paired against roughly 22,000 diseases, with a machine-learned treatment-probability score for each pair derived from a biomedical knowledge graph.

MATRIX and Whel are not doing the same thing. MATRIX provides a model-based estimate of how plausible a drug-disease link looks given the structure of biomedical knowledge. Whel reads the current clinical literature, trial registries, adverse-event reports, target databases, and named community forums for a specific set of under-researched women's health conditions, and grades the evidence it finds. Where MATRIX has coverage of a Whel pair, the MATRIX score will appear alongside the Whel grade so a reader can see both.

One thing worth noting about the difference: MATRIX collapses many evidence streams into a single plausibility score. Whel does not. Each Whel signal stays labeled with the source arm it came from, and where two or more arms independently support the same compound-condition pair, that overlap is surfaced as a cross-arm concordance flag. The flag is currently a planned display element rather than a tier change. The roadmap has the current status.

At a glance
Founded
2022
Founders
Fajgenbaum, Mitchell, Sikora
ARPA-H Phase 1
$48.3M · Feb 2024
ARPA-H Phase 2
up to $76M
TED Audacious Project
grantee
MATRIX coverage
~1,800 drugs × ~22,000 diseases
Pair count
~39.5M
Hosting
Hugging Face, public
What Whel does that MATRIX does not

Whel reads the actual clinical evidence base for specific compound-condition pairs in women's hormonal and reproductive health.

MATRIX is a model that predicts treatment probability from a knowledge graph. It does not read the clinical literature for any specific condition, and it is not condition-specific. Whel does the opposite: a narrow set of women's health conditions, each one read closely across published research, trials, adverse-event data, target evidence, and named patient communities, with every signal scored individually. The two outputs are different enough that Whel shows MATRIX scores beside its own grades wherever MATRIX has coverage, instead of folding them together.

Coverage audited 2026-06-06. Headline numbers and per-condition breakdown below ↓

05 · Coverage disclosure

How much of Whel sits inside MATRIX

Because Whel surfaces MATRIX scores as an independent layer rather than blending them into its own grades, the honest question is how much of Whel's universe MATRIX actually covers. The numbers below come from an audit script that joins Whel's active compound–condition pairs against the published MATRIX dataset. Raw, adjusted, and per-condition figures are all shown so readers can decide for themselves which denominator is fair. Per-pair scores from the same audit are also surfaced on each condition page beside the arm scores, so a reader can see MATRIX's biological-plausibility score for any individual compound–condition pair where MATRIX has coverage, not just the aggregate.

How to read these numbers

MATRIX returns two values per scored drug–disease pair, and Whel surfaces both. Per Every Cure's own dataset documentation on Hugging Face, both values are model predictions of treatment probability, not clinical claims.

The transformed score is MATRIX's prediction of treatment probability. Every Cure trained their model on a biomedical knowledge graph (drug targets, disease pathways, gene associations, and the set of already-validated drug–disease treatments) and the model learned to recognize the structural features that distinguish pairs that are real treatments from pairs that are not. The transformed score combines that raw treatment-probability with the pair's rank inside the drug's other predictions and inside the disease's other predictions, so it surfaces pairs that look like treatments both globally and in context. In the audit run summarized below it ranges roughly 3.0 to 4.5, with higher meaning MATRIX gives the pair a higher treatment-probability.

The quantile rank, shown on each signal card as “Top N%”, is MATRIX's own percentile across all of its predictions (roughly 39.5 million drug–disease pairs). A pair shown as “Top 8%” means MATRIX assigned this drug–disease pair a higher treatment-probability than ninety-two percent of every drug–disease pair its model has scored across the biomedical knowledge graph.

What “Top N%” does and does not say. A high MATRIX score is a graph-based plausibility signal: MATRIX's model thinks the structural features of this drug–disease pair look like the structural features of pairs that turned out to be real treatments. It is not a clinical recommendation. It is not a confirmation that the drug treats the disease. It is not a statement that the pair is being investigated or that the pair is rare. Every Cure makes the limit explicit in their own dataset card, and we quote them verbatim: “These scores are the output of a computational research pipeline and do not constitute medical advice, clinical recommendations, or endorsement of any drug for any use. All findings require independent scientific and clinical validation before any clinical application.”

Why Whel surfaces this anyway. MATRIX is independent of Whel's literature pipeline. MATRIX reads no papers; it predicts treatment plausibility from the structure of a biomedical knowledge graph. Whel reads published literature, trial registries, adverse-event data, target databases, and patient communities, and scores each signal against a five-dimension rubric. When the two layers agree on a pair (Whel finds literature support for the use, AND MATRIX assigns the pair a high treatment-probability), that is two methodologically different approaches arriving at the same hypothesis. The fact that most of Whel's matched pairs land in MATRIX's top eight percent or so is the kind of structural agreement an independent disclosure layer is supposed to provide.

Raw transformed scores are kept in the chip tooltip and in the full audit numbers below for readers who want the underlying value.

Compound match rate (adjusted)
85.7%
84 of 98 eligible compounds
Pairs with a MATRIX score
176
83.0% of 212 eligible pairs
Active Whel pairs (raw)
271
Before excluding class labels and non-drug items
Rescued by brand-name dictionary
24
Compounds matched only via the Whel brand→generic crosswalk
Per condition
ConditionMONDOOfficial MATRIX filter¹Predictions in audit
EndometriosisMONDO:0005133False38
PCOSMONDO:0008487True38
AdenomyosisMONDO:0010888True34
PMDDMONDO:1010182True0
NoteMATRIX scores the parent term 'premenstrual tension' (MONDO:0004169) rather than PMDD directly. Reported direct coverage is zero; parent-term coverage is captured separately for context.
VulvodyniaMONDO:0021722True18
NoteNot indexed in Open Targets Platform; the OT pipeline falls back to target search.
Perimenopause & MenopauseMONDO:0001119True48
NoteMONDO has no general 'menopause' or 'perimenopause' disease term. Coverage is computed against the closest disease phenotypes (premature menopause, postmenopausal osteoporosis, postmenopausal atrophic vaginitis) and will under-represent the lived experience of natural menopause. The Open Targets pipeline previously used GO:0042697 (a Gene Ontology process term, not a disease term) for this condition and should migrate to MONDO:0001119 with the same candidate cluster.

¹ Whether the condition sits inside MATRIX's own official disease filter. This flag is intent, not a gate: predictions can still be present for conditions outside the official filter, and absent for conditions inside it.

Reading the numbers

What we take from this

Whel covers a small set of women's health conditions; MATRIX is a general-purpose drug-repurposing graph trained across the whole disease space. An 85.7% adjusted compound match rate is high for that kind of cross-reference, and when both sides of a Whel pair exist in MATRIX, MATRIX has a published score for that pair 83.0% of the time.

The asymmetries in the per-condition table are the most informative result. MATRIX's official disease filter and what its model actually produces don't line up cleanly, in both directions. That mismatch is the central reason the two layers stay separated rather than blended:

  • PMDD: 0 predictions despite being inside MATRIX's filter. MATRIX scopes PMDD as in-scope but its model produced no scores for it. A blended grade would silently penalise every PMDD compound for a MATRIX gap that has nothing to do with the evidence.
  • Endometriosis: 38 predictions despite being outside MATRIX's filter. MATRIX returns useful scores here even though Endometriosis isn't in its official disease list. Surfacing those scores is exactly what the disclosure layer is for.
  • 24 compounds matched only via the brand and synonym dictionary. Without the brand-to-generic, INN-variant, salt-form, and formulation-qualifier translations the crosswalk applies (section 05), roughly 29% of matched compounds would have been missed. The translation step is explicit and auditable.
  • 17 class labels and 21 non-drug entries excluded from the denominator. Umbrella categories like 'GLP-1 RAs' and supplements like 'Magnesium' can't be looked up in MATRIX the way individual drugs can. Excluding them is what the word 'adjusted' is doing in the 85.7% headline.

Net: MATRIX gives Whel a strong, audited cross-reference for most of its universe, with two informative gaps and a documented set of exclusions. That's the right shape for an independent layer.

The other direction

What this says about Whel

The audit isn't only a measurement of MATRIX. It's also a clean test of whether Whel's conditions and compounds speak the standard biomedical language the rest of the field uses, and where Whel covers clinical ground a general-purpose drug-repurposing model doesn't. Four specific things the comparison reinforces:

  • All six conditions confirmed as exact MONDO matches. Whel's disease definitions resolve to standard ontology entries the broader literature already indexes: PCOS, PMDD, Adenomyosis, Endometriosis, Vulvodynia, Perimenopause & Menopause. No in-house labels, no silent drift.
  • Whel's compound vocabulary maps cleanly into standard CURIE space. 85.7% of eligible compounds resolve into the CHEBI / UNII / DrugBank identifier system MATRIX uses. For a niche women's-health subset that is a high crosswalk rate, and it indicates Whel's drug layer is not running on a parallel vocabulary from the rest of pharmacology.
  • Whel covers conditions where MATRIX is silent. PMDD sits inside MATRIX's official disease filter, yet MATRIX's model produced no predictions for it. Whel has graded, source-anchored signals for PMDD that cover ground a general-purpose model did not.
  • The brand and synonym dictionary is Whel's contribution back. 24 of the 84 matched compounds were recoverable only via Whel's brand-to-generic and INN-variant translations (section 05). The crosswalk is versioned and auditable, and its size and composition are reported on this page; the individual entries are shared on request, since they name compounds in the gated index.

Scope of the claim: the audit measures structural alignment between Whel and MATRIX, not the validity of Whel's grades. MATRIX scores are model probabilities and Whel grades are literature tiers; the two are not interchangeable as ground truth. The numbers above establish that Whel's identifiers resolve into the standard biomedical ontologies MATRIX uses, and that Whel's coverage includes condition–compound pairs MATRIX leaves unscored.

Score distribution
n
160
min
3.114
p25
3.333
median
3.485
p75
3.692
max
4.554
mean
3.526
Dataset snapshot
matrix-scores
8b09a71239d9
2026-04-09 12:23:48 UTC
matrix-disease
33575e20ffb9
2026-04-08 10:46:29 UTC
matrix-drug
7c51b34e5b49
2026-05-12 14:46:58 UTC

Audit run: 2026-06-06 · Snapshot label: audit run. Raw report and reproducible script live at scripts/check-matrix-coverage.py.

06 · Female-biology layer

Sex-specific pharmacokinetics and cycle phase, grounded and cross-checked

The differentiating layer of the substrate holds two female-biology facts as first-class, sourced rows rather than averaging them away: how a drug behaves differently in women (sex-specific pharmacokinetics), and how a treatment's effect depends on the menstrual-cycle phase. Both are seeded today for an initial set of compounds and surface beside the relevant signals in the gated view, never folded into the grade.

What grounds it

Every row carries a primary source. Per-drug facts come from the FDA drug label, the regulatory record, including the mirabegron (Myrbetriq) label and the duloxetine (Cymbalta) label. The curated literature provides the backbone and the cross-check: Zucker and Prendergast, Biology of Sex Differences 2020, a dataset of 86 approved drugs in which 76 showed higher exposure or slower elimination in women, and the review by Soldin and Mattison, Clinical Pharmacokinetics 2009. The sertraline entry additionally rests on Ronfeld et al., Clinical Pharmacokinetics 1997.

What the cross-check found

The layer is seeded for an initial set of eight compounds. Two of the three rows first drawn from FDA labels, sertraline and mirabegron, are independently corroborated as women-higher in the Zucker and Prendergast dataset; the third, duloxetine, rests on its FDA label and is not in that dataset, with no contradiction. Citalopram appears in the dataset but is left out on purpose, because other pharmacokinetic studies disagree on its direction and magnitude. The remaining seeded compounds, fluoxetine, paroxetine, gabapentin, diazepam, and bupropion, are each listed as women-higher in that same curated source.

How it is shown

Each sex-PK fact is held as a structured row with its parameter, direction, magnitude, and source, and surfaces as a sex-PK marker beside the candidate with the underlying facts in the evidence trail. It is shown beside the signal, never folded into its grade, the same posture as the MATRIX and knowledge-graph layers above.

Cycle-phase dependence

The twin layer records where a treatment’s effect depends on the menstrual-cycle phase, which matters most for PMDD, a condition defined by cyclical timing. It is seeded with the strongest-evidence cases: luteal-phase (intermittent) dosing of SSRIs for PMDD, fluoxetine, sertraline, and paroxetine CR (ACOG Clinical Practice Guideline on PMDD, 2023; each FDA-approved for PMDD), escitalopram, supported by a placebo-controlled RCT (Eriksson et al., 2008), and drospirenone/ethinyl estradiol, whose continuous 24/4 regimen suppresses the luteal-phase symptom pathophysiology (FDA YAZ label). The phase vocabulary follows the standard ovarian model (follicular, ovulatory, luteal, menstrual). The validation basis for this layer, for when validation work begins, is the Daily Record of Severity of Problems (DRSP), the FDA-recognized phase-tagged outcome instrument for PMDD, and the ISPMD methodological consensus.

07 · Under review

Where the external layer expands

A second tier of resources sits under active review. Two kinds appear here. Some are data sources under review for inclusion in the pipelines, filling a gap the current arms either cannot reach (European adverse-event data, structured drug-target indications) or only reach indirectly. Others are independent validation layers shown beside a signal rather than built into it: the open biomedical knowledge graphs and models that lead the drug-repurposing field. Those validation layers are marked as not integrated into the core architecture, because they carry the field's male-default coverage that Whel exists to correct. Inclusion of any source is conditional on stable licensing, citable provenance, and the ability to round-trip every record from Whel back to its origin. The roadmap sets out the same split as a technical-architecture track and a validation-layer track.

Planned

EudraVigilance

European adverse-event data; under review for parity with openFDA

Visit source →
Planned

DrugBank

Drug-target and indication data; licensing model under review

Visit source →
Planned

SIDER

Drug side-effect reference; under review for retention or formal retirement

Visit source →
Planned

DRKG (Drug Repurposing Knowledge Graph)

Open, multi-source repurposing knowledge graph. Planned as an independent validation cross-reference shown beside a signal, not merged into Whel's own graph, because it carries the field's male-default coverage that Whel exists to correct

Validation cross-reference; not integrated into the core architecture

Visit source →
Planned

PrimeKG (Precision Medicine Knowledge Graph)

Open precision-medicine graph across drugs, diseases, phenotypes, and pathways. Planned as a second independent cross-reference to widen the graph-supports-or-silent disclosure beyond one source

Validation cross-reference; not integrated into the core architecture

Visit source →
Planned

TxGNN (graph foundation model)

Open, zero-shot drug-repurposing model. Planned as a benchmark and hypothesis cross-reference whose predictions Whel would validate rather than adopt, since the model inherits the same male-default training data

Validation cross-reference; not integrated into the core architecture

Visit source →
08 · Out of scope

What Whel does not draw from

Whel only cites records that are reachable upstream to a primary source. That single requirement rules out several categories of material an evidence aggregator could otherwise pull from. The list below explains which categories and why.

Consumer health portals

Wikipedia drug pages, WebMD, Healthline, and Mayo Clinic are useful background reading but are not source-of-record. Whel only cites primary literature, trial registries, regulatory databases, structured knowledge bases, and named community forums, all reachable upstream.

Closed patient-experience platforms

Commercial patient-experience platforms behind logins or paywalls are excluded. Their provenance cannot be independently verified, and their records cannot be cited in a form that survives outside the platform. Whel's community arm is restricted to publicly readable, condition-specific forums.

Generative AI outputs as evidence

Summaries produced by general-purpose AI assistants are not cited as evidence in Whel. They are not stable, not retrievable at a fixed address, and cannot be reproduced. The same applies to drug-information chatbots layered over closed knowledge bases.

General social media

Whel does not pull from X, Facebook, or general TikTok content. Community signal is restricted to focused, condition-specific subreddits with persistent moderation and stable URLs, surfaced under the Community Forum Reports arm with explicit labeling.

09 · Crosswalk transparency

Brand and synonym dictionary

Whel's compound list sometimes uses brand strings, alternate INN spellings, salt forms, formulation or route qualifiers, or multi-ingredient combination strings, where the MATRIX drug-list keys on a single canonical name. A small brand-to-generic crosswalk is the only translation step the match applies; every other match is a direct name or synonym lookup against MATRIX. The crosswalk is versioned and reviewed, and the counts below record its size and composition by entry kind.

Open the dictionary32 active mappings · schema v2 · last reviewed 2026-06-0113 brands · 5 inn variants · 1 abbreviation · 1 salt form · 10 formulation variants · 2 combos

The crosswalk maps brand strings, INN spelling variants, salt forms, route and formulation qualifiers, and combination strings onto canonical compound names. Because those names are the candidates in the gated index, the individual entries are shared with researchers and partners on request rather than published here. The counts above record the crosswalk's size and composition.

10 · This page

A living register

The external reference register is dated and will change. New resources are added when they meet the same standards as the existing ones: open or appropriately licensed, citable at a stable address, and capable of being round-tripped from Whel back to the source. Suggestions are welcome on the contact page.

Register revised June 2026
← About WhelRead the roadmap →Browse the six conditions →