WhelWomen's Health Evidence Lab
Home/Platform
The platform

The corrected knowledge graph for female biology.

Whel is built in three layers: a substrate that holds female biology as first-class structure, a retrieval layer that ties every claim to its source and surfaces disagreement, and a signal layer that turns off-label practice into hypotheses worth testing.

Layer 01
The substrate

A corrected knowledge graph built to capture sex-specific pharmacokinetics, cyclical hormonal state, and the cross-condition mechanisms general platforms miss because they were trained on male-default data. Grounded today in MONDO, EFO, RxNorm, and ChEMBL with a live mechanistic graph over Open Targets. The sex-specific pharmacokinetics and cycle-phase layers are now seeded for an initial set of compounds, each sourced and shown beside the signal; broader population is ongoing.

Postgres-nativeOntology-groundedSex-specific PK
Layer 02
Retrieval & validation

Provenance-preserving extraction tuned for biomedical literature. Every claim ties to a verbatim source span, every synthesis is marked as a synthesis, and every contradiction in the underlying literature is surfaced explicitly rather than averaged away.

Per-claim provenanceMarked synthesisContradiction surfacing
Layer 03
Hypothesis from signal

Patient-community signal, including off-label prescribing patterns, community reports, and structured patient-reported data, enters as hypothesis generation and is validated downstream against mechanistic and clinical evidence, never equated with the result of a controlled trial. Formal advocacy-organization partnerships are planned.

Off-label patternsCommunity reportsValidated downstream
Beside every signal
Independent validation

The three layers above build a signal and score it. Each finished signal then carries an independent reading layer, shown beside the score rather than folded into it: Every Cure’s MATRIX treatment-probability cross-reference, and, where the substrate covers the drug or pair, its documented sex-specific pharmacokinetics and cycle-phase dependence. These are kept separate from Layer 02’s internal validation and are not one of the three build layers. The full set of readings is detailed below.

MATRIX cross-referenceSex-specific PKCycle-phase reading
Layer 01 · The substrate

A knowledge graph built for female biology.

A drug-repurposing engine is only as good as the map it reasons over. That map is a biomedical knowledge graph: a network in which the nodes are entities such as diseases, drugs, genes, pathways, and phenotypes, and the edges are the relationships between them, such as a compound binding a target or a gene participating in a pathway. Graphs of this kind already exist and work. Hetionet integrates dozens of public resources into roughly 47,000 nodes and 2.25 million relationships, and newer graphs such as PrimeKG extend the same idea across millions more.

What those graphs share is that the meaning of every node is fixed to a standard vocabulary, a discipline called ontology grounding. We ground entities in the standard biomedical ontologies: MONDO and EFO for diseases, and RxNorm and ChEMBL for drugs and compound bioactivity. Grounding is what lets the platform know that “paracetamol” and “acetaminophen” are the same drug and that a study of one disease subtype belongs under its parent. Without it, the same fact written two ways counts as two facts, or as none.

A general-purpose graph is not enough, because the data underneath it was built largely on the male body. In pharmacology research male animals still outnumber female ones by roughly five to one, with most studies still using males only, so the basic pharmacology of many drugs was characterized in male tissue. A substrate that inherits that record uncritically inherits its blind spots. Ours is built to correct for them, which means it carries sex-specific pharmacokinetics and cyclical hormonal state as first-class structure rather than as an afterthought.

The differences are real and measurable. In one analysis of 86 approved drugs, 76 reached higher concentrations or cleared more slowly in women, who also experience adverse drug reactions nearly twice as often as men. CYP3A4, the enzyme that processes a large share of prescription drugs, is more active in women. The clearest case is zolpidem: women metabolize it so much more slowly that blood levels run about 50 percent higher, and in 2013 the FDA halved the recommended dose for women, a rare instance of a unisex dose being openly corrected. A model that treats a drug as one number across all bodies cannot represent any of this.

Layer 01 · in practice

Hormonal state held as structured data.

Drug response shifts across the menstrual cycle, so the substrate holds hormonal state as structured data, so a luteal-phase signal is read in its phase instead of averaged into a flat number. This layer is seeded for the strongest-evidence PMDD cases and shown beside the relevant signals; broader population is ongoing.

EstradiolProgesteroneLHFSHLuteal dosing window
Layer 02 · Retrieval & validation

Every claim is traceable to its source sentence.

A repurposing hypothesis is only useful if its evidence can be checked, so the second layer is built so that every assertion carries its source. We retrieve over the biomedical literature and extract claims with provenance, which means each claim is tied to the verbatim span of text that supports it. The point is auditability: a statement that a drug reduced a symptom is shown together with the exact sentence, study, year, and design it came from, so a reader can see at a glance whether it rests on a small observational report or a large randomized trial. This is the discipline that makes retrieval-grounded systems trustworthy rather than merely fluent.

When the platform combines several sources into one statement, that synthesis is marked as a synthesis rather than presented as if it were a single finding. The line between what a study said and what the system inferred stays visible at all times.

The literature disagrees with itself constantly, and the easy thing to do is average the disagreement away into one confident-looking score. We do the opposite. Using natural-language inference, the system detects when one study's finding contradicts another's and surfaces the disagreement with both sources and their populations attached, because the fact that the evidence conflicts is itself information a researcher needs. A result that holds in healthy adults but not in renal impairment is not noise to be smoothed over; it is the finding.

Evidence is graded rather than collapsed into a single weight. The discipline is the one clinical guidelines already use under frameworks such as GRADE: strong and weak evidence are kept distinct, with the basis for each grade visible, so a case report and a randomized trial never carry the same authority simply because they point the same way.

Layer 03 · Hypothesis from signal

Off-label practice is an untapped clinical experiment.

The conditions Whel works on are managed off label as a matter of routine, which means the real-world record of what helps is large and largely unread. Off-label prescribing is standard practice across women's health, and that practice is a signal: when clinicians and patients converge on a drug approved for something else, they are running an informal experiment whose result is worth recovering.

We treat that signal the way the field treats real-world evidence generally, as hypothesis-generating rather than confirmatory. Off-label patterns, community reports, and structured patient reports can tell you where to look; they cannot, on their own, tell you that a drug works, because observational signal carries confounding, selection effects, and placebo response that only a controlled comparison can separate out.

So every signal is validated downstream against mechanistic and clinical evidence. The platform asks whether the drug's known mechanism plausibly explains the effect, then whether clinical data supports it, and because the signal usually originates with women, the validation is run in women or reported sex-stratified rather than assumed to transfer from a male-dominant sample. A community observation that survives this becomes a hypothesis worth a researcher's time; one that does not is set aside with its reasons recorded.

The differentiation

What a general platform misses.

A fair question: could a general-purpose biomedical platform simply query these conditions in the graph it already has? On three counts it would look in the wrong places. The sources: platforms like Causaly read PubMed, the trial registries, and patent filings, not the patient communities where the earliest signal for off-label-managed conditions lives. The variables: a general graph holds a drug and disease as a fixed link, while female pharmacology moves across the hormonal cycle, so a candidate whose insight is its timing reads as noise. The candidates: a platform serving oncology and immunology budgets ranks for ownable molecules, and the cheap generics that manage women’s conditions are financial orphans it has no reason to surface.

Low-dose naltrexone for endometriosis sits at the intersection of all three. At low doses it appears to quiet the glial inflammation behind chronic pain, by a route unrelated to the addiction treatment it was approved for, and women have tracked its effects in endometriosis communities for years. The institutional record is thin: the one randomized endometriosis trial was terminated with nine patients enrolled, and it is a generic no sponsor will fund to confirmation. A general platform ranks it near the bottom, or never reads the signal. We surface it, contradictions and uncertainty attached, because the signal and the mechanism are both real.

Reading the evidence markers

Several independent readings on every candidate, shown side by side.

Each candidate carries several independent readings, recorded separately rather than combined into a single number, so the basis for each reading stays visible. The confidence tier is our own score: each evidence arm is graded on five dimensions (corroboration, rigor, specificity, plausibility, and consistency) which sum to a strength out of ten, then discounted by a female-applicability multiplier reflecting how far the evidence was generated in women, placing the candidate in one of four tiers from exploratory to strong.

How far the published record independently backs a pair is no longer a separate grade: it is carried inside the score itself, by the rigor dimension, which weights registered trials and peer-reviewed work above weaker sources, and traces to the verbatim quotes on the signal. Whether the biology connects the drug to the condition is likewise carried by the Pathway evidence arm rather than shown as a separate graph chip. What remains beside the score are the readings below, which inform how a result should be interpreted rather than how strong it is.

The MATRIX marker appears where Every Cure’s MATRIX model has a score for the same pair. MATRIX is a machine-learned treatment-probability estimate drawn from a biomedical knowledge graph across roughly 1,800 drugs and 22,000 diseases, and it predicts how plausible a drug and disease link looks given the structure of biomedical knowledge. We read the actual evidence for a narrow set of women’s health conditions, so the two are doing different things, and we show the MATRIX percentile beside our own score rather than folding them together, so a reader can weigh a model’s prior against the evidence on the ground.

The sex-PK marker appears where the substrate holds documented sex-specific pharmacokinetics for the drug, the way its exposure or clearance differs in women. Each fact carries its source, an FDA label or the curated sex-PK literature, and is shown beside the signal rather than folded into the grade, because it informs how a result should be read rather than how strong the evidence is.

The phase marker appears where a treatment’s effect depends on the menstrual-cycle phase, which matters most for a cyclical condition like PMDD. It records the phase the relationship holds in, for example luteal-phase dosing of an SSRI, with its source, and like the others it is shown beside the signal rather than folded into the grade.

The open data sources and tools we build on, the independent MATRIX cross-reference, and the checks we run against model error are documented on external references, and the full scoring method, the five-dimension rubric, and the documented limitations are on technical architecture.

Regulatory posture

A research-support tool, by design.

Whel sits under the 21st Century Cures Act §3060 research-support exemption and stays there. Because every claim is tied to a source a clinician can independently review, the platform meets the exemption's transparency bar by architecture rather than by accident, which is the same property that makes the output worth trusting in the first place.

WHAT WE NEVER DO

· Display “treat patient X with drug Y” without surfaceable provenance.
· Auto-generate treatment plans without clinician review of each citation.
· Make claims about specific identified patients.
· Ship a patient mode that returns recommendations without a clinician in the loop.
The honest version

What the platform does, and its limits.

Whel generates evidenced repurposing hypotheses. It does not diagnose, it does not replace a clinical trial, and it does not return a recommendation that a clinician cannot trace back to its basis. The methods underneath it are real but imperfect: claim extraction misses nuance, contradiction detection is sensitive to how things are phrased, and provenance is best-effort rather than absolute.

That is exactly why the platform leaves every claim checkable instead of presenting it as settled. We are integrating mature pieces, including knowledge graphs, ontology grounding, retrieval, and evidence scoring, and pointing them at the part of biology medicine left understudied. The work that earns a clinician's trust is making the underlying source visible.

The three layers are also at different stages, and we are explicit about that. The retrieval-and-validation layer runs today as a flagship on PMDD and PMS rather than across every condition; the substrate’s grounding is live while its graph is still being built; and the six-condition index you can browse now is produced by the scored-signals engine these layers are progressively replacing. Where each layer actually stands is set out on technical architecture.

Whel · Women's Health Evidence Lab

Finding what already works for women.