Technical architecture.
The data pipelines, the five-dimension scoring framework, the infrastructure, and the documented limitations behind every signal.
The data pipelines
Whel runs five active data pipelines that populate the database on demand. A sixth pipeline (EudraVigilance) is implemented but not yet contributing signals to the current snapshot.
| Pipeline | Evidence arm | API | Status |
|---|---|---|---|
| PubMed | Direct Research | NCBI Entrez | ● Active |
| ClinicalTrials.gov | Direct Research | REST API v2 | ● Active |
| FDA AEMS | Cross-Condition Signals | OpenFDA | ● Active |
| Open Targets Platform | Pathway Insights | GraphQL | ● Active |
| Community Forum Reports | Public JSON | ● Active | |
| EudraVigilance EVDAS | Cross-Condition Signals | Oracle BI API | ● In development |
PubMed
Queries the NCBI Entrez API for published studies directly investigating each condition. Searches are condition specific and filtered for relevance. Results are parsed for study type, date, and abstract, then passed to Claude Opus for signal extraction and evidence strength classification.
ClinicalTrials.gov
Queries the ClinicalTrials.gov REST API v2 for active, completed, and recruiting trials targeting each condition. Trial phase, status, and intervention type are captured and stored alongside the primary signal.
FDA Adverse Event Monitoring System (AEMS) [Formerly FAERS]
Queries the FDA adverse-event public API (OpenFDA, the system formerly known as FAERS) using a two-pass approach: first targeting gynecological and hormonal terms, then broadening to general reaction terms. Female patient reports are filtered and analyzed for signals suggesting off label benefit. URL encoding and pagination are handled to maximize coverage across all six conditions.
Open Targets Platform
Queries the Open Targets Platform GraphQL API (platform.opentargets.org) for each condition using standardized EFO and MONDO disease ontology identifiers. Retrieves drug candidates, mechanistic associations, and biological target scores aggregated from genetic association data, known drug target interactions, Reactome pathway analysis, and differential gene expression. Results are analyzed by Claude Opus for pathway level repurposing hypotheses. No authentication required.
Queries condition specific subreddits (r/Endo, r/PCOS, r/PMDD, r/Menopause, r/adenomyosis, r/vulvodynia) using eight treatment focused search queries per subreddit. Individual post permalinks are stored and validated; URLs must contain /comments/ to confirm they are post level rather than subreddit level. Posts are grouped by subreddit in citation display. The pipeline looks for consistent patterns across many posts, not individual anecdotes.
EudraVigilance EVDAS (in development, not yet contributing signals)
Queries the European Medicines Agency adverse event database (dap.ema.europa.eu) via the Oracle BI Analytics API. Substance codes are resolved via the public adrreports.eu substance table. Female patient reaction data is filtered and grouped by condition. Requires a free registered EMA account for session authentication. This pipeline is implemented but has not yet been ingested into the current database snapshot.
How evidence is scored
Whel applies a structured, multidimensional inclusion framework to every signal before it enters the database. The goal is a tiered evidence framework with minimum standards for reliability, reproducibility, and actionability, rather than a single universal cutoff. The framework was developed in consultation with published research on evidence synthesis and pharmacovigilance methodology, drawing on established practices in systematic review design and drug repurposing research.
Model Selection: Claude Opus 4.6
All signal analysis and scoring is performed using Claude Opus 4.6 (claude-opus-4-6), Anthropic's most capable model. Opus 4.6 was selected specifically for its performance on complex multicriteria reasoning tasks. Independent benchmarks consistently place Opus at the top of evaluations requiring simultaneous assessment across multiple analytical dimensions, precisely what evidence scoring requires. Smaller or faster models were evaluated and found to produce flatter, less discriminating scores, particularly on biological plausibility and confounding risk assessment. For a tool where the quality of the evidence evaluation is the core product, model selection is not a minor implementation detail.
The Five-Dimension Scoring Framework
Every signal is independently scored from 0 to 2 on five dimensions, for a maximum total score of 10. Scores are assigned by Claude Opus 4.6 based on the full source content, not just metadata.
| Dimension | Score 0 | Score 1 | Score 2 |
|---|---|---|---|
Replication Whether the finding has been independently observed across separate sources. | 0Single source only. | 1Two independent sources. | 2Three or more independent sources pointing in the same direction. |
Source Quality The evidentiary weight of the underlying data. | 0Forum or anecdotal data only. | 1Observational, registry, or pharmacovigilance data. | 2Peer reviewed human study or clinical trial. |
Specificity Whether the outcome is clearly defined and clinically relevant to the condition. | 0Vague outcome (“improved,” “felt better”). | 1Symptom specific outcome (pelvic pain, cycle regularity, mood lability). | 2Clearly defined condition specific clinical endpoint. |
Biological Plausibility The strength and specificity of the mechanistic rationale. | 0Unclear or absent mechanism. | 1Broad but plausible mechanism. | 2Well characterized drug target pathway disease fit (e.g., COX-2 inhibition and prostaglandin dysregulation in endometriosis). |
Consistency of Direction Whether the effect direction is consistent across sources. | 0Mixed or conflicting findings. | 1Mostly consistent direction. | 2Clearly consistent direction across all sources. |
Confidence tiers
Total scores map to four confidence tiers displayed on every signal card:
Highly replicated, well characterized signals with consistent direction across multiple evidence types.
Replicated findings with solid mechanistic rationale.
Early stage evidence with some corroboration or mechanistic support.
Single source, mechanistic, or low specificity signals included for hypothesis generation only.
Category-specific minimum standards
Direct Research
The highest-confidence category carries the highest bar. Minimum requirements: at least one peer reviewed human study with clearly identified population, drug, outcome, and effect direction. Signals are excluded if they are mechanistic only with no human data. Preferred: at least one prospective study, trial, or meta-analysis. Quality criteria prioritize replication and outcome relevance over citation count; a highly cited older paper with no replication is not equivalent to two recent independent studies with similar findings.
Cross-Condition Signals
These signals are hypothesis generating by nature. Minimum requirements: the signal must appear in at least two independent evidence domains (published literature plus FDA AEMS, adverse event data plus community reports, or similar cross-domain corroboration), with the same direction of effect and a plausible shared biological mechanism. Three or more formal source mentions pointing in the same direction also qualify. Vague similarity between conditions is not sufficient; a documented shared pathway is required.
Pathway Insights
Pathway signals are powerful but easy to overinterpret. Minimum requirements: a specific named mechanism (mast cell activation, prostaglandin signaling, or androgen receptor modulation, not generic "inflammation"), at least one known drug target link, and at least one disease pathway link. Pathway-only signals with no human or pharmacovigilance corroboration are classified Exploratory and displayed with explicit framing. Pathway signals paired with human observation are classified Emerging or Moderate. Pathway signals with human observation plus independent replication are classified Strong.
Community Forum Reports
This category requires the clearest guardrails. Minimum requirements: 5 or more distinct posts with specific exposure-outcome language from unique users. Raw volume alone is insufficient; the framework still requires specificity (not "metformin changed things" but "after starting metformin, my cycles shortened and acne improved"), directionality (improvement, worsening, or no change), and unique-user diversity across threads. Obvious reposts, promotional content, and low-content comments are excluded. Replication is graded on a 0–2 scale (0 = 5–7 posts, 1 = 8–14 posts, 2 = 15 or more posts). Signals with 15 or more qualifying mentions and consistent directional language are eligible for Moderate classification, particularly when triangulated with a formal source. Whel also tracks which forums a signal appears in, the time period of discussion, and whether the signal persists over time or reflects a temporary spike.
Cross-cutting reliability rules
For every signal across all four categories, Whel applies five cross-cutting reliability checks:
Outcome specificity
"Improved" is insufficient. Qualifying outcomes include pelvic pain, heavy bleeding, cycle regularity, mood lability in luteal phase, vulvar burning, and similar condition specific clinical endpoints.
Effect directionality
Every signal must be classified as one of: improves, worsens, mixed, or unclear.
Replication
One source is interesting. Two or more independent sources start to constitute a signal.
Confounding assessment
Known confounders are flagged: drugs with multiple indications where symptom improvement may be indirect, forum populations reporting multiple concurrent therapies, and adverse event data that may reflect reporting bias rather than true incidence.
Denominator awareness
FDA AEMS and community data do not provide true incidence rates. They are signal generating sources, not causal datasets. All signals from these sources are labeled accordingly and require corroboration before elevation above Emerging.
Frequency is not truth.
A rare but repeatedly observed, highly specific signal from a single credible source may carry more evidential weight than 500 vague forum mentions. Whel's scoring framework is designed to privilege specificity, reproducibility, and triangulation over raw volume.
Database and infrastructure
Documented limitations
Whel is a signal aggregator rather than a clinical recommendation engine. Evidence strength classifications are generated by a language model against a published five-dimension rubric and should be treated as a starting point for further investigation, not a definitive assessment. Community Forum Reports reflect patient-reported patterns and are explicitly not clinical evidence. The absence of signals in the Direct Research arm for a given condition is itself data; it reflects the current state of published research rather than a gap in the tool. The full list of known limitations is documented below, grouped for navigation.