Whel

Technical architecture.

The data pipelines, the five-dimension scoring framework, the infrastructure, and the documented limitations behind every signal.

Read the methods (PDF)
Figure 1 · Pipeline register

The data pipelines

Whel runs five active data pipelines that populate the database on demand. A sixth pipeline (EudraVigilance) is implemented but not yet contributing signals to the current snapshot.

PipelineEvidence armAPIStatus
PubMedDirect ResearchNCBI EntrezActive
ClinicalTrials.govDirect ResearchREST API v2Active
FDA AEMSCross-Condition SignalsOpenFDAActive
Open Targets PlatformPathway InsightsGraphQLActive
RedditCommunity Forum ReportsPublic JSONActive
EudraVigilance EVDASCross-Condition SignalsOracle BI APIIn development

PubMed

Queries the NCBI Entrez API for published studies directly investigating each condition. Searches are condition specific and filtered for relevance. Results are parsed for study type, date, and abstract, then passed to Claude Opus for signal extraction and evidence strength classification.

ClinicalTrials.gov

Queries the ClinicalTrials.gov REST API v2 for active, completed, and recruiting trials targeting each condition. Trial phase, status, and intervention type are captured and stored alongside the primary signal.

FDA Adverse Event Monitoring System (AEMS) [Formerly FAERS]

Queries the FDA adverse-event public API (OpenFDA, the system formerly known as FAERS) using a two-pass approach: first targeting gynecological and hormonal terms, then broadening to general reaction terms. Female patient reports are filtered and analyzed for signals suggesting off label benefit. URL encoding and pagination are handled to maximize coverage across all six conditions.

Open Targets Platform

Queries the Open Targets Platform GraphQL API (platform.opentargets.org) for each condition using standardized EFO and MONDO disease ontology identifiers. Retrieves drug candidates, mechanistic associations, and biological target scores aggregated from genetic association data, known drug target interactions, Reactome pathway analysis, and differential gene expression. Results are analyzed by Claude Opus for pathway level repurposing hypotheses. No authentication required.

Reddit

Queries condition specific subreddits (r/Endo, r/PCOS, r/PMDD, r/Menopause, r/adenomyosis, r/vulvodynia) using eight treatment focused search queries per subreddit. Individual post permalinks are stored and validated; URLs must contain /comments/ to confirm they are post level rather than subreddit level. Posts are grouped by subreddit in citation display. The pipeline looks for consistent patterns across many posts, not individual anecdotes.

EudraVigilance EVDAS (in development, not yet contributing signals)

Queries the European Medicines Agency adverse event database (dap.ema.europa.eu) via the Oracle BI Analytics API. Substance codes are resolved via the public adrreports.eu substance table. Female patient reaction data is filtered and grouped by condition. Requires a free registered EMA account for session authentication. This pipeline is implemented but has not yet been ingested into the current database snapshot.

Figure 2 · Scoring framework

How evidence is scored

Whel applies a structured, multidimensional inclusion framework to every signal before it enters the database. The goal is a tiered evidence framework with minimum standards for reliability, reproducibility, and actionability, rather than a single universal cutoff. The framework was developed in consultation with published research on evidence synthesis and pharmacovigilance methodology, drawing on established practices in systematic review design and drug repurposing research.

Model Selection: Claude Opus 4.6

All signal analysis and scoring is performed using Claude Opus 4.6 (claude-opus-4-6), Anthropic's most capable model. Opus 4.6 was selected specifically for its performance on complex multicriteria reasoning tasks. Independent benchmarks consistently place Opus at the top of evaluations requiring simultaneous assessment across multiple analytical dimensions, precisely what evidence scoring requires. Smaller or faster models were evaluated and found to produce flatter, less discriminating scores, particularly on biological plausibility and confounding risk assessment. For a tool where the quality of the evidence evaluation is the core product, model selection is not a minor implementation detail.

The Five-Dimension Scoring Framework

Every signal is independently scored from 0 to 2 on five dimensions, for a maximum total score of 10. Scores are assigned by Claude Opus 4.6 based on the full source content, not just metadata.

DimensionScore 0Score 1Score 2
Replication
Whether the finding has been independently observed across separate sources.
0Single source only.1Two independent sources.2Three or more independent sources pointing in the same direction.
Source Quality
The evidentiary weight of the underlying data.
0Forum or anecdotal data only.1Observational, registry, or pharmacovigilance data.2Peer reviewed human study or clinical trial.
Specificity
Whether the outcome is clearly defined and clinically relevant to the condition.
0Vague outcome (“improved,” “felt better”).1Symptom specific outcome (pelvic pain, cycle regularity, mood lability).2Clearly defined condition specific clinical endpoint.
Biological Plausibility
The strength and specificity of the mechanistic rationale.
0Unclear or absent mechanism.1Broad but plausible mechanism.2Well characterized drug target pathway disease fit (e.g., COX-2 inhibition and prostaglandin dysregulation in endometriosis).
Consistency of Direction
Whether the effect direction is consistent across sources.
0Mixed or conflicting findings.1Mostly consistent direction.2Clearly consistent direction across all sources.
Figure 3 · Tier mapping

Confidence tiers

Total scores map to four confidence tiers displayed on every signal card:

Strong
Strong
Composite 9 to 10

Highly replicated, well characterized signals with consistent direction across multiple evidence types.

Moderate
Moderate
Composite 7 to 8

Replicated findings with solid mechanistic rationale.

Emerging
Emerging
Composite 4 to 6

Early stage evidence with some corroboration or mechanistic support.

Exploratory
Exploratory
Composite 0 to 3

Single source, mechanistic, or low specificity signals included for hypothesis generation only.

Figure 4 · Category standards

Category-specific minimum standards

Direct Research

The highest-confidence category carries the highest bar. Minimum requirements: at least one peer reviewed human study with clearly identified population, drug, outcome, and effect direction. Signals are excluded if they are mechanistic only with no human data. Preferred: at least one prospective study, trial, or meta-analysis. Quality criteria prioritize replication and outcome relevance over citation count; a highly cited older paper with no replication is not equivalent to two recent independent studies with similar findings.

Cross-Condition Signals

These signals are hypothesis generating by nature. Minimum requirements: the signal must appear in at least two independent evidence domains (published literature plus FDA AEMS, adverse event data plus community reports, or similar cross-domain corroboration), with the same direction of effect and a plausible shared biological mechanism. Three or more formal source mentions pointing in the same direction also qualify. Vague similarity between conditions is not sufficient; a documented shared pathway is required.

Pathway Insights

Pathway signals are powerful but easy to overinterpret. Minimum requirements: a specific named mechanism (mast cell activation, prostaglandin signaling, or androgen receptor modulation, not generic "inflammation"), at least one known drug target link, and at least one disease pathway link. Pathway-only signals with no human or pharmacovigilance corroboration are classified Exploratory and displayed with explicit framing. Pathway signals paired with human observation are classified Emerging or Moderate. Pathway signals with human observation plus independent replication are classified Strong.

Community Forum Reports

This category requires the clearest guardrails. Minimum requirements: 5 or more distinct posts with specific exposure-outcome language from unique users. Raw volume alone is insufficient; the framework still requires specificity (not "metformin changed things" but "after starting metformin, my cycles shortened and acne improved"), directionality (improvement, worsening, or no change), and unique-user diversity across threads. Obvious reposts, promotional content, and low-content comments are excluded. Replication is graded on a 0–2 scale (0 = 5–7 posts, 1 = 8–14 posts, 2 = 15 or more posts). Signals with 15 or more qualifying mentions and consistent directional language are eligible for Moderate classification, particularly when triangulated with a formal source. Whel also tracks which forums a signal appears in, the time period of discussion, and whether the signal persists over time or reflects a temporary spike.

Figure 5 · Reliability rules

Cross-cutting reliability rules

For every signal across all four categories, Whel applies five cross-cutting reliability checks:

Outcome specificity

"Improved" is insufficient. Qualifying outcomes include pelvic pain, heavy bleeding, cycle regularity, mood lability in luteal phase, vulvar burning, and similar condition specific clinical endpoints.

Effect directionality

Every signal must be classified as one of: improves, worsens, mixed, or unclear.

Replication

One source is interesting. Two or more independent sources start to constitute a signal.

Confounding assessment

Known confounders are flagged: drugs with multiple indications where symptom improvement may be indirect, forum populations reporting multiple concurrent therapies, and adverse event data that may reflect reporting bias rather than true incidence.

Denominator awareness

FDA AEMS and community data do not provide true incidence rates. They are signal generating sources, not causal datasets. All signals from these sources are labeled accordingly and require corroboration before elevation above Emerging.

One Guiding Principle

Frequency is not truth.

A rare but repeatedly observed, highly specific signal from a single credible source may carry more evidential weight than 500 vague forum mentions. Whel's scoring framework is designed to privilege specificity, reproducibility, and triangulation over raw volume.

Figure 6 · Infrastructure

Database and infrastructure

Database
Supabase (PostgreSQL) with Row Level Security. Core tables include conditions, compounds, repurposing_signals, and sources. The repurposing_signals table stores five scoring dimensions (replication, source quality, specificity, plausibility, direction), a computed total evidence score, confidence tier, effect direction, and human readable level labels. Signals are deduplicated at both the pipeline level (by post ID for Reddit, by compound and condition pair for all sources) and via database constraints.
Frontend
Next.js (TypeScript) with Tailwind CSS, hosted on Vercel. Analytics via Vercel Analytics.
Source deduplication
Sources are deduplicated by URL before storage. The frontend applies additional normalization to prevent the same compound appearing multiple times in the same evidence bucket.
Figure 7 · Limitations

Documented limitations

Whel is a signal aggregator rather than a clinical recommendation engine. Evidence strength classifications are generated by a language model against a published five-dimension rubric and should be treated as a starting point for further investigation, not a definitive assessment. Community Forum Reports reflect patient-reported patterns and are explicitly not clinical evidence. The absence of signals in the Direct Research arm for a given condition is itself data; it reflects the current state of published research rather than a gap in the tool. The full list of known limitations is documented below, grouped for navigation.

Back to About