Engine Internals

Technical Methodology

Every formula, gate, and pipeline step that turns a YouTube transcript into a published score. No hand-waving — this is what actually runs. For the high-level overview, see Methodology.

01

Video ingestion

Every 6 hours we poll the YouTube Data API for new uploads across our curated channel list. Each channel has a stable channel_id and an admin-tuned reliability weight (1–100) that reflects the creator's track record.

New videos land in the videos table with metadata only — transcript and mention extraction run as separate stages so a failure in one doesn't block the next channel.

02

Transcript pipeline

Two-pass strategy, every 5 minutes, batched per video:

  1. Pass 1 — native captions. YouTube auto-captions (or creator-provided) via youtube-transcript. Free, fast, ~80% coverage.
  2. Pass 2 — AI speech-to-text. For the ~20% with no captions, Google Gemini transcribes the audio directly. Slower and metered, but no video is left behind.

Both paths produce a single transcript column with the source tagged (youtube / ai / none) so we can audit quality later.

03

Mention extraction

Each transcript is passed to Gemini with a structured-output schema. The model returns a JSON array of mentions, each carrying:

{
  "ticker": "NVDA",
  "stance": "bullish" | "neutral" | "bearish",
  "confidence": 0.0 - 1.0,
  "timestamp_seconds": 412,
  "excerpt": "...verbatim quote..."
}

Stance reflects what the creator said about the ticker — not whether we agree. Confidence is the model's certainty that the ticker was actually being discussed (vs. a passing reference or a misheard word).

04

Ticker validation

Models hallucinate. Before any mention can move forward, the ticker symbol is checked against Yahoo Finance — if the symbol doesn't resolve to a real listed instrument it's quarantined and removed by a weekly cleanup-invalid-tickers cron job. Validated symbols are cached in the tickers table with exchange, name, and logo.

05

Scoring formula

This is the real math, run on every recompute (auto-chained after every new transcript batch). For each ticker T across a rolling 14-day window:

Step 1 — sign each mention
sign = +1   if stance == "bullish"
       -1   if stance == "bearish"
       +0.2 if stance == "neutral"

mention_value = sign × confidence

Neutral mentions count, but barely — they nudge the score without dominating it. A creator saying "I'm watching NVDA" shouldn't move the needle the same as "I'm buying NVDA."

Step 2 — collapse per creator, clamp
for each creator C who mentioned T:
  creator_avg[C] = mean(mention_value over C's mentions of T)
  creator_avg[C] = clamp(creator_avg[C], -1, +1)

Clamping prevents a single creator with many high-confidence bullish mentions from running away with the score — one creator can move the needle at most ± their weight.

Step 3 — weight and sum
raw = Σ ( weight[C] × creator_avg[C] )   for all creators C of T

theoretical_max = max_weight × number_of_creators
consensus_bonus = 1.15 if creator_count >= 3 else 1.00
Step 4 — normalize to 0–100
deviation = (raw / theoretical_max) × 50 × consensus_bonus
score     = round( clamp( 50 + deviation, 0, 100 ) )

50 is neutral. Pure bullish consensus across high-weight creators pushes toward 100; bearish pushes toward 0. The 15% consensus bonus rewards independent corroboration — three creators agreeing is meaningfully different from one creator shouting.

06

Coverage gates

A score alone doesn't get published. The pick must clear:

  • ≥ 2 distinct creators, OR
  • ≥ 3 total mentions from the same creator

This kills single-mention noise. A ticker namedropped once by one creator never becomes a published recommendation, regardless of how confidently the model extracted it.

The previously-published score remains live until the next recompute changes it by ≥ 5 points or the contributing mentions change — at which point the AI thesis is invalidated and regenerated.

07

AI reasoning regeneration

Every published pick carries a written thesis — the "why" you see on the stock page. It's generated by a separate Gemini pass that ingests:

  • The ticker's verbatim mention excerpts (with timestamps)
  • Each contributing creator's name and weight
  • The final score and stance distribution

When the underlying mentions or score shift materially, the thesis is marked stale and re-queued. A worker drains the queue every 5 minutes, so the public site never shows a thesis that contradicts the current score.

08

Backtest engine

We measure ourselves. Every published pick with score ≥ 70 is auto-enrolled into a hypothetical $100 position, captured at the next trading day's open after publication.

entry_date  = first trading day after published_at
entry_price = open[entry_date]
spy_entry   = open[entry_date] for SPY

window prices captured at +1M, +3M, +6M, +1Y:
  pick_return  = (price[window] / entry_price) - 1
  spy_return   = (spy[window]   / spy_entry)   - 1
  alpha        = pick_return - spy_return

Snapshots fill in automatically as windows mature (daily cron). We aggregate per creator and site-wide to a leaderboard. The dashboard is currently admin-only until the sample size is large enough to publish honestly — back-filling old picks with synthetic entry prices would distort the record, so we're letting it accumulate forward-only.

Assumes no fees, slippage, taxes, or dividends. α is purely illustrative — past performance doesn't guarantee future results.

09

Limits & honesty

What we explicitly do not do:

  • No price targets. We don't predict where a stock will go — we surface what high-conviction creators are saying, weighted by their reliability.
  • No recency decay yet. Within the 14-day window all mentions count equally. A future revision may weight more recent mentions higher.
  • No sector or macro weighting. A 100 score on a microcap means the same thing as a 100 score on a megacap. Position-sizing is on you.
  • No short signals emphasis. Bearish stances are tracked symmetrically but the audience reality is that most viewers act on long ideas.
  • Not investment advice. This is a signal aggregator. Read the disclaimer.