← Back to blog

Phoenix-style scoring vs likes-prediction: what we actually measure

·7 min read·by Isaiah Dupree

When people ask "how good is this tweet?", they usually mean "will it get a lot of likes?". That framing is wrong in a useful way: Twitter's ranker doesn't weight likes evenly with other actions, and four of the actions it tracks actively hurt your tweet's reach.

TweetSim's phoenix-style scorer predicts probabilities for 18 distinct user actions, weighted by the coefficients leaked from Twitter's 2023 open-source ranker. Here's what each action is worth and what the score is actually telling you.

The 18 actions, by category

Positive engagement (boosts reach)

  • favorite (like) — baseline weight ~0.5. Cheap signal, easy to give, abundant.
  • reply — ~3.0× the weight of a like. Reply means the user spent energy responding. Most-weighted positive action.
  • retweet — ~2.0×. User vouches with their own audience.
  • quote tweet — ~2.0×. Same as retweet plus added context.
  • profile_click — ~12×. Surprisingly weighted because it indicates the tweet made the reader curious about the author. Strong follow predictor.
  • follow_author — ~12×. The action the platform most wants to drive.
  • dwell_time — measured in seconds, not probability. A tweet that gets 4-second average dwell is read; one that gets 0.3s is scrolled past.
  • click (link) — ~0.4. Small positive signal that the tweet drove an action.
  • share via DM / copy link — ~0.04. Quiet sharing, minor positive.
  • photo_expand · video_view — ~0.0002. Almost zero weight per event, but high-frequency, so they sum up for media tweets.

Negative engagement (throttles reach, hard)

These are the ones most tools ignore. They're the reason engagement-bait tweets often get a brief spike then collapse.
  • not_interested — −74. Each "not interested" click costs you ~150 likes' worth of weight.
  • mute_author — −74. Same magnitude. User chose to stop seeing you.
  • block_author — −74. Same magnitude. Strong permanent signal.
  • report — −369. Five times as bad as a block. Catastrophic if it happens at scale.

Why this changes how you should write

If you only care about likes, here's the optimal strategy: post pleasant agreeable content that's safe to like, post a lot, ask for likes occasionally. You'll accumulate likes.

If you care about weighted engagement (the thing the algorithm actually uses to decide whether to promote you), the strategy is different:

1. Engagement bait actively hurts you

"RT if you agree" gets retweets but it also gets not_interested clicks from people who recognize the pattern. Net weight: usually negative.

2. Profile click is undervalued

Most tools never tell you to optimize for profile-click probability. Phoenix weights it 12×. Tweets that make the reader curious about the author (specific receipts, contrarian takes, "I shipped X this week") drive profile clicks. Vague tweets don't.

3. Replies dominate likes

A tweet that gets 5 replies and 50 likes outscores a tweet that gets 100 likes and 0 replies in phoenix. Reply-trigger writing — questions, contrarian claims, missing-information hooks — is high-leverage.

4. Annoyance is asymmetric

Each "not interested" cancels ~150 likes of weight. A tweet that gets 200 likes and 5 not-interested clicks is roughly net-negative. This is why hype tweets that briefly spike often die fast — the negative actions cancel the positives.

What you actually see in the score

The phoenix breakdown in the TweetSim UI shows the top contributors — usually reply, favorite, profile_click, plus any negative actions that fired. Each row has the predicted probability and its weighted contribution.

reply           18.2%   +5.46
favorite        31.1%   +0.16
profile_click    7.4%   +0.91
follow_author    4.3%   +0.52
not_interested   3.1%   -2.30
              ───────   ─────
              phoenix score   55.1

That's a publishable tweet — net positive, no big negative signals. If not_interested were 6%+ it would tip negative regardless of how high the favorite probability climbed.

Where this comes from

The weights are derived from Twitter's 2023 open-source ranker release (the "phoenix" scoring). The probabilities are predicted from text features — word count, hook punch, reply-trigger detection, sales markers, AI-tell phrases, link presence, etc.

The probability model is calibratable. As you publish posts and actual engagement comes in, the calibration loop fits a per-action regression: predicted probability → observed action rate. Coefficients update only when sample size is large enough and Pearson r is above 0.25 (safety check). This is the "does the simulator stop trusting itself" loop — when r drifts below the floor, the scorecard turns red.

What we don't model

Phoenix scoring is one signal among many that the For-You ranker uses. We don't model:

  • Topic clustering (when your tweet lands in trending topics)
  • Author reputation (your account's 7-day baseline engagement)
  • Reciprocity graph (whether you reply to people who reply to you)
  • Bot detection (low-quality account heuristics)
  • Out-of-network distribution mechanics beyond the velocity gate

These are real factors. We're honest about not modeling them. The phoenix score is "will the algorithm want to promote this if it shows it to people?" — a necessary but not sufficient condition.

Bottom line

Stop optimizing for likes. They're weighted lowest among positive actions and most-vulnerable to engagement-bait anti-patterns. Optimize for replies (highest weight per event), profile clicks (most undervalued), and dwell time (signals quality consumption). Avoid the four negative actions like the plague — each one cancels 150 likes of weight.

That's the framework. The phoenix score in TweetSim is a single number that captures all of it. Score a tweet and look at which positive actions are driving the score — that's the part of your writing the algorithm is rewarding.