PR #1019 · March 2026

Autopsy of PR #1019's 27M-Parameter GPT

I trained a 27M-parameter GPT to 1.134 BPB on FineWeb, then dissected it. Three findings: MLP matrices need more quantization bits than attention — despite Q and Out having 3,000× higher condition numbers. A single layer (L7) contributes −4.35 bits/token of readability — more than any other single layer after L0. And calibration is nearly perfect at 0.24% ECE, so temperature scaling is not worth pursuing.

Final BPB

1.134

Parameters

27.1M

Steps

7,000

Wall Time

40min

Hardware

2×H100SXM


MLP Needs More Bits Than Attention

Each matrix quantized individually to int6, all others held at full precision

Quantization replaces precise weights with rounded approximations to make models smaller and faster. To find which weights matter most, I quantize each matrix individually to int6 while keeping everything else at full precision — like loosening one bolt at a time to find which ones hold the structure together.

Each cell: quantize only that one matrix to int6, measure BPB degradation. Baseline: 1.134307 BPB. Full int6: +8.3×10⁻³.
All 6 matrices of each layer quantized to int6. Decoder layers 9–10 are most sensitive.

Key Insight

MLP accounts for 6,247 × 10⁻⁶ total sensitivity. All four attention matrices (Q, K, V, Out) together: 1,600 × 10⁻⁶. For mixed-precision GPTQ: allocate more bits to MLP. Layer 10 is most sensitive per-layer (+1,057 × 10⁻⁶).

Why Condition Number Is Misleading

Q matrices have condition numbers up to 54,000 — yet MLP is 4× more sensitive than all attention combined

Every weight matrix can be decomposed (via SVD) into independent channels ranked by importance. Condition number is the ratio between the strongest and weakest channels — Q's ratio of 54,000× means it amplifies some directions 54,000× more than others, which should make rounding errors catastrophic. It doesn't.

Higher condition number = expected to be more fragile. Q matrices top the chart — yet are least sensitive to quantization.
Bottom = high readability improvement. Right = high condition number. Top-right = encoder layers preparing skip features.

Stable Rank Reveals What Matters

Effective utilization, not condition number, predicts quantization sensitivity

The singular value curves show why: Q concentrates its work in just 10% of its channels (its “stable rank”). The other 90% carry near-zero signal, so quantization errors there do nothing. MLP uses 33% of its channels — 3× more capacity in active use, so rounding errors have 3× more ways to cause damage.

Each line is one layer. Steep decay = low effective rank. Flat = capacity fully used.

Key Insight

Condition number alone is a poor predictor of quantization damage. Effective utilization (stable rank / full rank) combined with dimensionality is what matters. This is directly measured, not theoretical.

Layer 7 Does Most of the Work

Projecting each layer's residual stream through the unembedding matrix

The logit lens probes what the model “believes” at each layer by projecting the hidden state through the final prediction head. If loss drops after a layer, that layer moved the model closer to the right answer. If loss rises, the layer sacrificed intermediate readability — reorganizing representations for downstream layers to use via skip connections.

Loss (nats)Top-1 accuracy (scaled)Readability dropped

Key Insight

Layer 7 contributes −4.35 bits/token — more than any other single layer after L0's initial embedding projection (−15.80). Layers 3–5 show increased loss (residual stream becomes less readable). Hypothesis: encoder layers 3–4 are reorganizing representations for the skip connections (enc 3→dec 6, enc 4→dec 5), and decoder layer 5 pays the cost of integrating the enc 4 skip, sacrificing intermediate readability. This is consistent with the architecture but not proven causally. Layer 10 adds only −0.47 bits/token — a candidate for narrowing, pending empirical validation.

Calibration Is Not the Bottleneck

ECE = 0.24% across 62M tokens

Calibration measures whether a model's confidence matches reality — when it says 80% confident, is it right 80% of the time? Expected Calibration Error (ECE) quantifies this gap across all confidence levels. A model with low ECE but high loss means the model knows what it doesn't know — the bottleneck is prediction accuracy, not misplaced confidence.

Accuracy − confidence per bin. Zero = perfectly calibrated. Model is overconfident on 79% of tokens.
Where loss comes from. Low P(correct) = model assigned little probability to the right answer — these tokens dominate loss.

Key Insight

Do not pursue calibration tuning. 70% of total loss comes from tokens where P(correct) < 5% — the model is well-calibrated but often wrong. The bottleneck is accuracy, not confidence. Temperature scaling and label smoothing would yield negligible gains.

Further Exploration

What Each Head Learns

Classifying 88 attention heads by function (Olsson et al. 2022)

Each attention head learns a different pattern for routing information between tokens. By measuring what each head actually attends to — the previous token, repeated sequences (induction), or absolute positions — we classify heads by function. This reveals whether the model is doing sophisticated in-context learning or relying on simpler n-gram statistics.

2 induction, 22 previous-token, 5 positional, 59 other
Higher values indicate stronger A B … A → B copying behavior

Key Insight

22 of 88 heads are previous-token heads, concentrated in the encoder (14 vs 8 decoder). Only 2 show induction behavior (L0H5, L3H2) with marginal scores (~0.02). At this scale, the model relies on n-gram statistics, not in-context copying — induction-head-based interventions are not worth pursuing.

Reading the Model's Mind

Token-level loss, top-k predictions at failure points, and generation vs. reality

The most direct form of interpretability: look at individual tokens, see where the model fails, examine what it predicted instead, and compare its free-form generation against reality. This turns aggregate statistics into concrete examples you can read and reason about.

Loss Heatmap

Light = predicted well, dark = surprised. Hover for loss values.

Low loss
High loss
<s>Insurance Company Declares Living Man Dead George Johannesen is very much alive. Which is why it was so surprising when the Canadian man received a letter addressedTo the Estate of George Johannesen.” Even more surprising is that it came from his insurance company, who should really be on top of such things. Now this wouldnt have been so terrible if Manitoba Public Insurance was giving Johannesens estate a fat check for his passing away. But thats not what happened. Instead the letter was to inform the estate that, since George was dead, his driver license and auto insurance had been cancelled in October. This poses a problem for Johannesen because, being alive, he continues to drive his car.I dont understand how this could have happened, he told the Toronto Sun.For me to be declared dead, someone would have to present a death certificate. For someone to get that, I guess I must have died sometime in October.” Now the 59-year-old worries that he will stop getting his pension and other government benefits. The Manitoba Public Insurance Company says they are trying to resolve the issue. They also claim they werent the ones who determined Johannesen was dead, but cryptically cant reveal the source of the confusion for confidentially reasons. Perhaps a pesky ghost is behind the mix up?<s>Exhausted and euphoric. Those are the words to describe me right now. Six days after boarding the "Good Morning America" Whistle-Stop '08 Tour train, and beginning the adventure of a lifetime, it's over. We just wrapped the last show of our little Odyssey from the Newseum in the nation's capitol, Washington D.C., and for me, it was an appropriate but bittersweet ending to our tale. Appropriate because the point of our tour was to go out and ask real people what was on their minds, to hear straight from them their concerns about our nation. By ending in Washington, D.C., we brought their thoughts, problems and hopes to the doorstep of the government -- to the people that can do something about them. But it was also bittersweet because I honestly didn't want the trip to end. I'm not going to lie to you and say that I loved everything about it (3:00 AM wake up calls being the main offender), but I was consistently surprised to find that even in the tougher times, when we had been blearily working for 18 hours straight, something or someone would come along to pick everyone up. From the absolute chaos of pre-show preparations, to the fleeting sparkle of pride in the production team's eyes when a show went just as planned, life on the train was crazy, grueling and complicated, but most of all, fun. Some moments I'll never forget. Like the celebration in Massachusetts after we pulled off what had never been done before -- the first live network television broadcast from a moving train. Or when Diane, Robin and Chris teamed up -- using Rick Klein and me as props -- to convince Sam that he was supposed to share his tiny room on the train with two roomates. Or when Chris put his life on a very secure line at Niagara Falls to dramatically bring the news from the brink of watery doom. Or, my personal favorite moment, when Sam, Chris and two producers played the most ridiculous game of Monopoly I've ever seen for four hours and a few of us, Sam included, cried from laughing so hard. But far more moving than the obvious and endearing camaraderie between the anchors was their care for the American people to whom they bring the news every morning. Never was this more obvious than yesterday, when I accidentally stumbled into an anchors' meeting where they discuss the content of the next day's show and, for some reason, I was allowed to stay. As an aspiring journalist myself, I can't express how inspiring it was to listen in on this discussion and know firsthand that whatever goes on the air, it's the fairest, most accurate and most informative report possible. Though they have their fun, when it comes to the news Diane, Robin, Chris and Sam are professionals in every sense of the word. But now I have to go -- have to return to "normal" life, and I don't want to. I have to shave my rail-beard, the result of a production-wide pact to not shave for the duration of the trip. I have to wash some extremely smelly clothes. I also have a feeling that the spontaneous dance parties that erupted on the train will be for some reason looked down upon in the office. These are all reasons to miss dragging myself aboard that cramped studio on rails well before the sun comes up. But maybe we'll be able to do it again sometime. We were told to get back to New York however we wanted. I think I'll take a train.<s>By registering on amnesty.org you can join in on the human rights conversation and ensure your contributions are combined with ours. If you come from a country that doesn't have an office you have the option to become an International member. Here, you will receive emails about human rights campaigns that are targeted to your interests and opportunities to take action for human rights impact. You can also become a volunteer, and lead on activism initiatives in your community. Furthermore, you will have full use of the Amnesty International online communities. We cant do it without you! Where do you live? We have 50 international offices. This information will help us ensure that you receive the appropriate services. If you don't have an office in your country, you can

Top-k Predictions at High-Loss Tokens

185 tokens above the 90th-percentile loss threshold. Sorted by loss (highest first).

... of us, Sam included, cried...
Position 1284 · Loss: 10.795 nats (15.573 bits) · P(correct): 0.00%
RankPredictedProbBar
1." and"51.2%
2.","31.1%
3."'"2.9%
4." was"1.1%
5." -"0.6%
Actual: " includ" (rank >5, prob 0.00%)
... back to New York however we wanted...
Position 1764 · Loss: 10.510 nats (15.163 bits) · P(correct): 0.00%
RankPredictedProbBar
1.","11.8%
2." C"9.2%
3." and"9.2%
4." on"7.2%
5." in"7.2%
Actual: " how" (rank >5, prob 0.00%)
...in with two roomates. Or when Ch...
Position 1166 · Loss: 10.409 nats (15.017 bits) · P(correct): 0.00%
RankPredictedProbBar
1."m"85.3%
2."s"2.3%
3."ies"2.3%
4."ie"1.8%
5."-"1.6%
Actual: "ates" (rank >5, prob 0.00%)
... be for some reason looked down upon in...
Position 1679 · Loss: 10.335 nats (14.910 bits) · P(correct): 0.00%
RankPredictedProbBar
1." the"14.1%
2." a"8.6%
3." more"5.9%
4.","2.8%
5." my"2.8%
Actual: " look" (rank >5, prob 0.00%)
... registering on amnesty.or...
Position 1790 · Loss: 9.841 nats (14.198 bits) · P(correct): 0.01%
RankPredictedProbBar
1."line"29.6%
2." this"18.0%
3." our"14.0%
4." the"9.6%
5." F"4.5%
Actual: " am" (rank >5, prob 0.01%)
... on rails well before the sun...
Position 1722 · Loss: 9.685 nats (13.972 bits) · P(correct): 0.01%
RankPredictedProbBar
1."."30.6%
2.","14.4%
3." and"14.4%
4." -"4.7%
5." in"2.8%
Actual: " well" (rank >5, prob 0.01%)
...es to the news Diane, Rob...
Position 1521 · Loss: 9.508 nats (13.718 bits) · P(correct): 0.01%
RankPredictedProbBar
1.","53.1%
2." and"5.6%
3." that"3.8%
4." -"2.6%
5." of"2.6%
Actual: " D" (rank >5, prob 0.01%)
...-year-old worries that he will...
Position 380 · Loss: 9.415 nats (13.582 bits) · P(correct): 0.01%
RankPredictedProbBar
1." J"14.7%
2." is"14.7%
3." has"7.9%
4.","6.1%
5." man"4.8%
Actual: " wor" (rank >5, prob 0.01%)
...haps a pesky ghost...
Position 508 · Loss: 9.077 nats (13.096 bits) · P(correct): 0.01%
RankPredictedProbBar
1."erson"18.2%
2."oss"8.6%
3."ol"8.6%
4."ub"8.6%
5."ot"6.7%
Actual: "es" (rank >5, prob 0.01%)
...For me to be declared dead...
Position 320 · Loss: 9.025 nats (13.020 bits) · P(correct): 0.01%
RankPredictedProbBar
1."l"24.7%
2." ab"9.1%
3." a"5.5%
4." al"4.9%
5." in"4.3%
Actual: " dec" (rank >5, prob 0.01%)
...sey from the Newseum in the nation...
Position 634 · Loss: 8.885 nats (12.819 bits) · P(correct): 0.01%
RankPredictedProbBar
1." Y"60.0%
2." E"4.9%
3." O"4.3%
4." W"4.3%
5." "3.0%
Actual: "se" (rank >5, prob 0.01%)
...'08 Tour train, and beg...
Position 588 · Loss: 8.866 nats (12.790 bits) · P(correct): 0.01%
RankPredictedProbBar
1.","54.0%
2." in"5.0%
3." of"3.9%
4."."3.5%
5."n"2.4%
Actual: " tra" (rank >5, prob 0.01%)
...p. I have to wash some extre...
Position 1629 · Loss: 8.602 nats (12.410 bits) · P(correct): 0.02%
RankPredictedProbBar
1." sh"9.5%
2." go"8.4%
3." be"4.5%
4." re"4.0%
5." s"3.1%
Actual: " was" (rank >5, prob 0.02%)
...ction-wide pact to not shave...
Position 1609 · Loss: 8.589 nats (12.391 bits) · P(correct): 0.02%
RankPredictedProbBar
1."an"15.9%
2."ub"10.9%
3."ol"10.9%
4."us"7.5%
5."and"5.2%
Actual: "act" (rank >5, prob 0.02%)
... America" Whistle-St...
Position 575 · Loss: 8.581 nats (12.380 bits) · P(correct): 0.02%
RankPredictedProbBar
1." t"16.0%
2." c"4.6%
3." tra"4.0%
4." fl"4.0%
5." r"4.0%
Actual: " Wh" (rank >5, prob 0.02%)
... Chris and two producers played the...
Position 1242 · Loss: 8.459 nats (12.204 bits) · P(correct): 0.02%
RankPredictedProbBar
1." of"26.4%
2." other"23.3%
3." fr"9.7%
4." ro"5.9%
5." R"3.1%
Actual: " produ" (rank >5, prob 0.02%)
...terday, when I accidentally stumb...
Position 1374 · Loss: 8.323 nats (12.008 bits) · P(correct): 0.02%
RankPredictedProbBar
1." was"18.3%
2." w"6.7%
3."'"4.6%
4." sa"3.6%
5." had"3.2%
Actual: " acc" (rank >5, prob 0.02%)
... confusion for confidentially re...
Position 492 · Loss: 8.221 nats (11.861 bits) · P(correct): 0.03%
RankPredictedProbBar
1." the"29.5%
2." them"15.8%
3." their"5.1%
4." J"4.0%
5." a"3.1%
Actual: " con" (rank >5, prob 0.03%)
...ie between the anchors was their c...
Position 1329 · Loss: 8.120 nats (11.715 bits) · P(correct): 0.03%
RankPredictedProbBar
1." two"41.9%
2." produ"4.4%
3." c"3.9%
4." m"2.7%
5." people"2.1%
Actual: " an" (rank >5, prob 0.03%)
...in.<s>By registering on am...
Position 1785 · Loss: 8.084 nats (11.663 bits) · P(correct): 0.03%
RankPredictedProbBar
1." J"8.6%
2.":"6.7%
3." M"6.7%
4." S"5.9%
5." D"5.2%
Actual: " reg" (rank >5, prob 0.03%)
...tter addressedTo the Est...
Position 70 · Loss: 8.079 nats (11.655 bits) · P(correct): 0.03%
RankPredictedProbBar
1." to"81.6%
2." for"5.2%
3." as"2.8%
4." in"1.5%
5." the"1.0%
Actual: "" (rank >5, prob 0.03%)
... also claim they werent the on...
Position 445 · Loss: 7.969 nats (11.497 bits) · P(correct): 0.03%
RankPredictedProbBar
1." are"20.3%
2." have"17.9%
3." will"12.3%
4." can"8.5%
5.""6.6%
Actual: " we" (rank >5, prob 0.03%)
..., Chris and two producers played...
Position 1241 · Loss: 7.953 nats (11.474 bits) · P(correct): 0.04%
RankPredictedProbBar
1." I"34.0%
2." R"20.6%
3." Ch"11.1%
4." me"7.6%
5." D"3.6%
Actual: " two" (rank >5, prob 0.04%)
...ance company, who should really be on to...
Position 113 · Loss: 7.910 nats (11.412 bits) · P(correct): 0.04%
RankPredictedProbBar
1."se"6.2%
2." had"6.2%
3." was"4.2%
4." is"4.2%
5." dec"3.3%
Actual: " should" (rank >5, prob 0.04%)
... human rights impact. You can also...
Position 1918 · Loss: 7.903 nats (11.401 bits) · P(correct): 0.04%
RankPredictedProbBar
1."."75.8%
2." in"3.8%
3." and"2.9%
4.","2.6%
5." is"1.6%
Actual: " imp" (rank >5, prob 0.04%)
... obvious than yesterday, when...
Position 1367 · Loss: 7.870 nats (11.355 bits) · P(correct): 0.04%
RankPredictedProbBar
1." the"22.4%
2." when"10.6%
3." it"9.3%
4." in"8.3%
5." this"5.0%
Actual: " y" (rank >5, prob 0.04%)
...ributions are combined with our...
Position 1824 · Loss: 7.854 nats (11.331 bits) · P(correct): 0.04%
RankPredictedProbBar
1." pro"22.8%
2." t"5.8%
3." not"3.5%
4." c"3.5%
5." p"3.1%
Actual: " com" (rank >5, prob 0.04%)
... dead, but cryptically can...
Position 471 · Loss: 7.796 nats (11.248 bits) · P(correct): 0.04%
RankPredictedProbBar
1."ame"18.8%
2."ert"16.6%
3."all"12.9%
4."red"6.9%
5."le"5.4%
Actual: "ry" (rank >5, prob 0.04%)
... being the main offender),...
Position 868 · Loss: 7.785 nats (11.231 bits) · P(correct): 0.04%
RankPredictedProbBar
1." re"19.0%
2." th"9.0%
3." p"4.8%
4." c"4.8%
5." e"4.2%
Actual: " of" (rank >5, prob 0.04%)
...thing about it (3:00 AM...
Position 851 · Loss: 7.731 nats (11.153 bits) · P(correct): 0.04%
RankPredictedProbBar
1."and"12.2%
2."I"9.5%
3."w"6.5%
4."t"5.1%
5."th"5.1%
Actual: "3" (rank >5, prob 0.04%)
... appropriate but bittersweet...
Position 669 · Loss: 7.712 nats (11.127 bits) · P(correct): 0.04%
RankPredictedProbBar
1."ly"7.5%
2." time"7.5%
3." day"6.6%
4." m"6.6%
5." t"4.6%
Actual: " but" (rank >5, prob 0.04%)
...annesen is very much alive. Wh...
Position 33 · Loss: 7.639 nats (11.021 bits) · P(correct): 0.05%
RankPredictedProbBar
1." a"13.4%
2." the"6.3%
3." one"3.8%
4." de"3.0%
5." an"3.0%
Actual: " very" (rank >5, prob 0.05%)
...), but I was consistently sur...
Position 877 · Loss: 7.620 nats (10.993 bits) · P(correct): 0.05%
RankPredictedProbBar
1."n"6.4%
2." s"4.4%
3." just"3.9%
4." also"3.9%
5." h"3.4%
Actual: " cons" (rank >5, prob 0.05%)
...fidentially reasons. Perh...
Position 498 · Loss: 7.619 nats (10.992 bits) · P(correct): 0.05%
RankPredictedProbBar
1."port"22.4%
2."ce"19.8%
3."qu"12.0%
4."p"9.4%
5."v"6.4%
Actual: "as" (rank >5, prob 0.05%)
... cryptically cant reveal...
Position 475 · Loss: 7.503 nats (10.824 bits) · P(correct): 0.06%
RankPredictedProbBar
1.","19.6%
2." they"9.3%
3."."4.4%
4." said"4.4%
5." s"3.9%
Actual: " can" (rank >5, prob 0.06%)
...ging myself aboard that cr...
Position 1707 · Loss: 7.444 nats (10.739 bits) · P(correct): 0.06%
RankPredictedProbBar
1." into"12.6%
2." out"12.6%
3." to"12.6%
4." on"6.8%
5." in"6.8%
Actual: " ab" (rank >5, prob 0.06%)
...rapped the last show of our little...
Position 620 · Loss: 7.370 nats (10.632 bits) · P(correct): 0.06%
RankPredictedProbBar
1." of"12.0%
2." "7.3%
3." two"5.0%
4." part"5.0%
5." fe"5.0%
Actual: " show" (rank >5, prob 0.06%)
... along to pick everyone up. Fr...
Position 937 · Loss: 7.293 nats (10.521 bits) · P(correct): 0.07%
RankPredictedProbBar
1." us"51.3%
2." up"21.4%
3." me"16.7%
4." our"2.5%
5." the"1.1%
Actual: " every" (rank >5, prob 0.07%)
... of pride in the production team's...
Position 978 · Loss: 7.292 nats (10.520 bits) · P(correct): 0.07%
RankPredictedProbBar
1." c"5.4%
2." m"4.8%
3." f"4.2%
4." s"4.2%
5." l"3.7%
Actual: " produ" (rank >5, prob 0.07%)
... people to whom they bring the news every...
Position 1347 · Loss: 7.285 nats (10.510 bits) · P(correct): 0.07%
RankPredictedProbBar
1." were"19.0%
2." had"7.9%
3." l"4.2%
4."'"3.7%
5." c"3.7%
Actual: " br" (rank >5, prob 0.07%)
...in and Chris teamed up --...
Position 1114 · Loss: 7.246 nats (10.454 bits) · P(correct): 0.07%
RankPredictedProbBar
1." were"13.6%
2." had"5.0%
3.","3.9%
4." b"2.7%
5." c"2.4%
Actual: " te" (rank >5, prob 0.07%)
...:00 AM wake up calls...
Position 857 · Loss: 7.184 nats (10.365 bits) · P(correct): 0.08%
RankPredictedProbBar
1.")"23.8%
2.","14.4%
3.")."6.8%
4." to"6.0%
5." E"4.7%
Actual: " w" (rank >5, prob 0.08%)
...ink of watery doom. Or,...
Position 1215 · Loss: 7.125 nats (10.279 bits) · P(correct): 0.08%
RankPredictedProbBar
1.","4.4%
2." w"3.9%
3." m"3.4%
4." s"3.4%
5." t"3.4%
Actual: " do" (rank >5, prob 0.08%)
... was to go out and ask real people what...
Position 703 · Loss: 7.094 nats (10.234 bits) · P(correct): 0.08%
RankPredictedProbBar
1." see"20.3%
2." get"5.8%
3." ex"5.1%
4." exper"4.5%
5." w"4.5%
Actual: " as" (rank >5, prob 0.08%)
... the news from the brink of watery...
Position 1209 · Loss: 7.079 nats (10.213 bits) · P(correct): 0.08%
RankPredictedProbBar
1." c"5.2%
2." W"4.1%
3." N"3.6%
4." M"2.8%
5." m"2.8%
Actual: " br" (rank >5, prob 0.08%)
...ussion and know firsthand that whate...
Position 1467 · Loss: 7.078 nats (10.212 bits) · P(correct): 0.08%
RankPredictedProbBar
1." that"43.7%
2."ing"9.8%
3." how"6.7%
4." what"5.9%
5." the"5.2%
Actual: " first" (rank >5, prob 0.08%)
...ing alive, he continues to dri...
Position 275 · Loss: 7.019 nats (10.127 bits) · P(correct): 0.09%
RankPredictedProbBar
1." was"15.0%
2.""11.7%
3." is"10.3%
4." had"8.1%
5." has"8.1%
Actual: " cont" (rank >5, prob 0.09%)
...hind the mix up?<s>Exh...
Position 521 · Loss: 7.013 nats (10.117 bits) · P(correct): 0.09%
RankPredictedProbBar
1."."41.2%
2." of"15.1%
3."?"11.8%
4.","8.1%
5."ed"6.3%
Actual: " up" (rank >5, prob 0.09%)
.... From the absolute ch...
Position 945 · Loss: 7.011 nats (10.115 bits) · P(correct): 0.09%
RankPredictedProbBar
1." m"11.8%
2." first"4.9%
3." be"4.9%
4." time"4.3%
5." t"4.3%
Actual: " ab" (rank >5, prob 0.09%)
... planned, life on the train...
Position 999 · Loss: 6.998 nats (10.097 bits) · P(correct): 0.09%
RankPredictedProbBar
1." to"53.6%
2." I"8.2%
3." it"6.4%
4." the"5.7%
5." we"3.4%
Actual: " l" (rank >5, prob 0.09%)
...ve ever seen for four hours...
Position 1269 · Loss: 6.998 nats (10.097 bits) · P(correct): 0.09%
RankPredictedProbBar
1."."41.8%
2.","12.0%
3." in"12.0%
4." -"10.6%
5." on"4.4%
Actual: " for" (rank >5, prob 0.09%)
... for confidentially reasons. P...
Position 496 · Loss: 6.986 nats (10.079 bits) · P(correct): 0.09%
RankPredictedProbBar
1."ity"42.3%
2." information"12.1%
3." d"6.5%
4." re"3.1%
5." p"2.7%
Actual: "ly" (rank >5, prob 0.09%)
..., we brought their thoughts, pro...
Position 756 · Loss: 6.966 nats (10.050 bits) · P(correct): 0.09%
RankPredictedProbBar
1." the"14.0%
2." back"10.9%
3." to"9.6%
4." a"9.6%
5." our"7.5%
Actual: " their" (rank >5, prob 0.09%)
... you will have full use of the Amn...
Position 1964 · Loss: 6.939 nats (10.010 bits) · P(correct): 0.10%
RankPredictedProbBar
1." acc"64.5%
2." cont"11.2%
3." f"3.2%
4." p"2.2%
5." time"1.7%
Actual: " use" (rank >5, prob 0.10%)
... letter was to inform the estate...
Position 205 · Loss: 6.931 nats (9.999 bits) · P(correct): 0.10%
RankPredictedProbBar
1." the"30.7%
2." J"6.9%
3." M"6.9%
4." G"5.3%
5."ld"3.7%
Actual: " in" (rank >5, prob 0.10%)
... informative report possible. Though...
Position 1501 · Loss: 6.914 nats (9.975 bits) · P(correct): 0.10%
RankPredictedProbBar
1."ing"58.3%
2."."6.2%
3." of"6.2%
4." on"4.8%
5." that"2.9%
Actual: " p" (rank >5, prob 0.10%)
... when a show went just as planned...
Position 992 · Loss: 6.897 nats (9.951 bits) · P(correct): 0.10%
RankPredictedProbBar
1." out"15.0%
2." on"13.2%
3." off"8.0%
4." down"8.0%
5." l"6.3%
Actual: " just" (rank >5, prob 0.10%)
...amed up -- using Rick Kle...
Position 1120 · Loss: 6.873 nats (9.916 bits) · P(correct): 0.10%
RankPredictedProbBar
1." and"10.6%
2." they"9.3%
3." the"8.2%
4." to"3.9%
5." I"3.4%
Actual: " us" (rank >5, prob 0.10%)
...cribe me right now. Six d...
Position 551 · Loss: 6.865 nats (9.904 bits) · P(correct): 0.10%
RankPredictedProbBar
1."."47.7%
2.","13.7%
3.":"5.0%
4." and"5.0%
5." in"2.1%
Actual: " right" (rank >5, prob 0.10%)
..., something or someone would come...
Position 926 · Loss: 6.827 nats (9.849 bits) · P(correct): 0.11%
RankPredictedProbBar
1." was"23.4%
2." had"5.2%
3." about"4.6%
4." st"3.6%
5." d"3.2%
Actual: " or" (rank >5, prob 0.11%)
...estate a fat check for his pass...
Position 172 · Loss: 6.770 nats (9.767 bits) · P(correct): 0.11%
RankPredictedProbBar
1."al"67.3%
2."her"11.7%
3." t"1.6%
4."ally"1.4%
5." c"1.2%
Actual: " che" (rank >5, prob 0.11%)
...ing sparkle of pride in the production...
Position 974 · Loss: 6.767 nats (9.763 bits) · P(correct): 0.12%
RankPredictedProbBar
1." the"24.9%
2." a"8.1%
3." our"5.5%
4." an"2.3%
5." s"2.3%
Actual: " pr" (rank >5, prob 0.12%)
... boarding the "Good Morning...
Position 564 · Loss: 6.746 nats (9.732 bits) · P(correct): 0.12%
RankPredictedProbBar
1." bus"5.7%
2." pl"5.7%
3." a"5.7%
4." "4.4%
5." M"3.9%
Actual: " "" (rank >5, prob 0.12%)
... the fleeting sparkle of pride...
Position 970 · Loss: 6.704 nats (9.672 bits) · P(correct): 0.12%
RankPredictedProbBar
1." m"9.7%
2." s"5.9%
3." d"5.2%
4." h"3.6%
5." f"2.8%
Actual: " sp" (rank >5, prob 0.12%)
...and that whatever goes on the air...
Position 1474 · Loss: 6.644 nats (9.585 bits) · P(correct): 0.13%
RankPredictedProbBar
1." the"11.7%
2." it"8.1%
3." ha"7.1%
4." I"7.1%
5." we"5.5%
Actual: " go" (rank >5, prob 0.13%)
... straight from them their concerns...
Position 724 · Loss: 6.632 nats (9.569 bits) · P(correct): 0.13%
RankPredictedProbBar
1."."36.5%
2.","25.1%
3." what"5.6%
4." and"4.9%
5." about"4.4%
Actual: " their" (rank >5, prob 0.13%)
... Odyssey from the Newseum in...
Position 631 · Loss: 6.620 nats (9.551 bits) · P(correct): 0.13%
RankPredictedProbBar
1." T"9.3%
2." "7.3%
3.","6.4%
4." t"6.4%
5."."4.4%
Actual: " from" (rank >5, prob 0.13%)
...e chaos of pre-show pre...
Position 954 · Loss: 6.620 nats (9.551 bits) · P(correct): 0.13%
RankPredictedProbBar
1." the"28.8%
2." our"8.3%
3." a"3.9%
4." my"2.7%
5." b"2.4%
Actual: " pre" (rank >5, prob 0.13%)
...ing Man Dead George Jo...
Position 21 · Loss: 6.586 nats (9.502 bits) · P(correct): 0.14%
RankPredictedProbBar
1." In"11.0%
2.","8.5%
3." in"7.5%
4." A"6.7%
5." at"4.6%
Actual: " G" (rank >5, prob 0.14%)
...ared dead, someone would have to p...
Position 327 · Loss: 6.551 nats (9.451 bits) · P(correct): 0.14%
RankPredictedProbBar
1." I"44.9%
2." it"12.9%
3." is"3.7%
4." and"3.3%
5." you"2.5%
Actual: " some" (rank >5, prob 0.14%)
..., to hear straight from them their con...
Position 720 · Loss: 6.548 nats (9.447 bits) · P(correct): 0.14%
RankPredictedProbBar
1."or"95.3%
2."u"1.8%
3."r"1.1%
4."ory"0.6%
5."ate"0.3%
Actual: "ra" (rank >5, prob 0.14%)
... life on a very secure line at N...
Position 1182 · Loss: 6.492 nats (9.366 bits) · P(correct): 0.15%
RankPredictedProbBar
1." b"8.3%
2." l"7.3%
3." t"7.3%
4." long"5.7%
5." sm"5.7%
Actual: " sec" (rank >5, prob 0.15%)
... the result of a production-wide p...
Position 1603 · Loss: 6.480 nats (9.348 bits) · P(correct): 0.15%
RankPredictedProbBar
1." l"7.4%
2." m"4.5%
3." s"4.5%
4." c"4.0%
5." t"4.0%
Actual: " produ" (rank >5, prob 0.15%)
... to shave my rail-be...
Position 1590 · Loss: 6.459 nats (9.318 bits) · P(correct): 0.16%
RankPredictedProbBar
1." he"38.3%
2." ha"20.5%
3." f"9.7%
4." te"5.2%
5." s"4.0%
Actual: " " (rank >5, prob 0.16%)
... hopes to the doorstep of the...
Position 771 · Loss: 6.444 nats (9.297 bits) · P(correct): 0.16%
RankPredictedProbBar
1." f"8.7%
2." p"6.0%
3." wor"5.3%
4." people"5.3%
5." re"4.6%
Actual: " do" (rank >5, prob 0.16%)
...18 hours straight, som...
Position 918 · Loss: 6.443 nats (9.295 bits) · P(correct): 0.16%
RankPredictedProbBar
1.","34.4%
2." a"9.8%
3." on"6.0%
4." to"6.0%
5." in"5.3%
Actual: " st" (rank >5, prob 0.16%)
... ours. If you come from a count...
Position 1834 · Loss: 6.436 nats (9.285 bits) · P(correct): 0.16%
RankPredictedProbBar
1." are"21.0%
2." have"12.7%
3.""8.8%
4."'"7.7%
5." do"6.0%
Actual: " com" (rank >5, prob 0.16%)
...beard, the result of a production...
Position 1599 · Loss: 6.423 nats (9.267 bits) · P(correct): 0.16%
RankPredictedProbBar
1." s"4.2%
2." l"3.7%
3." p"3.7%
4." f"3.3%
5." c"3.3%
Actual: " res" (rank >5, prob 0.16%)
...om. Or, my personal fav...
Position 1221 · Loss: 6.419 nats (9.260 bits) · P(correct): 0.16%
RankPredictedProbBar
1." when"39.9%
2." as"6.1%
3." in"4.8%
4." on"2.5%
5." at"2.3%
Actual: " my" (rank >5, prob 0.16%)
... me, it was an appropriate but...
Position 664 · Loss: 6.401 nats (9.235 bits) · P(correct): 0.17%
RankPredictedProbBar
1." am"9.1%
2." ex"9.1%
3."other"7.1%
4." exper"7.1%
5." e"7.1%
Actual: " app" (rank >5, prob 0.17%)
... beginning the adventure of a l...
Position 597 · Loss: 6.382 nats (9.207 bits) · P(correct): 0.17%
RankPredictedProbBar
1." day"10.5%
2." "7.2%
3." ""4.4%
4." new"4.4%
5." t"3.4%
Actual: " ad" (rank >5, prob 0.17%)
...orstep of the government --...
Position 777 · Loss: 6.366 nats (9.184 bits) · P(correct): 0.17%
RankPredictedProbBar
1." wor"5.7%
2." c"4.4%
3." Wh"3.0%
4." m"3.0%
5." p"3.0%
Actual: " go" (rank >5, prob 0.17%)
... go out and ask real people what was on...
Position 705 · Loss: 6.359 nats (9.174 bits) · P(correct): 0.17%
RankPredictedProbBar
1." the"20.0%
2." for"8.3%
3." people"6.5%
4." qu"6.5%
5." a"6.5%
Actual: " re" (rank >5, prob 0.17%)
... down upon in the office. These are...
Position 1686 · Loss: 6.341 nats (9.148 bits) · P(correct): 0.18%
RankPredictedProbBar
1." m"8.5%
2." f"6.6%
3." p"4.5%
4." c"4.5%
5." s"4.5%
Actual: " off" (rank >5, prob 0.18%)
... be on top of such things. N...
Position 121 · Loss: 6.336 nats (9.141 bits) · P(correct): 0.18%
RankPredictedProbBar
1." the"26.3%
2." his"18.1%
3." it"11.0%
4." their"9.7%
5." that"5.2%
Actual: " su" (rank >5, prob 0.18%)
...clares Living Man Dead...
Position 15 · Loss: 6.309 nats (9.102 bits) · P(correct): 0.18%
RankPredictedProbBar
1."e"16.4%
2."i"12.8%
3."oss"9.9%
4."aw"7.7%
5."ic"4.1%
Actual: "iv" (rank >5, prob 0.18%)
... now I have to go -- have to ret...
Position 1558 · Loss: 6.298 nats (9.086 bits) · P(correct): 0.18%
RankPredictedProbBar
1." back"27.3%
2." through"11.4%
3." to"8.9%
4." out"6.1%
5." on"6.1%
Actual: " -" (rank >5, prob 0.18%)
... absolute chaos of pre-...
Position 950 · Loss: 6.289 nats (9.073 bits) · P(correct): 0.19%
RankPredictedProbBar
1." h"7.9%
2." wor"7.9%
3." best"7.0%
4." t"7.0%
5."ly"6.2%
Actual: " ch" (rank >5, prob 0.19%)
... have their fun, when it comes to the...
Position 1513 · Loss: 6.274 nats (9.051 bits) · P(correct): 0.19%
RankPredictedProbBar
1." they"28.0%
2." I"11.7%
3." their"10.3%
4." it"9.1%
5." the"3.8%
Actual: " when" (rank >5, prob 0.19%)
...ime, it's over. We just wra...
Position 610 · Loss: 6.245 nats (9.010 bits) · P(correct): 0.19%
RankPredictedProbBar
1." time"32.6%
2." been"6.4%
3." h"4.4%
4." al"2.7%
5." e"2.7%
Actual: " over" (rank >5, prob 0.19%)
...ations, to the fleeting sparkle...
Position 967 · Loss: 6.244 nats (9.008 bits) · P(correct): 0.19%
RankPredictedProbBar
1."act"47.5%
2."ear"9.4%
3."ail"5.7%
4."un"3.0%
5."re"2.7%
Actual: "le" (rank >5, prob 0.19%)
....C., and for me, it was an...
Position 658 · Loss: 6.235 nats (8.995 bits) · P(correct): 0.20%
RankPredictedProbBar
1." we"9.4%
2." it"5.1%
3." he"4.5%
4." w"4.5%
5." the"4.5%
Actual: " for" (rank >5, prob 0.20%)
... every morning. Never was this more o...
Position 1357 · Loss: 6.233 nats (8.992 bits) · P(correct): 0.20%
RankPredictedProbBar
1." The"13.8%
2." And"9.5%
3." I"8.3%
4." S"5.7%
5." O"5.1%
Actual: " Ne" (rank >5, prob 0.20%)
... word. But now I have to go -...
Position 1553 · Loss: 6.140 nats (8.858 bits) · P(correct): 0.22%
RankPredictedProbBar
1." they"13.3%
2." I"8.1%
3." the"6.3%
4.","4.9%
5." it"3.8%
Actual: " now" (rank >5, prob 0.22%)
... Living Man Dead George...
Position 19 · Loss: 6.100 nats (8.800 bits) · P(correct): 0.22%
RankPredictedProbBar
1."age"13.9%
2."ag"9.5%
3."u"8.4%
4.""5.1%
5." in"3.5%
Actual: " De" (rank >5, prob 0.22%)
... have to wash some extremely...
Position 1632 · Loss: 6.092 nats (8.789 bits) · P(correct): 0.23%
RankPredictedProbBar
1." of"18.0%
2." cl"5.1%
3." ha"4.0%
4."one"3.5%
5." s"3.5%
Actual: " ex" (rank >5, prob 0.23%)
...asons to miss dragging my...
Position 1699 · Loss: 6.081 nats (8.773 bits) · P(correct): 0.23%
RankPredictedProbBar
1." the"38.5%
2." this"16.0%
3." it"3.6%
4." my"3.2%
5." a"2.8%
Actual: " d" (rank >5, prob 0.23%)
... have to go -- have to return to...
Position 1560 · Loss: 6.066 nats (8.751 bits) · P(correct): 0.23%
RankPredictedProbBar
1." and"12.7%
2." I"12.7%
3." to"3.6%
4." even"2.5%
5." the"2.5%
Actual: " have" (rank >5, prob 0.23%)
... the train with two roomates. Or...
Position 1164 · Loss: 6.053 nats (8.732 bits) · P(correct): 0.24%
RankPredictedProbBar
1." of"16.5%
2." other"14.5%
3." people"5.3%
4." fr"5.3%
5." m"3.3%
Actual: " ro" (rank >5, prob 0.24%)
...ulous game of Monopoly I...
Position 1257 · Loss: 6.035 nats (8.707 bits) · P(correct): 0.24%
RankPredictedProbBar
1." the"40.2%
2." all"21.5%
3." their"11.5%
4." his"1.8%
5." a"1.4%
Actual: " M" (rank >5, prob 0.24%)
... Monopoly I've ever se...
Position 1262 · Loss: 6.027 nats (8.694 bits) · P(correct): 0.24%
RankPredictedProbBar
1."."13.2%
2.","10.3%
3." in"10.3%
4." -"9.1%
5." on"9.1%
Actual: " I" (rank >5, prob 0.24%)
...ed a letter addressedTo...
Position 67 · Loss: 6.017 nats (8.681 bits) · P(correct): 0.24%
RankPredictedProbBar
1." from"41.0%
2." of"17.1%
3." in"4.3%
4." s"3.4%
5." that"3.0%
Actual: " add" (rank >5, prob 0.24%)
... I'll take a train.<s>By...
Position 1779 · Loss: 6.010 nats (8.671 bits) · P(correct): 0.25%
RankPredictedProbBar
1." b"11.8%
2." fe"8.1%
3." m"8.1%
4." l"6.3%
5." look"5.6%
Actual: " tra" (rank >5, prob 0.25%)
...ome an International member. Here...
Position 1864 · Loss: 5.995 nats (8.648 bits) · P(correct): 0.25%
RankPredictedProbBar
1." A"15.4%
2." M"9.3%
3." C"9.3%
4." P"4.4%
5." E"3.9%
Actual: " m" (rank >5, prob 0.25%)
...ick Klein and me as props -...
Position 1128 · Loss: 5.992 nats (8.645 bits) · P(correct): 0.25%
RankPredictedProbBar
1." J"8.3%
2." M"6.4%
3." the"6.4%
4." R"5.7%
5." B"5.0%
Actual: " me" (rank >5, prob 0.25%)
...ake up calls being the main of...
Position 863 · Loss: 5.988 nats (8.638 bits) · P(correct): 0.25%
RankPredictedProbBar
1.","37.2%
2.")"10.7%
3." and"9.4%
4.")."5.0%
5." to"3.5%
Actual: " be" (rank >5, prob 0.25%)
... on the train with two roomates. O...
Position 1163 · Loss: 5.980 nats (8.627 bits) · P(correct): 0.25%
RankPredictedProbBar
1." the"17.7%
2." us"12.2%
3." them"5.8%
4." his"4.5%
5." a"4.5%
Actual: " two" (rank >5, prob 0.25%)
...yssey from the Newseum in the n...
Position 633 · Loss: 5.974 nats (8.618 bits) · P(correct): 0.25%
RankPredictedProbBar
1." "4.5%
2." b"3.5%
3." G"2.7%
4." p"2.7%
5." s"2.4%
Actual: " New" (rank >5, prob 0.25%)
...ons to miss dragging mys...
Position 1700 · Loss: 5.937 nats (8.565 bits) · P(correct): 0.26%
RankPredictedProbBar
1."anc"21.0%
2."r"14.4%
3."in"12.7%
4."uring"8.7%
5."ance"7.7%
Actual: "ra" (rank >5, prob 0.26%)
... of our little Odyssey from...
Position 626 · Loss: 5.934 nats (8.560 bits) · P(correct): 0.26%
RankPredictedProbBar
1." t"7.7%
2." b"6.0%
3." ad"4.1%
4." s"4.1%
5." c"3.2%
Actual: " O" (rank >5, prob 0.26%)
... their thoughts, problems and h...
Position 761 · Loss: 5.932 nats (8.557 bits) · P(correct): 0.27%
RankPredictedProbBar
1." their"16.4%
2." and"11.3%
3." fe"7.8%
4." con"3.2%
5." o"3.2%
Actual: " pro" (rank >5, prob 0.27%)
... Manitoba Public Insur...
Position 146 · Loss: 5.925 nats (8.548 bits) · P(correct): 0.27%
RankPredictedProbBar
1." had"35.0%
2." was"16.5%
3.""7.8%
4." were"3.7%
5." d"3.3%
Actual: " P" (rank >5, prob 0.27%)
... the Canadian man received a...
Position 58 · Loss: 5.916 nats (8.535 bits) · P(correct): 0.27%
RankPredictedProbBar
1." go"16.7%
2." F"3.7%
3." P"3.7%
4." C"3.3%
5." p"3.3%
Actual: " man" (rank >5, prob 0.27%)
...ometime. We were told to get back...
Position 1754 · Loss: 5.909 nats (8.525 bits) · P(correct): 0.27%
RankPredictedProbBar
1."'"51.7%
2." can"6.2%
3." will"4.8%
4." have"4.8%
5." m"4.3%
Actual: " were" (rank >5, prob 0.27%)
... behind the mix up?<s>Ex...
Position 520 · Loss: 5.866 nats (8.462 bits) · P(correct): 0.28%
RankPredictedProbBar
1."ur"28.9%
2."ind"12.0%
3."at"8.3%
4."ess"4.4%
5."is"4.4%
Actual: "ix" (rank >5, prob 0.28%)
... been blearily working for 18...
Position 909 · Loss: 5.848 nats (8.436 bits) · P(correct): 0.29%
RankPredictedProbBar
1." s"5.1%
2." in"4.5%
3." d"3.5%
4." b"3.5%
5." w"3.5%
Actual: " work" (rank >5, prob 0.29%)
... hard. But far more moving...
Position 1302 · Loss: 5.846 nats (8.433 bits) · P(correct): 0.29%
RankPredictedProbBar
1." it"8.5%
2." I"8.5%
3." the"8.5%
4.","4.0%
5." we"3.5%
Actual: " f" (rank >5, prob 0.29%)
... pension and other government ben...
Position 396 · Loss: 5.839 nats (8.423 bits) · P(correct): 0.29%
RankPredictedProbBar
1." p"7.5%
2." l"6.6%
3." ins"5.9%
4." d"5.2%
5." b"4.6%
Actual: " go" (rank >5, prob 0.29%)
...merican people to whom they bring the...
Position 1344 · Loss: 5.838 nats (8.423 bits) · P(correct): 0.29%
RankPredictedProbBar
1."day"14.0%
2." the"6.6%
3." be"5.2%
4." make"4.0%
5." see"3.5%
Actual: " wh" (rank >5, prob 0.29%)
...wide pact to not shave for the...
Position 1611 · Loss: 5.814 nats (8.387 bits) · P(correct): 0.30%
RankPredictedProbBar
1." c"4.7%
2." br"4.7%
3." make"4.1%
4." re"4.1%
5." p"3.2%
Actual: " not" (rank >5, prob 0.30%)
... to not shave for the duration of...
Position 1615 · Loss: 5.804 nats (8.373 bits) · P(correct): 0.30%
RankPredictedProbBar
1." my"65.1%
2." the"6.9%
3." a"2.9%
4." your"2.0%
5." it"1.7%
Actual: " for" (rank >5, prob 0.30%)
... dance parties that erupted on...
Position 1664 · Loss: 5.799 nats (8.366 bits) · P(correct): 0.30%
RankPredictedProbBar
1." I"12.9%
2." are"8.9%
3." we"5.4%
4." have"5.4%
5." were"4.7%
Actual: " " (rank >5, prob 0.30%)
...unities. We cant do it without...
Position 1983 · Loss: 5.786 nats (8.347 bits) · P(correct): 0.31%
RankPredictedProbBar
1." also"14.8%
2." help"14.8%
3." prov"5.4%
4." off"3.3%
5."n"2.9%
Actual: "" (rank >5, prob 0.31%)
... the confusion for confidentially...
Position 491 · Loss: 5.777 nats (8.335 bits) · P(correct): 0.31%
RankPredictedProbBar
1."."75.8%
2.","2.6%
3." that"2.3%
4." in"1.6%
5." and"1.6%
Actual: " for" (rank >5, prob 0.31%)
...is and Sam are professionals in...
Position 1536 · Loss: 5.764 nats (8.316 bits) · P(correct): 0.31%
RankPredictedProbBar
1." the"9.2%
2." all"5.6%
3."n"3.8%
4." not"3.4%
5." a"3.0%
Actual: " pro" (rank >5, prob 0.31%)
...urance Company Declares L...
Position 9 · Loss: 5.758 nats (8.307 bits) · P(correct): 0.32%
RankPredictedProbBar
1." of"17.2%
2.","4.4%
3." L"4.4%
4." A"3.9%
5." S"3.0%
Actual: " De" (rank >5, prob 0.32%)
... feeling that the spontaneous d...
Position 1654 · Loss: 5.727 nats (8.262 bits) · P(correct): 0.33%
RankPredictedProbBar
1." t"5.8%
2." new"5.1%
3." p"3.5%
4." m"3.1%
5." f"3.1%
Actual: " sp" (rank >5, prob 0.33%)
... me right now. Six days after bo...
Position 555 · Loss: 5.711 nats (8.239 bits) · P(correct): 0.33%
RankPredictedProbBar
1."o"26.3%
2."ome"16.0%
3."om"14.1%
4."he"3.6%
5."in"2.8%
Actual: "ix" (rank >5, prob 0.33%)
... to return to "normal"...
Position 1566 · Loss: 5.697 nats (8.219 bits) · P(correct): 0.34%
RankPredictedProbBar
1." the"34.2%
2." my"7.6%
3." work"5.3%
4." W"4.6%
5." a"3.6%
Actual: " "" (rank >5, prob 0.34%)
... to do it again sometime. We...
Position 1748 · Loss: 5.684 nats (8.200 bits) · P(correct): 0.34%
RankPredictedProbBar
1."."23.8%
2.","9.9%
3." in"9.9%
4." ne"9.9%
5." so"6.8%
Actual: " s" (rank >5, prob 0.34%)
...es up. But maybe we'll...
Position 1734 · Loss: 5.663 nats (8.171 bits) · P(correct): 0.35%
RankPredictedProbBar
1." I"18.9%
2." the"6.2%
3." it"5.4%
4.","4.2%
5." that"3.7%
Actual: " may" (rank >5, prob 0.35%)
... Tour train, and beginning the...
Position 591 · Loss: 5.656 nats (8.160 bits) · P(correct): 0.35%
RankPredictedProbBar
1." I"51.9%
2." we"6.2%
3." my"4.8%
4." the"4.8%
5." a"3.3%
Actual: " and" (rank >5, prob 0.35%)
... they discuss the content of the ne...
Position 1397 · Loss: 5.640 nats (8.137 bits) · P(correct): 0.36%
RankPredictedProbBar
1."ed"98.5%
2." the" ← actual0.4%
3."ing"0.2%
4." a"0.1%
5." their"0.1%
... the anchors was their care for the A...
Position 1333 · Loss: 5.605 nats (8.086 bits) · P(correct): 0.37%
RankPredictedProbBar
1." the"33.1%
2." a"7.4%
3." that"4.0%
4." S"2.7%
5." not"2.1%
Actual: " their" (rank >5, prob 0.37%)
...in. Or when Diane, Rob...
Position 1103 · Loss: 5.582 nats (8.053 bits) · P(correct): 0.38%
RankPredictedProbBar
1." we"26.4%
2." the"20.6%
3." I"14.1%
4." it"5.2%
5." a"3.1%
Actual: " D" (rank >5, prob 0.38%)
... his life on a very secure line at...
Position 1181 · Loss: 5.575 nats (8.043 bits) · P(correct): 0.38%
RankPredictedProbBar
1." m"11.1%
2." tra"8.6%
3." t"5.2%
4." ro"4.6%
5." b"3.2%
Actual: " very" (rank >5, prob 0.38%)
...C., and for me, it was an app...
Position 659 · Loss: 5.571 nats (8.038 bits) · P(correct): 0.38%
RankPredictedProbBar
1." the"23.5%
2."m"8.7%
3." a"8.7%
4."t"7.6%
5."g"5.9%
Actual: " me" (rank >5, prob 0.38%)
... when we had been blearily working for...
Position 906 · Loss: 5.542 nats (7.995 bits) · P(correct): 0.39%
RankPredictedProbBar
1."o"18.9%
2."rough"7.9%
3."om"7.0%
4."itt"6.1%
5."ook"5.4%
Actual: "le" (rank >5, prob 0.39%)
... crazy, grueling and comp...
Position 1011 · Loss: 5.540 nats (7.993 bits) · P(correct): 0.39%
RankPredictedProbBar
1." and"24.3%
2." but"5.4%
3." to"2.9%
4." s"2.9%
5." the"2.0%
Actual: " g" (rank >5, prob 0.39%)
... their care for the American people to...
Position 1338 · Loss: 5.538 nats (7.989 bits) · P(correct): 0.39%
RankPredictedProbBar
1." c"4.8%
2." people"4.8%
3." m"4.2%
4." t"4.2%
5." p"3.3%
Actual: " A" (rank >5, prob 0.39%)
... bring the news every morning. Ne...
Position 1352 · Loss: 5.533 nats (7.982 bits) · P(correct): 0.40%
RankPredictedProbBar
1."."35.6%
2.","13.1%
3." and"10.2%
4." to"10.2%
5." -"3.3%
Actual: " every" (rank >5, prob 0.40%)
... that cramped studio on rail...
Position 1715 · Loss: 5.532 nats (7.981 bits) · P(correct): 0.40%
RankPredictedProbBar
1.","9.0%
2." tra"8.0%
3." l"5.5%
4." ro"4.8%
5." t"3.8%
Actual: " stud" (rank >5, prob 0.40%)
... to share his tiny room on the...
Position 1154 · Loss: 5.525 nats (7.971 bits) · P(correct): 0.40%
RankPredictedProbBar
1."al"35.9%
2."ri"27.9%
3."our"11.6%
4."ear"2.3%
5."re"1.8%
Actual: "in" (rank >5, prob 0.40%)
...ersweet because I honestly did...
Position 808 · Loss: 5.511 nats (7.951 bits) · P(correct): 0.40%
RankPredictedProbBar
1." was"22.1%
2."'"8.1%
3." had"6.3%
4." th"4.9%
5." f"4.3%
Actual: " h" (rank >5, prob 0.40%)
...'s over. We just wrapped the...
Position 613 · Loss: 5.508 nats (7.946 bits) · P(correct): 0.41%
RankPredictedProbBar
1."'"32.2%
2." are"5.6%
3." have"4.9%
4." were"3.9%
5." had"3.9%
Actual: " just" (rank >5, prob 0.41%)
...s straight, something or some...
Position 922 · Loss: 5.507 nats (7.945 bits) · P(correct): 0.41%
RankPredictedProbBar
1." I"28.4%
2." we"17.3%
3." the"13.4%
4." it"9.2%
5." there"4.4%
Actual: " s" (rank >5, prob 0.41%)
...cussion and know firsthand that wh...
Position 1466 · Loss: 5.500 nats (7.936 bits) · P(correct): 0.41%
RankPredictedProbBar
1." to"22.3%
2." I"3.0%
3." the"3.0%
4.","2.4%
5." how"2.4%
Actual: " know" (rank >5, prob 0.41%)
....” Now the 59-year-...
Position 373 · Loss: 5.495 nats (7.928 bits) · P(correct): 0.41%
RankPredictedProbBar
1."est"88.7%
2."2"1.4%
3."1"1.1%
4."ide"0.9%
5."ve"0.8%
Actual: "5" (rank >5, prob 0.41%)
...isten in on this discussion and...
Position 1460 · Loss: 5.485 nats (7.913 bits) · P(correct): 0.41%
RankPredictedProbBar
1." m"7.3%
2." show"5.7%
3."."4.5%
4." t"4.5%
5." p"3.1%
Actual: " dis" (rank >5, prob 0.41%)
... was dead, but cryptically can...
Position 470 · Loss: 5.477 nats (7.901 bits) · P(correct): 0.42%
RankPredictedProbBar
1." they"37.7%
2." the"10.8%
3." it"4.0%
4." that"4.0%
5." he"2.7%
Actual: " c" (rank >5, prob 0.42%)
...ive emails about human rights c...
Position 1880 · Loss: 5.474 nats (7.897 bits) · P(correct): 0.42%
RankPredictedProbBar
1." the"17.8%
2." how"8.4%
3." your"8.4%
4." our"6.6%
5." up"4.5%
Actual: " h" (rank >5, prob 0.42%)
... discuss the content of the next...
Position 1398 · Loss: 5.465 nats (7.885 bits) · P(correct): 0.42%
RankPredictedProbBar
1." new"7.5%
2." l"4.0%
3." p"3.5%
4." f"3.5%
5." c"3.5%
Actual: " cont" (rank >5, prob 0.42%)
... clothes. I also have a feeling...
Position 1646 · Loss: 5.445 nats (7.856 bits) · P(correct): 0.43%
RankPredictedProbBar
1." have"64.1%
2."'"12.6%
3." don"3.6%
4." can"3.2%
5." am"1.7%
Actual: " also" (rank >5, prob 0.43%)
... we pulled off what had never been d...
Position 1063 · Loss: 5.440 nats (7.848 bits) · P(correct): 0.43%
RankPredictedProbBar
1." the"39.1%
2." our"30.4%
3." a"12.7%
4." an"2.5%
5." some"1.3%
Actual: " what" (rank >5, prob 0.43%)
...ve my rail-beard, the...
Position 1593 · Loss: 5.423 nats (7.824 bits) · P(correct): 0.44%
RankPredictedProbBar
1."ro"45.0%
2."s"24.1%
3."ing"7.8%
4." sh"1.8%
5." and"1.5%
Actual: "-" (rank >5, prob 0.44%)
...lein and me as props -- to...
Position 1130 · Loss: 5.407 nats (7.801 bits) · P(correct): 0.45%
RankPredictedProbBar
1." the"16.8%
2." their"11.6%
3." a"9.0%
4." our"7.0%
5." gu"4.3%
Actual: " pro" (rank >5, prob 0.45%)
... you will receive emails about hum...
Position 1876 · Loss: 5.400 nats (7.790 bits) · P(correct): 0.45%
RankPredictedProbBar
1." a"28.0%
2." an"13.2%
3." your"7.1%
4." the"7.1%
5.":"4.9%
Actual: " em" (rank >5, prob 0.45%)
... Insurance was giving Joh...
Position 154 · Loss: 5.392 nats (7.778 bits) · P(correct): 0.46%
RankPredictedProbBar
1."n"31.9%
2." not"9.2%
3." a"3.8%
4." the"3.0%
5." st"2.6%
Actual: " g" (rank >5, prob 0.46%)
... up -- using Rick Klein and...
Position 1122 · Loss: 5.388 nats (7.773 bits) · P(correct): 0.46%
RankPredictedProbBar
1." the"28.3%
2." their"15.1%
3." a"11.8%
4." our"2.3%
5." all"1.8%
Actual: " R" (rank >5, prob 0.46%)
...vince Sam that he was suppos...
Position 1142 · Loss: 5.382 nats (7.764 bits) · P(correct): 0.46%
RankPredictedProbBar
1." and"19.6%
2."ant"7.2%
3." to"4.9%
4." R"4.4%
5." S"4.4%
Actual: " that" (rank >5, prob 0.46%)
... Sam that he was supposed to sh...
Position 1145 · Loss: 5.381 nats (7.764 bits) · P(correct): 0.46%
RankPredictedProbBar
1." the"10.5%
2."n"9.2%
3." go"7.2%
4." a"5.0%
5." in"3.4%
Actual: " su" (rank >5, prob 0.46%)
...austed and euphoric. Th...
Position 532 · Loss: 5.379 nats (7.761 bits) · P(correct): 0.46%
RankPredictedProbBar
1."ating"25.2%
2."ag"9.3%
3."l"8.2%
4."nd"7.2%
5."c"6.4%
Actual: "up" (rank >5, prob 0.46%)
...t express how inspiring it was...
Position 1447 · Loss: 5.375 nats (7.754 bits) · P(correct): 0.46%
RankPredictedProbBar
1." much"22.3%
2." h"6.4%
3." g"5.6%
4." s"5.6%
5." ex"3.4%
Actual: " ins" (rank >5, prob 0.46%)
... train. Or when Diane, R...
Position 1102 · Loss: 5.372 nats (7.750 bits) · P(correct): 0.46%
RankPredictedProbBar
1." the"47.4%
2.","9.3%
3." may"5.7%
4."ig"3.0%
5." a"2.7%
Actual: " when" (rank >5, prob 0.46%)
... opportunities to take action for hum...
Position 1909 · Loss: 5.369 nats (7.745 bits) · P(correct): 0.47%
RankPredictedProbBar
1." help"5.7%
2." en"5.0%
3." su"5.0%
4." work"4.4%
5." re"3.9%
Actual: " take" (rank >5, prob 0.47%)
... Newseum in the nation's cap...
Position 638 · Loss: 5.362 nats (7.736 bits) · P(correct): 0.47%
RankPredictedProbBar
1." S"5.7%
2." m"5.7%
3." P"3.9%
4." C"3.5%
5." c"3.5%
Actual: " n" (rank >5, prob 0.47%)
...bration in Massachusetts after...
Position 1052 · Loss: 5.352 nats (7.721 bits) · P(correct): 0.47%
RankPredictedProbBar
1."an"12.2%
2."ay"6.5%
3."ont"5.8%
4."i"5.1%
5."ad"4.5%
Actual: "ass" (rank >5, prob 0.47%)
... was to listen in on this discus...
Position 1457 · Loss: 5.347 nats (7.715 bits) · P(correct): 0.48%
RankPredictedProbBar
1." to"90.7%
2." and"2.7%
3."."1.9%
4.","0.9%
5." in" ← actual0.5%
... complicated, but most of all, fun...
Position 1021 · Loss: 5.319 nats (7.674 bits) · P(correct): 0.49%
RankPredictedProbBar
1." it"11.2%
2." I"7.7%
3." the"7.7%
4." not"3.2%
5." also"2.8%
Actual: " most" (rank >5, prob 0.49%)
... pre-show preparations, to...
Position 959 · Loss: 5.306 nats (7.654 bits) · P(correct): 0.50%
RankPredictedProbBar
1."s"12.8%
2." t"6.9%
3." p"3.7%
4." to"3.2%
5." m"2.2%
Actual: " pre" (rank >5, prob 0.50%)
...ramped studio on rails well...
Position 1717 · Loss: 5.304 nats (7.652 bits) · P(correct): 0.50%
RankPredictedProbBar
1." a"7.8%
2." b"6.1%
3." tra"5.3%
4." l"5.3%
5." t"5.3%
Actual: " on" (rank >5, prob 0.50%)
... an anchors' meeting where they dis...
Position 1388 · Loss: 5.281 nats (7.619 bits) · P(correct): 0.51%
RankPredictedProbBar
1." off"11.6%
2." ro"8.0%
3." b"8.0%
4." h"6.2%
5." l"4.8%
Actual: " me" (rank >5, prob 0.51%)
... supposed to share his tiny...
Position 1150 · Loss: 5.275 nats (7.611 bits) · P(correct): 0.51%
RankPredictedProbBar
1." be"46.1%
2." have"4.9%
3." go"3.3%
4." get"3.3%
5." do"2.6%
Actual: " sh" (rank >5, prob 0.51%)
...one would have to present a death c...
Position 333 · Loss: 5.268 nats (7.601 bits) · P(correct): 0.52%
RankPredictedProbBar
1."ay"67.5%
2."ut"13.3%
3."ass"9.1%
4."ull"2.3%
5."ick"1.8%
Actual: "res" (rank >5, prob 0.52%)
...ed, cried from laughing...
Position 1290 · Loss: 5.240 nats (7.559 bits) · P(correct): 0.53%
RankPredictedProbBar
1." out"22.5%
2." and"12.1%
3."."10.7%
4.","7.3%
5." in"3.9%
Actual: " from" (rank >5, prob 0.53%)
... fun. Some moments I'll...
Position 1030 · Loss: 5.223 nats (7.535 bits) · P(correct): 0.54%
RankPredictedProbBar
1." of"26.0%
2."one"22.9%
3."h"20.2%
4."w"5.8%
5." people"5.1%
Actual: " m" (rank >5, prob 0.54%)
...raight from them their concerns about...
Position 725 · Loss: 5.221 nats (7.533 bits) · P(correct): 0.54%
RankPredictedProbBar
1." own"7.5%
2." m"6.6%
3." th"5.1%
4." p"5.1%
5." st"4.5%
Actual: " con" (rank >5, prob 0.54%)
... capitol, Washington D...
Position 647 · Loss: 5.219 nats (7.529 bits) · P(correct): 0.54%
RankPredictedProbBar
1." and"33.5%
2." where"4.0%
3." the"2.8%
4." in"2.4%
5." a"2.4%
Actual: " W" (rank >5, prob 0.54%)
... it came from his insurance company,...
Position 106 · Loss: 5.216 nats (7.525 bits) · P(correct): 0.54%
RankPredictedProbBar
1." p"9.6%
2." m"7.5%
3." w"7.5%
4." f"5.8%
5." s"5.1%
Actual: " ins" (rank >5, prob 0.54%)
...amped studio on rails well be...
Position 1718 · Loss: 5.204 nats (7.507 bits) · P(correct): 0.55%
RankPredictedProbBar
1." the"43.7%
2." a"12.5%
3."ce"7.6%
4." my"4.1%
5." S"2.2%
Actual: " " (rank >5, prob 0.55%)
... nation. By ending in Was...
Position 737 · Loss: 5.203 nats (7.506 bits) · P(correct): 0.55%
RankPredictedProbBar
1." the"43.7%
2." this"2.5%
3." t"2.2%
4." now"1.9%
5." that"1.7%
Actual: " e" (rank >5, prob 0.55%)
...ing the main offender), but...
Position 869 · Loss: 5.198 nats (7.499 bits) · P(correct): 0.55%
RankPredictedProbBar
1." the"34.2%
2." my"30.2%
3." our"8.6%
4." this"3.2%
5." it"3.2%
Actual: "fe" (rank >5, prob 0.55%)
... secure line at Niagara F...
Position 1187 · Loss: 5.195 nats (7.495 bits) · P(correct): 0.55%
RankPredictedProbBar
1." the"44.0%
2." a"18.4%
3." his"3.2%
4." an"2.8%
5." M"1.9%
Actual: " N" (rank >5, prob 0.55%)
...- to the people that can do something...
Position 787 · Loss: 5.173 nats (7.463 bits) · P(correct): 0.57%
RankPredictedProbBar
1." were"8.9%
2." we"7.8%
3." are"6.9%
4." have"6.1%
5." they"4.2%
Actual: " can" (rank >5, prob 0.57%)
...nding to our tale. Appro...
Position 681 · Loss: 5.172 nats (7.462 bits) · P(correct): 0.57%
RankPredictedProbBar
1."ri"57.9%
2."our"35.1%
3."re"2.9%
4."en"0.9%
5."al" ← actual0.6%

Side-by-Side Generation

50-token prompt, 200-token continuation (temp=0.8, seed=42). Highlighted = token matches at same position.

Position matches: 3/200 (1.5%)Unique token overlap: 48 (39% of real)
Prompt (50 tokens)
<s>Insurance Company Declares Living Man Dead George Johannesen is very much alive. Which is why it was so surpr
Real continuation
ising when the Canadian man received a letter addressedTo the Estate of George Johannesen.” Even more surprising is that it came from his insurance company, who should really be on top of such things. Now this wouldnt have been so terrible if Manitoba Public Insurance was giving Johannesens estate a fat check for his passing away. But thats not what happened. Instead the letter was to inform the estate that, since George was dead, his driver license and auto insurance had been cancelled in October
Model generation
ising to be insurance company. Living Man? Maybe. All Living Man? It was a miracle. However, right now, Living Man is still dead. Well, why is he not living? Living man Rachel Drobles is a person whose life is tested by the techniques of the car industry. Like all techniques, the life of Rachel is tested by the astounding things she has become. Therefore, whether it is a mistake, or in-residence, or a mistake happening, is perpetuated. Therefore, Lyon Man is dead. Can Living Man H

Next Steps

How to turn these findings into experiments

1. Mixed-precision GPTQ guided by sensitivity data

The per-matrix sensitivity map gives a direct bit-budget recipe. MLP matrices (6,247 × 10⁻⁶ total) need 7–8 bits; attention matrices (1,600 × 10⁻⁶ total) can survive at 4–5 bits. This should recover most of the 0.0083 BPB full-model int6 penalty while keeping the model smaller than uniform high-precision.

2. Stable-rank-guided bit allocation

Generalize beyond hand-tuned bit budgets: for each matrix, set bits proportional to effective utilization (stable rank / full rank) × dimensionality. Matrices with more active channels need finer precision. This can be computed from a single SVD pass before GPTQ calibration begins.

3. Layer 10 narrowing experiment

Layer 10 is the most quantization-sensitive layer (+1,057 × 10⁻⁶) but contributes only −0.47 bits/token via logit lens — the least of any layer. Experiment: reduce L10's hidden dimension and measure whether the freed parameter budget can be reallocated to more productive layers. This requires retraining, not just post-hoc pruning.

4. Skip connection value audit

The logit lens shows encoder layers 3–4 sacrifice readability to prepare representations for their skip connections (enc 3→dec 6, enc 4→dec 5). This cost is visible but the benefit is only inferred. Ablation: zero out individual skip connections during eval to directly measure each one's contribution to final BPB. If any skip adds less than the readability it costs, it's a candidate for removal.

5. Scale-conditional architecture decisions

The attention head analysis shows this 27M-parameter model has essentially no in-context learning — only 2 marginal induction heads. This means interventions that rely on in-context copying (retrieval augmentation, few-shot prompting strategies) are premature at this scale. When scaling up, monitor induction head emergence to know when those techniques become viable.

What to Do Differently

  1. Give MLP more GPTQ bits. MLP is 4× more sensitive than all attention matrices combined (6,247 vs 1,600 × 10⁻⁶), despite Q and Out having up to 3,000× higher condition numbers.
  2. Focus on word prediction. 94% of bytes, 90% of loss. Rare token types are harder per byte but only 10% of total loss.
  3. Don't bother with calibration tuning. ECE = 0.24% across 62M tokens. The bottleneck is accuracy, not confidence.
  4. Investigate narrowing layer 10. It contributes only −0.47 bits/token via logit lens but is the most quantization-sensitive layer. Needs empirical validation.

11L GPT, 512d, XSA-all, BigramHash 3072×112, Parallel Muon

Trained seed 314, 2×H100 SXM, 7000 steps · Post-EMA val BPB: 1.1338