Add Olmo3 implementation #16015

2015aroras · 2025-09-15T17:52:37Z

This PR adds the upcoming Olmo 3. The main architectural differences from Olmo 2 are:

Sliding window attention is used for 3 out of 4 layers. RoPE scaling is not applied to sliding window attention layers.

Since the architecture is very similar to Olmo 2, this PR opts to merge Olmo 3 changes into the Olmo 2 implementation (similar to vllm-project/vllm#24534). I can create a separate Olmo 3 implementation instead if preferred.

2015aroras · 2025-09-15T20:12:49Z

I used the model conversion example for testing. I got the following results when using bf16 on shanearora/2025-sep-a-base-model, modified to have yarn rope scaling enabled.

📈 METRICS
==============================
MSE (Mean Squared Error):     1.592396e-02
Reference Variance:           6.831117e+00
NMSE:                         2.331092e-03
Max Absolute Error:           0.438750
Mean Absolute Error:          0.116665
NMSE (dB):                    -26.32 dB

🎯 INTERPRETATION
==============================
👍 Good match

📋 GUIDANCE
==============================
👍 GOOD: Your GGML conversion is working well.
   Small differences are likely due to precision/quantization.

📚 NMSE BENCHMARKS
==============================
✅ RESULT: PASS (NMSE = 2.33e-03)

Also, below is the allenai/OLMo-2-0425-1B with fp32.

📈 METRICS
==============================
MSE (Mean Squared Error):     1.594746e-03
Reference Variance:           9.219801e+00
NMSE:                         1.729697e-04
Max Absolute Error:           0.168732
Mean Absolute Error:          0.033951
NMSE (dB):                    -37.62 dB

🎯 INTERPRETATION
==============================
👍 Very good match

📋 GUIDANCE
==============================
✅ EXCELLENT: Your GGML conversion is working very well!
   The differences are negligible for practical use.

📚 NMSE BENCHMARKS
==============================
✅ RESULT: PASS (NMSE = 1.73e-04)

CISC · 2025-09-15T21:38:44Z

src/llama-model.cpp

+                if (is_swa) {
+                    // For sliding window layers, Olmo3 does not use rope scaling.
+                    // This is achieved here by setting freq_scale and attn_factor to 1.
+                    // We also set ext_factor to 0 to avoid a few unnecessary computations.
+                    Qcur = ggml_rope_ext(
+                        ctx0, Qcur, inp_pos, nullptr,
+                        n_rot, rope_type, n_ctx_orig, freq_base, 1.0,
+                        0.0, 1.0, beta_fast, beta_slow
+                        );
+
+                    Kcur = ggml_rope_ext(
+                        ctx0, Kcur, inp_pos, nullptr,
+                        n_rot, rope_type, n_ctx_orig, freq_base, 1.0,
+                        0.0, 1.0, beta_fast, beta_slow
+                        );
+                }
+                else {


Suggested change

if (is_swa) {

// For sliding window layers, Olmo3 does not use rope scaling.

// This is achieved here by setting freq_scale and attn_factor to 1.

// We also set ext_factor to 0 to avoid a few unnecessary computations.

Qcur = ggml_rope_ext(

ctx0, Qcur, inp_pos, nullptr,

n_rot, rope_type, n_ctx_orig, freq_base, 1.0,

0.0, 1.0, beta_fast, beta_slow

);

Kcur = ggml_rope_ext(

ctx0, Kcur, inp_pos, nullptr,

n_rot, rope_type, n_ctx_orig, freq_base, 1.0,

0.0, 1.0, beta_fast, beta_slow

);

}

else {

if (!is_swa) {

This if block is needed. For SWA layers Olmo2 uses standard rope. Removing that if block would remove rope on SWA layers.

6997fad Clarified comment slightly

src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

2015aroras added 2 commits September 15, 2025 10:30

Add HF to gguf conversion logic for Olmo3

bd1f3e5

Add Olmo3 implementation

a5f19bb

github-actions bot added the python python script changes label Sep 15, 2025

2015aroras marked this pull request as ready for review September 15, 2025 20:08

CISC approved these changes Sep 15, 2025

View reviewed changes

2015aroras and others added 2 commits September 15, 2025 15:13

Update rope comment

6997fad

Fix indentation

aea35e8

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Olmo3 implementation #16015

Add Olmo3 implementation #16015

2015aroras commented Sep 15, 2025 •

edited

Loading

Uh oh!

2015aroras commented Sep 15, 2025 •

edited

Loading

Uh oh!

CISC Sep 15, 2025

Uh oh!

2015aroras Sep 15, 2025

Uh oh!

2015aroras Sep 15, 2025

Uh oh!

Uh oh!

Uh oh!

Add Olmo3 implementation #16015

Are you sure you want to change the base?

Add Olmo3 implementation #16015

Conversation

2015aroras commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

2015aroras commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

2015aroras Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

2015aroras Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

2015aroras commented Sep 15, 2025 •

edited

Loading

2015aroras commented Sep 15, 2025 •

edited

Loading