#13609: Uplift dram and l1 allocators to use dram/l1 specific alignment #13762

abhullar-tt · 2024-10-11T21:17:02Z

Ticket

Problem description

Using the max of DRAM and L1 alignment for both DRAM and L1 buffers was causing pcc mismatches in i2s and s2i.

What's changed

Use L1/DRAM specific alignment for respective allocations. This will require some ops to be uplifted to handle re-alignment
@yugaoTT and @ntarafdar to add corresponding op changes

Checklist

Below post commits were triggered 12/03

…t issue

Because he asked me to

bbradelTT · 2024-12-20T16:37:45Z

ttnn/cpp/ttnn/operations/data_movement/pad/device/pad_program_factory.cpp

@@ -1653,7 +1653,7 @@ operation::ProgramWithCallbacks pad_rm_sharded_width_only(

    // FIXME: assumes that this was sharded using DRAM alignment so that gaps are left in the tensor.


Should this comment be udpated as well?

bbradelTT · 2024-12-20T16:42:39Z

tech_reports/prog_examples/shard_data_rm/shard_data_rm.md

@@ -63,7 +63,7 @@ uint32_t shard_size = shard_height * shard_width;
 uint32_t input_unit_size = sizeof(uint32_t);
 uint32_t shard_width_bytes = shard_width * data_size;
 uint32_t num_units_per_row = shard_width * input_unit_size;
-uint32_t padded_offset_bytes = align(input_unit_size, device->get_allocator_alignment());
+uint32_t padded_offset_bytes = align(input_unit_size, device->get_allocator_alignment(BufferType::L1));


It might be good to add more info about when you need or when the infra uses different alignments and why.

tt-aho · 2024-12-20T17:09:37Z

ttnn/cpp/ttnn/operations/data_movement/transpose/device/transpose_program_factory.cpp

@@ -676,7 +676,7 @@ operation::ProgramWithCallbacks transpose_hc_multi_core(
    // TODO: noc_async_write only require 16B alignment for both DRAM and L1 for Blackhole, so instead of reading in
    // face-lines from C tiles to form a single tile, we can load a single tile and then write out its face-lines to C
    // tiles
-    uint32_t alignment = dst_buffer->buffer_type() == tt::tt_metal::BufferType::DRAM ? DRAM_ALIGNMENT : L1_ALIGNMENT;
+    uint32_t alignment = device->get_allocator_alignment(dst_buffer->buffer_type());


Use dst_buffer->alignment() instead

tt-aho · 2024-12-20T17:13:34Z

ttnn/cpp/ttnn/operations/data_movement/pad/device/pad_program_factory.cpp

@@ -1653,7 +1653,7 @@ operation::ProgramWithCallbacks pad_rm_sharded_width_only(

    // FIXME: assumes that this was sharded using DRAM alignment so that gaps are left in the tensor.
    // if this changes, we should change the stick step to be 16B (L1 alignment).
-    auto dram_alignment_bytes = tt::tt_metal::hal.get_alignment(tt::tt_metal::HalMemType::DRAM);
+    auto dram_alignment_bytes = tt::tt_metal::hal.get_alignment(tt::tt_metal::HalMemType::L1);


Should be renamed to l1_alignment_bytes?

tt_metal/impl/device/device.hpp

tt_metal/impl/device/device.cpp

tt_metal/hw/inc/dataflow_api.h

@@ -921,9 +941,10 @@ struct InterleavedPow2AddrGen {
    const uint32_t bank_base_address;
    const uint32_t log_base_2_of_page_size;  // WARNING: This struct is used for optimized get_noc_addr in which case
                                             // you know that bank_unit_size is a power of 2
-    const uint32_t aligned_log_base_2_of_page_size = this->log_base_2_of_page_size > LOG_BASE_2_OF_ALLOCATOR_ALIGNMENT
+    const uint32_t log_base_2_of_allocator_alignment = interleaved_addr_gen::get_log_base2_of_allocator_alignment<DRAM>();


tt_metal/hw/inc/dataflow_api.h

@@ -1019,9 +1040,10 @@ template <bool DRAM>
 struct InterleavedPow2AddrGenFast {
    uint32_t bank_base_address;             // Base address for the whole tensor.
    const uint32_t log_base_2_of_page_size; // Num bytes in bank unit.
-    const uint32_t aligned_log_base_2_of_page_size = this->log_base_2_of_page_size > LOG_BASE_2_OF_ALLOCATOR_ALIGNMENT
+    const uint32_t log_base_2_of_allocator_alignment = interleaved_addr_gen::get_log_base2_of_allocator_alignment<DRAM>();


abhullar-tt mentioned this pull request Oct 11, 2024

allocator uses 32B alignment for both DRAM and L1 #13609

Open

abhullar-tt temporarily deployed to dev October 11, 2024 21:18 — with GitHub Actions Inactive

abhullar-tt temporarily deployed to dev October 11, 2024 21:19 — with GitHub Actions Inactive

abhullar-tt temporarily deployed to dev October 11, 2024 21:20 — with GitHub Actions Inactive

abhullar-tt temporarily deployed to dev October 11, 2024 21:22 — with GitHub Actions Inactive

abhullar-tt linked an issue Oct 11, 2024 that may be closed by this pull request

allocator uses 32B alignment for both DRAM and L1 #13609

Open

abhullar-tt had a problem deploying to dev October 11, 2024 21:29 — with GitHub Actions Failure

abhullar-tt temporarily deployed to dev October 11, 2024 21:29 — with GitHub Actions Inactive

abhullar-tt temporarily deployed to dev October 11, 2024 21:32 — with GitHub Actions Inactive

abhullar-tt temporarily deployed to dev October 11, 2024 21:35 — with GitHub Actions Inactive

llongTT added 5 commits December 18, 2024 23:44

#13609: take care of sharded padding failure due to DRAM/L1 alignmen…

353945b

…t issue

#13609: stick to the usage of keep_l1_aligned = True for now

5e0bdda

#13609: switch to i2s/s2i call explicitly to keep l1 aligned

628e010

Merge branch 'main' into abhullar/diff-aligns

740938e

Merge branch 'main' into abhullar/diff-aligns

53c9f09

tt-aho self-requested a review December 19, 2024 21:54

Merge branch 'main' into abhullar/diff-aligns

49aef72

llongTT marked this pull request as ready for review December 20, 2024 16:35

llongTT requested review from ayerofieiev-tt, dmakoviichuk-tt, rfurko-tt, TT-BrianLiu, razorback3, dongjin-na, bbradelTT, ntarafdar, sjameelTT, jaykru-tt, yugi957, jvegaTT and llongTT as code owners December 20, 2024 16:35

bbradelTT approved these changes Dec 20, 2024

View reviewed changes

tt-aho reviewed Dec 20, 2024

View reviewed changes

Merge branch 'main' into abhullar/diff-aligns

d8a7c8d

llongTT requested a review from nardoTT as a code owner December 20, 2024 22:00

abhullar-tt and others added 4 commits December 20, 2024 23:05

Add allocator api to get alignment based on all buffer types

0174c05

Merge branch 'main' into abhullar/diff-aligns

6f16c44

#13609: Temporarily skip the failed tests to see if more tests fail

7c46541

#13609: skip more tests

de04fc0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#13609: Uplift dram and l1 allocators to use dram/l1 specific alignment #13762

#13609: Uplift dram and l1 allocators to use dram/l1 specific alignment #13762

abhullar-tt commented Oct 11, 2024 •

edited

Loading

bbradelTT Dec 20, 2024

bbradelTT Dec 20, 2024

tt-aho Dec 20, 2024

tt-aho Dec 20, 2024

This comment was marked as resolved.

This comment was marked as resolved.

		@@ -1653,7 +1653,7 @@ operation::ProgramWithCallbacks pad_rm_sharded_width_only(

		// FIXME: assumes that this was sharded using DRAM alignment so that gaps are left in the tensor.

#13609: Uplift dram and l1 allocators to use dram/l1 specific alignment #13762

Are you sure you want to change the base?

#13609: Uplift dram and l1 allocators to use dram/l1 specific alignment #13762

Conversation

abhullar-tt commented Oct 11, 2024 • edited Loading

Ticket

Problem description

What's changed

Checklist

bbradelTT Dec 20, 2024

Choose a reason for hiding this comment

bbradelTT Dec 20, 2024

Choose a reason for hiding this comment

tt-aho Dec 20, 2024

Choose a reason for hiding this comment

tt-aho Dec 20, 2024

Choose a reason for hiding this comment

This comment was marked as resolved.

This comment was marked as resolved.

abhullar-tt commented Oct 11, 2024 •

edited

Loading