CANN/catlass 逐令牌反量化
2026/5/9 19:13:31 网站建设 项目流程

Block Epilogue Per Token Dequant

【免费下载链接】catlass本项目是CANN的算子模板库,提供NPU上高性能矩阵乘及其相关融合类算子模板样例。项目地址: https://gitcode.com/cann/catlass

代码位置

功能说明

  • BlockEpilogue偏特化实现,使用perTokenScale和perChannelScale对block数据做perToken和perChannel的反量化。

  • 计算公式:$blockD_{ij} = blockC_{ij} * perChannelScale_j * perTokenScale_i$

  • 当前支持的blockC、perChannelScale、perTokenScale、blockD数据类型

blockCperChannelScaleperTokenScaleblockD
int32halfhalfhalf
int32bfloat16_tbfloat16_tbfloat16_t
int32floatfloathalf
int32floatfloatbfloat16_t

调度策略

// For AtlasA2, per token dequant template <uint32_t UB_STAGES_> struct EpilogueAtlasA2PerTokenDequant { using ArchTag = Arch::AtlasA2; static constexpr uint32_t UB_STAGES = UB_STAGES_; };

调用示例

Block组装

参考样例12_quant_matmul

constexpr uint32_t ubStages = 2; using EpilogueDispatchPolicy = Epilogue::EpilogueAtlasA2PerTokenDequant<ubStages>; using ScaleType = Gemm::GemmType<bfloat16_t, layout::VectorLayout>; using PerTokenScaleType = Gemm::GemmType<bfloat16_t, layout::VectorLayout>; using DType = Gemm::GemmType<bfloat16_t, layout::RowMajor>; using RowBroadcastMulType = Gemm::GemmType<float, layout::RowMajor>; using BroadcastOneBlkType = Gemm::GemmType<float, layout::RowMajor>; using OneBlkColumnBroadcastMulType = Gemm::GemmType<float, layout::RowMajor>; using EpilogueTileShape = MatrixShape<32, 256>; using TileRowBroadcastMul = Epilogue::Tile::TileRowBroadcastMul<ArchTag, RowBroadcastMulType, EpilogueTileShape>; using TileBroadcastOneBlk = Epilogue::Tile::TileBroadcastOneBlk<ArchTag, BroadcastOneBlkType, EpilogueTileShape::ROW>; using TileOneBlkColumnBroadcastMul = Epilogue::Tile::TileOneBlkColumnBroadcastMul<ArchTag, OneBlkColumnBroadcastMulType, EpilogueTileShape>; using TileCopy = Epilogue::Tile::TileCopy<ArchTag, CType, ScaleType, PerTokenScaleType, DType>; using TileScheduler = Epilogue::Tile::EpilogueHorizontalTileSwizzle;
using BlockEpilogue = Epilogue::Block::BlockEpilogue< EpilogueDispatchPolicy, // 选用的后处理调度策略 CType, // 反量化前block的类型 ScaleType, // perChannelScale的类型 PerTokenScaleType, // perTokenScale的类型 DType, // 反量化后block的类型 TileRowBroadcastMul, // tile组件,将(1,n)的scale广播到(m,n)后与block相乘 TileBroadcastOneBlk, // tile组件,将(m,1)的perTokenScale广播到(m,32B) TileOneBlkColumnBroadcastMul, // tile组件,将(m,32B)的perTokenScale广播到(m,n)后与block相乘 TileCopy, // tileCopy组件 TileScheduler // tile块切分调度 >;

Block实例化

参考quant_matmul_multistage_workspace,在kernel代码的void operator()<AscendC::AIV>函数中:

BlockEpilogue blockEpilogue(resource);

Block更新params

参考quant_matmul_multistage_workspace,在kernel代码的void operator()<AscendC::AIV>函数中:

EpilogueParams epilogueParams{ params.ptrScale, // perChannelScale的GM地址 layoutScale, // perChannelScale的layout params.ptrPerTokenScale, // perTokenScale的GM地址 layoutPerTokenScale, // perTokenScale的layout params.ptrD, // 输出矩阵的GM地址 layoutD // 输出矩阵的layout }; blockEpilogue.UpdateParams(epilogueParams);

Block执行

参考basic_matmul,在kernel代码的void operator()<AscendC::AIC>函数中:

blockEpilogue( blockShapeMNK, // block的shape blockCoordMNK, // block在输出矩阵中的坐标(block粒度) actualBlockShapeMNK, // 待处理block的实际shape gmBlockC, // 待处理block在GM上起始地址 layoutBlockC // 待处理block的layout );

约束说明

  • 当前仅支持blockC、blockD的layout均为RowMajor,perChannelScale、perTokenScale的layout均为VectorLayout

【免费下载链接】catlass本项目是CANN的算子模板库,提供NPU上高性能矩阵乘及其相关融合类算子模板样例。项目地址: https://gitcode.com/cann/catlass

创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考

需要专业的网站建设服务?

联系我们获取免费的网站建设咨询和方案报价,让我们帮助您实现业务目标

立即咨询