CANN向量步幅切片约束
2026/5/9 12:42:34 网站建设 项目流程

Vec Stride and Slicing Constraints

【免费下载链接】cannbot-skillsCANNBot 是面向 CANN 开发的用于提升开发效率的系列智能体,本仓库为其提供可复用的 Skills 模块。项目地址: https://gitcode.com/cann/cannbot-skills

Read this file when a vec operation needs to access part of a wider buffer, or when a "narrow" source (e.g. row-max buffer) must align with a "wide" destination row by row.

Goal

Decide correctly when a vec operation can run continuously over a full buffer versus when it requires sliced views or explicit stride configuration.

1. The alignment problem

Vec operations inferrepeatfrom the destination tensor and strides from each tensor'sspan/shape. When a wide buffer (e.g.[M, 128]) is paired with a narrow buffer (e.g.[M, 8]), the repeat counts may not align row-by-row.

For float (C0=8):

  • [M, 128]span1=128does not match8*C0=64orC0=8→ default strides (blk=1, rep=8)
  • Each row takes2 repeats(128 / 64 = 2)
  • [M, 8]span1=8 == C0blk=0, rep=1
  • Each row takes1 repeatfrom the narrow buffer

Ifsub(wide[M,128], wide[M,128], narrow[M,8])is called directly:

  • repeat = M * 128 / 64 = 2M(from dst)
  • narrow advances 1 per repeat → after repeat 0 (row 0 first half), narrow moves to row 1
  • row 0's second half gets row 1's value→ misaligned!

2. Fix: slice the wide buffer to 64-column views

Slicing to[M, 64]creates a view wherespan1=64 == 8*C0:

  • blk=1, rep=shape[1]//C0(e.g.128//8=16for a 128-wide parent)
  • Each row takes1 repeat→ aligns with the narrow buffer'srep=1
# Correct: sliced views ensure 1 repeat per row sub(ub[0:M, 0:64], ub[0:M, 0:64], max_buf) # first half sub(ub[0:M, 64:128], ub[0:M, 64:128], max_buf) # second half

The slice syntax creates a Tensor view with updatedspanandoffsetwhile keeping the originalshape. The stride auto-inference usesspanfor stride selection andshapeforrep_stridecalculation, which correctly skips the full row width between repeats.

3. When slicing is NOT needed

Purely element-wise operations (no narrow source) can run continuously over the full buffer:

OperationNeeds slicing?Reason
muls(wide, wide, scalar)NoScalar broadcasts uniformly
exp(wide, wide)NoSame-shape in-place, no alignment issue
cast(half_out, float_in)NoSame-shape element-wise conversion
sub(wide, wide, narrow)YesNarrow source advances 1 row/repeat
vmax(dst64, wide_half1, wide_half2)YesNeed column views of a wider buffer
brcb(wide, narrow)Explicit stridesSee brcb section

Rule: if all source and destination tensors have the samespanand are operated element-wise, no slicing is needed. If any operand has a different width (narrower), slice the wider operands to match the narrow operand's per-row repeat cadence.

4. Stride auto-inference rules

Fromvecutils.infer_strides(tensor)for float (C0=8):

span[1]Matchesblk_striderep_stride
64(= 8×C0)Yes1shape[1] // C0
8(= C0)Yes0shape[1] // C0
otherNo1 (default)8 (default)

For half (C0=16):

span[1]Matchesblk_striderep_stride
128(= 8×C0)Yes1shape[1] // C0
16(= C0)Yes0shape[1] // C0
otherNo1 (default)8 (default)

Whenspan[0] == 1and a match occurred,rep_strideis overridden to0.

infer_repeat(tensor)always uses:span[0] * span[1] / (256 // dtype.size)

5. Column slicing via Tensor views

DSL tensor slicing (tensor[row_start:row_end, col_start:col_end]) creates a view with:

  • offsetadjusted to the slice start
  • spanset to the slice extent
  • shapeinherited from the parent (full allocation width)

This meansrep_stride = shape[1] // C0correctly accounts for the full row width, whilerepeat = span[0] * span[1] // (256 // dtype_size)only covers the sliced region.

Example forub_data[0:64, 64:128]whereub_dataisTensor(float, [64, 128]):

  • span = [64, 64],shape = [64, 128],offset = [0, 64]
  • blk=1, rep=128//8=16(skips full 128-wide row)
  • repeat = 64*64/64 = 64(one repeat per row)

Files to study

  • easyasc/stub_functions/vec/vecutils.py— stride inference logic
  • easyasc/utils/Tensor.py— slice/view creation
  • agent/example/kernels/a2/flash_attn_score.py— practical use of sliced sub + continuous exp/cast

【免费下载链接】cannbot-skillsCANNBot 是面向 CANN 开发的用于提升开发效率的系列智能体,本仓库为其提供可复用的 Skills 模块。项目地址: https://gitcode.com/cann/cannbot-skills

创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考

需要专业的网站建设服务?

联系我们获取免费的网站建设咨询和方案报价,让我们帮助您实现业务目标

立即咨询