自己有网站怎么优化,没备案能做网站吗,法制教育网站,网店营销网站详解 torch.triu#xff1a;上三角矩阵的高效构造
在深度学习和矩阵运算中#xff0c;我们经常需要构造上三角矩阵#xff08;Upper Triangular Matrix#xff09;#xff0c;其中主对角线以下的元素全部设为 0。PyTorch 提供了一个高效的函数 torch.triu()#xff0c;用…详解 torch.triu上三角矩阵的高效构造
在深度学习和矩阵运算中我们经常需要构造上三角矩阵Upper Triangular Matrix其中主对角线以下的元素全部设为 0。PyTorch 提供了一个高效的函数 torch.triu()用于生成上三角矩阵并允许我们灵活地调整对角线的偏移量。
在本篇博客中我们将深入探讨
torch.triu() 的基本用法第二个参数 diagonal 如何影响结果torch.triu(all_ones, -1 * 2 1) 会生成什么代码示例与应用场景 1. torch.triu 的基本用法
1.1 语法
torch.triu(input, diagonal0)input输入张量必须是 2D 矩阵diagonal指定从哪条对角线开始保留元素 diagonal0默认保留主对角线及其上的元素diagonal0向上偏移 diagonal 行diagonal0向下偏移 diagonal 行 1.2 示例默认 diagonal0
import torchA torch.tensor([[1, 2, 3],[4, 5, 6],[7, 8, 9]
])B torch.triu(A)
print(B)输出
tensor([[1, 2, 3],[0, 5, 6],[0, 0, 9]])解释
主对角线1, 5, 9及其上方元素2, 3, 6被保留下三角部分4, 7, 8被置为 0 2. diagonal 参数的作用
2.1 diagonal 0向上偏移
B torch.triu(A, diagonal1)
print(B)输出
tensor([[0, 2, 3],[0, 0, 6],[0, 0, 0]])解释
diagonal1 表示从主对角线上方一行开始保留元素主对角线元素1, 5, 9被置为 0仅保留 2, 3, 6 2.2 diagonal 0向下偏移
B torch.triu(A, diagonal-1)
print(B)输出
tensor([[1, 2, 3],[4, 5, 6],[0, 8, 9]])解释
diagonal-1 表示从主对角线下一行开始保留元素主对角线以上元素仍保留下三角部分的 7 变成 0但 4, 8 仍然保留 3. torch.triu(all_ones, -1 * 2 1) 解析
假设
all_ones torch.ones(5, 5)
B torch.triu(all_ones, -1 * 2 1)
print(B)让我们拆解 diagonal 参数
-1 * 2 1 -1这等价于 torch.triu(all_ones, -1)
all_ones 矩阵
tensor([[1, 1, 1, 1, 1],[1, 1, 1, 1, 1],[1, 1, 1, 1, 1],[1, 1, 1, 1, 1],[1, 1, 1, 1, 1]])torch.triu(all_ones, -1) 结果
tensor([[1, 1, 1, 1, 1],[1, 1, 1, 1, 1],[0, 1, 1, 1, 1],[0, 0, 1, 1, 1],[0, 0, 0, 1, 1]])解释
diagonal-1 意味着主对角线及其上一行都保留低于 -1 的部分被置 0 4. torch.triu() 的应用场景
4.1 生成注意力掩码Transformer
在 Transformer 的自回归解码过程中我们使用 torch.triu() 生成上三角掩码mask避免未来信息泄露
seq_len 5
mask torch.triu(torch.ones(seq_len, seq_len), diagonal1)
mask mask.masked_fill(mask 1, float(-inf))
print(mask)输出掩码矩阵
tensor([[ 0., -inf, -inf, -inf, -inf],[ 0., 0., -inf, -inf, -inf],[ 0., 0., 0., -inf, -inf],[ 0., 0., 0., 0., -inf],[ 0., 0., 0., 0., 0.]])用于 softmax 计算使模型只能关注当前及之前的 token。 4.2 计算上三角矩阵的和
A torch.tensor([[1, 2, 3],[4, 5, 6],[7, 8, 9]
])
upper_sum torch.triu(A).sum()
print(upper_sum) # 26解释
只保留 1, 2, 3, 5, 6, 91 2 3 5 6 9 26 4.3 生成 Pascal 三角形
n 5
pascal torch.triu(torch.ones(n, n), diagonal0)
for i in range(1, n):for j in range(1, i1):pascal[i, j] pascal[i-1, j-1] pascal[i-1, j]
print(pascal)输出
tensor([[1., 0., 0., 0., 0.],[1., 1., 0., 0., 0.],[1., 2., 1., 0., 0.],[1., 3., 3., 1., 0.],[1., 4., 6., 4., 1.]])5. 总结
torch.triu() 用于生成上三角矩阵对角线以下的元素设为 0。diagonal 控制保留的最小对角线 diagonal0默认保留主对角线及以上diagonal0向上偏移更多元素变 0diagonal0向下偏移更多元素被保留 torch.triu(all_ones, -1 * 2 1) 生成 diagonal-1 的上三角矩阵。常见应用 Transformer 掩码矩阵运算构造 Pascal 三角形 torch.triu() 是矩阵计算和深度学习中必不可少的函数掌握它可以优化你的 PyTorch 代码
Understanding torch.triu: Constructing Upper Triangular Matrices in PyTorch
In deep learning and matrix computations, upper triangular matrices are widely used, where all elements below the main diagonal are set to zero. PyTorch provides the efficient function torch.triu() to generate upper triangular matrices and allows flexible control over which diagonal to retain.
In this blog post, we will explore:
The basic usage of torch.triu()How the second parameter diagonal affects the outputWhat torch.triu(all_ones, -1 * 2 1) generatesPractical examples and applications 1. Introduction to torch.triu
1.1 Syntax
torch.triu(input, diagonal0)input: The input tensor (must be a 2D matrix).diagonal: Specifies which diagonal to retain: diagonal0 (default): Retains the main diagonal and elements above it.diagonal0: Shifts retention upwards.diagonal0: Shifts retention downwards. 1.2 Example: Default diagonal0
import torchA torch.tensor([[1, 2, 3],[4, 5, 6],[7, 8, 9]
])B torch.triu(A)
print(B)Output:
tensor([[1, 2, 3],[0, 5, 6],[0, 0, 9]])Explanation:
The main diagonal (1, 5, 9) and elements above it (2, 3, 6) are retained.The lower triangular part (4, 7, 8) is set to 0. 2. Understanding the diagonal Parameter
2.1 diagonal 0: Shift upwards
B torch.triu(A, diagonal1)
print(B)Output:
tensor([[0, 2, 3],[0, 0, 6],[0, 0, 0]])Explanation:
diagonal1 retains elements from one row above the main diagonal.The main diagonal (1, 5, 9) is set to 0.Only elements 2, 3, 6 are preserved. 2.2 diagonal 0: Shift downwards
B torch.triu(A, diagonal-1)
print(B)Output:
tensor([[1, 2, 3],[4, 5, 6],[0, 8, 9]])Explanation:
diagonal-1 retains elements from one row below the main diagonal.The main diagonal and upper part remain unchanged.The lowest element 7 is set to 0, but 4, 8 are retained. 3. What does torch.triu(all_ones, -1 * 2 1) generate?
Assume:
all_ones torch.ones(5, 5)
B torch.triu(all_ones, -1 * 2 1)
print(B)Breaking down diagonal:
-1 * 2 1 -1Equivalent to torch.triu(all_ones, -1)
all_ones matrix:
tensor([[1, 1, 1, 1, 1],[1, 1, 1, 1, 1],[1, 1, 1, 1, 1],[1, 1, 1, 1, 1],[1, 1, 1, 1, 1]])torch.triu(all_ones, -1) result:
tensor([[1, 1, 1, 1, 1],[1, 1, 1, 1, 1],[0, 1, 1, 1, 1],[0, 0, 1, 1, 1],[0, 0, 0, 1, 1]])Explanation:
diagonal-1 means retaining the main diagonal and one row below it.Elements below -1 are set to 0. 4. Applications of torch.triu()
4.1 Generating Attention Masks (Transformers)
In Transformers, upper triangular masks are used to prevent future information leakage during autoregressive decoding:
seq_len 5
mask torch.triu(torch.ones(seq_len, seq_len), diagonal1)
mask mask.masked_fill(mask 1, float(-inf))
print(mask)Output (Mask Matrix):
tensor([[ 0., -inf, -inf, -inf, -inf],[ 0., 0., -inf, -inf, -inf],[ 0., 0., 0., -inf, -inf],[ 0., 0., 0., 0., -inf],[ 0., 0., 0., 0., 0.]])This ensures that the model can only attend to current and past tokens. 4.2 Summing the Upper Triangular Matrix
A torch.tensor([[1, 2, 3],[4, 5, 6],[7, 8, 9]
])
upper_sum torch.triu(A).sum()
print(upper_sum) # 26Explanation:
Retains only 1, 2, 3, 5, 6, 91 2 3 5 6 9 26 4.3 Constructing Pascal’s Triangle
n 5
pascal torch.triu(torch.ones(n, n), diagonal0)
for i in range(1, n):for j in range(1, i1):pascal[i, j] pascal[i-1, j-1] pascal[i-1, j]
print(pascal)Output:
tensor([[1., 0., 0., 0., 0.],[1., 1., 0., 0., 0.],[1., 2., 1., 0., 0.],[1., 3., 3., 1., 0.],[1., 4., 6., 4., 1.]])5. Conclusion
torch.triu() constructs upper triangular matrices, setting elements below the specified diagonal to zero.The diagonal parameter controls which diagonal to retain: diagonal0: Retains the main diagonal and above.diagonal0: Shifts upwards, removing more elements.diagonal0: Shifts downwards, keeping more elements. torch.triu(all_ones, -1 * 2 1) generates an upper triangular matrix with diagonal-1.Common use cases: Transformers attention masksMatrix computationsConstructing Pascal’s triangle torch.triu() is an essential function for matrix computations and deep learning, making PyTorch code more efficient and readable!
后记
2025年2月23日14点50分于上海在GPT4o大模型辅助下完成。