保姆级教程：用HICO-Det数据集训练你的第一个HOI检测模型（附代码）-酒店常州论坛

从零构建HOI检测模型：HICO-Det实战指南与代码解析

在计算机视觉领域，人-物交互(HOI)检测正成为继目标检测之后的下一个研究热点。与常规目标检测不同，HOI检测不仅需要识别图像中的人和物体，还要理解他们之间的交互关系。这种细粒度的视觉理解能力，使得HOI检测在智能监控、人机交互、内容理解等场景中展现出巨大潜力。

HICO-Det作为当前最全面的HOI检测基准数据集，包含了80类物体、117种动词行为以及600种人-物交互组合。对于初学者而言，面对如此复杂的标注体系和多样的交互类别，往往不知从何入手。本文将从一个具体的交互案例"人骑自行车"出发，手把手带你完成从数据准备到模型训练的全流程，并提供可直接运行的PyTorch代码实现。

1. 环境准备与数据解析

1.1 搭建基础开发环境

在开始处理HICO-Det数据集前，我们需要配置适当的开发环境。推荐使用Python 3.8+和PyTorch 1.10+版本，这些组合在稳定性和功能支持上表现最佳。

# 创建并激活conda环境 conda create -n hoi python=3.8 -y conda activate hoi # 安装核心依赖 pip install torch==1.10.0 torchvision==0.11.1 pip install numpy scipy matplotlib opencv-python pip install scikit-learn pandas tqdm

对于数据处理，我们还需要安装一些专用工具包来处理MATLAB格式的标注文件：

pip install h5py scipy

1.2 下载与组织HICO-Det数据集

HICO-Det数据集可从官方网站下载，主要包含以下组成部分：

图像文件：47,776张图片（38,118训练集/9,658测试集）
标注文件：
- anno_bbox.mat：包含边界框和交互标注
- list_action.txt：600种HOI类别列表
- README：标注格式说明

建议按如下结构组织数据目录：

hico-det/ ├── images/ │ ├── train2015/ │ └── test2015/ ├── annotations/ │ ├── anno_bbox.mat │ ├── list_action.txt │ └── README

1.3 解析标注文件

HICO-Det的标注信息存储在MATLAB格式的anno_bbox.mat文件中，我们可以使用scipy.io库来加载这些数据：

import h5py import numpy as np def load_annotations(anno_path): with h5py.File(anno_path, 'r') as f: bbox_train = f['bbox_train'][:] bbox_test = f['bbox_test'][:] list_action = f['list_action'][:] return bbox_train, bbox_test, list_action # 示例：加载并打印训练集第一个样本的信息 bbox_train, _, _ = load_annotations('hico-det/annotations/anno_bbox.mat') first_sample = bbox_train[0] print(f"文件名: {first_sample['filename']}") print(f"图像尺寸: {first_sample['size']}") print(f"包含的HOI数量: {len(first_sample['hoi'])}")

标注数据结构解析：

字段	类型	描述
filename	str	图像文件名
size	tuple	(宽度, 高度, 通道数)
hoi	list	交互标注列表
hoi[i].id	int	交互类别ID
hoi[i].bboxhuman	list	人物边界框[x1,y1,x2,y2]
hoi[i].bboxobject	list	物体边界框[x1,y1,x2,y2]
hoi[i].connection	list	人物-物体配对索引

2. 构建HOI数据加载器

2.1 设计数据集类

我们需要创建一个继承自torch.utils.data.Dataset的类来处理HICO-Det数据：

import torch from torch.utils.data import Dataset import cv2 class HICODetDataset(Dataset): def __init__(self, root_dir, annotations, transform=None): self.root_dir = root_dir self.annotations = annotations self.transform = transform self.actions = self._load_action_list('hico-det/annotations/list_action.txt') def _load_action_list(self, path): with open(path) as f: return [line.strip() for line in f.readlines()] def __len__(self): return len(self.annotations) def __getitem__(self, idx): anno = self.annotations[idx] img_path = f"{self.root_dir}/images/{anno['filename']}" image = cv2.imread(img_path) image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # 提取所有人和物体的边界框 human_boxes = [] object_boxes = [] interactions = [] for hoi in anno['hoi']: if hoi['invis'] == 1: continue action_id = hoi['id'] for conn in hoi['connection']: human_idx, object_idx = conn human_box = hoi['bboxhuman'][human_idx] object_box = hoi['bboxobject'][object_idx] human_boxes.append(human_box) object_boxes.append(object_box) interactions.append(action_id) sample = { 'image': image, 'human_boxes': human_boxes, 'object_boxes': object_boxes, 'interactions': interactions, 'filename': anno['filename'] } if self.transform: sample = self.transform(sample) return sample

2.2 实现数据增强

在HOI检测任务中，合理的数据增强可以显著提升模型性能。我们设计一个专门的转换类：

import random import numpy as np class HOITransform: def __init__(self, is_train=True): self.is_train = is_train def __call__(self, sample): image = sample['image'] h, w = image.shape[:2] # 随机水平翻转 if self.is_train and random.random() > 0.5: image = image[:, ::-1, :] for i in range(len(sample['human_boxes'])): sample['human_boxes'][i][[0, 2]] = w - sample['human_boxes'][i][[2, 0]] sample['object_boxes'][i][[0, 2]] = w - sample['object_boxes'][i][[2, 0]] # 归一化边界框坐标 human_boxes = np.array(sample['human_boxes']) / np.array([w, h, w, h]) object_boxes = np.array(sample['object_boxes']) / np.array([w, h, w, h]) # 转换为Tensor image = torch.from_numpy(image).permute(2, 0, 1).float() / 255.0 human_boxes = torch.from_numpy(human_boxes).float() object_boxes = torch.from_numpy(object_boxes).float() interactions = torch.tensor(sample['interactions'], dtype=torch.long) return { 'image': image, 'human_boxes': human_boxes, 'object_boxes': object_boxes, 'interactions': interactions }

2.3 创建数据加载器

现在我们可以将上述组件组合起来创建PyTorch数据加载器：

from torch.utils.data import DataLoader # 示例：创建训练集加载器 train_dataset = HICODetDataset( root_dir='hico-det', annotations=bbox_train[:1000], # 为演示只使用部分数据 transform=HOITransform(is_train=True) ) train_loader = DataLoader( train_dataset, batch_size=8, shuffle=True, num_workers=4, collate_fn=lambda x: x # 自定义批处理需要在模型中进行 )

3. 构建HOI检测模型

3.1 模型架构设计

我们将实现一个基于Faster R-CNN的HOI检测模型，包含三个主要组件：

特征提取器：使用ResNet-50作为主干网络
人-物检测分支：基于Faster R-CNN检测人和物体
交互分类分支：预测检测到的人-物对之间的交互关系

import torch.nn as nn import torchvision.models as models from torchvision.ops import RoIAlign class HOIModel(nn.Module): def __init__(self, num_actions=600): super().__init__() # 特征提取器 backbone = models.resnet50(pretrained=True) self.backbone = nn.Sequential( backbone.conv1, backbone.bn1, backbone.relu, backbone.maxpool, backbone.layer1, backbone.layer2, backbone.layer3, backbone.layer4 ) # ROI对齐 self.roi_align = RoIAlign(output_size=7, spatial_scale=1/16, sampling_ratio=2) # 交互分类器 self.interaction_head = nn.Sequential( nn.Linear(2048*2, 1024), nn.ReLU(), nn.Linear(1024, num_actions) ) def forward(self, images, human_boxes, object_boxes): # 提取特征图 features = self.backbone(images) # 对人和物体ROI进行对齐 batch_size = images.shape[0] rois = [] for i in range(batch_size): human_rois = human_boxes[i] * torch.tensor([images.shape[3], images.shape[2]]*2, device=images.device) object_rois = object_boxes[i] * torch.tensor([images.shape[3], images.shape[2]]*2, device=images.device) rois.append(torch.cat([ torch.full((human_rois.shape[0], 1), i, device=images.device), human_rois ], dim=1)) rois.append(torch.cat([ torch.full((object_rois.shape[0], 1), i, device=images.device), object_rois ], dim=1)) rois = torch.cat(rois, dim=0) pooled_features = self.roi_align(features, rois) # 分离人和物体特征 human_feats = pooled_features[::2] object_feats = pooled_features[1::2] # 交互分类 combined_feats = torch.cat([ human_feats.flatten(start_dim=1), object_feats.flatten(start_dim=1) ], dim=1) action_scores = self.interaction_head(combined_feats) return action_scores

3.2 损失函数与评估指标

HOI检测需要专门的损失函数来处理多标签分类问题：

class HOILoss(nn.Module): def __init__(self): super().__init__() self.cls_loss = nn.CrossEntropyLoss() def forward(self, preds, targets): """ preds: (N, 600) 交互类别预测分数 targets: (N,) 真实交互类别ID """ return self.cls_loss(preds, targets)

评估HOI检测性能常用以下指标：

指标	计算公式	说明
mAP	$\frac{1}{600}\sum_{i=1}^{600} AP_i$	所有交互类别的平均精度
Role mAP	考虑人和物体位置的AP	更严格的评估标准
Default	使用官方评估协议	包含已知/未知物体类别划分

4. 模型训练与优化

4.1 训练流程实现

下面是完整的训练循环实现，包含学习率调度和模型保存：

from tqdm import tqdm import os def train_model(model, train_loader, criterion, optimizer, num_epochs=10, save_dir='checkpoints'): os.makedirs(save_dir, exist_ok=True) for epoch in range(num_epochs): model.train() running_loss = 0.0 pbar = tqdm(train_loader, desc=f'Epoch {epoch+1}/{num_epochs}') for batch in pbar: # 自定义批处理 images = torch.stack([item['image'] for item in batch]) human_boxes = [item['human_boxes'] for item in batch] object_boxes = [item['object_boxes'] for item in batch] interactions = torch.cat([item['interactions'] for item in batch]) # 转移到GPU images = images.cuda() interactions = interactions.cuda() for i in range(len(human_boxes)): human_boxes[i] = human_boxes[i].cuda() object_boxes[i] = object_boxes[i].cuda() # 前向传播 optimizer.zero_grad() outputs = model(images, human_boxes, object_boxes) loss = criterion(outputs, interactions) # 反向传播 loss.backward() optimizer.step() # 统计信息 running_loss += loss.item() pbar.set_postfix({'loss': running_loss/(pbar.n+1)}) # 保存检查点 torch.save({ 'epoch': epoch, 'model_state_dict': model.state_dict(), 'optimizer_state_dict': optimizer.state_dict(), 'loss': running_loss/len(train_loader), }, f'{save_dir}/epoch_{epoch}.pth')

4.2 优化策略

针对HOI检测任务的特点，我们采用以下优化策略：

学习率调度：使用余弦退火学习率
梯度裁剪：防止梯度爆炸
类别平衡采样：针对长尾分布问题

from torch.optim import AdamW from torch.optim.lr_scheduler import CosineAnnealingLR # 初始化模型和优化器 model = HOIModel().cuda() optimizer = AdamW(model.parameters(), lr=1e-4, weight_decay=1e-4) scheduler = CosineAnnealingLR(optimizer, T_max=num_epochs, eta_min=1e-6) criterion = HOILoss().cuda() # 训练模型 train_model( model, train_loader, criterion, optimizer, num_epochs=20 )

4.3 模型评估与可视化

训练完成后，我们需要评估模型在测试集上的表现：

def evaluate(model, test_loader): model.eval() correct = 0 total = 0 with torch.no_grad(): for batch in tqdm(test_loader, desc='Evaluating'): images = torch.stack([item['image'] for item in batch]).cuda() human_boxes = [item['human_boxes'].cuda() for item in batch] object_boxes = [item['object_boxes'].cuda() for item in batch] interactions = torch.cat([item['interactions'] for item in batch]).cuda() outputs = model(images, human_boxes, object_boxes) _, predicted = torch.max(outputs.data, 1) total += interactions.size(0) correct += (predicted == interactions).sum().item() accuracy = 100 * correct / total print(f'Test Accuracy: {accuracy:.2f}%') return accuracy

对于可视化，我们可以绘制预测结果示例：

import matplotlib.pyplot as plt import matplotlib.patches as patches def visualize_prediction(image, human_boxes, object_boxes, pred_action, true_action=None): fig, ax = plt.subplots(1, figsize=(12, 8)) ax.imshow(image.permute(1, 2, 0).cpu()) # 绘制人物框（红色） for box in human_boxes: x1, y1, x2, y2 = box * torch.tensor([image.shape[2], image.shape[1]]*2) rect = patches.Rectangle( (x1, y1), x2-x1, y2-y1, linewidth=2, edgecolor='r', facecolor='none' ) ax.add_patch(rect) # 绘制物体框（蓝色） for box in object_boxes: x1, y1, x2, y2 = box * torch.tensor([image.shape[2], image.shape[1]]*2) rect = patches.Rectangle( (x1, y1), x2-x1, y2-y1, linewidth=2, edgecolor='b', facecolor='none' ) ax.add_patch(rect) # 显示预测结果 action_name = train_dataset.actions[pred_action] title = f"Predicted: {action_name}" if true_action is not None: true_name = train_dataset.actions[true_action] title += f"\nTrue: {true_name}" ax.set_title(title) plt.axis('off') plt.show()

5. 高级技巧与优化方向

5.1 处理数据不平衡问题

HICO-Det中存在严重的长尾分布问题，我们可以采用以下策略：

重采样策略：
- 对稀少类别过采样
- 对常见类别欠采样
损失函数调整：
- 类别加权交叉熵
- Focal Loss

# 计算类别权重 def compute_class_weights(annotations, num_classes=600): class_counts = torch.zeros(num_classes) for anno in annotations: for hoi in anno['hoi']: if hoi['invis'] == 1: continue class_counts[hoi['id']] += len(hoi['connection']) # 计算逆频率权重 weights = 1.0 / (class_counts + 1e-6) weights = weights / weights.sum() * num_classes return weights.cuda() class_weight = compute_class_weights(bbox_train) criterion = nn.CrossEntropyLoss(weight=class_weight)

5.2 模型架构改进

原始模型可以进一步优化：

注意力机制：添加空间和通道注意力模块
图神经网络：建模人-物之间的结构关系
多任务学习：联合训练检测和交互分类

class AttentionHOIModel(nn.Module): def __init__(self, num_actions=600): super().__init__() # 原有特征提取器 self.backbone = ... # 添加注意力模块 self.attention = nn.Sequential( nn.Conv2d(2048, 512, 1), nn.ReLU(), nn.Conv2d(512, 1, 1), nn.Sigmoid() ) # 改进的交互分类器 self.interaction_head = nn.Sequential( nn.Linear(2048*2, 1024), nn.ReLU(), nn.Dropout(0.5), nn.Linear(1024, num_actions) ) def forward(self, images, human_boxes, object_boxes): features = self.backbone(images) # 应用注意力 attention_map = self.attention(features) features = features * attention_map # 其余部分保持不变 ...

5.3 实际部署考量

将HOI检测模型部署到生产环境时需要考虑：

模型轻量化：
- 使用MobileNetV3等轻量主干
- 知识蒸馏
推理优化：
- TensorRT加速
- 半精度推理
应用场景适配：
- 特定领域微调
- 交互类别定制

# 示例：模型量化 quantized_model = torch.quantization.quantize_dynamic( model, {nn.Linear}, dtype=torch.qint8 )

企业官网建设流程全解析

从零构建HOI检测模型：HICO-Det实战指南与代码解析

1. 环境准备与数据解析

1.1 搭建基础开发环境

1.2 下载与组织HICO-Det数据集

1.3 解析标注文件

2. 构建HOI数据加载器

2.1 设计数据集类

2.2 实现数据增强

2.3 创建数据加载器

3. 构建HOI检测模型

3.1 模型架构设计

3.2 损失函数与评估指标

4. 模型训练与优化

4.1 训练流程实现

4.2 优化策略

4.3 模型评估与可视化

5. 高级技巧与优化方向

5.1 处理数据不平衡问题

5.2 模型架构改进

5.3 实际部署考量

热门文章

文章分类

标签云

需要专业的网站建设服务？

企业官网建设流程全解析

从零构建HOI检测模型：HICO-Det实战指南与代码解析

1. 环境准备与数据解析

1.1 搭建基础开发环境

1.2 下载与组织HICO-Det数据集

1.3 解析标注文件

2. 构建HOI数据加载器

2.1 设计数据集类

2.2 实现数据增强

2.3 创建数据加载器

3. 构建HOI检测模型

3.1 模型架构设计

3.2 损失函数与评估指标

4. 模型训练与优化

4.1 训练流程实现

4.2 优化策略

4.3 模型评估与可视化

5. 高级技巧与优化方向

5.1 处理数据不平衡问题

5.2 模型架构改进

5.3 实际部署考量

热门文章

文章分类

标签云

相关文章

从Moment.js到Day.js：前端时间库的轻量化迁移实战与避坑指南

九种体质被滥用的锅，AI不背：知医邦ChatiSS如何用客观数据做真正的中医辨体

别再手动克隆了！用VMware Workstation Pro一键复制CentOS7虚拟机（附网络配置避坑指南）

需要专业的网站建设服务？