文本生成:从 Seq2Seq 到 GPT 的演进
2026/5/11 20:21:00 网站建设 项目流程

文本生成:从 Seq2Seq 到 GPT 的演进

1. 技术分析

1.1 文本生成技术演进

文本生成经历了从规则方法到深度学习的演进:

文本生成技术路线 规则模板: 基于模板填充 统计语言模型: n-gram 神经语言模型: RNN/LSTM Transformer: GPT/T5

1.2 文本生成模型对比

模型架构特点代表模型
RNN/LSTM循环结构序列建模Seq2Seq
Transformer注意力机制并行计算GPT
T5统一框架多任务T5
BERT双向编码理解为主BERT

1.3 生成策略对比

生成策略 Greedy: 每步选概率最大的 token Beam Search: 保留多个候选 Sampling: 随机采样 Top-K: 限制候选范围 Top-P (Nucleus): 概率质量阈值

2. 核心功能实现

2.1 RNN 文本生成

import torch import torch.nn as nn import torch.nn.functional as F class RNNGenerator(nn.Module): def __init__(self, vocab_size, embedding_dim, hidden_dim, num_layers=2): super().__init__() self.embedding = nn.Embedding(vocab_size, embedding_dim) self.lstm = nn.LSTM(embedding_dim, hidden_dim, num_layers=num_layers) self.fc = nn.Linear(hidden_dim, vocab_size) def forward(self, x, hidden=None): x = self.embedding(x) output, hidden = self.lstm(x, hidden) logits = self.fc(output) return logits, hidden def generate(self, start_token, max_len=100, temperature=1.0): self.eval() generated = [start_token] hidden = None for _ in range(max_len): input_ids = torch.tensor([generated[-1]]).unsqueeze(0) with torch.no_grad(): logits, hidden = self.forward(input_ids, hidden) logits = logits.squeeze(0) / temperature probabilities = F.softmax(logits, dim=-1) next_token = torch.multinomial(probabilities, num_samples=1).item() generated.append(next_token) if next_token == self.end_token: break return generated

2.2 Transformer 文本生成

class TransformerGenerator(nn.Module): def __init__(self, vocab_size, d_model=512, num_heads=8, d_ff=2048, num_layers=6): super().__init__() self.embedding = nn.Embedding(vocab_size, d_model) self.positional_encoding = PositionalEncoding(d_model) decoder_layer = nn.TransformerDecoderLayer(d_model, num_heads, d_ff) self.decoder = nn.TransformerDecoder(decoder_layer, num_layers) self.fc = nn.Linear(d_model, vocab_size) def forward(self, tgt, memory=None, tgt_mask=None): tgt = self.embedding(tgt) * torch.sqrt(torch.tensor(self.embedding.embedding_dim, dtype=torch.float32)) tgt = self.positional_encoding(tgt) output = self.decoder(tgt, memory, tgt_mask=tgt_mask) output = self.fc(output) return output def generate(self, start_token, max_len=100, temperature=1.0, top_k=50): self.eval() generated = [start_token] for _ in range(max_len): input_ids = torch.tensor([generated]).T tgt_mask = nn.Transformer.generate_square_subsequent_mask(len(input_ids)).to(input_ids.device) with torch.no_grad(): logits = self.forward(input_ids, tgt_mask=tgt_mask) logits = logits[-1, :] / temperature if top_k > 0: v, _ = torch.topk(logits, top_k) logits[logits < v[-1]] = float('-inf') probabilities = F.softmax(logits, dim=-1) next_token = torch.multinomial(probabilities, num_samples=1).item() generated.append(next_token) if next_token == self.end_token: break return generated

2.3 GPT 风格生成

class GPTGenerator(nn.Module): def __init__(self, vocab_size, d_model=768, num_heads=12, d_ff=3072, num_layers=12): super().__init__() self.transformer = nn.Transformer( d_model=d_model, nhead=num_heads, num_encoder_layers=0, num_decoder_layers=num_layers, dim_feedforward=d_ff ) self.embedding = nn.Embedding(vocab_size, d_model) self.positional_encoding = PositionalEncoding(d_model) self.fc = nn.Linear(d_model, vocab_size) def forward(self, x): x = self.embedding(x) * torch.sqrt(torch.tensor(self.embedding.embedding_dim, dtype=torch.float32)) x = self.positional_encoding(x) mask = nn.Transformer.generate_square_subsequent_mask(x.size(0)).to(x.device) output = self.transformer(x, x, tgt_mask=mask) output = self.fc(output) return output def generate(self, prompt, tokenizer, max_len=100, temperature=1.0, top_p=0.9): self.eval() input_ids = tokenizer.encode(prompt, return_tensors='pt').T for _ in range(max_len): with torch.no_grad(): logits = self.forward(input_ids) logits = logits[-1, :] / temperature if top_p < 1.0: sorted_logits, sorted_indices = torch.sort(logits, descending=True) cumulative_probs = torch.cumsum(F.softmax(sorted_logits, dim=-1), dim=-1) sorted_indices_to_remove = cumulative_probs > top_p sorted_indices_to_remove[1:] = sorted_indices_to_remove[:-1].clone() sorted_indices_to_remove[0] = 0 indices_to_remove = sorted_indices[sorted_indices_to_remove] logits[indices_to_remove] = float('-inf') probabilities = F.softmax(logits, dim=-1) next_token = torch.multinomial(probabilities, num_samples=1).item() input_ids = torch.cat([input_ids, torch.tensor([[next_token]])], dim=0) if next_token == tokenizer.eos_token_id: break return tokenizer.decode(input_ids.squeeze().tolist())

3. 性能对比

3.1 文本生成模型对比

模型生成质量训练难度推理速度适用场景
RNN简单生成
Transformer中等生成
GPT-2很高复杂生成
GPT-3极高很高高质量生成

3.2 生成策略对比

策略多样性连贯性可控性
Greedy
Beam Search很高很高
Top-K
Top-P
Temperature可调可调可调

3.3 模型大小影响

模型参数生成质量训练时间
GPT-2 small124M1周
GPT-2 medium355M2周
GPT-2 large774M很高4周
GPT-3175B极高数月

4. 最佳实践

4.1 文本生成模型选择

def select_generator(task_type, data_size): if task_type == 'simple': return RNNGenerator(10000, 256, 512) elif task_type == 'medium': return TransformerGenerator(10000, 512, 8, 2048, 6) else: from transformers import GPT2LMHeadModel return GPT2LMHeadModel.from_pretrained('gpt2') class GeneratorFactory: @staticmethod def create(config): if config['type'] == 'rnn': return RNNGenerator(**config['params']) elif config['type'] == 'transformer': return TransformerGenerator(**config['params']) elif config['type'] == 'gpt': from transformers import GPT2LMHeadModel return GPT2LMHeadModel.from_pretrained(config['model_name'])

4.2 文本生成训练流程

class TextGenerationTrainer: def __init__(self, model, optimizer, scheduler, loss_fn): self.model = model self.optimizer = optimizer self.scheduler = scheduler self.loss_fn = loss_fn def train_step(self, batch): self.optimizer.zero_grad() input_ids = batch['input_ids'] labels = batch['labels'] output = self.model(input_ids) loss = self.loss_fn(output.reshape(-1, output.size(-1)), labels.reshape(-1)) loss.backward() self.optimizer.step() self.scheduler.step() return loss.item() def evaluate(self, dataloader): self.model.eval() total_loss = 0 with torch.no_grad(): for batch in dataloader: input_ids = batch['input_ids'] labels = batch['labels'] output = self.model(input_ids) loss = self.loss_fn(output.reshape(-1, output.size(-1)), labels.reshape(-1)) total_loss += loss.item() return total_loss / len(dataloader)

5. 总结

文本生成已进入 Transformer 时代:

  1. GPT:目前最强大的文本生成模型
  2. 生成策略:根据需求选择合适策略
  3. 模型大小:更大模型通常更好但更慢
  4. 预训练模型:推荐使用现成的预训练模型

对比数据如下:

  • GPT-2 比 RNN 生成质量提升显著
  • Top-P 策略平衡多样性和连贯性
  • 温度参数控制随机性
  • 推荐使用预训练 GPT 模型进行微调

需要专业的网站建设服务?

联系我们获取免费的网站建设咨询和方案报价,让我们帮助您实现业务目标

立即咨询