Qianfan-OCR入门实战：Python requests调用OCR API并解析JSON响应-酒店常州论坛

Qianfan-OCR入门实战：Python requests调用OCR API并解析JSON响应

1. 项目概述

Qianfan-OCR是百度千帆推出的开源文档智能多模态模型，基于4B参数的Qwen3-4B语言模型构建。这个端到端解决方案将传统OCR流水线简化为单一模型处理，支持文字识别、版面分析和文档理解三大核心功能。

作为Apache 2.0协议下的开源项目，它既可用于商业场景，也支持进一步微调。模型采用InternVLChat架构，结合了InternViT视觉编码器和Qwen3-4B语言模型，在保持轻量化的同时实现了出色的多模态理解能力。

2. 环境准备

2.1 基础环境要求

确保你的开发环境满足以下条件：

Python 3.8+
requests库（用于API调用）
可访问的Qianfan-OCR服务（默认地址：http://localhost:7860）

安装requests库：

pip install requests

2.2 服务状态检查

在开始编码前，建议先确认OCR服务运行正常：

curl http://localhost:7860

正常应返回Gradio WebUI的HTML内容。

3. 基础API调用

3.1 简单OCR识别

以下是最基础的文字识别代码示例：

import requests import base64 def simple_ocr(image_path): # 读取图片并编码为base64 with open(image_path, "rb") as image_file: encoded_image = base64.b64encode(image_file.read()).decode('utf-8') # 构造请求数据 payload = { "image": encoded_image, "task": "ocr" } # 发送POST请求 response = requests.post("http://localhost:7860/api/predict", json=payload) # 返回原始JSON响应 return response.json() # 使用示例 result = simple_ocr("test.jpg") print(result)

3.2 响应数据结构解析

典型成功响应包含以下字段：

{ "status": "success", "data": { "text": "识别出的全部文字内容...", "language": "zh", "confidence": 0.95 } }

错误响应可能的结构：

{ "status": "error", "message": "错误描述信息" }

4. 进阶功能实现

4.1 带布局分析的OCR

启用Layout-as-Thought模式可获得结构化输出：

def layout_analysis(image_path): with open(image_path, "rb") as f: img_base64 = base64.b64encode(f.read()).decode('utf-8') payload = { "image": img_base64, "task": "ocr", "layout": True # 启用布局分析 } response = requests.post("http://localhost:7860/api/predict", json=payload) return response.json()

响应示例：

{ "status": "success", "data": { "text": "...", "layout": [ { "type": "title", "text": "文档标题", "bbox": [x1, y1, x2, y2] }, { "type": "paragraph", "text": "正文内容...", "bbox": [x1, y1, x2, y2] } ] } }

4.2 带提示词的信息提取

通过提示词实现定向信息抽取：

def prompt_ocr(image_path, prompt): with open(image_path, "rb") as f: img_base64 = base64.b64encode(f.read()).decode('utf-8') payload = { "image": img_base64, "task": "ocr", "prompt": prompt } response = requests.post("http://localhost:7860/api/predict", json=payload) return response.json()

使用示例：

# 提取表格内容 result = prompt_ocr("invoice.jpg", "请提取文档中的表格内容，以Markdown格式输出") # 提取特定字段 result = prompt_ocr("id_card.jpg", "请从图片中提取以下字段：姓名、身份证号、有效期。使用JSON格式输出")

5. 错误处理与调试

5.1 常见错误处理

建议在代码中加入完善的错误处理：

def safe_ocr(image_path): try: with open(image_path, "rb") as f: img_base64 = base64.b64encode(f.read()).decode('utf-8') response = requests.post( "http://localhost:7860/api/predict", json={"image": img_base64, "task": "ocr"}, timeout=10 ) result = response.json() if result.get("status") != "success": raise ValueError(f"OCR失败: {result.get('message', '未知错误')}") return result["data"] except requests.exceptions.RequestException as e: print(f"网络请求失败: {str(e)}") except Exception as e: print(f"处理失败: {str(e)}") return None

5.2 服务状态监控

可以通过API检查服务健康状态：

def check_service_health(): try: response = requests.get("http://localhost:7860/health", timeout=5) return response.status_code == 200 except: return False

6. 实战案例：发票信息提取

6.1 完整代码示例

import requests import base64 import json def extract_invoice_info(image_path): # 读取图片 with open(image_path, "rb") as f: img_data = base64.b64encode(f.read()).decode('utf-8') # 构造提示词 prompt = """请从发票中提取以下信息并以JSON格式返回： - 发票号码 - 开票日期 - 销售方名称 - 购买方名称 - 金额合计（大写和小写） - 商品列表（名称、规格、数量、单价、金额）""" # 发送请求 response = requests.post( "http://localhost:7860/api/predict", json={ "image": img_data, "task": "ocr", "prompt": prompt, "layout": True }, timeout=15 ) # 解析结果 result = response.json() if result["status"] == "success": try: # 尝试解析返回的JSON数据 invoice_data = json.loads(result["data"]["text"]) return invoice_data except json.JSONDecodeError: # 如果解析失败，返回原始文本 return result["data"]["text"] else: raise Exception(f"识别失败: {result.get('message')}") # 使用示例 invoice_info = extract_invoice_info("invoice.jpg") print(json.dumps(invoice_info, indent=2, ensure_ascii=False))

6.2 结果后处理建议

对于返回的非结构化结果，可以添加后处理逻辑：

def postprocess_text(text): # 去除多余空格和换行 text = ' '.join(text.split()) # 尝试提取关键信息（示例：提取日期） import re date_pattern = r"\d{4}年\d{1,2}月\d{1,2}日" dates = re.findall(date_pattern, text) return { "raw_text": text, "extracted_dates": dates }

7. 总结

通过本文的实践指南，我们系统性地掌握了：

基础调用：学会使用Python requests库与Qianfan-OCR API交互
功能进阶：实现了布局分析、提示词工程等高级功能
错误处理：构建了健壮的异常处理机制
实战应用：完成了发票信息提取的端到端案例

Qianfan-OCR作为开源多模态模型，相比传统OCR方案具有显著优势：

端到端处理流程简化
支持复杂的文档理解任务
通过提示词实现灵活的信息抽取
开源协议允许商业使用和二次开发

对于希望快速实现文档智能处理的应用场景，这套方案提供了开箱即用的强大能力。下一步可以探索：

批量处理优化
自定义模型微调
与其他系统的集成方案

获取更多AI镜像
想探索更多AI镜像和应用场景？访问 CSDN星图镜像广场，提供丰富的预置镜像，覆盖大模型推理、图像生成、视频生成、模型微调等多个领域，支持一键部署。

企业官网建设流程全解析