PDB批量下载官方脚本
2026/7/2 1:22:13 网站建设 项目流程

1、RCSB-PDB对晶体结构编号进行了更新

在官网首页提供了归档数据下载的链接

2、官方提供了批量下载的脚本

支持多种类型批量下载、端点重连功能十分方便

This script is for downloading all released PDB entries (of a single file type/format) from the PDB Beta Archive. It uses asynchronous aiohttp library to download multiple files asynchronously when performing bulk downloads. It requires python 3.8 or higher and aiofiles, aiohttp packages. The aiofiles, aiohttp packages can be installed with the following commands: pip install aiofiles pip install aiohttp The script requires two input arguments to run. The following example command line downloads all mmCIF files and stores the downloaded files under the directory, `/home/my_user_id/download`: python BetaArchiveBatchDownloader.py --file_type mmcif --output_dir /home/my_user_id/download (Run the following command lines to see all supported download file types: python BetaArchiveBatchDownloader.py or python BetaArchiveBatchDownloader.py -h or python BetaArchiveBatchDownload.py --help It shows: --file_type FILE_TYPE The supported file types for downloading are listed in left column. The corresponding file naming conventions are listed in right column. mmcif : pdb_xxxxxxxx.cif.gz pdb : pdb_xxxxxxxx.pdb.gz assemblies : pdb_xxxxxxxx-assembly#.cif.gz XML : pdb_xxxxxxxx.xml.gz XML-extatom : pdb_xxxxxxxx-extatom.xml.gz XML-noatom : pdb_xxxxxxxx-noatom.xml.gz structure_factors : pdb_xxxxxxxx-sf.cif.gz nmr_data_str : pdb_xxxxxxxx_nmr-data.str.gz nmr_data_nef : pdb_xxxxxxxx_nmr-data.nef.gz nmr_chemical_shifts : pdb_xxxxxxxx_cs.str.gz nmr_restraints : pdb_xxxxxxxx.mr.gz nmr_restraints_v2 : pdb_xxxxxxxx_mr.str.gz validation_cif : pdb_xxxxxxxx_validation.cif.gz validation_xml : pdb_xxxxxxxx_validation.xml.gz validation_pdf : pdb_xxxxxxxx_validation.pdf.gz full_validation_pdf : pdb_xxxxxxxx_full_validation.pdf.gz ) How the downloaded files are stored: Since the current Archive has more than 246000+ entries, it is not desirable to have quarter million files under a single directory. The script first creates a top sub directory using file type name as sub directory name (/home/my_user_id/download/mmcif), then creates the hash directories based on pdb ids. The downloaded files are stored in hash directories based on pdb ids. For the above example command, the downloaded files are stored as following: /home/my_user_id/download/mmcif/00/pdb_0000100d.cif.gz /home/my_user_id/download/mmcif/00/pdb_0000200d.cif.gz /home/my_user_id/download/mmcif/00/pdb_0000200l.cif.gz /home/my_user_id/download/mmcif/00/pdb_0000300d.cif.gz /home/my_user_id/download/mmcif/00/pdb_0000400d.cif.gz /home/my_user_id/download/mmcif/01/pdb_0000101d.cif.gz /home/my_user_id/download/mmcif/01/pdb_0000101m.cif.gz /home/my_user_id/download/mmcif/01/pdb_0000201d.cif.gz /home/my_user_id/download/mmcif/01/pdb_0000201l.cif.gz /home/my_user_id/download/mmcif/01/pdb_0000301d.cif.gz /home/my_user_id/download/mmcif/01/pdb_0000401d.cif.gz

3、本地化下载需要对脚本进行修改

脚本下载地址:https://cdn.rcsb.org/wwpdb/docs/BetaArchiveBatchDownloader.py

#row99在aiohttp.ClientSession中增加trust_env=True async with aiohttp.ClientSession(trust_env=True) as session: #row270 在aiohttp.ClientSession中增加trust_env=True async with aiohttp.ClientSession(trust_env=True) as session:

需要专业的网站建设服务?

联系我们获取免费的网站建设咨询和方案报价,让我们帮助您实现业务目标

立即咨询