Codeplot
blast
Introduction

简介

BLAST(Basic Local Alignment Search Tool)是生物序列相似性比较及区域查找的分析工具。可用于推断序列之间的功能和进化关系,以及帮助鉴定基因家族的成员。

该BLAST WDL 工作流程采用 ncbi-BLAST+ 2.13.0 软件
其主要包括以子程序:

  • blastp:蛋白序列与蛋白库做比对。
  • blastx:核酸序列对蛋白库的比对。
  • blastn:核酸序列对核酸库的比对。
  • tblastn:蛋白序列对核酸库的比对,将给定的氨基酸序列与核酸数据库中的序列(双链)按不同的阅读框进行比对。
  • tblastx: 核酸序列对核酸库的比对,检索的序列和核酸序列数据库中的序列按不同的阅读框全部翻译成蛋白质序列,然后进行蛋白质序列比对。

可以通过method参数切换不同子程序,默认 为blastn。

BLAST详细说明请查阅NCBI 说明文档

使用案例

1.使用预设BLAST Database

目前该流程收纳了CNGBdb新冠数据库数据,未来我们会收纳更多CNGBdb归档数据,您可以配置通过 input 中dbname 参数,选择不同数据库。

2. 自定义BLAST Database

你可以通过 File 类型参数 custom_db 及 String 类型参数 custom_db_dbtype 分别定义您需要检索自定义数据库序列文件和文件类型。

除此之外,我们支持用户个性化修改不同子程序的默认参数,从而达到理想的结果。

如:修改Task name 中 blast.runtblastn 的 Attribute name 为word_size即修改blast工作流中blastn中特异的word_size 参数。

详细参数介绍查看下文input参数。

联系我们

该工具由计算平台团队提供。如有任何问题或疑虑,请联系CNGBdb@cngb.org

Script
Input
Task nameAttribute nameTypeDescription
* blast queryfaFile Input file name
* blast.runtblastx word_sizeInt Word size for wordfinder algorithm (length of best perfect match)
* blast.runtblastx taxidsString Restrict search of database to include only the specified taxonomy IDs(multiple IDs delimited by ',')
* blast.runtblastx segString Filter query sequence with SEG (Format: 'yes', 'window locut hicut', or 'no' to disable)
* blast.runtblastx negative_taxidsString Restrict search of database to everything except the specified taxonomy IDs
* blast.runtblastx max_target_seqsInt Maximum number of aligned sequences to keep
* blast.runtblastx max_hspsInt Set maximum number of HSPs per subject sequence to save for each query
* blast.runtblastx matrixString Scoring matrix name (normally BLOSUM62)
* blast.runtblastx lcase_maskingBoolean Use lower case filtering in query and subject sequence(s)?
* blast.runtblastn word_sizeInt Word size for wordfinder algorithm (length of best perfect match)
* blast.runtblastn taxidsString Restrict search of database to include only the specified taxonomy IDs(multiple IDs delimited by ',')
* blast.runtblastn segString Filter query sequence with SEG (Format: 'yes', 'window locut hicut', or 'no' to disable)
* blast.runtblastn negative_taxidsString Restrict search of database to everything except the specified taxonomy IDs
* blast.runtblastn max_target_seqsInt Maximum number of aligned sequences to keep
* blast.runtblastn max_hspsInt Set maximum number of HSPs per subject sequence to save for each query
* blast.runtblastn matrixString Scoring matrix name (normally BLOSUM62)
* blast.runtblastn lcase_maskingBoolean Use lower case filtering in query and subject sequence(s)?
* blast.runtblastn gapopenInt Cost to open a gap
* blast.runtblastn gapextendInt Cost to extend a gap
* blast.runtblastn comp_based_statsString Use composition-based statistics: D or d: default (equivalent to 2 ) ;0 or F or f: No composition-based statistics;1: Composition-based statistics as in NAR 29:2994-3005, 2001;2 or T or t : Composition-based score adjustment as in Bioinformatics;3: Composition-based score adjustment as in Bioinformatics 21:902-911
* blast.runblastx word_sizeInt Word size for wordfinder algorithm (length of best perfect match)
* blast.runblastx taxidsString Restrict search of database to include only the specified taxonomy IDs(multiple IDs delimited by ',')
* blast.runblastx segString Filter query sequence with SEG (Format: 'yes', 'window locut hicut', or 'no' to disable)
* blast.runblastx negative_taxidsString Restrict search of database to everything except the specified taxonomy IDs
* blast.runblastx max_target_seqsInt Maximum number of aligned sequences to keep
* blast.runblastx max_hspsInt Set maximum number of HSPs per subject sequence to save for each query
* blast.runblastx matrixString Scoring matrix name (normally BLOSUM62)
* blast.runblastx lcase_maskingBoolean Use lower case filtering in query and subject sequence(s)?
* blast.runblastx gapopenInt Cost to open a gap
* blast.runblastx gapextendInt Cost to extend a gap
* blast.runblastx comp_based_statsString Use composition-based statistics: D or d: default (equivalent to 2 ) ;0 or F or f: No composition-based statistics;1: Composition-based statistics as in NAR 29:2994-3005, 2001;2 or T or t : Composition-based score adjustment as in Bioinformatics;3: Composition-based score adjustment as in Bioinformatics 21:902-911
* blast.runblastp word_sizeInt Word size for wordfinder algorithm (length of best perfect match)
* blast.runblastp taxidsString Restrict search of database to include only the specified taxonomy IDs(multiple IDs delimited by ',')
* blast.runblastp segString Filter query sequence with SEG (Format: 'yes', 'window locut hicut', or 'no' to disable)
* blast.runblastp negative_taxidsString Restrict search of database to everything except the specified taxonomy IDs
* blast.runblastp max_target_seqsInt Maximum number of aligned sequences to keep
* blast.runblastp max_hspsInt Set maximum number of HSPs per subject sequence to save for each query
* blast.runblastp matrixString Scoring matrix name (normally BLOSUM62)
* blast.runblastp lcase_maskingBoolean Use lower case filtering in query and subject sequence(s)?
* blast.runblastp gapopenInt Cost to open a gap
* blast.runblastp gapextendInt Cost to extend a gap
* blast.runblastp comp_based_statsString Use composition-based statistics: D or d: default (equivalent to 2 ) ;0 or F or f: No composition-based statistics;1: Composition-based statistics as in NAR 29:2994-3005, 2001;2 or T or t : Composition-based score adjustment as in Bioinformatics;3: Composition-based score adjustment as in Bioinformatics 21:902-911
* blast.runblastn word_sizeInt Word size for wordfinder algorithm (length of best perfect match)
* blast.runblastn taxidsString Restrict search of database to include only the specified taxonomy IDs(multiple IDs delimited by ',')
* blast.runblastn tasksString Task to execute
* blast.runblastn strandString Query strand(s) to search against database/subject
* blast.runblastn rewardInt Reward for a nucleotide match
* blast.runblastn penaltyInt Penalty for a nucleotide mismatch
* blast.runblastn negative_taxidsString Restrict search of database to everything except the specified taxonomy IDs
* blast.runblastn max_target_seqsInt Maximum number of aligned sequences to keep
* blast.runblastn max_hspsInt Set maximum number of HSPs per subject sequence to save for each query
* blast.runblastn lcase_maskingBoolean Use lower case filtering in query and subject sequence(s)?
* blast.runblastn gapopenInt Cost to open a gap
* blast.runblastn gapextendInt Cost to extend a gap
* blast.runblastn dustString Filter query sequence with DUST (Format: 'yes', 'level window linker', or 'no' to disable)
* blast threadsInt Number of threads (CPUs) to use in the BLAST search
* blast outfmtInt alignment view options
* blast methodString Blast component :blastn blastp blastx tblastn tblasx
* blast dbnameString path of blast database by CNGBdb team provided
* blast evalueFloat Expectation value (E) threshold for saving hits
* blast blast_docker_overrideString docker of blast software
* blast custom_dbFile User-defined file of blast database
* blast custom_db_dbtypeFile data type of user-defined file of blast database
Output
Task nameAttribute nameTypeDescription
* blast fina_outputFile Return the output file to the column name of the corresponding table by this.xxx