blast
Introduction
简介
BLAST(Basic Local Alignment Search Tool)是生物序列相似性比较及区域查找的分析工具。可用于推断序列之间的功能和进化关系,以及帮助鉴定基因家族的成员。
该BLAST WDL 工作流程采用 ncbi-BLAST+ 2.13.0 软件
其主要包括以子程序:
- blastp:蛋白序列与蛋白库做比对。
- blastx:核酸序列对蛋白库的比对。
- blastn:核酸序列对核酸库的比对。
- tblastn:蛋白序列对核酸库的比对,将给定的氨基酸序列与核酸数据库中的序列(双链)按不同的阅读框进行比对。
- tblastx: 核酸序列对核酸库的比对,检索的序列和核酸序列数据库中的序列按不同的阅读框全部翻译成蛋白质序列,然后进行蛋白质序列比对。
可以通过method
参数切换不同子程序,默认 为blastn。
BLAST详细说明请查阅NCBI 说明文档
使用案例
1.使用预设BLAST Database
目前该流程收纳了CNGBdb新冠数据库数据,未来我们会收纳更多CNGBdb归档数据,您可以配置通过 input 中dbname
参数,选择不同数据库。
2. 自定义BLAST Database
你可以通过 File 类型参数 custom_db
及 String 类型参数 custom_db_dbtype
分别定义您需要检索自定义数据库序列文件和文件类型。
除此之外,我们支持用户个性化修改不同子程序的默认参数,从而达到理想的结果。
如:修改Task name
中 blast.runtblastn 的 Attribute name
为word_size即修改blast工作流中blastn中特异的word_size 参数。
详细参数介绍查看下文input参数。
联系我们
该工具由计算平台团队提供。如有任何问题或疑虑,请联系CNGBdb@cngb.org
Script
Input
Task name | Attribute name | Type | Description |
---|---|---|---|
* blast | queryfa | File | Input file name |
blast.runtblastx | word_size | Int | Word size for wordfinder algorithm (length of best perfect match) |
blast.runtblastx | taxids | String | Restrict search of database to include only the specified taxonomy IDs(multiple IDs delimited by ',') |
blast.runtblastx | seg | String | Filter query sequence with SEG (Format: 'yes', 'window locut hicut', or 'no' to disable) |
blast.runtblastx | negative_taxids | String | Restrict search of database to everything except the specified taxonomy IDs |
blast.runtblastx | max_target_seqs | Int | Maximum number of aligned sequences to keep |
blast.runtblastx | max_hsps | Int | Set maximum number of HSPs per subject sequence to save for each query |
blast.runtblastx | matrix | String | Scoring matrix name (normally BLOSUM62) |
blast.runtblastx | lcase_masking | Boolean | Use lower case filtering in query and subject sequence(s)? |
blast.runtblastn | word_size | Int | Word size for wordfinder algorithm (length of best perfect match) |
blast.runtblastn | taxids | String | Restrict search of database to include only the specified taxonomy IDs(multiple IDs delimited by ',') |
blast.runtblastn | seg | String | Filter query sequence with SEG (Format: 'yes', 'window locut hicut', or 'no' to disable) |
blast.runtblastn | negative_taxids | String | Restrict search of database to everything except the specified taxonomy IDs |
blast.runtblastn | max_target_seqs | Int | Maximum number of aligned sequences to keep |
blast.runtblastn | max_hsps | Int | Set maximum number of HSPs per subject sequence to save for each query |
blast.runtblastn | matrix | String | Scoring matrix name (normally BLOSUM62) |
blast.runtblastn | lcase_masking | Boolean | Use lower case filtering in query and subject sequence(s)? |
blast.runtblastn | gapopen | Int | Cost to open a gap |
blast.runtblastn | gapextend | Int | Cost to extend a gap |
blast.runtblastn | comp_based_stats | String | Use composition-based statistics: D or d: default (equivalent to 2 ) ;0 or F or f: No composition-based statistics;1: Composition-based statistics as in NAR 29:2994-3005, 2001;2 or T or t : Composition-based score adjustment as in Bioinformatics;3: Composition-based score adjustment as in Bioinformatics 21:902-911 |
blast.runblastx | word_size | Int | Word size for wordfinder algorithm (length of best perfect match) |
blast.runblastx | taxids | String | Restrict search of database to include only the specified taxonomy IDs(multiple IDs delimited by ',') |
blast.runblastx | seg | String | Filter query sequence with SEG (Format: 'yes', 'window locut hicut', or 'no' to disable) |
blast.runblastx | negative_taxids | String | Restrict search of database to everything except the specified taxonomy IDs |
blast.runblastx | max_target_seqs | Int | Maximum number of aligned sequences to keep |
blast.runblastx | max_hsps | Int | Set maximum number of HSPs per subject sequence to save for each query |
blast.runblastx | matrix | String | Scoring matrix name (normally BLOSUM62) |
blast.runblastx | lcase_masking | Boolean | Use lower case filtering in query and subject sequence(s)? |
blast.runblastx | gapopen | Int | Cost to open a gap |
blast.runblastx | gapextend | Int | Cost to extend a gap |
blast.runblastx | comp_based_stats | String | Use composition-based statistics: D or d: default (equivalent to 2 ) ;0 or F or f: No composition-based statistics;1: Composition-based statistics as in NAR 29:2994-3005, 2001;2 or T or t : Composition-based score adjustment as in Bioinformatics;3: Composition-based score adjustment as in Bioinformatics 21:902-911 |
blast.runblastp | word_size | Int | Word size for wordfinder algorithm (length of best perfect match) |
blast.runblastp | taxids | String | Restrict search of database to include only the specified taxonomy IDs(multiple IDs delimited by ',') |
blast.runblastp | seg | String | Filter query sequence with SEG (Format: 'yes', 'window locut hicut', or 'no' to disable) |
blast.runblastp | negative_taxids | String | Restrict search of database to everything except the specified taxonomy IDs |
blast.runblastp | max_target_seqs | Int | Maximum number of aligned sequences to keep |
blast.runblastp | max_hsps | Int | Set maximum number of HSPs per subject sequence to save for each query |
blast.runblastp | matrix | String | Scoring matrix name (normally BLOSUM62) |
blast.runblastp | lcase_masking | Boolean | Use lower case filtering in query and subject sequence(s)? |
blast.runblastp | gapopen | Int | Cost to open a gap |
blast.runblastp | gapextend | Int | Cost to extend a gap |
blast.runblastp | comp_based_stats | String | Use composition-based statistics: D or d: default (equivalent to 2 ) ;0 or F or f: No composition-based statistics;1: Composition-based statistics as in NAR 29:2994-3005, 2001;2 or T or t : Composition-based score adjustment as in Bioinformatics;3: Composition-based score adjustment as in Bioinformatics 21:902-911 |
blast.runblastn | word_size | Int | Word size for wordfinder algorithm (length of best perfect match) |
blast.runblastn | taxids | String | Restrict search of database to include only the specified taxonomy IDs(multiple IDs delimited by ',') |
blast.runblastn | tasks | String | Task to execute |
blast.runblastn | strand | String | Query strand(s) to search against database/subject |
blast.runblastn | reward | Int | Reward for a nucleotide match |
blast.runblastn | penalty | Int | Penalty for a nucleotide mismatch |
blast.runblastn | negative_taxids | String | Restrict search of database to everything except the specified taxonomy IDs |
blast.runblastn | max_target_seqs | Int | Maximum number of aligned sequences to keep |
blast.runblastn | max_hsps | Int | Set maximum number of HSPs per subject sequence to save for each query |
blast.runblastn | lcase_masking | Boolean | Use lower case filtering in query and subject sequence(s)? |
blast.runblastn | gapopen | Int | Cost to open a gap |
blast.runblastn | gapextend | Int | Cost to extend a gap |
blast.runblastn | dust | String | Filter query sequence with DUST (Format: 'yes', 'level window linker', or 'no' to disable) |
blast | threads | Int | Number of threads (CPUs) to use in the BLAST search |
blast | outfmt | Int | alignment view options |
blast | method | String | Blast component :blastn blastp blastx tblastn tblasx |
blast | dbname | String | path of blast database by CNGBdb team provided |
blast | evalue | Float | Expectation value (E) threshold for saving hits |
blast | blast_docker_override | String | docker of blast software |
blast | custom_db | File | User-defined file of blast database |
blast | custom_db_dbtype | File | data type of user-defined file of blast database |
Output
Task name | Attribute name | Type | Description |
---|---|---|---|
blast | fina_output | File | Return the output file to the column name of the corresponding table by this.xxx |