ProDesign uses a sequence file and an optional cluster file to generate probe sets.

The sequence file has to be a regular FASTA, while the cluster file can be either a CD-hit, Tribe-MCL or ProDesign's own cluster.

CD-HIT cluster file

> >Cluster 0
0 2799aa, >PF04998.6|RPOC2_CHLRE/275-3073... *
> >Cluster 1
0 2214aa, >PF06317.1|Q6Y625_9VIRU/1-2214... at 80%
1 2215aa, >PF06317.1|O09705_9VIRU/1-2215... at 84%
2 2217aa, >PF06317.1|Q6Y630_9VIRU/1-2217... *
3 2216aa, >PF06317.1|Q6GWS6_9VIRU/1-2216... at 84%
4 527aa, >PF06317.1|Q67E14_9VIRU/6-532... at 63%

....

where
a > starts a new cluster
a * at the end means that this sequence is the representative of the cluster
a % is the identity between this sequence and the representative

 

The TribeMCL cluster file is a file where each line is a cluster of tab-separated sequence
names, and looks like

ref|NC_007607.1|:c42795-42529 ref|NC_007385.1|:40541-40807
ref|NC_007385.1|:c187634-187467
ref|NC_007946.1|:305929-306267 ref|NC_004431.1|:370454-370792
ref|NC_007946.1|:c306371-305979
ref|NC_000913.2|:c4520043-4518694
ref|AC_000091.1|:c4526700-4525351
ref|NC_007946.1|:c2144363-2141331
ref|NC_004431.1|:c2293189-2290157

ProDesign cluster file

On a ProDesign cluster file the first line is the total number of clusters, the second line is the size of the first cluster, and the order of the sequence in the clusters should be in the same order of the related FASTA.



Input files