There have been several studies on the usefulness of spaced seeds (Ma et al., 2002; Keich et al., 2002; Buhler et al., 2003; Choi et al., 2004; Noe and Kucherov 2004; Kutcherov et al., 2004, 2005, 2006).
The spaced seeds of larger weight consume more memory and tend to reduce the specificity of oligonucleotide probes, while those of smaller weight tend to have lower coverage. Therefore there is a tradeoff between specificity and coverage. We include some reported optimal spaced seeds here. For small dataset (e.g. fewer than 10,000 sequences), a spaced seed of weight 10 may be adequate. For large datasets (e.g. more than 50,000 sequences), spaced seed of weight 11 or more are preferred.
The program Hedera may also be used to generate optimal spaced seeds of different weights.
Some examples of spaced seeds from Kucherov et al., 2004, 2005, 2006:
| model | w |
spaced seeds | spaced seeds 2 @ | spaced seeds 4@ |
| B | 10 |
##_##___##_#_### | ##_#__##_@_#_@### | #@#_#@_#_@#__@### |
| ###_@#_@#_#_### | ||||
11 |
###_#__#_#__##_### | ###_@#__#_@#_#_### | ##@@#__#@_#_#_@### | |
| ##@#__##_#_#_@### | ||||
| DT1 | 10 |
###_#_##__#__### | ###_#__#_@#@_### | ##@#__@#_#_@#_@## |
| ##_##___@##_##@# | ||||
11 |
###_#__#__##_#_### | ###_#__#@_#_##_@## | ##@#_@@#_#__#@_### | |
| ##@#@_##_##__### | ||||
| DT2 | 10
|
##_##____##_##_## | ##@#@_##____##_## | ##@#@_##___@#@_## |
| ##_@#_##__@_##_## | ||||
11 |
##_###___##_##_## | ##@##___@##_##_## | ##@#@_##@___##_##@ | |
| ##_##_@#_#___#_#@_## | ||||
| NT | 10 |
##_##____##_##_## | ##_##___@##_##@# | ##_@#@#@_##_@## |
| ##_##____##_@@_##_# | ||||
11 |
##_##____##_##_### | ##@#@_##_##__### | ##@#@_#@_##_@### | |
| ##_##____##_@@_##_## |