[gt-users] uniquesub, packedindex and reverse complements

James Casbon casbon at gmail.com
Fri Jul 3 11:31:43 CEST 2009


Can a packedindex index both strands of DNA when building an index,
without explicitly including both in the input file?

For example, if I have a fasta file which contains two copies of
'AAAAA' and then query 'AAAAA' for uniqueness, I get nothing as
expected.

james at datarig:/data/uniquesub/ecoli/test$ cat test.fa
>1
AAAAA
>2
AAAAA
james at datarig:/data/uniquesub/ecoli/test$ gt packedindex mkindex -dna
-parts 1 -bsize 10 -locfreq 32 -indexname test -db test.fa -dir rev
...
james at datarig:/data/uniquesub/ecoli/test$ gt uniquesub -pck test
-query test.fa -min 1  -output querypos sequence
unit 0 (1)
unit 1 (2)



But if I have the reverse complement, then they are not recognised as
duplicates:


james at datarig:/data/uniquesub/ecoli/test$ cat test.fa
>1
AAAAA
>2
TTTTT
james at datarig:/data/uniquesub/ecoli/test$ gt packedindex mkindex -dna
-parts 1 -bsize 10 -locfreq 32 -indexname test -db test.fa -dir rev
...
james at datarig:/data/uniquesub/ecoli/test$ gt uniquesub -pck test
-query test.fa -min 1  -output querypos sequence
unit 0 (1)
0 5 aaaaa
unit 1 (2)
0 5 ttttt


More information about the gt-users mailing list