[gt-users] Tallymer to get k-mers in a specific range?

Stefan Kurtz stefan.kurtz at me.com
Fri Jan 15 22:38:44 CET 2010


I fogot two things:

You have to call gt tallymer mkindex.
And please add the option -scan to read the suffix array sequentially.
To construct the suffix array with suffixerator you probably have to use the
option -parts <numberofparts> to reduce the memory peak.

Bye,

Stefan

Stefan Kurtz wrote:
> Hi Dan,
>
> Dan Bolser wrote:
>> Hi All, Stefan,
>>
>> I've just read through the Tallymer manual, so apologies if this
>> question is obvious...
>>
>> I have a large set of reads, and I'd like to extract all k-mers (the
>> sequences) with occurrence between 200 and 250 (mitochondrially
>> derived).
>>
>> I can't seem to see a way to do that with the Tallymer tools...
>>   
>
> With the program gt tallymer mkindex you actually do not have to 
> create an index,
> but you can directly pull out the k-mers satisfying the length 
> constrains. Once you have
> the enhanced suffix array of your read set created you can call
>
> gt mkindex -esa myesaindex -mersize 51 -minocc 200 -maxocc 250 > out.tmp
>
> In out.tmp you find lines like
>
> 200 34234
> aatctataatggaatattaaaaaaaaaaaaaattata....agagatatata
> ...
> <34233 more lines with 51-mers>
> 201 663
> atatcaggatatgagc......aactatcgactatacgacgacgacggac
> <662 more lines with 51-mers>
> ...
>
> A simple postprocessing script should make it easy to extract the 
> 51-mers with the
> appropriate occurrence counts. Note that the output generated can be huge
>
>>
>> Also, I'd like to plot the k-mer frequency distribution for a given
>> k-mer length, i.e. given a k-mer length of 51, how many k-mers occur 1
>> time, 2 times, 3 times, ... x times. etc. I can't see how to do this
>> directly.
>>   
> The lines containing integer pairs in the output of the call above 
> should give you exactly this
> information.
>
> Hope this helps. If not, contact me again.
>>
>> The best I can come up with is creating a fake sequence file with all
>> possible k-mers in it and passing that to search, but I'm sure this
>> information should be available directly from one of the index files.
>>
>>
>> Thanks for any help with these issues.
>>
>> All the best,
>> Dan.
>> _______________________________________________
>> gt-users mailing list
>> gt-users at genometools.org
>> http://genometools.org/mailman/listinfo/gt-users
>>
>>   
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> gt-users mailing list
> gt-users at genometools.org
> http://genometools.org/mailman/listinfo/gt-users
>   


-- 
Prof. Dr. Stefan Kurtz
Zentrum fuer Bioinformatik
Universitaet Hamburg
Bundesstrasse 43
20146 Hamburg
Germany

Email:  kurtz at zbh.uni-hamburg.de
URL:    http://www.zbh.uni-hamburg.de/kurtz
Phone:  +49 (40) 42838 7311
FAX:    +49 (40) 42838 7312

-------------- next part --------------
A non-text attachment was scrubbed...
Name: stefan_kurtz.vcf
Type: text/x-vcard
Size: 185 bytes
Desc: not available
Url : http://genometools.org/pipermail/gt-users/attachments/20100115/e99ce193/attachment.vcf 


More information about the gt-users mailing list