[gt-users] get feature by ID
Sascha Steinbiss
steinbiss at zbh.uni-hamburg.de
Wed Feb 11 15:16:59 CET 2009
Brent Pedersen wrote:
> hi again,
Hi Brent,
> one thing that i dont know how to do in genometools via the bindings
> is to get at a FeatureNode
> from its ID. i've been doing this by using a python dictionary, but
> then instead of simply using FeatureIndex.add_gff3file(), i have to
> use a stream and iterate over it in python to fill the dictionary. is
> there somethign like this in the C code that's just not exposed in the
> API?
Using the implementation as it is now, this is exactly the way to do it
as we do not have an ID-based index for an annotation file (yet). Thus
all root nodes must be read into memory if you want to write your own
index like this (which is quite easy, and can also be kept in memory for
subsequent accesses).
Furthermore, if annotations from multiple files are processed like this,
IDs are no longer guaranteed to be unique, because an ID needs only be
unique in the scope of a single annotation file. As the IDs are only
changed when they are output via a GFF3OutStream, at this point they may
still be ambiguous. Keep that in mind when indexing like this.
> on a sort of related note, one feature that would be nice is something
> to remove this boilerplate:
>
> genome_stream = FeatureStream(genome_stream, feature_index)
> feature = genome_stream.next_tree()
> while feature:
> feature = genome_stream.next_tree()
>
> maybe with something like:
>
> FeatureStream(genome_stream, feature_index).iterate_all()
>
> that doesnt apply to the problem of make a name => feature hash, but
> is nice for filling a feature_index and doing an intron_stream.
This is a nice idea and should be implementable without many problems as
a general method in the GenomeStream class.
> thanks,
> -brent
Sascha
--
Sascha Steinbiss
Center for Bioinformatics
University of Hamburg
Bundesstr. 43
20146 Hamburg
Germany
Email: steinbiss at zbh.uni-hamburg.de
URL: http://www.zbh.uni-hamburg.de/steinbiss
Phone: +49 (40) 42838 7322
FAX: +49 (40) 42838 7312
More information about the gt-users
mailing list