[gt-users] get feature by ID
Brent Pedersen
bpederse at gmail.com
Thu Feb 12 00:20:59 CET 2009
On Wed, Feb 11, 2009 at 6:16 AM, Sascha Steinbiss
<steinbiss at zbh.uni-hamburg.de> wrote:
> Brent Pedersen wrote:
>> hi again,
>
> Hi Brent,
>
>> one thing that i dont know how to do in genometools via the bindings
>> is to get at a FeatureNode
>> from its ID. i've been doing this by using a python dictionary, but
>> then instead of simply using FeatureIndex.add_gff3file(), i have to
>> use a stream and iterate over it in python to fill the dictionary. is
>> there somethign like this in the C code that's just not exposed in the
>> API?
>
> Using the implementation as it is now, this is exactly the way to do it
> as we do not have an ID-based index for an annotation file (yet). Thus
> all root nodes must be read into memory if you want to write your own
> index like this (which is quite easy, and can also be kept in memory for
> subsequent accesses).
> Furthermore, if annotations from multiple files are processed like this,
> IDs are no longer guaranteed to be unique, because an ID needs only be
> unique in the scope of a single annotation file. As the IDs are only
> changed when they are output via a GFF3OutStream, at this point they may
> still be ambiguous. Keep that in mind when indexing like this.
i do this often enough that i just wrote a FeatureIndexMemory subclass
to automatically
create a dict with key => value of ID => gt_pointer. then the
FeatureNode object is not
created until it's actually requested.
i also had some fun abusing the python slicing syntax. usage and
implementation here:
http://gist.github.com/62359
it currently doesn't handle potentially repeated ID's... but i'm
usually just dealing with a single file at a time.
>
>> on a sort of related note, one feature that would be nice is something
>> to remove this boilerplate:
>>
>> genome_stream = FeatureStream(genome_stream, feature_index)
>> feature = genome_stream.next_tree()
>> while feature:
>> feature = genome_stream.next_tree()
>>
>> maybe with something like:
>>
>> FeatureStream(genome_stream, feature_index).iterate_all()
>>
>> that doesnt apply to the problem of make a name => feature hash, but
>> is nice for filling a feature_index and doing an intron_stream.
>
> This is a nice idea and should be implementable without many problems as
> a general method in the GenomeStream class.
>
>> thanks,
>> -brent
>
> Sascha
>
> --
> Sascha Steinbiss
> Center for Bioinformatics
> University of Hamburg
> Bundesstr. 43
> 20146 Hamburg
> Germany
>
> Email: steinbiss at zbh.uni-hamburg.de
> URL: http://www.zbh.uni-hamburg.de/steinbiss
> Phone: +49 (40) 42838 7322
> FAX: +49 (40) 42838 7312
>
> _______________________________________________
> gt-users mailing list
> gt-users at genometools.org
> http://genometools.org/mailman/listinfo/gt-users
>
More information about the gt-users
mailing list