[gt-users] how to access GtGenomeNode content from stream?

Sascha Steinbiss steinbiss at zbh.uni-hamburg.de
Tue Apr 21 11:13:19 CEST 2009


David Ellinghaus wrote:
> Dear genometools developers,

Dear David,

> I try to get familiar with the genometools and would like to implement a 
> stream in genometools
> simply reading huge tables from input files line by line.
> I wrote something like that for my  new "tools/gt_fam.c":
> 
>   GtNodeStream *fam_in_stream;
>   GtGenomeNode *gn;
> 
>   /* create a fam input stream */
>   fam_in_stream = 
> gt_fam_in_stream_new(gt_str_get(arguments->str_file_selection));
> 
>   /* pull the FAM files through the stream and free them afterwards */
>   while (!(had_err = gt_node_stream_next(fam_in_stream, &gn, err)) && gn) {
>    
>     (1) How to get the information from gn ?

That depends on which GenomeNode subclass/implementation you have 
created in the input stream. The SequenceNode, FeatureNode and 
CommentNode classes offer methods to extract specific data (see the 
respective hearder files), while the GenomeNode (genome_node_api.h) 
methods offers general accessors to ranges etc.
You can check if a GenomeNode is of a specific subclass/implements a 
specific interface by calling a try_cast function. For example, to check 
whether a GenomeNode supports SequenceNode functionality, use

GtSequenceNode *sn;
if ((sn = gt_sequence_node_try_cast(gn)) == NULL) {
   printf("Could not cast genome node -- is not a sequence node!");
}

>     gt_genome_node_delete(gn);
>   }
> 
>   /* free */
>   gt_node_stream_delete(fam_in_stream);
> 
> 
> I adapted my own source code in "extended/fam_in_stream.c" from your 
> file "extended/bed_in_stream.c."
> Unfortunately, I don't know how to access my parsed value (fam_id) from 
> each line which must be in GtGenomeNode gn.
> Is there any possibility to cast the gn to access my string fam_id in 
> each line?

IMHO the cleanest solution would be to subclass GenomeNode, thus 
creating your own node type, and define appropriate accessors there 
(e.g. gt_fam_node_get_fam_id()). Gordon, do you agree?

As a side note: for efficient sequential (unmapped) sequence access, in 
the latest GenomeTools versions the GtSeqIterator and GtSeqIteratorQual 
classes may help you. I do not know the FAM format you are using here, 
but as I remember a prior question of yours, I think I could mention it ;)

> Thanks a lot in advance
> Best regards
> David

No problem!
Sascha

-- 
Sascha Steinbiss
Center for Bioinformatics
University of Hamburg
Bundesstr. 43
20146 Hamburg
Germany

Email:  steinbiss at zbh.uni-hamburg.de
URL:    http://www.zbh.uni-hamburg.de/steinbiss
Phone:  +49 (40) 42838 7322
FAX:    +49 (40) 42838 7312



More information about the gt-users mailing list