dictyBase Developers

Solving one problem at a time

Prepare and Loading Data for Visualizing in Gbrowse

The scripts in Modware-Loader distribution will be needed for data export and format conversions.

Export genome annotations in GFF3 format

The annotations are exported from chado oracle database.

  • Exporting purpureum genome
1
2
3
$_> modware-export chado2gff3 --org purpureum --dsn 'dbi:Oracle:host=host;sid=sid' \ 
       -u chado -p chado  -o purpureum.gff3 --extra_gene_model Geneid \ 
        --include_aligned_feature EST --tolerate_missing --include_align_parts

The above line exports the Dictyostelium purpureum genome in GFF3 format along with extra gene models and EST alignment features. Many more examples of GFF3 exports are here

Including gene products in fasta header

Aligning proteins to other genomes using tblastn

Here, for examples, Dictyostelium discoideum proteins will be aligned to the top level assemblies(supercontigs) of pallidum genome. Detail alignment strategy with refinement is given here.

Loading features from GFF3

The GFF3 data are going to loaded in Postgresql database using BioPerl’s SeqFeature backend.

  • Install DBI backend for Postgresql
1
cpanm DBD::Pg

Note: Do not set -Darch option while compiling perl with perlbrew, otherwise DBI won’t get installed

  • Load data
1
2
bp_seqfeature_load.pl --dsn 'dbi:Pg:database=purpureum' -a 'DBI::Pg' -u uuser \
-p passs -f --summary -c purpureum.gff3

If you load more feature in the same database just skip the -c options. However, if it also complains about feature yet not found in the database then also skip the (-f) option.

Note: For loading tblastn alignment skip the -c, but use the -f option.

  • Edit and add database source in gbrowse configuration file.

RNA-seq(NGS) alignments data

The alignments are expected to be in BAM format, if not run any standard NGS alignment pipeline(bowtie etc..) to get the BAM format.

  • Create a folder under $GBROWSE_ROOT/database
1
mkdir -p  $GBROWSE_ROOT/database/purpureum
  • Index the BAM file
1
samtools index file.bam
  • Also copy the fasta sequence of reference genome in the same folder

  • Install perl binding for samtools

1
cpanm Bio::DB::Sam
  • Note: If Bio::DB::Sam install fails, try to do it from source
1
2
3
4
5
6
7
8
9
  First download(samtools.sf.net) and compile samtools from source
cd samtools-<version>/
make
  It is fine if it cannot compile tview, it can happen in absense of curses library. It is not needed for the perl module

  Now download Bio::DB::Sam source (http://metacpan.org/module/Bio::DB::Sam)
cd Bio-DB-SAM-<version>
SAMTOOLS=<samtools-path> perl Build.PL
./Build install
  • Edit and add the track configuration as described in this guide

Next: Integrate headers and footers with gbrowse