The scripts in Modware-Loader distribution will be needed for data export and format conversions.
- First install Modware-Loader.
Export genome annotations in GFF3 format
The annotations are exported from chado oracle database.
- Exporting purpureum genome
1 2 3
The above line exports the Dictyostelium purpureum genome in GFF3 format along with extra gene models and EST alignment features. Many more examples of GFF3 exports are here
Including gene products in fasta header
- Download the dictyostelium protein fasta file
- mapgeneid2prod.pl : Produces a map between Gene ID and product name.
- rewrite_dicty_fasta_header.pl : Rewrite the fasta write to include the product name.
Aligning proteins to other genomes using tblastn
Here, for examples, Dictyostelium discoideum proteins will be aligned to the top level assemblies(supercontigs) of pallidum genome. Detail alignment strategy with refinement is given here.
Loading features from GFF3
The GFF3 data are going to loaded in Postgresql database using BioPerl’s SeqFeature backend.
- Install DBI backend for Postgresql
Note: Do not set -Darch option while compiling perl with perlbrew, otherwise DBI won’t get installed
- Load data
If you load more feature in the same database just skip the -c options. However, if it also complains about feature yet not found in the database then also skip the (-f) option.
Note: For loading tblastn alignment skip the -c, but use the -f option.
- Edit and add database source in gbrowse configuration file.
RNA-seq(NGS) alignments data
The alignments are expected to be in BAM format, if not run any standard NGS alignment pipeline(bowtie etc..) to get the BAM format.
- Create a folder under $GBROWSE_ROOT/database
- Index the BAM file
Also copy the fasta sequence of reference genome in the same folder
Install perl binding for samtools
- Note: If Bio::DB::Sam install fails, try to do it from source
1 2 3 4 5 6 7 8 9
- Edit and add the track configuration as described in this guide