Stock Data Import
Data
Data that will be imported is in the format explained here
Rationale & SQL
Inventories
- The inventory data available for import is one inventory per row. Inventory being a property of the stock is stored in the
stockprop
table. - We also have ontologies that explain the terms used to define a strain or a plasmid inventory. The ontologies are available here: strain & plasmid. These ontologies can be loaded using obo-loader.
- As defined in the ontologies;
- strain-inventory – A stock is said to have a strain inventory when it has
location, color, storage date, number of vials, obtained as, stored as, private comment, public comment
- plasmid-inventory – A stock is said to have a plasmid inventory when it has
location, color, storage date, obtained as, stored as, private comment, public comment
- strain-inventory – A stock is said to have a strain inventory when it has
- The inventory data is stored in the database as follows;
stockprop.type => { in => [qw/<terms from inventory ontologies>/]}
&stockprop.value => <value of the term from ontology>
- Same
stockprop.rank
for each DBS-ID/DBP-ID is one inventory record.
Once data is imported, it can be viewed/retrieved using the following SQL;
1 2 3 4 5 6 7 8 9 |
|
Strain-Plasmid
- Strains have plasmids associated with it. Some of these plasmids are available in the Dicty StockCenter, while some are not.
- The plasmids that are not currently available in the Stock Center, may or may not be available in the future. Thus, we use the same data model as the one’s available. The only difference is
stockcollection.name => 'Dicty Azkaban'
- If ever, these plasmids are made available we can change
stockcollection.name => Dicty Stockcenter
- For storing the relation we use the
stock_relationship
table.stock_relationship.subject_id => stock.stock_id (plasmid)
&stock_relationship.object_id => stock.stock_id (strain)
. The relation term used ispart_of
which is defined undercv.name => 'stock_relation'
.
Once data is imported, it can be viewed/retrieved using the following SQL;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Strain-Genes
- Strains are associated with genes (features). The relation is mainly based on the genotype, which is common to the strain & the gene. Thus the link between a strains (stock) and a gene (feature) can be modeled using the genotype.
- From our data, one strain has one genotype. However, the relation between strains & genes is many-to-many.
stock -> stock_genotype -> genotype -> feature_genotype -> feature
feature_genotype
requirescgroup
,chromosome_id
andcvterm_id
, which is not modeled yet.
Once data is imported, it can be viewed/retrieved using the following SQL;
1 2 3 4 5 6 7 8 9 10 11 |
|
Publications & Strain characteristics
- All publication entries have been exported as PMIDs.
- Initially available in EndNote, the stockcenter publications have been converted to BibTeX formet. This will be loaded using loader written for chadopub2bib.
- Data import is pretty straight-forward. Data is stored in
stock_pub
linking table.
Once data is imported, it can be viewed/retrieved using the following SQL;
1 2 3 4 |
|
- Similarly, the model for strain characteristics is very straight-forward. Make sure you have the strain_characteristics ontology loaded.
- The linking table is
stock_cvterm
.
Once data is imported, it can be viewed/retrieved using the following SQL;
1 2 3 4 |
|
Parental strains
- The model for parental strains is the same as strain-plasmids.
- The only difference is:
stock_relationship.subject_id => stock.stock_id (parent)
. And the relation term used isis_parent_of
fromcv.name => 'stock_relation'
.
1 2 3 4 5 6 7 8 9 10 11 |
|
Phenotype
- Phenotype is an important data component. It is something that is observed for a given
genotype
. Like explained in stock data export, it is dependent ongenotype
andenvironment
- The way it is modeled is the same as legacy.
- The ontologies that required to be loaded before importing this data; Dicty Phenotypes, Dicty Environment, Dictyostelium Assay.
The model for phenotype is based on the following concept;
strain
has agenotype
. Thisgenotype
, under certainenvironment
expresses to showphenotype
.
In the legacy database there are manually added phenotypes for strains. They are not linked to the
Dicty Phenotypes
ontology. To correct this, these phenotype terms were manually mapped to terms in the ontology and there corrected data can be found here.- This data is in a slightly different format that what is accepted for import. So the import data was manually created and can be found here. This file can be passed to
modware-import dictystrain2chado
using the--dsc_phenotypes
parameter. As you will see, most of these phenotypes do not have a PMID associated with it. So we just use theDicty Phenotypes
reference as default.
Once data is imported, it can be viewed/retrieved using the following SQL;
1 2 3 4 5 6 7 8 9 10 |
|
Plasmid Sequence
- The plasmid sequences are available for import in either GenBank or FastA formats.
- With this import, we convert GenBank to FastA and import only FastA sequences.
- The plasmid sequences are stored in
feature
table. Defaultuniquename
&dbxref.accession
is DBP-ID. - In case of GenBank, the GenBank accession is the
dbxref.accession
. - Also an entry is made in the
stockprop
table for each plasmid that has a sequence.- The
stockprop.type => 'plasmid_vector'
& thestockprop.value => feature_id
.
- The
- Also as plasmids do not have an organism defined (also not enough metadata available for a different kind of data model), default is Dictyostelium discoideum.
- For importing the sequences, param
--seq_data_dir
needs to be passed a path to folder with clean/formatted sequence files.
Plasmid Genes
- Plasmids has genes associated with it (from legacy data). However, all the data about the sequence & loci is not available.
- Had the sequence data been available, a diffeent data model would have been adopted.
- Currently, the genes associated with plasmids are stored in the
stockprop
tablestockprop.type => 'has_part'
.has_part
is fromsequence
ontology.stockprop.value => DDB_G-ID
Once data is imported, it can be viewed/retrieved using the following SQL;
1 2 3 4 5 |
|
Command
The data is being imported using the modware-import
command. All the modules used by this command can be found under Modware::Import
and Modware::Role::Stock::Import
The command looks like this;
1 2 3 4 5 6 7 8 |
|
The options common for both commands are
1 2 3 4 5 6 7 8 9 10 11 |
|
Options specific to the commands:
1
|
|
1
|
|
To run the command
1 2 3 4 5 6 7 8 9 |
|
NOTE: Plasmid data will have to be imported before the strain data. This is because the strain-plasmids depend on the plasmid records