Stock Data Import
Data
Data that will be imported is in the format explained here
Rationale & SQL
Inventories
- The inventory data available for import is one inventory per row. Inventory being a property of the stock is stored in the
stockproptable. - We also have ontologies that explain the terms used to define a strain or a plasmid inventory. The ontologies are available here: strain & plasmid. These ontologies can be loaded using obo-loader.
- As defined in the ontologies;
- strain-inventory – A stock is said to have a strain inventory when it has
location, color, storage date, number of vials, obtained as, stored as, private comment, public comment - plasmid-inventory – A stock is said to have a plasmid inventory when it has
location, color, storage date, obtained as, stored as, private comment, public comment
- strain-inventory – A stock is said to have a strain inventory when it has
- The inventory data is stored in the database as follows;
stockprop.type => { in => [qw/<terms from inventory ontologies>/]}&stockprop.value => <value of the term from ontology>
- Same
stockprop.rankfor each DBS-ID/DBP-ID is one inventory record.
Once data is imported, it can be viewed/retrieved using the following SQL;
1 2 3 4 5 6 7 8 9 | |
Strain-Plasmid
- Strains have plasmids associated with it. Some of these plasmids are available in the Dicty StockCenter, while some are not.
- The plasmids that are not currently available in the Stock Center, may or may not be available in the future. Thus, we use the same data model as the one’s available. The only difference is
stockcollection.name => 'Dicty Azkaban' - If ever, these plasmids are made available we can change
stockcollection.name => Dicty Stockcenter - For storing the relation we use the
stock_relationshiptable.stock_relationship.subject_id => stock.stock_id (plasmid)&stock_relationship.object_id => stock.stock_id (strain). The relation term used ispart_ofwhich is defined undercv.name => 'stock_relation'.
Once data is imported, it can be viewed/retrieved using the following SQL;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | |
Strain-Genes
- Strains are associated with genes (features). The relation is mainly based on the genotype, which is common to the strain & the gene. Thus the link between a strains (stock) and a gene (feature) can be modeled using the genotype.
- From our data, one strain has one genotype. However, the relation between strains & genes is many-to-many.
stock -> stock_genotype -> genotype -> feature_genotype -> feature
feature_genotyperequirescgroup,chromosome_idandcvterm_id, which is not modeled yet.
Once data is imported, it can be viewed/retrieved using the following SQL;
1 2 3 4 5 6 7 8 9 10 11 | |
Publications & Strain characteristics
- All publication entries have been exported as PMIDs.
- Initially available in EndNote, the stockcenter publications have been converted to BibTeX formet. This will be loaded using loader written for chadopub2bib.
- Data import is pretty straight-forward. Data is stored in
stock_publinking table.
Once data is imported, it can be viewed/retrieved using the following SQL;
1 2 3 4 | |
- Similarly, the model for strain characteristics is very straight-forward. Make sure you have the strain_characteristics ontology loaded.
- The linking table is
stock_cvterm.
Once data is imported, it can be viewed/retrieved using the following SQL;
1 2 3 4 | |
Parental strains
- The model for parental strains is the same as strain-plasmids.
- The only difference is:
stock_relationship.subject_id => stock.stock_id (parent). And the relation term used isis_parent_offromcv.name => 'stock_relation'.
1 2 3 4 5 6 7 8 9 10 11 | |
Phenotype
- Phenotype is an important data component. It is something that is observed for a given
genotype. Like explained in stock data export, it is dependent ongenotypeandenvironment - The way it is modeled is the same as legacy.
- The ontologies that required to be loaded before importing this data; Dicty Phenotypes, Dicty Environment, Dictyostelium Assay.
The model for phenotype is based on the following concept;
strainhas agenotype. Thisgenotype, under certainenvironmentexpresses to showphenotype.
In the legacy database there are manually added phenotypes for strains. They are not linked to the
Dicty Phenotypesontology. To correct this, these phenotype terms were manually mapped to terms in the ontology and there corrected data can be found here.- This data is in a slightly different format that what is accepted for import. So the import data was manually created and can be found here. This file can be passed to
modware-import dictystrain2chadousing the--dsc_phenotypesparameter. As you will see, most of these phenotypes do not have a PMID associated with it. So we just use theDicty Phenotypesreference as default.
Once data is imported, it can be viewed/retrieved using the following SQL;
1 2 3 4 5 6 7 8 9 10 | |
Plasmid Sequence
- The plasmid sequences are available for import in either GenBank or FastA formats.
- With this import, we convert GenBank to FastA and import only FastA sequences.
- The plasmid sequences are stored in
featuretable. Defaultuniquename&dbxref.accessionis DBP-ID. - In case of GenBank, the GenBank accession is the
dbxref.accession. - Also an entry is made in the
stockproptable for each plasmid that has a sequence.- The
stockprop.type => 'plasmid_vector'& thestockprop.value => feature_id.
- The
- Also as plasmids do not have an organism defined (also not enough metadata available for a different kind of data model), default is Dictyostelium discoideum.
- For importing the sequences, param
--seq_data_dirneeds to be passed a path to folder with clean/formatted sequence files.
Plasmid Genes
- Plasmids has genes associated with it (from legacy data). However, all the data about the sequence & loci is not available.
- Had the sequence data been available, a diffeent data model would have been adopted.
- Currently, the genes associated with plasmids are stored in the
stockproptablestockprop.type => 'has_part'.has_partis fromsequenceontology.stockprop.value => DDB_G-ID
Once data is imported, it can be viewed/retrieved using the following SQL;
1 2 3 4 5 | |
Command
The data is being imported using the modware-import command. All the modules used by this command can be found under Modware::Import and Modware::Role::Stock::Import
The command looks like this;
1 2 3 4 5 6 7 8 | |
The options common for both commands are
1 2 3 4 5 6 7 8 9 10 11 | |
Options specific to the commands:
1
| |
1
| |
To run the command
1 2 3 4 5 6 7 8 9 | |
NOTE: Plasmid data will have to be imported before the strain data. This is because the strain-plasmids depend on the plasmid records