Search the GCP webpage:

Bioinformatics


Subprogramme Leader Graham McLaren,
g.mclaren@cgiar.org

All Templates

EST Genotyping

Version: 2.0

Template Description: Template for EST DataTemplate for EST Data

Mappings for this template

Sections available in this template

Section NameDescriptionConditions
SourceInformation on the source of the dataset, the species it concerns and the name and version of the datasetMandatory
ExperimentGeneral experiment dataMandatory
Quality AssessmentInformation about the quality measures usedMandatory
ConditionsExperimental conditionsMandatory
SummaryDescription of the library containing the ESTs identified in the experiment.Mandatory
ESTsInformation on individual ESTs.Optional
Multiple sheets allowed
Sequence DataInformation regarding the nucleotide sequences of the ESTs.Optional
Multiple sheets allowed
SamplesInformation about samples used in the experimentMandatory
Multiple sheets allowed
InstitutionsList of institute codes used in passport data sections and their corresponding decoded name and addresses.Optional

Source

Section Description: Information on the source of the dataset, the species it concerns and the name and version of the dataset

see section: source in GCPDataSubmissionTemplate2.0

for the following fields institute, principalInvestigator, projectCode, projectName, emailContact, species, ploidy, datasetName, version, creationDate, remark

Experiment

Section Description: General experiment data

see section: experiment in GCPGenotypingTemplate2.0

for the following fields operationalTaxonomicUnit, purposeOfStudy, missingData, remark

Quality Assessment

Section Description: Information about the quality measures used

see section: qualityAssessment in GCPDataSubmissionTemplate2.0

for the following fields qualityMeasure, standard, control, errorEstimator

Conditions

Section Description: Experimental conditions

see section: conditions in GCPGenotypingTemplate2.0

for the following fields samplingStrategy, controlGenotypes, rnaExtraction, cDnaSynthesis, libraryConstruction, sequencingMethod, reference

Field NameDescriptionConditions
RNA ExtractionA description of the RNA extraction method or reference to published method.Mandatory
cDNA SynthesisA description of the protocol for cDNA synthesis or reference to published method.Mandatory
Library ConstructionA description of the method used for cDNA library construction or reference to published method.Mandatory
Sequencing MethodA description of the method used for sequencing of the cDNA library or reference to published method.Mandatory

Summary

Section Description: Description of the library containing the ESTs identified in the experiment.

Field NameDescriptionConditions
Library NameThe used-defined name of the library.Mandatory
Data SourceName of database where the ESTs were submitted.
Example: NCBI GenBank
Mandatory
GeneBank Accession NumbersAccession numbers of the EST sequences in the database.
Example: AB123456;AF000987
Mandatory
Number of ESTsTotal number of ESTs in the library.Mandatory
Creation Date in DatabaseDate library was created in the database.Mandatory
SpeciesTaxonomic or common name of species analyzed. If data is from several species, please separate multiple entries separate each species name with a semi-colon.Mandatory
Tissue TypeTissue type and organ source.
Example: root
Mandatory
Cell TypeCell type and name of cell line, if applicable.Optional
Developmental StageDevelopmental stage of organism during sampling.
Example: juvenile
Mandatory
Laboratory HostLaboratory host of library.Optional
VectorName and type of vector used to construct library.Mandatory
Restriction Enzyme 1Restriction enzyme at site 1 of vector.Optional
Restriction Enzyme 2Restriction enzyme at site 2 of vector.Optional
Protocol DescriptionDescription of library preparation methods.Mandatory
ReferenceOne or more references to articles in which the genotyping procedures are published. Please place each reference on a separate row in the same column.Optional

ESTs

Section Description: Information on individual ESTs.

see section: markers in GCPMappingTemplate2.0

for the following fields estID, sampleID, geneBankAccessionNumber, contig, cluster, length, sequenceIDs, numberOfSsr, ssrDetected, forwardPrimer, reversePrimer, annealingTm, productLength, references

Field NameDescriptionConditions
EST IDA unique identifier of the EST. The EST ID will be unique for a specific laboratory but is not a universal identifier. It must relate to an EST ID in the list of individual ESTs.Mandatory
Unique
Sample IDA unique identifier of a DNA sample, which can be a sample in a well on a gel or a LIMS entry, or even a unique ID created specifically for this dataset. The SampleID is specific to a lab and is not a universal identifier. If the accession data is provide it must relate to SampleID in the accession sheet or file.Mandatory
Unique
GeneBank Accession NumberAccession number of the EST sequence in the database.
Example: AB123456
Mandatory
ContigIdentifier of the contig.
Example: Contig1
Optional
LengthNumber of base pairs in the EST.Optional
ClusterIdentifier for the cluster.Optional
Sequence IDsList of sequences in the cluster.Optional
Number of SSRNumber of SSRs in the EST.Optional
SSR DetectedList of tandem repeats in the cluster.Mandatory
Product LengthNumber of base pairs in the PCR product of the primer pair.Mandatory

Sequence Data

Section Description: Information regarding the nucleotide sequences of the ESTs.

Field NameDescriptionConditions
Sequence IDUnique identifier of the sequence.Mandatory
GeneBank Accession NumberAccession number of the sequence in the database.
Example: AB123456
Mandatory
SSR IDUnique identifier of the SSR.Mandatory
Motif LengthNumber of basepairs in the motif.Mandatory
MotifRepeated motif in the SSR.Mandatory
Number Of RepeatsNumber of times motif is repeated.Mandatory
SSR StartStart site of SSR in the sequence.Mandatory
SSR StopStop site of SSR in the sequence.Mandatory
Sequence LengthNumber of base pairs in the sequence.Mandatory
Putative AnnotationPutative annotation for the sequence.Optional
SequenceActual sequence.Mandatory
RemarksAny additional information supporting the data that the authors / curators want to add. Multiple references should be separated with a semi-colonOptional

Samples

Section Description: Information about samples used in the experiment

Samples (optional)

The first field in the sample is the SampleID, which relates directly to the SampleID field in the data spreadsheet or file. This SampleID is a unique identifier of a DNA sample, which can be a sample in a well on a gel or a LIMS entry. It could even by a unique identifier developed specifically for this dataset. In the case of multiple extractions from the same material then each same would have a unique SampleID. Please refer to the section on Multiple Data Points for more details.

The GermplasmID field is an optional field for collections where a new GermplasmID is assigned each time an accession is regenerated or for some other reason a new seed or germplasm sample is taken. For this reason an accession in this case is a collection of samples with different GermplasmIDs. GermplasmID are often unique within a specific database for this reason they should be prefixed by the data name or abbreviation. For example, an entry with GermplasmID 2341 in IWIS, would be IWIS:2341.

The remaining accession data should be either in multi-crop passport descriptors (MCPD) or EURISCO descriptors format. These descriptors are MCPD defines a total of 28 descriptors for passport data, each of which equates to a column in the template. EURISCO defines an additional 6 descriptors for a total of 33 descriptors. Only a few MCPD or EURISCO descriptors are mandatory and for the sake of brevity only the mandatory and some recommended optional fields are described here. However, the mandatory descriptor provides sufficient information to allow the accession to be found in the appropriate National Inventory or genebank. For a full description of all MCPD and EURISCO descriptors please refer to the EURISCO_Descriptors.doc file, which is available fro the EPGRIS website (<a href=�http://www.ecpgr.cgiar.org/epgris/�>http://www.ecpgr.cgiar.org/epgris/</a>) and or can be downloaded with the passport template.

see section: generalPassportData in GCPPassportTemplate2.0

for the following fields sampleID, sampleGermplasmID, localUniqueID, holdingInstitute, collectionName, genus, species, countryOfOrigin

Field NameDescriptionConditions
Sample IDA unique identifier of a DNA sample, which can be a sample in a well on a gel or a LIMS entry, or even a unique ID created specifically for this dataset. The SampleID is specific to a lab and is not a universal identifier. If the accession data is provide it must relate to SampleID in the accession sheet or file.Mandatory
Unique
Germplasm IDA alphanumeric value which uniquely identifies the germplasm. The format proposed is concatenation of HoldingInstitute:CollectionName:LocalUniqueID. In case a new Germplasm ID is assigned each time an accession is regenerated or for some reason sub-sampled use the current germplasm ID prefixed with the system or database name.
Example: NGA333:Genebank:252
Example: COL003:CIATBEAN:3542
Example: MEX064:IWIS:2341
Mandatory
Unique
Country of OriginCode of the country in which the sample was originally collected. Use 3-letter ISO 3166-1 extended country codes.Optional

Institutions

Section Description: List of institute codes used in passport data sections and their corresponding decoded name and addresses.

see section: institutions in GCPDataSubmissionTemplate2.0

for the following fields faoInstituteCode, organizationName, street, cityState, zipCode, country, institutionalEmail, institutionalTelephone, fax, url, primaryContactName

Copyright (c) 2004-2006 Bioversity, CIMMYT, IITA, IRRI

Developed by Guy Davenport (CIMMYT), Sarah Hearne (IITA), Mathieu Rouard (Bioversity), Genevieve Aquino (IRRI)

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.