Search the GCP webpage:

Bioinformatics


Subprogramme Leader Graham McLaren,
g.mclaren@cgiar.org

All Templates

SNP Genotyping

Version: 2.0

Template Description: Template for Genotyping Data using SNPsTemplate for Genotyping Data using SNPs

Mappings for this template

Sections available in this template

Section NameDescriptionConditions
SourceInformation on the source of the dataset, the species it concerns and the name and version of the datasetMandatory
ExperimentGeneral experiment dataMandatory
Quality AssessmentInformation about the quality measures usedMandatory
ConditionsExperimental conditionsMandatory
Data ListThe actual data as a list. This section is derived from the appearance of the ILLUMINA output file.Mandatory
Multiple sheets allowed
Excludes
SNPInformation about SNPs used in the experimentOptional
Multiple sheets allowed
Reference SequencesOptional
Multiple sheets allowed
SamplesInformation about samples used in the experimentMandatory
Multiple sheets allowed
InstitutionsList of institute codes used in passport data sections and their corresponding decoded name and addresses.Optional

Source

Section Description: Information on the source of the dataset, the species it concerns and the name and version of the dataset

see section: source in GCPDataSubmissionTemplate2.0

for the following fields institute, principalInvestigator, projectCode, projectName, emailContact, species, ploidy, datasetName, version, creationDate, remark

Experiment

Section Description: General experiment data

see section: experiment in GCPGenotypingTemplate2.0

for the following fields operationalTaxonomicUnit, purposeOfStudy, missingData, remark

Quality Assessment

Section Description: Information about the quality measures used

see section: qualityAssessment in GCPDataSubmissionTemplate2.0

for the following fields qualityMeasure, standard, control, errorEstimator

Conditions

Section Description: Experimental conditions

see section: conditions in GCPGenotypingTemplate2.0

for the following fields samplingStrategy, controlGenotypes, datasetType, dnaExtraction, polymorphismDetection, genotypingSoftware, snpCoding, reference

Field NameDescriptionConditions
Dataset TypeThe type of dataset indicates which technologies have been used to obtain data: allelic sequencing with the sanger method, genotyping data by "illumina”, taqman
Example: sequencing genotyping
Mandatory
ProtocolThe protocol associated to obtain data and which is also a function of the 1st Field dataset type.
Example: illumina, sanger, HRM… illumina, sanger, HRM…
Mandatory
Polymorphism DetectionThe protocol used to detect the SNP
Example: PCR reactions were performed with genomic DNA and products were analysed by direct sequencing
Mandatory
SNP codingHow the name of SNPs are referenced in the template. Example: a unique number for SNP illimuna, the name of the reference sequence number + snp position on the reference sequence.
Example: batch name_snp position batch name_snp position batch name_snp position
Optional

Data List

Section Description: The actual data as a list. This section is derived from the appearance of the ILLUMINA output file.

Field NameDescriptionConditions
SNP CodeThe code of the SNP marker used. If the marker data is provided it must relate to Marker name in the marker sheet or file.
Example: 122938546
Mandatory
Sample IDA unique identifier of a DNA sample, which can be a sample in a well on a gel or a LIMS entry, or even a unique ID created specifically for this dataset. The SampleID is specific to a lab and is not a universal identifier. If the accession data is provide it must relate to SampleID in the accession sheet or file.
Example: B00338P
Mandatory
Unique
AlleleAllele name
Example: G
Optional
QualityThe quality scale takes values from 1 to 100 attributed by the genotyping software or 200 the base is corrected manually by the user.Optional
AmountThe relative allele contribution of this allele to all alleles at this locus. For possible values for known ploidy and bulk data please refer to the Table of Allele Amounts below.Optional

SNP

Section Description: Information about SNPs used in the experiment

SNPs (optional)

The spreadsheet consists of ten columns, which each row representing an SNP marker.

see section: markers in GCPMappingTemplate2.0

for the following fields snpCode, type, length, chromosome, referenceSequenceName, position, fiveFlank, threeFlank, motif, forwardPrimer, reversePrimer, annealingTm, genBankAccessionNumber, unigene, references

Field NameDescriptionConditions
SNP CodeThe code of the SNP marker used. If the marker data is provided it must relate to Marker name in the marker sheet or file.
Example: 122938546
Mandatory
TypenoneMandatory
Lengthnone
Example: 1
Example: 4
Mandatory
ChromosomeChromosome on which this SNP is located
Example: 1
Optional
Reference Sequence NameThe name or id of the reference sequence on which this SNP is positioned. In the reference sequence section is provided, this name or id must be present there.
Example: ADOC01_01_Os
Optional
PositionPosition on reference sequenceOptional
5flanknoneOptional
3flanknoneOptional
UnigeneThe unigene id in which the SNP is locatedOptional

Reference Sequences

Section Description: none

Field NameDescriptionConditions
NameName of ID of the sequence
Example: ADOC01_01_Os
Mandatory
Unique
GenBank Accession NumberA NCBI GenBank accession number for the marker reference sequenceOptional
Gene NameIf the sequence is a gene, give the name of geneOptional
Unique
SequenceThe sequenceOptional

Samples

Section Description: Information about samples used in the experiment

Samples (optional)

The first field in the sample is the SampleID, which relates directly to the SampleID field in the data spreadsheet or file. This SampleID is a unique identifier of a DNA sample, which can be a sample in a well on a gel or a LIMS entry. It could even by a unique identifier developed specifically for this dataset. In the case of multiple extractions from the same material then each same would have a unique SampleID. Please refer to the section on Multiple Data Points for more details.

The GermplasmID field is an optional field for collections where a new GermplasmID is assigned each time an accession is regenerated or for some other reason a new seed or germplasm sample is taken. For this reason an accession in this case is a collection of samples with different GermplasmIDs. GermplasmID are often unique within a specific database for this reason they should be prefixed by the data name or abbreviation. For example, an entry with GermplasmID 2341 in IWIS, would be IWIS:2341.

The remaining accession data should be either in multi-crop passport descriptors (MCPD) or EURISCO descriptors format. These descriptors are MCPD defines a total of 28 descriptors for passport data, each of which equates to a column in the template. EURISCO defines an additional 6 descriptors for a total of 33 descriptors. Only a few MCPD or EURISCO descriptors are mandatory and for the sake of brevity only the mandatory and some recommended optional fields are described here. However, the mandatory descriptor provides sufficient information to allow the accession to be found in the appropriate National Inventory or genebank. For a full description of all MCPD and EURISCO descriptors please refer to the EURISCO_Descriptors.doc file, which is available fro the EPGRIS website (http://www.ecpgr.cgiar.org/epgris/) and or can be downloaded with the passport template.

see section: generalPassportData in GCPPassportTemplate2.0

for the following fields sampleID, sampleGermplasmID, localUniqueID, holdingInstitute, collectionName, genus, species, countryOfOrigin, accessionName

Field NameDescriptionConditions
Sample IDA unique identifier of a DNA sample, which can be a sample in a well on a gel or a LIMS entry. The SampleID will be unique for a specific laboratory but is not a universal identifier. It must relate to SampleID in the data spreadsheet or file.Mandatory
Unique
Germplasm IDA alphanumeric value which uniquely identifies the germplasm. The format proposed is concatenation of HoldingInstitute:CollectionName:LocalUniqueID. In case a new Germplasm ID is assigned each time an accession is regenerated or for some reason sub-sampled use the current germplasm ID prefixed with the system or database name.
Example: NGA333:Genebank:252
Example: COL003:CIATBEAN:3542
Example: MEX064:IWIS:2341
Mandatory
Unique
Country of OriginCode of the country in which the sample was originally collected. Use 3-letter ISO 3166-1 extended country codes.Optional
Accession nameEither a registered or other formal designation given to the accession. First letter uppercase. Multiple names separated with semicolon without space.
Example: CT9993-5-10-1-M
Optional

Institutions

Section Description: List of institute codes used in passport data sections and their corresponding decoded name and addresses.

see section: institutions in GCPDataSubmissionTemplate2.0

for the following fields faoInstituteCode, organizationName, street, cityState, zipCode, country, institutionalEmail, institutionalTelephone, fax, url, primaryContactName

Copyright (c) 2004-2006 CIMMYT, CIRAD, CIRAD, IRRI, IRRI

Developed by Guy Davenport (CIMMYT), Claire Billot (CIRAD), Kenneth McNally (IRRI), Manuel Ruiz (CIRAD), Genevieve Aquino (IRRI)

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.