|
Version: 2.0
Template Description: Template for Genotyping Data using SNPsTemplate for Genotyping Data using SNPs
Mappings for this template
Sections available in this template
| Section Name | Description | Conditions |
| Source | Information on the source of the dataset, the species it concerns and the name and version of the dataset | Mandatory
|
| Experiment | General experiment data | Mandatory
|
| Quality Assessment | Information about the quality measures used | Mandatory
|
| Conditions | Experimental conditions | Mandatory
|
| Data List | The actual data as a list. This section is derived from the appearance of the ILLUMINA output file. | Mandatory Multiple sheets allowed Excludes
|
| SNP | Information about SNPs used in the experiment | Optional Multiple sheets allowed
|
| Reference Sequences | | Optional Multiple sheets allowed
|
| Samples | Information about samples used in the experiment | Mandatory Multiple sheets allowed
|
| Institutions | List of institute codes used in passport data sections and their corresponding decoded name and addresses. | Optional
|
Source
Section Description: Information on the source of the dataset, the species it concerns and the name and version of the dataset
see section: source in GCPDataSubmissionTemplate2.0
for the following fields
institute, principalInvestigator, projectCode, projectName, emailContact, species, ploidy, datasetName, version, creationDate, remark
Experiment
Section Description: General experiment data
see section: experiment in GCPGenotypingTemplate2.0
for the following fields
operationalTaxonomicUnit, purposeOfStudy, missingData, remark
Quality Assessment
Section Description: Information about the quality measures used
see section: qualityAssessment in GCPDataSubmissionTemplate2.0
for the following fields
qualityMeasure, standard, control, errorEstimator
Conditions
Section Description: Experimental conditions
see section: conditions in GCPGenotypingTemplate2.0
for the following fields
samplingStrategy, controlGenotypes, datasetType, dnaExtraction, polymorphismDetection, genotypingSoftware, snpCoding, reference
| Field Name | Description | Conditions |
| Dataset Type | The type of dataset indicates which technologies have been used to obtain data: allelic sequencing with the sanger method, genotyping data by "illumina”, taqman
Example: sequencing genotyping
| Mandatory
|
| Protocol | The protocol associated to obtain data and which is also a function of the 1st Field dataset type.
Example: illumina, sanger, HRM…
illumina, sanger, HRM…
| Mandatory
|
| Polymorphism Detection | The protocol used to detect the SNP
Example: PCR reactions were performed with genomic DNA and products were analysed by direct sequencing | Mandatory
|
| SNP coding | How the name of SNPs are referenced in the template. Example: a unique number for SNP illimuna, the name of the reference sequence number + snp position on the reference sequence.
Example: batch name_snp position
batch name_snp position
batch name_snp position | Optional
|
Data List
Section Description: The actual data as a list. This section is derived from the appearance of the ILLUMINA output file.
| Field Name | Description | Conditions |
| SNP Code | The code of the SNP marker used. If the marker data is provided it must relate to Marker name in the marker sheet or file.
Example: 122938546 | Mandatory
|
| Sample ID | A unique identifier of a DNA sample, which can be a sample in a well on a gel or a LIMS entry, or even a unique ID created specifically for this dataset. The SampleID is specific to a lab and is not a universal identifier. If the accession data is provide it must relate to SampleID in the accession sheet or file.
Example: B00338P | Mandatory Unique
|
| Allele | Allele name
Example: G | Optional
|
| Quality | The quality scale takes values from 1 to 100 attributed by the genotyping software or 200 the base is corrected manually by the user. | Optional
|
| Amount | The relative allele contribution of this allele to all alleles at this locus. For possible values for known ploidy and bulk data please refer to the Table of Allele Amounts below. | Optional
|
SNP
Section Description: Information about SNPs used in the experiment
SNPs (optional)
The spreadsheet consists of ten columns, which each row representing an SNP marker.
see section: markers in GCPMappingTemplate2.0
for the following fields
snpCode, type, length, chromosome, referenceSequenceName, position, fiveFlank, threeFlank, motif, forwardPrimer, reversePrimer, annealingTm, genBankAccessionNumber, unigene, references
| Field Name | Description | Conditions |
| SNP Code | The code of the SNP marker used. If the marker data is provided it must relate to Marker name in the marker sheet or file.
Example: 122938546 | Mandatory
|
| Type | none | Mandatory
|
| Length | none
Example: 1
Example: 4 | Mandatory
|
| Chromosome | Chromosome on which this SNP is located
Example: 1 | Optional
|
| Reference Sequence Name | The name or id of the reference sequence on which this SNP is positioned. In the reference sequence section is provided, this name or id must be present there.
Example: ADOC01_01_Os | Optional
|
| Position | Position on reference sequence | Optional
|
| 5flank | none | Optional
|
| 3flank | none | Optional
|
| Unigene | The unigene id in which the SNP is located | Optional
|
Reference Sequences
Section Description: none
| Field Name | Description | Conditions |
| Name | Name of ID of the sequence
Example: ADOC01_01_Os | Mandatory Unique
|
| GenBank Accession Number | A NCBI GenBank accession number for the marker reference sequence | Optional
|
| Gene Name | If the sequence is a gene, give the name of gene | Optional Unique
|
| Sequence | The sequence | Optional
|
Samples
Section Description: Information about samples used in the experiment
Samples (optional)
The first field in the sample is the SampleID, which relates directly to the SampleID field in the data spreadsheet or file. This SampleID is a unique identifier of a DNA sample, which can be a sample in a well on a gel or a LIMS entry. It could even by a unique identifier developed specifically for this dataset. In the case of multiple extractions from the same material then each same would have a unique SampleID. Please refer to the section on Multiple Data Points for more details.
The GermplasmID field is an optional field for collections where a new GermplasmID is assigned each time an accession is regenerated or for some other reason a new seed or germplasm sample is taken. For this reason an accession in this case is a collection of samples with different GermplasmIDs. GermplasmID are often unique within a specific database for this reason they should be prefixed by the data name or abbreviation. For example, an entry with GermplasmID 2341 in IWIS, would be IWIS:2341.
The remaining accession data should be either in multi-crop passport descriptors (MCPD) or EURISCO descriptors format. These descriptors are MCPD defines a total of 28 descriptors for passport data, each of which equates to a column in the template. EURISCO defines an additional 6 descriptors for a total of 33 descriptors. Only a few MCPD or EURISCO descriptors are mandatory and for the sake of brevity only the mandatory and some recommended optional fields are described here. However, the mandatory descriptor provides sufficient information to allow the accession to be found in the appropriate National Inventory or genebank. For a full description of all MCPD and EURISCO descriptors please refer to the EURISCO_Descriptors.doc file, which is available fro the EPGRIS website (http://www.ecpgr.cgiar.org/epgris/) and or can be downloaded with the passport template.
see section: generalPassportData in GCPPassportTemplate2.0
for the following fields
sampleID, sampleGermplasmID, localUniqueID, holdingInstitute, collectionName, genus, species, countryOfOrigin, accessionName
| Field Name | Description | Conditions |
| Sample ID | A unique identifier of a DNA sample, which can be a sample in a well on a gel or a LIMS entry. The SampleID will be unique for a specific laboratory but is not a universal identifier. It must relate to SampleID in the data spreadsheet or file. | Mandatory Unique
|
| Germplasm ID | A alphanumeric value which uniquely identifies the germplasm. The format proposed is concatenation of HoldingInstitute:CollectionName:LocalUniqueID. In case a new Germplasm ID is assigned each time an accession is regenerated or for some reason sub-sampled use the current germplasm ID prefixed with the system or database name.
Example: NGA333:Genebank:252
Example: COL003:CIATBEAN:3542
Example: MEX064:IWIS:2341 | Mandatory Unique
|
| Country of Origin | Code of the country in which the sample was originally collected. Use 3-letter ISO 3166-1 extended country codes. | Optional
|
| Accession name | Either a registered or other formal designation given to the accession. First letter uppercase. Multiple names separated with semicolon without space.
Example: CT9993-5-10-1-M | Optional
|
Institutions
Section Description: List of institute codes used in passport data sections and their corresponding decoded name and addresses.
see section: institutions in GCPDataSubmissionTemplate2.0
for the following fields
faoInstituteCode, organizationName, street, cityState, zipCode, country, institutionalEmail, institutionalTelephone, fax, url, primaryContactName
Copyright (c) 2004-2006 CIMMYT, CIRAD, CIRAD, IRRI, IRRI
Developed by Guy Davenport (CIMMYT), Claire Billot (CIRAD), Kenneth McNally (IRRI), Manuel Ruiz (CIRAD), Genevieve Aquino (IRRI)
This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.
|