Source: BAYLOR COLLEGE OF MEDICINE submitted to
COMPREHENSIVE IDENTIFICATION AND MAPPING AND CHARACTERIZATION OF HESSIAN FLY GENES USING A INNOVATIVE WHOLE GENOME SEQUENCING APPROACH
 
PROJECT DIRECTOR: Richards, S.
 
PERFORMING ORGANIZATION
(N/A)
BAYLOR COLLEGE OF MEDICINE
HOUSTON,TX 77030
 
NON TECHNICAL SUMMARY: The Hessian fly is a major pest of US wheat crops, and the world's most important wheat pest. Researchers including many in the US funded by the USDA are trying to find better ways to control this pest and reduce the damage done to wheat crop yields. Often genetic information is required, specifically about gene and protein structure, for both basic and applied research into this pest - for example identification of pesticide resistance genes, and protein sequences of pesticide target proteins to allow better design of pesticide targets, and bacterial expression of pesticide targets allowing interactions between the pesticide and its target to be studied and better understood. The identification of single genes and proteins of interest is an expensive and time-consuming process when conducted a gene at a time. In this proposal we will rapidly and very inexpensively identify, characterize and map every gene in the genome to speed research into this important pest species. We are applying new massively parallel sequencing technologies to dramatically reduce the cost of sequencing projects of this size from tens of millions of dollars in the late 90's to ~ 5 million dollars around 2003, to $400,000 dollars in this proposal. In other species, the availability of the "toolkit" of genes and proteins that make up an organism has accelerated the progress and results from research dramatically - for example laboratories can now study the entire set of ligand gated ion channels (a target of several major pesticides) with the confidence that they are not missing any, and with the full protein sequence of each of the genes. Whilst until now the high cost of sequencing has made this global approach uneconomic for species with small communities of researchers, the new lower costs make research on insect species uneconomical without a whole genome sequence, and full description of the gene and protein sets. Whilst there is always a delay between the acquisition of primary basic data and actual results in the field, we have no doubt that the data produced by this proposal will dramatically speed the efforts of Hessian fly researchers to reduce the damage caused by this important pest.
 
OBJECTIVES: OBJECTIVES We will identify, characterize and map the vast majority of genes of the wheat pest Mayetiola destructor - the Hessian fly. 1. Generate raw sequence data representing 12-fold coverage of the Hessian fly genome with 19 runs (21 attempted allowing 10% failure rate) of the GS-FLX genome sequencer (454 inc) each run generating 100Mb of sequence in 250bp reads. 2. Generate 32X clone coverage paired-end data with 3kb and 10kb insert sizes using the 454 GS-20. This paired-end data will be used in the assembly process to determine the order and orientation of the majority of contigs in the assembled sequence. 3. Assemble 2Gb of raw 454 GS-FLX sequence reads and paired-end data into sequence scaffolds of ordered and oriented contigs, followed by placement on the existing physical map. 4. Generate ~1,200,000 EST sequences from a variety of Hessian fly tissues, to provide an extensive transcribed sequence data set to drive automated gene identification and annotation. 5. Produce an automated annotation of the assembled Hessian fly genome sequence based on EST data and protein homologies, using the BCM-HGSC import of the Ensembl gene annotation pipeline, and other gene prediction programs including NCBI Gnomen. 6. Deposit data in public databases, and the BCM-HGSC website; establish database collaborations with Flybase and the KSU Arthropod Genomics Center.
 
APPROACH: We will generate 12-fold random sequence coverage of the Hessian fly genome using a pyrosequencing technology platform from 454. Additionally, paired end sequence data and transcription data will be generated. This random sequence will be assembled using the Atlas assembly suite of software tools developed at the Baylor College of Medicine Human Genome Sequencing Center into a draft genome sequence. Gene sequences will be annotated automatically using existing annotation software pipelines with reference to extensive transcription sequence data also generated by this project. All results will be placed in multiple publicly accessible data repositories.
 
CRIS NUMBER: 0212900 SUBFILE: CRIS
PROJECT NUMBER: TEXR-2007-04624 SPONSOR AGENCY: NIFA
PROJECT TYPE: NRI COMPETITIVE GRANT PROJECT STATUS: EXTENDED MULTI-STATE PROJECT NUMBER: (N/A)
START DATE: Feb 1, 2008 TERMINATION DATE: Jan 31, 2011

GRANT PROGRAM: ENTOMOLOGY/NEMATOLOGY
GRANT PROGRAM AREA: Plant Systems

CLASSIFICATION
Knowledge Area (KA)Subject (S)Science (F)Objective (G)Percent
211311011304.2100%

CLASSIFICATION HEADINGS
KA211 - Insects, Mites, and Other Arthropods Affecting Plants
S3110 - Insects
F1130 - Entomology and acarology
G4.2 - Reduce Number and Severity of Pest and Disease Outbreaks


RESEARCH EFFORT CATEGORIES
BASIC 100%
APPLIED (N/A)%
DEVELOPMENTAL (N/A)%

KEYWORDS: hessian fly; wheat pest; genome sequence; pyrosequencing; massively parallel; insect genomics; gene annotation

PROGRESS: Feb 1, 2008 TO Jan 31, 2009
OUTPUTS: Our year one goals for the Hessian fly genome project were to: 1 generate 12X raw sequence coverage of the Hessian fly genome on the 454 FLX platform, 2 to generate 32X "clone coverage" of the genome in paired end data, and 3, to assemble these sequence reads in a genome sequence of ordered and oriented contigs. Year 2 goals aim to identify and annotate hessian fly genes. There has been one biological complication (that to be honest should have been foreseen) that has delayed the project. The hessian fly has an unusual sex determination system, wherein females wither give birth to all male or all female offspring. To allow the creation of a inbred line for sequencing and assembly Jeff Stuart identified a female that produced mostly male but some female offspring, and an inbred line was created, that is >95% male. Unfortunately the hessian fly has two X chromosomes accounting for approximately 40% of the total genome, and in males these will be at half coverage (or 6X) with our original sequencing strategy, and likely produce smaller contigs. To overcome this limitation, we used an updated 454 chemistry (XLR) which produces longer read lengths and more data per run to produce 20X sequence coverage of the Hessian fly genome - ensuring that X chromosomes will have at least 10X sequence coverage. The reagents for this upgrade only became widely available in September 2008, which has caused a 5-6 month delay. 11 XLR runs were performed, (10 of which were successful) generating a total of 3,582Mb of raw sequence or 22.6X coverage of the 158Mb genome, fulfilling objective one of the grant. Of these 5 454-XLR runs were of paired end sequence libraries and these produced 38.75 X paired end coverage of the Hessian fly genome where both ends could be mapped within the initial assemblies. Our third goal for the year is the assembly of the raw sequence into ordered and oriented Because of the delay in obtaining sequence, the assembly process was only started in Jan 09, and at this stage we can only report an intermediate assembly at this time. This initial assembly produced an assembly of 126Mb total size and a N50 contig size of 3.5kb. Initial assembly details: Number Of (> 500bp) Contigs = 53,939 Number Of Bases = 126,251,682 Avg Contig Size = 2,340 N50 Contig Size = 3,509 largest Contig Size = 62,836 Q40 Plus Bases = 118,284,880, 93.69% Q39 Minus Bases = 7,966,802, 6.31% Whilst this assembly is clearly not good enough, we will be releasing it as an initial assembly on the Human Genome Sequencing Center website, and making it web searchable via blast to aid researchers on the Hessian fly, and anyone who has an interest. A fuller improved assembly is in progress, and we expect to produce a higher quality product in the coming year (likely less than 6 months). Unfortunately this can only count as a partial fulfillment of goal three for the year, and we hope to catch up over year two of the grant. PARTICIPANTS: Stephen Richards (PD) directed the accumulation of sequence data for this project, and performed the initial assembly of the sequence data. Jeff Stuart (co PD) produced and provided pure isolated Hessian fly DNA from an inbred hessian fly line TARGET AUDIENCES: Not relevant to this project. PROJECT MODIFICATIONS: We produced sequence using an upgrades version of the 454 pyro-sequencing platform (upgrade from FLX to XLR). This allowed more sequence to be generated at the same cost, but caused the delay of the project by 6 months. We increased the sequence coverage of the Hessian fly genome generated from 12X to 22X, to enable proper assembly of the X chromosomes (40% of the genome) from an inbred line mostly of male individuals.

IMPACT: 2008-02-01 TO 2009-01-31 The availability of the genome sequence of the Hessian fly is the first step towards the comprehensive identification, mapping and characterization of Hessian fly genes. We are currently making this genomic information available and searchable via a web based blast on our website to accelerate Hessian fly research into virulence and other genes affecting pest reduction of crop yields.

PUBLICATION INFORMATION: 2008-02-01 TO 2009-01-31
No publications reported this period

PROJECT CONTACT INFORMATION
NAME: Stephen Richards
PHONE: 713-798-6667
FAX: 713-798-5741