Welcome to Dr. Zompo, the Zostera and Posidonia EST Repository

Posidonia-oceanica

Posidonia oceanica

Zostera-marina

Zostera marina

Dr. Zompo is the first interactive seagrass sequence database based on a total of 14 597 expressed sequence tags (ESTs) obtained from two seagrass species. These genetic sequences have been processed, assembled and comprehensively annotated ot provide experimentalists with a broad foundation to build experiments and consider challenges associated with the investigation of this class of non-domesticated monocotyledon systems. Our database is based on the Ruby on Rails framework and is rich in features including the retrieval of experimentally determined heat-responsive transcripts, mining for molecular markers (SSRs and SNPs), and weighted key word searches that allow access to annotation gathered on several levels including Pfam domains, GeneOntology and KEGG pathways. Well established plant genome sites such as The Arabidopsis Information Resource (TAIR) and the Rice Genome Annotation Project are interfaced by Dr. Zompo. With this project, we have initialized a valuable resource for plant biologists in general and the seagrass community in particular.

Details

We provide a collection of EST sequences gathered from two different seagrass species, Posidonia oceanica and Zostera marina. 9412, and 5185 raw sequence reads were reduced to a set of 3387 and 1219 unigenes for Zostera and Posidonia, respectively.

How the sequences were obtained, cloned, and sequenced

Posidonia EST libraries
Samples were collected by SCUBA diving from a meadow located in Lacco Ameno, Ischia (Gulf of Naples), from 5 and 25 meters depth. Total RNA was isolated from pooled young leaf tissue and meristematic portions of shoots using a CTAB method. Lithium Chloride was used for precipitation of total RNA. The cDNA was synthesized by vertis Biotechnologie AG, Freising, Germany. Vector plasmid: pBSII sk+ (2961 bp), cDNA was not-directionally ligated into the Eco RI and Bam HI sites of the plasmid. Fragments length distribution: ~650bp ~2000bp (Max conc. ~800bp ~1200bp). Titer: about 1900cfu per μL bacterial suspension resulting in a total number of about 11,000,000 recombinant clones. Randomly selected clones from the library were isolated from plasmids (minipreps) and sequenced. Sequences were performed both at the Stazione Zoologica Anton Dohrn (Naples, Italy) and at the Max Planck Institute of Molecular Genetics (Berlin, Germany), thanks to Marine Genomics Europe TP grants. T7 primer was used for sequencing.
Zostera EST libraries
Total RNA was extracted from young leaf tissue and meristematic region of plants. Samples were obtained from Schilksee and Maasholm (south-western Baltic Sea, Germany), from 1.6 to 2.5 meters depth. Overall, five different samples of experimental conditions and tissue types were eventually pooled together to determine the unigenes contained in this database. One library is redundant and should just improve assembly. Two libraries represent natural conditions collected from average summer and winter conditions, respectively. Two other libraries were collected at the same sites, but under a heat stress treatment in aquaria at the water surface, where the ambient Baltic Sea water temperature was elevated by illumination to 17°C and 25°C, respectively. RNA was extracted using the RNeasy plant kit (Qiagen, Hilden, Germany). Each library contaied the pooled RNA of 4-6 genotypes. The cleaned tissue was immediately shock frozen (< 10min after collection) and stored at -80°C. Library construction was performed with Creator SMART cDNA library construction kit (Clontech). Initial LD-PCR amplification of 28 cycles for enrichment of full length transcripts was verified on Agarose gels. cDNA sizes (> 800 base pairs) were selected using filter columns based on agarose gels as described in the Clontech manual. Only 50% of bacterial clones have insert, therefore we included an initial PCR based size selection prior sequencing. Sequencing based on Plasmid preps using the forward M13 primer only at Max-Planck-Institute for Limnology, Ploen, Germany.

How the raw sequences were prepared, processed and analyzed

From the raw sequences, poor quality regions, vector, adapter, and poly-A/-T sequence fragments were removed using two tools consecutively: pregap4 and cross_match. Redundant or significantly overlapping sequence reads were clustered into contiguous sequences (contigs) using CAP3. After this step, the unigenes are defined: either contigs if redundancy was found, or singletons when a sequence read could not be mapped to any other sequence. The putative Single Nucleotide Polymorphisms (SNPs) were obtained using the pipeline QualitySNP, and Simple Sequence Repeats (SSRs, also microsatellites) were determined using SSRIT.

How the annotation was obtained

SwissProt and GeneOntology annotations were obtained using BLAST searches where descriptions are extracted from significantly similar, already annotated sequences. The KEGG annotation was produced with the KAAS, and Pfam annotation was obtained using Pfam's pfam_scan perl script in version 0.7 (a wrapper around HMMPFAM) with standard parameters to search against the Pfam-A database (v23.0).

References

  • Staden, R.; Beal, K. F. & Bonfield, J. K. The Staden package, 1998. Methods Mol Biol, 2000, 132, 115-130
  • Huang, X. & Madan, A. CAP3: A DNA sequence assembly program. Genome Res, 1999, 9, 868-877
  • Susko, E. & Roger, A. J. Estimating and comparing the rates of gene discovery and expressed sequence tag (EST) frequencies in EST surveys. Bioinformatics, 2004, 20, 2279-2287
  • Altschul, S. F.; Madden, T. L.; Schäffer, A. A.; Zhang, J.; Zhang, Z.; Miller, W. & Lipman, D. J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 1997, 25, 3389-3402
  • Temnykh, S.; DeClerck, G.; Lukashova, A.; Lipovich, L.; Cartinhour, S. & McCouch, S. Computational and experimental analysis of microsatellites in rice (Oryza sativa L.): frequency, length variation, transposon associations, and genetic marker potential. Genome Res, 2001, 11, 1441-1452
  • Tang, J.; Vosman, B.; Voorrips, R. E.; van der Linden, C. G. & Leunissen, J. A. M. QualitySNP: a pipeline for detecting single nucleotide polymorphisms and insertions/deletions in EST data from diploid and polyploid species. BMC Bioinformatics, 2006, 7, 438
  • Boeckmann, B.; Bairoch, A.; Apweiler, R.; Blatter, M.; Estreicher, A.; Gasteiger, E.; Martin, M. J.; Michoud, K.; O'Donovan, C.; Phan, I.; Pilbout, S. & Schneider, M. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res, 2003, 31, 365-370
  • Finn, R. D.; Tate, J.; Mistry, J.; Coggill, P. C.; Sammut, S. J.; Hotz, H.; Ceric, G.; Forslund, K.; Eddy, S. R.; Sonnhammer, E. L. L. & Bateman, A. The Pfam protein families database. Nucleic Acids Res, 2008, 36, D281-D288
  • Ashburner, M.; Ball, C. A.; Blake, J. A.; Botstein, D.; Butler, H.; Cherry, J. M.; Davis, A. P.; Dolinski, K.; Dwight, S. S.; Eppig, J. T.; Harris, M. A.; Hill, D. P.; Issel-Tarver, L.; Kasarskis, A.; Lewis, S.; Matese, J. C.; Richardson, J. E.; Ringwald, M.; Rubin, G. M. & Sherlock, G. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 2000, 25, 25-29

Contact

We appreciate any kind of feedback. In particular, if you encounter errors, or find bugs or inconsistencies, please let us know.
Main developer: Lothar Wissler