Differences between revisions 33 and 54 (spanning 21 versions)
Revision 33 as of 2010-03-29 10:06:38
Size: 2034
Editor: root
Comment:
Revision 54 as of 2012-03-12 11:01:26
Size: 3503
Editor: IanRees
Comment:
Deletions are marked like this. Additions are marked like this.
Line 7: Line 7:
EMEN2 is an object oriented database and electronic lab notebook. It is designed to store scientific data in a freeform way without limiting the ability to search/mine the results. Unlike a traditional database, where the contents of each record type (table) must be defined by a database administrator and strictly adhered to, each individual record in EMEN2 can have arbitrary additional parameters outside the record definition, and all such parameters remain fully searchable. Please note, this is NOT the related [[EMAN2]] image processing system.
Line 9: Line 9:
Records in the database may be arbitrarily linked to each other, much like the web. Any record may link to an arbitrary number of other records of arbitrary type (the record's children). Many other records may link to each record (the record's parents). This permits, for example, a publication to be linked into a publications folder as well as being linked to a specific project; or a microscopy session may be a child of both the biological research project as well as the microscope the data was collected on. Structural and computational biologists frequently work with complex data sets assembled from diverse experimental sources, public resources, and analysis methods. Archiving and mining these data sets with their complicated interrelationships remains a persistent challenge, particularly with “open science” initiatives to make entire workflows, including all raw and intermediate data, available with publications.

To address these needs, we have developed EMEN2, an object-oriented scientific database and electronic notebook. EMEN2 uses a flexible schema based on plain text descriptions of experimental protocols. These protocols may be local and describe techniques and data within a single lab group, reference published ontologies (e.g. GO, NCBO BioPortal), or contain links to external resources (PDB, GenBank, etc.). Similarly, an EMEN2 installation can itself act as a resource, providing public access to selected protocols and data. While originally developed to serve the needs of the cryo-EM community, we believe EMEN2’s architecture provides an excellent foundation for many other scientific endeavors.

EMEN2 is developed using all open-source technologies. The core database is written in the Python programming language, with BerkeleyDB providing a robust embedded database back-end. The infrastructure is highly modular, permitting new ontologies to be fully implemented using only it’s “Web 2.0” interface. In addition, there is a remote API available for client applications. The included EMDash program is a standalone GUI tool for equipment integration, currently used to upload data transparently from our electron microscopes as it is being collected, as well as integrate with other lab equipment. The EMEN2 server itself can be extended in a similar way by writing custom Python modules, which can expose additional views to the Web interface, or new methods to the API.

A full ontology for cryo-EM has been established for internal use and has been in active use at the NCMI for ~3 years. It is used to archive all data at the center, and currently provides services for over 750 users, with over 16 terabytes of data in 460,000 records.
As an example of its extensibility and ontology mapping capabilities, we have developed a module for harvesting the database and producing PDB compliant XML files which can be used to seed a structure deposition to EMDatabank.org.
Line 20: Line 27:
== EMEN2 Installation and Configuration ==
Line 21: Line 29:
 * [[http://pypi.python.org/pypi/emen2|EMEN2 download]]
Line 22: Line 31:
== Installation and Configuration ==  * [[EMEN2/Dependencies|Dependencies]]
Line 24: Line 33:
* [[http://ncmi.bcm.edu/ncmi/software/software_details?selected_software=EMEN2|Download]]  * [[EMEN2/Install|Installation]]
Line 26: Line 35:
* [[EMEN2/Dependencies|Dependencies]]  * The [[EMEN2/emen2ctl|EMEN2 control script (emen2ctl)]] and [[EMEN2/Startup|starting EMEN2 server at boot]]
Line 28: Line 37:
* [[EMEN2/Install|Install]] == EMDash: EMEN2 Client documentation ==
Line 30: Line 39:
* [[EMEN2/config.yml|Configuration]]  * [[http://pypi.python.org/pypi/emdash|EMDash download]]
Line 32: Line 41:
* [[EMEN2/User_Guide|User Guide]]  * [[EMEN2/emdash/Install|EMDash installation]]
Line 34: Line 43:
* [[EMEN2/FAQ|FAQ]]  * [[EMEN2/emdash/Tutorial|EMDash tutorial]]
Line 36: Line 45:


== Technical Discussions ==

* [[EMEN2/Integration|Instrument Integration]]

* [[EMEN2/Architecture|EMEN2 Architecture]]

* [[EMEN2/Ontologies|EMEN2 Ontologies]]

* [[EMEN2/Export|Data harvesting and Export]]

* [[EMEN2/API|Public API and web services]]
 * [[EMEN2/emdash|Using EMDash on microscopes]]

EMEN2

EMEN2

An extesible, object-oriented electronic lab notebook

Please note, this is NOT the related EMAN2 image processing system.

Structural and computational biologists frequently work with complex data sets assembled from diverse experimental sources, public resources, and analysis methods. Archiving and mining these data sets with their complicated interrelationships remains a persistent challenge, particularly with “open science” initiatives to make entire workflows, including all raw and intermediate data, available with publications.

To address these needs, we have developed EMEN2, an object-oriented scientific database and electronic notebook. EMEN2 uses a flexible schema based on plain text descriptions of experimental protocols. These protocols may be local and describe techniques and data within a single lab group, reference published ontologies (e.g. GO, NCBO BioPortal), or contain links to external resources (PDB, GenBank, etc.). Similarly, an EMEN2 installation can itself act as a resource, providing public access to selected protocols and data. While originally developed to serve the needs of the cryo-EM community, we believe EMEN2’s architecture provides an excellent foundation for many other scientific endeavors.

EMEN2 is developed using all open-source technologies. The core database is written in the Python programming language, with BerkeleyDB providing a robust embedded database back-end. The infrastructure is highly modular, permitting new ontologies to be fully implemented using only it’s “Web 2.0” interface. In addition, there is a remote API available for client applications. The included EMDash program is a standalone GUI tool for equipment integration, currently used to upload data transparently from our electron microscopes as it is being collected, as well as integrate with other lab equipment. The EMEN2 server itself can be extended in a similar way by writing custom Python modules, which can expose additional views to the Web interface, or new methods to the API.

A full ontology for cryo-EM has been established for internal use and has been in active use at the NCMI for ~3 years. It is used to archive all data at the center, and currently provides services for over 750 users, with over 16 terabytes of data in 460,000 records. As an example of its extensibility and ontology mapping capabilities, we have developed a module for harvesting the database and producing PDB compliant XML files which can be used to seed a structure deposition to EMDatabank.org.

EMEN2 Demo

There is a publicly accessible, read-only EMEN2 installation for accessing the NCMI's public datasets:

http://ncmi.bcm.edu/publicdata/db/home/

An overview document has been created to introduce new users to the EMEN2 web interface. It includes a number of screenshots.

EMEN2 Installation and Configuration

EMDash: EMEN2 Client documentation

EMEN2 (last edited 2013-04-22 20:02:57 by IanRees)