Whoa, you're using an old browsers aren't you? This site would look better if you upgraded. We recommend Mozilla Firefox

WERA1015: Developing the US National Virtual Herbarium

Statement of Issues and Justification

In 2008, the Western Association of Agricultural Experiment Station Directors supported the WDC 12 project for Integrating Access to Information from Herbaria. This project brought together more than 40 representatives from herbaria across the country at the Botany 2008 conference. The group unanimously supported the creation of a US Virtual Herbarium, USVH, that will provide, through collaboration with regional networks, a single portal to information in the nation's herbaria. This proposal outlines the goals and objectives of a five year project to develop the USVH that will be led by a coordinating committee with representatives from herbaria and informatics. Overall, the project will support AES by increasing the information available about plants that contribute to and/or impact the US agriculture industry by enabling more efficient access to the wealth of information that resides in herbaria.

Justification

Herbaria are rich sources of fine-grained information about plant diversity and biogeography and a key resource for educating students. USVH will make the resources of the country's more than 625 herbaria freely accessible to scientists, consultants, students, and members of the public via the Web. This will revolutionize research and education in systematics, ecology, land management, conservation biology, biogeography, and biodiversity informatics, just as the creation of the first herbaria in the 1540s transformed plant taxonomy (Pavord 2005) by providing, for the first time, an effective means of documenting the meaning of a name and the plants of a region. Access to more specimen information, such as USVH will provide, will enable better use of analytical tools for identifying the ecological and temporal factors that determine species distributions, prediction of areas to which an introduced species will spread or additional populations of a native species be found, and exploration of the similarity in ecological parameters determining the distribution of insect pests to the distribution of potential host species.

Placing images of herbarium specimens on line will facilitate accurate identification of specimens collected in the field, a process that currently requires visiting a herbarium with a well-managed collection. Tools that draw on both distributional information and images can be used in developing powerful online identification resources, a development that will be welcomed by all those required to identify plants.

Enabling rapid visualization of species distributions will highlight the areas for which there are few collections. This can be used to encourage greater participation in documenting the nation's flora and developing greater interest in acquiring taxonomic skills. Representatives of several federal agencies, including the US Forest Service, have expressed dismay at the lack of graduates with the ability to identify plants. Development of USVH will not, in itself, solve the problem but it will make resources available that can be used to address it.

A multi-state approach to the development of USVH is essential. Neither plants nor collectors recognize state boundaries; information about a species based on its distribution in a single state or the herbaria of a single state is likely to be inaccurate. Even to find the state-level-distribution of a species may require searching out-of-state herbaria. Holmgren and Holmgren (1977) reported that Stipa lemmonii (Vasey) Scribner [a Achnatherum lemmonii (Vasey) Barkworth] did not grow in Nevada or Utah. Barkworth and Linman (1984) discovered specimens of it from both states, one of the two from Utah being in the Gray Herbarium of Harvard University, the other in the herbarium of Northern Arizona University. Those from Nevada had simply been misidentified. In addition to avoiding overlooking records documented only in out-of state records, a multi-state approach will facilitate sharing development of the infrastructure required, thereby eliminating redundancy of effort and reducing the overall cost of its construction.

The goal of USVH is technologically feasible. Indeed, a few regional networks are already operational (e.g., SEINet (http://seinet.asu.edu/seinet/index.php), Consortium of California Herbaria (http://ucjeps.berkeley.edu/consortium/). Representatives from these networks have agreed to assist in developing USVH. A larger challenge is sociological: engaging the taxonomists in charge of herbaria, building bridges between taxonomists and computer scientists, and expanding the pool of individuals able to work at the interface between these two areas. This project focuses on development of the human resources and interactions needed to convert the idea of USVH into reality by improving dissemination of information on the processes involved. Support for software development, hardware purchases, and data entry will be sought from a variety of other sources. This project, while not contributing directly to the financial cost of digitizing herbaria and building the networks on which USVH will depend, will contribute indirectly, and substantially, by reducing redundancy in software development and accelerating dissemination of information about new and improved protocols for completing the tasks required for building UVSH. It is hard to estimate the consequences of failing to establish this coordinating committee. A national portal to US herbaria probably will be developed eventually, but will probably take much longer and/or include only the large herbaria as does, for instance, Australias Virtual Herbarium (http://www.chah.gov.au/avh/avh.html) which ignores all but the official state herbaria. The abundance of herbaria in the US is one of the country's strengths. The purpose of the coordinating committee is to ensure that the country benefits from the wealth of resources, both informational and human, they represent.

The US National Virtual Herbarium will draw information from regional networks and individual herbaria. Consequently, this project focuses on three different levels: aiding people at individual herbaria in digitizing collections and making these resources Web-accessible, integrating information from multiple herbaria at a regional portal, and sharing data through a national portal, including enabling regional portals to reflect records for their region from extra-regional herbaria.

Previous work and current resources

Tracking progress. The number of active herbaria within the US is not known. Over 625 are registered with Biodiversity Collections Index (BCI; http://www.biodiversitycollectionsindex.org/static/index.html), but many smaller herbaria are not registered and some of those listed are inactive. It is also not known how many individual herbaria have begun to database and/or image their collections. About 25 or 4% are listed as providing information to GBIF (inquiry sent on Jan 7, 2009). One of the first tasks of the committee will be to determine the number of herbaria in each of these categories so that it may track progress and identify bottlenecks in creating USVH.

The potential importance of small herbaria to this enterprise is hard to overestimate. They are often located in areas that, because of their remoteness, have not been well collected; among their collections are many specimens that are the only known record of a species occurring in a particular region (Edward Gilbert, pers. comm., 2009). Equally important, they are often at undergraduate institutions that provide more research opportunities to their students than is feasible at large research universities.

It is also important to ensure that those teaching plant systematics teach their students to record the kind and quality of information now required for conformity to international record standards. Supporting the regional networks will aid in achieving this.

There are two resources for recording the existence of herbaria, Index Herbariorum (IH; (http://sweetgum.nybg.org/ih/) and the Biodiversity Collections Index (BCI; http://www.biodiversitycollectionsindex.org/static/index.html). These two projects are working to improve their interface so that information stored by both will only need to be entered once, but they serve different audiences and each has some unique fields. One difference, critical to this project, is that BCI accepts information from small collections, thus linking to BCI rather than IH will best serve the project's needs.

Digitizing collections. Digitizing a specimen comprises two aspects, imaging the specimen as a whole and then storing the label and annotation information in a database. There is little uniformity in the processes used for these processes. This has led to a redundancy of effort and confusion on the part of those wishing to start digitizing. At the 2008 meeting, it was clear that everyone wanted to use the most efficient methods for digitizing their herbaria, but there is great uncertainty as to the optimal combination of equipment, work flow, and software. The answer varies with the size and purpose of each collection, but factual data and clear instructions would greatly aid those seeking to start or speed up the process in their collection.

Sharing digitized information: The Global Biodiversity Information Facility (GBIF) and Taxonomic Databases Working Group (TDWG) have sponsored the development of internationally recognized standards for sharing biodiversity information. Most US herbaria export their data to according to the DarwinCore standard; with or without its extensions. Software for exporting information from a herbarium database must be tailored to the database used. There are tutorials for accomplishing this that can be readily understood by computer scientists. Clearly, there will be economies if the number of different herbarium database systems is minimized, but the primary requirement is that any system used be able to accommodate the desired fields. No databasing system is perfect. Information on the strengths and weaknesses of each system needs to be accessible. Some of this information already exists, but a summary that is regularly updated would be beneficial.

There are two protocols for sharing digitized information, Digir and Tapir (http://www.tdwg.org/activities/tapir/), Tapir being the most recent and the only one that can accommodate images and, consequently, the one on which this project will focus. Implementing these protocols requires a background in Information Technology (IT) or computer science. The National Biodiversity Information Infrastructure (NBII) program within the US Geological Service has sponsored workshops on using these two protocols for people with an IT background and is willing to sponsor more such workshops. As a first step, all networks must have at least one, and preferably more, people aware of how to set up, use, and maintain Tapir.

Integrating resources. For integrating the information from multiple herbaria there is another set of necessary and/or desirable software, e.g., software for data clean up, accommodating differing taxonomic treatments, georeferencing, image examination and measuring. At present, there is little if any sharing of software, nor an easy way to determine what software already exists. A clearing house that provides access to open source software for such procedures will accelerate creation of operational networks, a critical step in USVH development. Other resources are needed at a national level, e.g., a list of species of concern and the states in which they are of concern, shape files for counties that include the dates for which they are valid, software for automating the sharing of records among regions, etc. The immediate need is to determine what is needed and what is available and make this information available.

Last Modified: 16-Feb-2009

Back to Top