Adding value to data – Digital Repositories in the e-Science world

Special Session at the 4th IEEE International Conference on e-Science

(http://escience2008.iu.edu/)

December 7-12, 2008, Indianapolis, USA

New Keynote

Prof. David de Roure, Southampton, has agreed to give a keynote on his experiences in grid technologies and what these could mean for digital repositories. Details to follow soon.

DEADLINE EXTENDED

There is a great, untapped potential for synergies between grid/e-science technologies and a cluster of related systems addressing the management of digital assets in digital libraries and repositories. The digital material generated from and used by academic and other research is to an increasing extent being held in formal data management systems; these systems are variously categorized as digital repositories, libraries or archives, although the distinction between them relates more to the sort of data that they contain and the use to which the data is put, rather than to any major difference in functionality. In many cases, these systems are used currently to hold relatively simple objects, for example an institution’s pre-prints and publications, or e-theses. However, some institutions are beginning to use them to manage research data in a variety of disciplines, including physical sciences, social sciences, and the arts and humanities, as well as the output from various digitisation programmes.

Modern repository systems allow us to move away from the model of a stand-alone repository, library or archive, where objects are simply deposited for subsequent access and download. Instead, researchers are developing more sophisticated models in which these containers of data are integrated components of a larger e-Science research infrastructure, incorporating advanced tools and workflows, and are being used to model complex webs of information and capture scholarly or scientific processes in their entirety, from raw data through to final publications. Repositories have been successfully combined with data grid technologies, and in addition computational grids seem to offer possible applications in digital preservation and curation, such as automatic metadata extraction and index creation. These systems thus could add value to the data-driven research lifecycle in e-Science.

We list a sample of the research challenges below:

  • Digital preservation and curation in research infrastructures: Digital preservation keeps digital objects usable over time. Digital curation is about maintaining the current body of data for research and plays an important role in the life cycle of research. The question we would like to investigate is where these activities are placed within e-Science based research applications.
  • Interoperability: In attempting to build a robust distributed digital repository system, interoperability is still a problem as data standards for exchange are often still incompatible. We encourage submissions from researchers proposing solutions to a wide range of issues related to data interoperation and information sharing. Of particular interest should be the secure combination of research data with commercial or publicly funded data sets, e.g. the medical research data and hospital data.
  • Security: Although there has been some movement to support open access in research data in recent years, many science domains still rely on strict guarantees for security. In particular publishers of data would like to ensure that they keep control/
  • Provenance: In e-Science, careful consideration of provenance can be considered important for several purposes. In research workflows provenance helps the interpretation of results by linking them to intermediate results. Recognizing provenance is essential for judging the quality of data by linking it to its origins and its development.
  • Metadata Extraction: The success of digital repositories depends heavily on the quality of metadata associated with them. Meaningful searches cannot be done without them. Computational grid technology could help with automatic metadata extraction. Quite a lot of research has been done on how to use machine-learning, data mining and text mining, etc. to derive meaningful metadata corresponding to data and documents. Less research has been done on how to utilize the e-Science infrastructure for this task.
  • Workflow Integration: Scientific work often spans a variety of different technological environments, e.g. research data available in data grids compared to results of analyses upon those data held in e-print repositories, etc. We invite papers on the integration of these environments as they are vital for seamless research work processes.
  • Architecture of Participation: Web 2.0 applications have been successfully used in the digital library world to enhance content. They also play an increasingly important role in the development of usable e-Science applications that meet the researchers’ information need. How can they be used to enhance the content of research data by e.g. utilizing community knowledge of data?

In addition to the fields listed above, topics of interest include (but are not limited to):

  • Cyberinfrastructures (e.g. data grid technologies) to support digital preservation and curation
  • Creation and maintenance of asset management for research data
  • Federated repositories: content modelling, metadata creation and in particular ontology mapping
  • Data grid technologies and their role in digital curation and preservation
  • Creation and maintenance of digital libraries for research data
  • Access control and security across infrastructure and asset management systems
  • Persistent identification for research data and digital objects across e-Infrastructure
  • Provenance and authenticity of digital objects in distributed e-Infrastructure
  • Information and data services in e-Science applications

Papers submitted for presentation as part of the session should report original research that has not been published elsewhere.

We invite two types of papers:

  • Full Paper (not more than 8 pages): This type of paper reports research problems, solution ideas and preliminary results and should be at most eight pages long.
  • Short Paper/Poster (not more than 4 pages): The paper should contain a summary of research problems, solution ideas and preliminary results and should be at most four pages long.
  • At least one author of each accepted submission must attend the special session, and all papers submitted for presentation in the special session will be reviewed by at least three members of the Program Committee and will be published in the proceedings.

    Session chairs are:

    • Andreas Aschenbrenner - University of Goettingen
    • Tobias Blanke - Centre for e-Research (King's College London)
    • Mark Hedges - Centre for e-Research (King's College London)

    Confirmed members of the programme committee include:

    • Gregory Crane - PERSEUS (Tufts)
    • Neil Chue Hong - OMII-UK (Southampton)
    • Lee Dirks - Microsoft Research
    • Nicholas Ferguson - Trust-Itservices Ltd. (TBC)
    • Luigi Fusco - European Space Agency (ESA)
    • Wolfgang Gentzsch - Distributed European Infrastructure for Supercomputing (DEISA)
    • Jane Hunter - University of Queensland
    • Erwin Laure - European Organization for Nuclear Research (CERN)
    • Simon Lin - Academia Sinica Grid Computing (Taiwan) (TBC)
    • Elisabeth Lyon - UKOLN (University of Bath)
    • Reagan Moore - San Diego Super Computer Centre (SDSC)
    • Matthias Razum - FIZ Karlsruhe
    • Seamus Ross - HATII (University of Glasgow) (TBC)
    • Richard O. Sinnott - NeSC (University of Glasgow)
    • Thornton Staples - Fedora Commons
    • Andrew Treloar - Australian National Data Service (Melbourne)
    • Jano van Hemert - e-Science Institute (Edinburgh)
    • David Wallom - University of Oxford (TBC)
    • Paul Watry - University of Liverpool
    • Ann Zimmerman - University of Michigan

    Authors are invited to submit papers of double column text using single-spaced, 10-point font size on 8.5 x 11 inch pages, as per IEEE 8.5 x 11 manuscript guidelines. Authors should submit a PDF or PostScript (Level 2) file that will print on a PostScript printer. Submissions can done via the submission website for the Special Session:

    https://cmt.research.microsoft.com/DReSNeteScience2008/

    The session will comprise invited lectures and oral presentations.

    Important dates

    • Deadline for Submission of full papers: August 31, 2008
    • Notification of Acceptance: September 15, 2008
    • Final submission of camera-ready papers: September 29, 2008