REPORT OF THE DELOS-NSF working group on Digital Imagery
for Significant Cultural and
Historical Materials[*]
Co-Chairs
Ching-chih Chen, Simmons College,
Boston, USA, Co-chair
Alberto Del Bimbo, University of
Florence, Italy, Co-Chair
EU Members
Giuseppe Amato, CNR-IEI, Pisa, Italy
Nozha Boujemaa, INRIA - Roquencourt, France
Patrick Bouthemy, INSA INRIA -
Rennes, France
Joseph Kittler, University of Surrey, UK
Ioannis Pitas, Aristotle University
of Thessaloniki, Greece
Arnold Smeulders, University of
Amsterdam, The Netherlands
US Members
Kirk
Alexander, Princeton University, USA
Kevin Kiernan, University of Kentucky, USA
Chung-Sheng
Li, IBM T. J. Watson Research Center,
USA.
Howard
Wactlar, Carnegie Mellon University,
USA
James Z. Wang, Pennsylvania State University, USA
Recent revolutionary breakthroughs in computing and
communications with the epoch-making arrival of the Internet have begun to
demolish artificial disciplinary boundaries and to open vast new fields of interdisciplinary
research. One major area was outlined
in the recent report to the US President by the President’s Information
Technology Advisory Committee (PITAC), entitled Digital Libraries: Universal Access to Human Knowledge [US PITAC,
2001]. In its cover letter PITAC
defines digital libraries as “the networked collections of digital text, documents, images,
sounds, scientific data, and software that are the core of today’s Internet and
tomorrow’s univer-sally accessible digital repositories of all human knowledge.” One of the chief
impediments to broadly useful access to digital libraries, however, is the
sharp cleavage in the academic research community between science and
humanities. The division is
particularly detrimental to research and practical development in digital
libraries, because computer scientists cannot adequately provide for universal
use of the world's cultural heritage without a deep understanding of the
relevant materials. The DELOS/NSF Working
Group for Digital Imagery of Significant Cultural and Historical Materials was
organized to bring together specialists who study these priceless materials
with technologists who have the expertise to help mine them in new ways to make
them universally available to the world's population.
(http://www.delosnsf-imagewg.unifi.it/activities.htm).
Our world is rich with relatively inaccessible and increasingly vulnerable
repositories of unique paintings, sculptures, and other works of art and
fragile hand-written records in a plethora of styles and scripts on clay and
stone and wood and canvas and cave walls, on parchment and paper and papyrus,
not just in libraries and museums, but also in churches, temples, and mosques,
and in the living museums of the longest inhabited cities and villages
throughout the world. But time, natural disasters, thieves, vandals, and terrorists,
are ever busy destroying them. For
example, for the past thirty years the museums of Europe have routinely checked
bags for bombs, and evacuated their premises in the middle of the day for bomb
scares. A unique Native American
pictograph known as the Blue Buffalo
was recently destroyed by an unknown vandal, while Michelangelo’s Pieta was damaged 30 years ago by a
crazed, hammer-wielding, zealot who thought he was Jesus Christ. Not long ago the Taliban blew up two
1500-hundred-year-old monumental Buddhas in the Bamiyan valley in
Afghanistan. Indeed, as we were
drafting this report, two priceless paintings by the Dutch master Vincent van
Gogh were stolen from the Van Gogh Museum in Amsterdam [CNN, 7 December 2002]. On September 11, 2001 – widely known as “911,”
along with the horrific loss of life in the World Trade Center, the world lost
original Rodins and many other irreplaceable works of art in the destruction of
the “museum in the sky” on the 105th floor of the North Tower. As our museums have now become the “soft
targets” of terrorists, it suddenly becomes a matter of our health, security,
and economic well-being to archive, preserve, and even restore our most
significant cultural and historical materials in digital libraries.
Research on digital imagery for
significant cultural and historical materials is imperative because of its
potential impact on many related sciences and engineering, its relevance to
education at all levels, and its role in promoting cultural understanding. Research on digital cultural materials will
undoubtedly inspire and add new rigor to many related research fields, including
computer vision, artificial intelligence, information technology, data mining,
and image processing. Cultural
materials require specialized knowledge to prepare for ubiquitous use by
everyone from scholars to the general public.
Cultural materials are valuable not only to humanities research but also
to technological and scientific research.
Digital imagery of significant cultural and historical materials is of
great value in their own right for research on computational intelligence. Applying modern computing techniques to
analyze them will gain insights for general-purpose image archival,
distribution, and intelligent automatic information extraction.
Education of all levels is a
crucial part of our conception of collaboration and subsequent recom-mendations. Treasures of human culture infuse ordinary
people with inspiration, imagination, and pride. Historical materials record the history of our human
societies. Ancient artifacts reveal
social structure, the way people normally lived, fashion and entertainment, as
well as the technological level of the times.
The same modern technologies used by medicine, intelligence, forensics,
and space programs will bring the cultural heritages of all time with greater
access.
While other working groups deal with text and
multilingual approaches, our focus on digital imagery does not neglect the
beauty and cultural significance of the scripts of other cultures. Formerly so foreign and inaccessible to
other cultures, Chinese characters, Arabic script, and cunieform tablets, for
example, can now be easily read and understood by people without foreign
language capabilities, with the help of intelligent graphical user
interfaces. These interfaces can even
instruct users how to compose these special hand-written artifacts or can
automatically translate the images into the different languages of any and all
readers. With the ever-increasing
importance of communication among people all over the world, it is crucial to
understand and respect cultural diversities and learn from each other. Prejudices often come from misunder-standing,
or unwillingness to understand.
Cultural materials are non-violent, unbiased, cultural ambassadors. Modern digital technologies have made it a
reality to exhibit large collections of works from multiple cultures. Since an enormous amount of historical and
cultural materials have been created, both storage and distribution raise many
challenges. Further advancing digital
technologies for archiving and distributing these materials is of great
importance.
In this preliminary report we lay out an urgent
interdisciplinary research agenda to pursue collaborative projects that
develop, apply, and adapt leading-edge technologies to manage and analyze large
and varied digital collections of cultural materials.
2. OVERARCHING GOAL
Recognizing that significant
cultural and historical materials are not merely data, we advocate an
organized, continuing collaboration between subject specialists and
technologists to establish sustainable and enduring digital archives of the
world's cultural heritage and to provide universal and ubiquitous online access
for advanced research as well as for all levels of formal and informal public
education.
3. CONCEPTUAL FRAMEWORK

Figure 1.
Conceptual Model of Our Research Agenda
Our conceptual model (Figure 1)
attempts to illustrate the relationships among people, content, and
technologies in our proposed research agenda.
Our interdisciplinary research will develop technologies to
enhance the way people create and access the content of their
cultural heritage. People encompass all
users, from curators and library and information scientists, to scholars,
teachers, and students in all areas of the humanities, to citizens of all
cultures. Content is the vast array of
significant cultural and historical materials throughout the world. Technologies are the enabling research and development in all related technical areas such as
information retrieval, image processing, artificial intelligence, and data
mining.
We recommend focused,
interdisciplinary, research programs along the three edges and the center of
the triangle, areas that traditional research programs currently neglect. The research area between people and content
is the area of digital imagery creation and preservation. The area between content and technologies is
the efficient and effective retrieval of the content using
technologies. Research into presentation
and usability will enhance the ability to access the content. Effective applications and use of the
research results, under lifecycle management, will integrate research of the
three related areas.
4.
THE FOUR INTERRELATED RESEARCH INITIATIVES
Digitization of cultural
artifacts should provide a lasting electronic record for scholarly and
universal access, preservation, and study.
At the present time, however, digitization projects are proceeding
without established methods of recording precise conditions of digitization. Experts in the subject field must begin to
work closely with technologists in developing digital imaging technologies for
historical archiving. We need tools that automatically protect the integrity,
fidelity, and security of digital images, and record any subsequent processing
of them. In addition to the automatic
recording of such technical metadata, these tools should provide the means for
subject specialists to encode descriptive metadata to facilitate subsequent
search and retrieval.
Digital
imaging modalities encompass visual appearance, texture, surface shape, and
sub-surface hidden structure. Multi-modal acquisition enables new insights
into structure and meaning. We do
not confine ourselves to any specific imaging modalities. Techniques such as photography, video, X-ray, 3-D scans, infrared,
UV, and laser scans have been used successfully for different art recording
purposes. Capturing cultural artifacts using different imaging modalities
creates the need for efficient, automated multimodal geometric registration
techniques. Novel technologies or
integration of existing technologies should be developed to better facilitate
the study and recording of collections of historical artifacts.
New methods of multi-modal rendering and presentation are required
to support different audiences and applications.
Techniques are required for measure-ments of degradation and support
of restoration. Considering a digital repository of
cultural artifacts not only as an educational & art history research tool
but also as a powerful tool for restoration, implies that, apart from visual
data (images, x-rays, etc) and simple
text/metadata information, a wealth of other research/restoration data should
be stored in the repository [Pappas et al, 1999]. Such data can include physical
details (e.g. dimensions), restoration details, creation data, current physical
conditions, storage information, historical data, associated bibliography,
spectroscopy/colorimetry measurements (along with information for the measurement
location on the artifact), etc. Obviously,
efficient storage, recall and presentation of this information to the user
is a challenging task that requires significant research.
An
important consideration is to record the provenance and any subsequent changes
of the items in the collection, such as distinguishing between the original
source at some point in time and other existing renderings in the
archives. Historical artifacts and
works of art evolve over time and users must be able to distinguish the source,
the time, and the process. The needs of
users also evolve over time. The
recording process must both incorporate previous use and anticipate future
needs. For the best, long-lasting,
results, we must record artifacts with the highest resolution economically
possible. Naturally, this approach creates the need for special
recording procedures such as image recording in partially overlapping parts (so
as to minimize geometric distortions and maximize resolution) followed by image
mosaicing to synthesize the whole image. Furthermore, one should bear in mind
that certain situations require that the work of art should be digitized
in-place (consider for example murals and architectural monuments), a fact that
creates additional problems that need to be tackled.
Although recording should indeed
be performed at maximum resolution to facilitate research and restoration,
storage and transmission of recorded data might impose size/speed restrictions.
Thus, existing lossy/lossless compression schemes should be incorporated and
new compression techni-ques that take into account the special requirements of
digital art artifacts repositories should be pursued if necessary. Some good
candidates for this purpose are multiresolution schemes that can adapt to the
requirements of resolution and transmission speed of a specific
application.
Creation
of a lasting historical record requires a repeatable imaging process. The imaging process should be calibrated so
that artifacts can be digitized at multiple instances of time to study degra-dation
or to return an artifact to its original condition in digital form. For example, the digitization of paintings
requires the recording of illumination and color for later calibration of the
exact appearances of the paintings at a particular point in time. The recording process should document
technical metadata such as time, date, equipment, lighting, and calibration
parameters. Technolo-gies to automate
the recording of technical metadata can be developed for fast digitization of
large collections, as well as accruing descriptive metadata by experts in the
various subject domains.
Open
standards for capturing categories of artifacts can be developed to facilitate
inter-operable systems. At the same
time, established standards such as MPEG, DICOM, and LOC, developed for other
domains, should be studied to determine suitability for digitizing artifacts.
Digital preservation and
archiving activities in the area of arts should not focus only on “traditional”
works of art (paintings, sculptures, architectural monuments, works of decora-tive
art, etc) but be broad enough (and prepared to) include new forms of art that
are also in need of preservation. Thus, archiving and preservation of
computer-generated art (computer graphics, animations, web art), video art,
movies, artistic installations, landscape art, art performances should also be
considered. The limited lifecycle of some of the above forms of art (web-art,
installations, performances, landscape art) is an additional reason for
preserving and archiving them using digital imagery. Preservation procedures
that take into account the particularities of these works of art should be
defined.
While “copyright” and
“intellectual property” issues are not addressed in this report, we are mindful
of the great importance and complexity of these problems and issues. Thus, it is also significant to explore
possibilities for creating a corpus of copyright-free image and video documents
for research and evaluation.
4.2. Retrieval
Computer Science and humanities
disciplines often use the same terms in quite different ways. For example, in
content-based retrieval computer scientists use the word “content” to refer to
measurable visual properties, such as color, shape, texture, spatial relations,
features that we will here call physical content. For non-computer scientists “content” normally refers to
meaning. For example, the content of
the image of a manuscript page is not its color, shape, etc., but the meaning
of the text in the manuscript. In the
case of works of art, the content of a painting like the Mona Lisa is for a
computer scientist a given combination of color distribution and shape, while
for the art historian or visitor to the Louvre the content might be instead
painting techniques, historical models, iconographic styles, the representation
of women, the study of mood, ambiva-lent expression, Leonardo DaVinci, and any
other features that a computer cannot retrieve without descriptive markup by
specialists in art. For productive
collaboration between computer scientists and humanities scholars it is
necessary to understand and make provision for these differences between
physical content and meaningful content.
With the potential ambiguity concerning “content” in mind, we discuss a
variety of strictly computer science image-based retrieval topics below:
Still image attributes such as
color distribution, shape, texture, and descriptors and invariant descriptors
for scale, light, or point of view, are obtained by statistical image
analysis. Automatic generation of
categories (clustering) of these attributes enables visual overview of discrete
image collections. Dynamic video
attributes, such as motion field, scene activities, and camera motion, are
extracted from the temporal imagery of animation or motion pictures. Other more complex extractable features
include automatic transcript generation by speech recognition and geometric
3D-model description. Physical features
are automatically generated metadata, as distinct from descriptive metadata
supplied by experts.
Machine-based
image similarity search is computed by comparing automatically extracted
features. Similarity measures must be
defined with the specialist according to the feature set pertaining to the
specific domain. In performing
similarity searches on large reposi-tories, scientists have investigated
efficiency and scalability issues. They
are developing algorithms and defining access methods that will allow highly
efficient search processes for increasingly larger image and video
collections. Global searches like
these can be usefully narrowed by reference to descriptive metadata supplied by
the domain specialists in the course of assembling and editing the image
archives.
For the subject specialists preparing the collections for
the universal user the term "semantic" most likely relates to
signification or meaning. The semantic
gap for imaging scientists is the space between low-level features such as
color, shape, and texture, and the high-level queries such as objects and
concepts. Precise machine search
allows the user to focus interest on selected objects or parts of an image,
such as a small detail in a complex landscape.
Machine learning techniques hold the promise to further bridge the
semantic gap by generalizing from manually generated descriptive markup. For example, computer algorithms can
potentially learn to classify paintings of different styles. The fact that a digitized work of art is not the work
itself but an image (instance) of this work, acquired at a certain time, under specific
conditions (size, resolution, camera position, light, physical condition of the
work, e.g. before or after a restoration operation) makes semantic-based
indexing and retrieval an absolute
necessity in this area. For example, a query on “Mona Lisa” should
retrieve all images of the painting regardless of size, view angle, restoration
procedures applied on the painting, etc. Alternatively, image fingerprinting
(or robust hashing) which deals with extracting unique image identifiers that
are robust to image deformations
(cropping, resizing, illumination changes, rotations etc) might be used along
with query by example techniques to partially deal with this task. However,
this area is still in the early stages of research & development. In the meantime, descriptive markup provided
by subject specialists remains the most precise and reliable recourse and will
continue to be an invaluable guide to any develop of automated search
strategies.
4) Integrated
access to digital repositories
After
decades of research, integration of image, text-based retrieval, and other
information retrieval techniques have improved search effectiveness. For example, text encoding provides semantic
description of content. Automatic
search can begin with user-defined constraints on the search domain. We can also define semantic structures as
relationships of concepts allowing high-level content-based retrieval, which
can be integrated with existing retrieval techniques to better facilitate user
access. Automatic image analysis can
furnish additional text annotation relating to physical features.
The availability of a huge amount of
digital material, both images and videos, of cultural heritage requires the
investigation of new cost-saving, and effective methods for annotation and
retrieval that are easy to use for most users.
Image analysis processing techniques provide a powerful means to extract
useful information from pixels, and provide automatic description of image and
video content. These relate to "syntactic" information (like color
and texture, video editing effects), low-level primitives (like corners, shapes
and spatial relationships) and higher-level information (like objects, scene
content, subject description, even associated emotions...) as well as to
invariance under different aspects.
Content annotation based on pixels can be used to perform search
operations from objective measures and descriptors of the visual content. Effective descriptors that agree with human
perception and feeling are required, with particular attention paid to the
computer science "semantics" of images and scenes, among other
things. Obviously, some of these processes
cannot work alone for images of hand-written documents, such as ancient
manuscripts and cuneiform tablets, nor for some prominent 20th-century
painting, such as cubism, impressionism, and abstract impressionism. In these cases, descriptive metadata is
essential first.
4.3. Presentation
and Usability
Although it does not present a
complex computer science challenge, we recognize the basic need to integrate
standard specialist markup into any solutions for presentation and usability
for universal users. It is accordingly
essential to capture commentary and annotation from past, present, and future
users of digital archives of cultural and historical materials. This capability must be embedded at every
level of a system and should be part of its overall design. Along with collecting and progressively
adding specialist metadata, the system must be capable of tracking, filtering,
and quantifying all of this information.
There is a great challenge to
developers of computer systems with respect to making large collections of
digital imagery available and meaningfully accessible to investigators
interested in cultural and historical subject matter. In general this large and diverse audience generates demands on
computer systems for simple and intuitive interfaces that stress almost every
existing mode of presentation and usability, and demand substantially more
sophistication on the underlying systems to make such new interfaces
possible. We see the need for development
in five key areas including the 1) design of advanced multimedia interfaces, 2)
display and delivery technologies, 3) three-dimensional issues, 4) presentation/exploration
of multimedia from multiple perspectives, and 5) visualization and summarization of cultural material/ collections and
potential relations between collections.
1)
Design of advanced multimedia interfaces
We need new interfaces both for
expressing queries of cultural materials
and for presenting results of cultural material so they can be exploited
meaningfully in multiple contexts (e.g. research, teaching, public
exhibition). In particular, for
non-specialists (and for searching vast collections) new graphical user
interfaces (GUI’s) are needed for expressing both verbal and non-verbal
queries. To meet this need several
things must be done:
·
New query and browse paradigms must be envisioned that permit
iterative refinement for the investigator
·
An abstract layer must be devised for posing queries in a way that
is independent of data modality and language independent way. These new query types include objects to deal with either low level features
(texture, color, shape) of an item or with a high level concept such as
indoor/outdoor, portrait/landscape, smile/frown or even metaphors, while
operations might include such relationships as logical, temporal, or spatial
operations.
·
Given the abstract query layer, a translation layer must be
developed to present results of the query in an intelligible way to the user
·
These query possibilities must be translated into flexible and
informative interfaces (running the gamut from natural language to completely non-verbal
queries) for the widest possible audience.
In addition to abstracting and
generalizing query paradigms, systems must support both multilingual and
language-independent retrieval. For
example, one must be able to pose queries related to abstract concepts such as
“deity,” “truth,” “beauty,” and “style,” all of which exist in different
languages, different scripts, and different cultures with dramatically different
semantic meanings. The search results of such queries must permit the
presentation of cultural and historical images and videos available from
sources throughout the world. This
interface challenge is related to the abstract query layer but requires cultural
and language dependent ontology management to drive the abstract query layer so
that it can perform the necessary translations in multilingual contexts. Computers cannot possibly achieve this goal
without pervasive assistance from specialists.
In a system offering rich arrays of
material there are also new challenges simply to render the range of query
possibilities comprehensible.
Evolving digital collections must
support multi-modal (heterogeneous) data handling and their integrated
presentation (photographic images, UV, X-Ray etc.). All categories of users must be able to browse simultaneously
along manifold axes. In presenting query
results, new retrieval software must dynamically adapt to various devices and
bandwidth as well as to support personalized formatting of content. The issue of personalization requires
meta-tagging of the content: it must
permit a characterization of what is important in the content. A current example of this can be seen in the
trans-coding hints supported in MPEG-7.
This permits progressive visualization of content based on relative
prioritization of the content components.
This automatic metadata must also integrate material based on
characteristics of continually changing user demands.
Looking at a system from its ability
to learn to present material in a hierarchy related to what is important, it
will be desirable to attempt to monitor the actions of the user (areas of
visual concentration) to add attributes/annotations to the database. If a system can accumulate these over the
lifetime of the entry it can record both what the users express interest in as
well as what the domain specialists may tag as valuable. Finally, this value tagging can be used to
enable the display system to dynamically select between different/ multiple
resolution representations or between 2D/3D representations of the
content. They must also use already
manually encoded metadata to narrow searches to specific realms of inquiry,
just as museums organize displays, or clients themselves organize their
individual visits to museums.
The design of advanced multimedia interfaces should always come
hand-in-hand with the objective of making the digital repositories available to
as many people as possible. Since the Internet is currently the most widely
used means of information sharing & retrieval, care should be taken to
construct new interfaces that integrate easily with standard Internet browsers
(e.g. as plug-ins) and do not require the installation of application-specific search/browsing
software.
2) Display and delivery technologies
Existing systems are workstation
based. Future systems designers will
need to investigate such technologies for many forms of digital imagery in
varying technical and personal contexts as:
·
New display solutions (e.g. Research shows that Large Scale
Displays generate different impact on users and enable greater understanding of
complex data.)
·
Seamless interfaces which integrate information about an entire
collection with recent queries against the collection. This notion is embodied in projects such as
a DL-I project, “Concept Space” [Chen et al., 1997; Schatz and Chen, 1996]. In describing this project researchers have articulated
the need to pose a query and to present the results in the same display space,
which makes it easier to refine a query.
·
Systems which support interaction based on gesture
recognition. We need to build on the
experience of museum guides and art historians and encourage their interaction
with computer scientists. These systems
can digitally capture a gesture by monitoring it directly. Gestures may include eye, hand, head,
position movement, attention span tracking and even mood (based on brain wave
detection.) The collection of such
gestures may also help to create a natural way to interact with the viewer by
permitting the simulation of virtual guides.
·
Different display spaces and different users. There are two degrees of freedom, the device
and the user: Currently, systems are
mostly designed with single device and discrete user roles (such as curator,
preservationist, and general public) in mind.
What is needed is a framework to allow continuous personalization of a
common interface, so that the same interface can be adjusted for different user
roles using various devices.
3)
Three dimensional issues
Archives holding cultural and
historical materials will contain data about many artifacts with three
dimensional attributes. There are two main purposes for 3D representations: (1)
the representation of existing objects and (2) the reconstruction of objects
that no longer exist.
3D is one way to examine cultural
artifacts in a more real context. To
fully support the presentation of three-dimensional artifacts developers will
need to create a solid and sustainable 3D representation and presentation
platform with associated queries. There
is a need to develop a formal extension of existing concepts, such as VRML and
its extensions in MPEG4. In addition,
in a culturally aware user-world, curators and others will want access to 3D
texture information as well as to information
that is not related to the visual appearance of the work but is of high
importance for restoration and research (material properties & condition,
structural information, etc). Here
too there is a need to represent and query in this domain. Finally, research is
needed to determine which modes of virtual world presentation are meaningful
for cultural and historical investigations such as the relative value of
immersive versus non-immersive environments.
4) Presentation/Exploration of multimedia from multiple perspectives
A single cultural heritage
investigation (e.g., Rome Revisited, the The First Emperor of China project,
the Perseus project, Uffizi, St Petersburg, Dunhuang, Sutton Hoo, etc.) may involve
artifacts of very different media types (a building, statues, a cave, a ship,
textiles, gold-working, inscriptions, pictographs) and draw on information from
the realms of history, art history, archeology and the like. This multidisciplinary environment
necessitates investigating a new level of abstraction across these axes so
support of multiple perspectives is not uniquely defined for each different
case. Research is needed to define the
special common attributes of each of these kinds of perspectives.
5) Visualization and summarization of cultural
material/collections and potential relations between collections
As the quantity and range of
cultural archives expands, the summarization of collections themselves will be
required in addition to summarizations of like items within a single
collection. What kind of relations can
be envisioned among collections? We
must imagine and make available new taxonomies for ever-emerging
interrelationships. Evolutions in
history, expansion of technologies, wars etc. all provide context for a
collection that will need to be queried, displayed and analyzed. This constantly evolving situation will
require new techniques to view and represent the attributes of an entire
collection and to relate these to the attributes of individual items. Additionally one may want to define the
attributes of a collection relative to known items represented in the
collection. In particular in the realm
of visual collections one can imagine the Mona Lisa appearing in many collections. This notion has implications for
complementary content-based image retrieval (CBIR) as well.
4.4. Applications and Use
The critical issue is providing
access to digital archives in appropriate forms for widely different user
needs. Digital image and video archives together with technologies to access
and present them will provide a resource for both general education at all
levels as well as for specialized research.
Potential user groups include historians, curators, educators, students,
and members of the general public, with their use running the gamut from
curiosity to research to analysis.
Instructional technologies are
necessary for teaching and research in educational institutions, which utilize
digital archives to illustrate concepts and allow users to search for
relationships among many collections of artifacts. This advanced range of use requires discipline-specific advice,
supervision, and extensive descriptive metadata. Unified access of multiple archives from different sources is
required to support queries across heterogeneous documents of historical
materials.
Museum installations for
interaction with digital reproductions of cultural materials enhance the
educational value. Presentation of
historical artifacts together with digital enhancements can be used to
illustrate the context. Display of
multi-modal imagery can reveal structures or information not visible in the
original. Virtual representations of artifacts allow the user to interact and
explore objects. Natural mechanisms
such as gesture analysis are desirable to enhance interaction with visual
representations.
Extension of digital collections
online is necessary to share archives among sites of different geographic
locations. Sharing our cultural
heritages will promote productive interchange of knowledge and establish common
ground to reach the greatest possible audience. The techno-logical challenge of online access is to achieve high
quality and high speed at low cost to the broadest possible user base.
There is a need to establish
benchmark datasets and related queries among user groups to facilitate
evaluation of the research progress.
Technology development is an iterative process requiring continuous
evaluation and improvements. Scalable
deployment to archive, search, and analyze very large collections of artifacts
is the ultimate challenge. Lifecycle
management of content capturing, cleansing, normalization, indexing and
retrieval is crucial for scalable deployment.
The lifecycle of "content" into a
digital library includes the following major stages:
1) Ingestion/creation - also known as capturing
or digitizing of those physical objects (painting, artifacts) into digital
representation,
2) Editing
- includes the normalization, standardization and cleansing of the captured
data, including color and brightness adjustment,
3) Analysis
- includes various metadata extraction such as low-level features (color
histogram, textures, shapes, geometries, etc.), high-level features, and
potentially correlating with other "related" content,
4) Management
- includes the management of both metadata and content, such as developing
indices for faster retrieval, addressing issues such as data integrity,
consistency, and versioning,
5) Distribution
- addressing issues related to content dissemination for the consumption by the
end user and require I/T infrastructures (such as caching), copyrights
management, etc.
As a result, the lifecycle management for a digital library includes
potentially the capturing and tracking of the workflow associated with all
aspects of content so that the processing steps of the content from creation to
dissemination can be better automated with richer, more intelligible, results.
5. MECHANISMS FOR COLLABORATIVE RESEARCH
1) Workshops in specific domains (works of art, the hand-written
record, film, sculpture, architecture, archeology, etc.)
·
Domain training of technologists
·
Focus on narrow technology domain problem assessment and idea
exchange (i.e., “what I’ve tried on this problem and this is what works or
not”)
·
Focus on technological needs of domain specialists (i.e., “this is
what I need, can’t you help me?”)
·
Requires
pre and post work on some large common data sets (which may have limited
distribution)
2) International exchange programs for student and researcher
·
Exchanges
between like computer scientists and domain specialists to establish effective
collaborative teams
NSF stipends (including transportation)
·
for
EU people while in US and US people while in EU (for work in direct support of
NSF grants, agreements, or contracts).
Payments to US institutions required.
·
Similar
EU stipends if feasible
·
Proposal:
Grants for student/researcher exchange program (with simple rules regarding IP
ownership developed by exchangee)
o
Periods
of 3 months to 1 year
o
Family
travel and living allowances
o
Some
common use of code or data a result, if only in experiments and papers
3) Shared testbeds of significant cultural and historical materials
·
Focus
on common (i) technology, (ii) corpus or (iii) application in each of the joint
testbeds
·
Partnerships
may be from same or different of the above categories
·
Most
preferable are those between domain specialists (corpus, application) and
technologists
o
Applying
new content analysis or CBIR technology to domain-specific corpus; validating
them “at scale”
o
Usage
and user studies
o
Evaluation/testing
of data exchange and display standards
·
Proposal:
Joint/common testbeds to be central theme for cooperation
o
Common
corpus with agreed upon mutual IP access (which may be otherwise limited)
o
Common
technology might be analysis, descriptive metadata, etc.
4) Bilateral researcher-to-researcher projects
·
Funding under single grant (NSF or EU) or common proposal,
independently funded by NSF and EU
·
Domain specialist-to-technologist (preferred) or
technologist-to-technologist
·
IP problems similar to joint projects (institutional), for
research and practical development of sophisticated GUI’s.
5) International benchmarking competitions
·
National
Institute of Standards and Technology (NIST) Text REtrieval Conference (TREC) –
team entries
·
US/EU
group pairing (preferred) or national/regional team competition
·
Proposal:
Grants for collaborative technologist/specialist participation in NIST TREC (or
comparable benchmarking) with travel grants for cooperating researcher visits
and attendance at yearly meetings
Among the possible opportunities
listed above for joint activity between the cultural and computer science
communities, a primary focus should be to stimulate the maximum amount of
interaction as possible between the communities. To this end two specific mechanisms stand out. First a set of image testbeds is very much
needed that can drive computer science research in Content Based Image
Retrieval that is specifically directed to meaningful applications in the humanities. A set of workshops to explicitly design and
gather these datasets would make an enormous contribution towards furthering
research of digital imagery of significant cultural and historical materials.
Mutual training workshops and a
common set of image testbeds could stimulate specific efforts to write the kind
of code necessary to develop truly accessible digital libraries of cultural and
historical materials. These efforts
might usefully be set up as search-and-retrieval competitions using multi-cultural
image datasets and judged by a joint team of citizen user, cultural and historical scholar, and
computer and information scientist. The joint evaluation team would
inspire the computer scientists to develop solid automatic retrieval
algorithms; the cultural/ historical scholar would insure that meaningful
cultural questions are tested; the citizen user would provide a necessary dose
of serendipity and a measure of broad and unpredictable applicability.
6. SUMMARY
The long-term outcomes of a
successful program of collaborative digital imaging research and collection
development will well serve three communities of interest: the citizen user, the cultural and
historical scholar, and the computer and information scientist.
For the universal citizen user it will provide:
·
The
ability to recreate the experience of getting to know an historic artifact in
the simulated environment of its original place at the time from the
convenience of a desktop.
·
Remote
accessibility that enables one to see the great creative works in settings,
detail and perspective unavailable even to the local viewer; and understand its
history, context and relevance.
·
Information
facilities that enrich education, enhance cross-cultural understanding and
sustain one’s heritage and cultural diversity.
For the cultural and historical
scholar it will enable:
·
Routine
use of machine image understanding technology. Automated classification of
content, similarity search, and semantic retrieval will be standard database
functions, useable by domain specialists without the aide of computer
scientists and programmers.
·
Capability
and functionality for imagery search and summarization at least as rich and
easy to use as that for textual sources.
·
Understanding
of art and artifacts will be technically deepened by sub-visual analysis and
simulated reconstructions.
·
Historical
understanding and relevance will be enriched by a broader visual context and
electronic visualization.
·
Integration
of descriptive markup by domain specialist
For the computer scientist, the
grand challenge of determining image semantics and automatically verbalizing it
will be significantly addressed. With
useful accuracy in a range of domains, systems will be able to
·
Describe
objectively what a picture appears to say or mean,
·
Describe
what actions happen or what events occur in a video scene,
·
Elaborate
on its context by linking to the global information space through automatically
generated visualizations in time and space.
The development and application of digital imaging technologies, combined with the increased information content understanding and accessibility to unique kinds and sources of visual data, will prove relevant to the many applications that serve the multiplicity of our society’s needs, including those crucial to