Available information parts inside this document:

Internal Information

Main Referee:

Norbert Gerfelder, Kerstin Pipke

State of Entry:

Complete

Remarks:

General Information

Type of Document:

SMPTE Task Force Report

Document Title:

Report of the Task Force on Digital Image Architecture

Date of Publication:

September 1993

Publisher:

SMPTE
595 West Hartsdale Avenue
White Plains, NY 10607-1824
USA

Primary Source / Published in:

SMPTE Publication
Report of the Task Force on Digital Image Architecture
September 1992

No. of Pages:

50

Source of Supply:

SMPTE
595 West Hartsdale Avenue
White Plains, NY 10607-1824
USA
Phone:    (+1) 914 761 1100
Fax:      (+1) 914 761 3115

Report of the Task Force on Digital Image Architecture

SOCIETY OF MOTION PICTURE AND TELEVISION ENGINEERS (SMPTE)

595 West Hartsdale Avenue
White Plains, New York 10607

Table of Contents

PREFACE

For some time, the communities participating in the standardization activities of the Society of Motion Picture and Television Engineers have considered the role of television in the future of visual communications. In recent years, the debate has been joined by members of other communities affected by convergence of communication and imaging technologies enabled by a common of digital building blocks.

The emergence of digital coding as the common language of visual communications may fundamentally change our view of the world. The extent to which this common language will affect life in the 21st century may be even more profound than the effect that the medium of television has had on life in the 20th century. Television has provided a window to the world - often real-time - for many of the 5,4 billion inhabitants of this planet. This medium of cultural an information exchange has enabled previously isolated populations to join an emerging global village - one increasingly free of barriers. The common digital language offers a unique opportunity to leverage converging technologies, such as television, computers an telecommunications, into a global communications network. Such a network would have the potential to offer a vastly augmented range of services to all system users, thus opening up new markets to all of the affected equipment and service providers.

Worldwide, there is a growing consensus that the time has come to develop standards for the television systems based on a new paradigm - appropriate for today - with forethought to future requirements. The introduction of digital technology into imaging industries, together with the widespread introduction of digital communications, creates a window of opportunity to establish a digital image architecture with unprecedented freedom of application and interconnection.

This report examines some of the fundamental issues that must be addressed in achieving a compatible set of standards enabling a globally interconnected and interoperable visual communications network. The essential concepts for this family of standards include: an open (non-proprietary) system architecture, interoperability, scalability and extensibility. It is hoped that this Report will stimulate the interest of many groups and organizations involved in the establishment of imaging standards, today and in the future, and lead to agreement on a single system, flexible enough to accommodate a wide variety of needs, while enabling worldwide interoperability.

The report was prepared by the SMPTE Task Force on Digital Image Architecture and is responsive to the Work Assignment, dated April 1991, which established the following objective:

The Task Force, early in its considerations, identified the need to expand the Work Assignment to include all relevant aspects of digital image systems - acquisition, processing, storage, transmission, reconstruction and display - and to consider systems across a much wider range of resolutions than previously planned. This was agreed to and is reflected in this Report. Requirements and constraints noted in SMPTE/IEEE/ATSC cosponsored digital system information exchange meetings have also been incorporated as appropriate.

The Report is, in essence, the outcome of a feasibility study concerning the creation of standards for digital image systems that are scalable and extensible, effecting a high level of interoperability between a diverse range of industries and applications. The work is, as yet, incomplete; however, it has already established an important though preliminary basis for a family of digital imaging standards. The Report raises many new questions and identifies additional work required to refine the concepts that form the basis of a digital image architecture. Of particular importance will be the selection of source and display refresh rates to provide performance and economic compatibility with today's television systems.

The concepts outlined can provide a basis for a modular open system architecture, in which the parameters and characteristics for each module, and the interfaces between these modules, are clearly defined and in the public domain.

Such a system should use common standard components to serve diverse needs across all affected industries. It should enable the movement of image data across application and industry boundaries without degradation and with minimum complication. This is interoperability.

Such a system should also provide the ability to adjust image parameters - temporal and spatial resolution, colorimetry and dynamic range - by varying the amount of data that is stored, transmitted, received, or displayed. This is scalability.

A digital image architecture must give forethought to evolution - to incorporate advances in technology within any module, without changes to any other module. It must be backward compatible with today's systems, and forward enabled to accomodate the technology explosions of the 21st century. This is extensibility.

The Report was prepared by a Task Force chaired initially by David Trczinski (PictureTel) and latterly by Dr. Will Stackhouse (Jet Propulsion Laboratory), with a wide participation from the computer, television, post production and telecommunications industries. A detailed list of the membership follows. The Report was considered by the SMPTE Standards Committee at its meeting of August 13th, 1992 and subsequently adopted after an in-depth review.

List of Members and Participants

Members

Will Stackhouse, (Chair)          JPL
Walter Bender                     MIT
Craig Birkmaier (Editor)          PCUBED
Rita Brennan                      Apple Computer
Wayne Bretl                       Zenith
Barry Bronson (Co-chair)          Hewlett-Packard
Ken Davies (Ex  Officio)          CBC, SMPTE
Gary Demos (Co-chair)             DemoGraFX
Hugo Gaggioni                     Sony
Bill Glenn                        Florida Atlantic University
Bob Keeler                        AT&T, Bell Labs.
Thomas Leedy                      NIST
Peiya Liu                         Siemens
Lee McKnight                      MIT
Robert Powers                     MCI Telecomms
Tom Meyer                         Duir Assoc.
Alan Reekie                       European Community (CCE)
Richard Solomon                   MIT
Arpad Toth                        Kodak
David Trzcinski                   PictureTel
Mitchell Wade                     DemoGraFX
Ken Yang                          Ampex

Participants

 
Stan Barron                       NBC, SMPTE
Si Becker                         SMPTE
Rex Buddenberg                    Consultant
Robert Burroughs                  Panasonic
David Carver                      MIT
Peter Dare                        Sony
Phil Dodds                        IMA
Charles Fenimore                  NIST
David Fibush                      Tektronix, SMPTE
Paul Fleischer                    Bellcore
Branko Gerovac                    DEC
Barry Gilbert                     Mayo Foundation
Christopher Hamlin                Apple Computer
David Herbine                     NADC
Clark Johnson                     Consultant
Thomas Leeder                     NIST
Bijoy Khandheria                  Mayo Foundation
Edward Krause                     General Instrument
Arvid Larson                      IEEE-USA
Derrick Lattibeaudiere            Panasonic
Richard Lau                       Bellcore
Bernard Lechner                   Consultant
Michael Liebhold                  Apple Computer
Henry Meadows                     Center for Telecomm. Research
Francois Michaud                  CBC
Marvin Mitchell                   Mayo Clinic
Robert Morrow                     USAF Academy
Robert Myers                      Hewlett-Packard
Suzanne Neil                      MIT
Bruce Penney                      Tektronix
Ken Phillips                      Citicorp
Ed Post                           Quark
Charles Poynton                   Sun
Glenn Reitmeier                   DSRC
Robert Sanderson                  Kodak
William Schreiber                 MIT
Scott Silver                      Tektronix
John Sprung                       Viacom
David Staelin                     MIT
Peter Symes                       Grass Valley Group
David Tennenhouse                 MIT
Greg Thagard                      CST
Mark Urdahl                       IBM
John Weaver                       Liberty Television
Merrill Weiss                     Consultant

International Participants

Norbert Gerfelder                 Fraunhofer Computer Graphics 
		  ISO/IEC
Rainer Hofmann                    Fraunhofer Computer Graphics 
		  ISO/IEC
Detlef Kroemker                   Fraunhofer Computer Graphics 
		  ISO/IEC

1.0 Executive Summary

The SMPTE Task Force on Digital Image Architecture was charged with developing and proposing a structure for a hierarchy of digital image standards that would facilitate interoperation of image systems. The major objective was to establish the basis for image systems that are open, scalable and extensible, thus meeting the perceived needs for image communications in the environment likely to exist as computers, television and communications converge, enabled by pervasive digital technology.

The Task Force, formed from representatives of the affected industries and applications, has examined the issues, setting out those that are believed critical at this time, and has modelled, for discussion, further refinement and testing, one possible approach that meets the basic requirements. It has also produced extensive tutorial information concerning the matters under consideration.

The Key Concepts of the approach are defined in Section 2, setting the conditions for image systems that are:

Such systems would be based on a hierarchy that is:

Current and future image systems are presented and analyzed in Section 3.0 of the Report, which also states the main objectives of the Task Force activity:

Section 3.0 of the report establishes the fundamental concepts upon which a model for an open digital image architecture can be constructed, taking into consideration the objectives defined above.

Section 4.0 details the critical issues in the development of a suitable image architecture meeting the stated objectives:

It is believed that this approach will result in systems that achieve a good level of compatibility with current television and imaging systems, while placing a minimum of constraints on the path to the future (extensibility).

A model of an open architecture approach to image standards is developed in Section 5.0, one that is both compatible with the present and extensible to the future. It is based on a low order hierarchical approach, using image tiles. The model defines four levels of resolution and takes account of a number of possible aspect ratios currently in use. Additional analysis is provided regarding the selection of an appropriate family of image acquisition rates and display refresh rates. Finally a scalable coding approach is proposed that offers the ability to produce image data in packages that can be combined to produce images at a variety of spatial and temporal resolutions.

The Task Force is expected to be of interest across a wide range of industries and applications. Section 6.0 examines the industries likely to be most affected, their specific imaging needs and the possible impacts of a defined digital image architecture.

In Section 7.0 the Task Force suggests additional work that must be completed, to move towards a full implementation of the of the digital image architecture. The list of suggestions included in Section 7.0 is not exhaustive; it is recognized that in the process of validating the architectural concepts, additional areas for further analysis will be identified. An extensive list of questions is included which should be considered in the process of establishing standards for an architecture.

The suggestions include the following items of high priority:

A considerable amount of background and tutorial material was developed during the preparation of the Report. Some of it is believed to be of value generally or for reference in future work on the development of the digital image architecture. This material is included in Section 8.0:

2.0 Key Concepts

2.1 Introduction

As a starting point in the process of developing and communicating the requirements for a digital image architecture, it is important to establish a clear definition of the key concepts upon which the architecture is to be based. In many cases, existing definitions must be enhanced to bridge the gap between current practice and future requirements embodied in the architecture.

Two reference documents were utilized in the process of creating the definitions which follow:

Definitions obtained from the IEEE Dictionary are presented in "quotations" - the provide a reference point for the expanded definitions developed by the Task Force. Definitions presented in the Report of the SMPTE Task Force on Headers/Descriptors proved to be incomplete for the needs of this report, due to the expanded Work Assignment for the Task Force on Digital Image Architecture. While the definitions in this Report are consistent with the earlier work of the Task Force on Headers/ Descriptors, they provide an expanded understanding of the key concepts for a digital image architecture.

2.2 Digital Image Architecture

A system architecture defines "the structure and relationship among the components of a system".

One of the major objectives of this Report is to define a system architecture which promotes sharing of images an equipment across applications and industry boundaries. To achieve this goal, the digital image architecture must be high flexible to deal with a variety of diverse requirements, including the evolution of technology.

A Digital Image Architecture should be an open system, that is, one made up of functional modules with standard, public interfaces which can be assembled into a functional system "a set of interconnected elements constituted to achieve a given objective by performing specified functions." Explicit objectives of the architecture include:

The Digital Image Architecture supports both natural and synthetic imagery including:

A key feature of the architecture is that it allows decoupling of the system into functional moduls. The functional modules of the architecture are:

2.3 Interoperability

Interoperability is the sharing of images and equipment across application and industry boundaries. When dealing with digital image representations, this sharing should be facilitated without degrading image quality due to transformations in temporal and spatial resolution, grid geometry, and image aspect ratio.

This requires careful attention to the definition of the interfaces -- the shared boundaries -- between the functional modules.

The key interface definitions are

2.4 Hierarchy

A Hierarchical digital image architecture is one in which various levels of performance are supported:

The architecture is hierarchical in order to address the requirements for scalability and extensibility.

2.4.1 Scalability

To scale is: "To change the quantity by a factor in order to bring its range within prescribed limits."

Scalability deals with the ability of an imaging system to adjust the level of performance by varying the amount of data that is stored, transmitted, received, or displayed -- up to the maximum resolution that was originally acquired. A number of specific definitions are implied:

2.4.2 Extensibility

Extensibility in the design of an hierarchical digital image architecture allows the system to evolve with advances in the underlying technologies so that additional levels of performance can be implemented, without rendering obsolete those existing products that conform to the basic requirements of the imaging hierarchy.

Extensibility implies designing evolution into the system. The transmission and display modules of the system should be cast as building blocks. The building blocks, because of their inherent modularity, may freely evolve over time.

3.0 Analysis of Imaging Architectures

3.1 Establishing a Framework for Analysis

For more than two decades, the application of digital processing techniques has contributed to the evolution of analog composite television systems, especially in the areas of video recording, image processing, and image synthesis. This evolutionary use of digital technology had little effect on the perception of imaging systems; from this perspective many observers believed that digital video would gradually replace analog video without any fundamental changes to the foundation of imaging systems.

However, in the past few years the evolutionary view of imaging systems has been challenged. At the 26th Annual SMPTE Advanced Television and Electronic Imaging Conference, John Watkinson suggested that we analyze the impact of digital technologies from another perspective: "To think that digital technology only impacts the underlying equipment and that otherwise it's business as usual is to miss the larger transformation that is occurring in each of the affected industries."

From Watkinson's perspective, the transition to a new digital imaging architecture represents the opportunity for a new paradigm. Proponents of this position have encouraged system designers to step back and take a global view of the impact that digital technologies are having on every industry that deals with electronic imaging; to think not just in terms of delivering ever-improving levels of image quality, but to consider what being digital really means.

John Naisbett in his 1982 best seller Megatrends: Ten Directions for Transforming Our Lives, stated that new technologies go through three phases as they become part of our daily lives. Applying Naisbett's model to the evolution of electronic imaging systems leads to the following three paradigms:

From the new perspective, being digital deals with the shift to the third paradigm. It is the enabling technology that has made it possible for this Task Force to analyze the requirements for interoperability, scalability and extensibility, and to propose a set of guidelines to accomplish these goals. What are the aspects of being digital that have brought about this transformation in perspectives?

A major factor has been the geometric progression in computer processing capabilities - doubling computational power every two years, with little change in cost or size. This progression is projected to continue well into the next century. As a result, high resolution still image processing capabilities are now within reach of every computer user. Techniques once reserved for high-end workstations are now commonly applied in desktop computing, including the recent addition of full motion video as a data type.

Video has also been a major beneficiary of the technology progression. Production systems that only a decade ago required a six foot rack of electronics can now be implemented in a few rack units - or on a few cards that plug into a personal computer.

The tremendous increase in computational power has enabled another critical aspect of being digital - video encoding based on the use of digital compression techniques to reduce the required data rate. A variety of compression technologies have evolved that remove image redundancy within and between video frames. The required data rate may also be significantly reduced by more efficient coding of the image at the source. Developments of such techniques are progressing rapidly and may become useful in the near future.

While compression technology has existed for many years, and continues to evolve, practical implementations for video have only become possible in the past few years due to the rapid evolution of digital processing technologies. This in turn has stimulated new research into scalable video encoding techniques that will allow multiple levels of image quality to be extracted from a single image data stream. Some observers predict that the processing power required for the decoding of scalable digital video streams will be universal and inexpensive before the end of this decade.

Improvements in data compression perform the same function as increases in bit carrying capacity in the communications system - delivery of more bits to the user. In the past decade, increases in communications capacity of several orders of magnitude have occurred.

In such an environment, the longevity of new equipment purchases may be dependent upon a digital image architecture that is designed with adequate provisions for extensibility. To meet this objective the Task Force has focused its attention on three areas:

3.2 Properties of Human Visual Perception

The human visual system deals with the physical world both in the terms of its ability to resolve image detail (spatial resolution), and changes in the environment (temporal resolution). We experience the world visually by capturing light directly from a source, or as the reflections of light off of objects in our physical environment. The resulting perceptions of the environment are typically described in terms of size, shape, brightness, color, depth, direction, and speed. These qualities arise in the brain's image processing circuitry; essentially they result from a comparison of the acquired visual cues with what we have learned about the world's intrinsic structure.

A research has revealed more about the physiology of vision, prevailing theory has evolved, placing major emphasis on the computational and cognitive role played by the brain and local image receptors. In turn, this research is providing potentially valuable input to the designers of digital imaging systems.

3.2.1 Human Image Acquisition

The human visual system relies on multiple image receptors to deal with the diversity of environment that it encounters: cones are utilized for color image acquisition over a wide range of illumination levels; rods are utilized for monochrome image acquisition over the lower range of illumination levels.

The eye contains approximately two million cones and 120 million rods. The cones are organized into three broad groups of receptors that are sensitive to light in specific spectral bands; while these bands have significant overlaps, they roughly conform to the red, green, and blue portions of the spectrum. Red and green receptors each outnumber blue receptors by a factor of two to one. The dispersion of these receptors is not uniform, thus spatial perception deals with a complex matrix of receptor types and cognitive processing by the brain.

The center of the visual field, an area called the fovea, contains 30,000 to 40,000 cones an no rods. Outside the fovea the density of cones diminishes, interspersed among the high density rods. The cones within the fovea are responsible for high spatial detail perception while the extrafoveal cones and rods play an important role in visual search and influence directed eye movement. Central vision enables use to see detail, while peripheral vision is attuned to change.

Although high spatial resolution vision is restricted to the fovea, the visual system acquires high resolution images over a wide portion of the field of view. This is achieved through involuntary eye movements; high frequency tremor, slow drift, and rapid saccade.

Research has determined that it takes several hundred milliseconds for the eye to acquire a high spatial resolution image, synthesized from a number of overlapping views. Slow drift and rapid saccade are the mechanisms used for repositioning the fovea to acquire these multiple impressions. The tremor appears to be a mechanism to remove high frequency spatial noise. The tremor's oscillation occurs at a frequency range of 40 to 80 Hz over an area approximately equal to the size of a single cone.

Since human vision is binocular, involuntary eye movements also contribute to depth perception: the brain process these overlapping views to obtain differences from which depth and spatial properties are inferred.

The spatial resolution of moving objects is also linked to eye movement:

3.2.2 Human Visual Processing

Much of the research in visual science today is focused on the processing of data acquired by the image receptors. A variety of specialized analyzers in the eye process data from small localized regions and accumulate the results into channels which are processed by the brain to create an integrated view of the physicals environment.

There is evidence that the brain directs the activity of the image receptors for processes such as establishing white balance and light sensitivity levels. Simple localized analyzers are used to enhance the data transmitted back to the brain. Some of these analyzers are sensitive to a particular edge orientation; there are sufficient analyzers at each location to represent a full set of edge orientations. Additional tuned analyzers cover portions of the range of human sensitivity for spatial frequency, spatial position, temporal frequency direction of motion; and binocular disparity.

The data processed by these analyzers moves to the brain through two types of channels; a set of fast responding channels with relatively transient responses to stimuli, and a set of slower channels with relatively sustained responses to stimuli. Transient channels process the output of analyzers that are tuned for low spatial and high temporal frequency stimuli. Sustained channels process the output of analyzers that are tuned for high spatial and low temporal frequency stimuli.

3.2.3 Thresholds for the Perception of Flicker

Above certain frequencies, flickering light sources will appear as a continuous light source. The relevant frequency is called the critical fusion frequency and varies with the level of illumination. Separate flicker thresholds exist for the transient and sustained processing channels.

Transient channels are sensitive to flickering light sources with low spatial resolution; this type of stimulation appears as wide-area flicker and is most noticeable in peripheral vision. At low levels of illumination (where rod vision is used) flicker fusion occurs at frequencies of only a few Hz; as the level of illumination increases and cone vision is triggered the fusion frequency increases.

Flicker from low light level sources such as a television or movie screen typically disappears in the range of 20 to 60 Hz. As screen size increase, taking up a larger portion of the field of vision, or if screen brightness increases, the frequency for flicker fusion increases.

Sustained channels are sensitive to flickering light sources with high spatial resolution; this type of stimulation appears as small area-flicker, often associated with moving objects. In this case the flicker fusion frequency can be much higher than for wide-area flicker; this form of flicker manifests itself as strobing of the object.

An excellent example is found in the single pixel horizontal lines often used in computer graphics. These lines do not appear to flicker on a progressive scan computer display which is refreshed at rates above 60 Hz; but if the same image is presented on an interlaced video display the single pixel lines are presented in every other field (at 30 Hz) and they flicker. This is due to the fact that the persistence of the display phosphor is of shorter duration than the refresh rate; higher scanning rates (either progressive or interlaced) eliminate the flicker.

3.2.4 Tuning Electronic Imaging Systems to Match Human Visual Perception

Our improved understanding of human visual perception together with an exponential improvement in electronic image processing techniques has set the stage for the design of a new digital image architecture.

In order for a new digital image architecture to be interoperable it must deal with existing imaging technologies. This requirement can place many constraints on the design of the architecture. It is important to understand the reasons that these constraints exist to determine if the new architecture must be similarly constrained.

3.2.5 The Elimination of Flicker on Scanning Displays

Information is presented in Section 4.3 which suggests that the refresh rate of scanning CRT displays should be linked to the field of view and brightness of the display. Lower refresh rates are acceptable when the display covers a narrow field of view, as is the case with our existing analog composite video delivery systems. Lower refresh rates are also acceptable for a display with a wide field of view at low brightness levels; typically this type of display requires a viewing environment with low ambient light levels such as a theater.

As the display covers a wider field of view at higher levels of brightness, the refresh rate must be increased to eliminate wide-area flicker. If information with high frequency edges such as computer generated text and graphics, is presented on the display it must also be refreshed at a higher rate. The computer industry uses progressive scanning with refresh frequencies above 60 Hz to eliminate flicker, larger display (>=16 inches diagonal) are typically refreshed at 72 or 75 Hz.

The same requirements for the elimination of wide-area flicker are now starting to influence the development of display systems for home entertainment. At the higher end of the home entertainment market it would be desirable for displays to provide a 50 degree field of view, and be viewable at normal room ambient light levels. Such a display has resolution and refresh requirements nearly identical to a large personal computer display.

3.2.6 Constraints that Dictated the Use of Interlace Scanning Technologies in Analog Composite Video Systems

Several factors influenced the decision to use interlace scanning techniques for acquisition and display when our composite video systems were designed:

Both interlaced and progressive scanning were evaluated; interlace proved to be the best solution to reduce signal bandwidth and minimize flicker in the display.

3.3 Models for the Design of an Imaging Architecture

As the design of a new digital imaging architecture is approached, it is important to take into account of all the applications and industries that may utilize the architecture as well as the economic contributions of each in the development and purchase of the system components (see Section 6.0). Experience with analog television has amply demonstrated the value of "economies of scale".The opportunity now exist to design an open digital imaging architecture that is based on generic, inexpensive, and increasingly powerful processing elements.

The choice of a Digital Image Architecture has implications that reach far beyond the normal realm of standards-setting activities. Telecommunications, television, and computing have made major impacts on life in the 20th century -- their integration is likely to have a profound affect on the way that the world communicates, is educated, works, plays and relaxes in the next century.

3.3.1 Components of the Model - Resolution

The level of resolution perceived by a viewer is a function of the distance of the viewer from the display. Thus, to design a digital image architecture that provides constant perceived resolution across applications that involve different viewing distances (e.g., close for a computer display, further away for a conventional TV screen, still further for a large flat-panel display), the system must be scalable in terms of image resolution.

In addition to holding perceived resolution constant under varying viewing distances, it is considered desirable to provide even greater resolution in some applications, as discussed below and as implemented in current proposals for advanced television systems.

While it would be desirable to design an imaging architecture in which resolution could be scaled in a continuous fashion, a hierarchy based on a progression of related image resolution levels can provide similar benefits to system designers and simplify the process of interoperation. Section 5.2 and Section 5.3 provide a detailed analysis of the variables that affect the perceived resolution of a display and illustrates the principles of a hierarchical digital image architecture with a progression of four image resolution levels.

Throughout this report, the concept of a multi-resolution hierarchy will be discussed and refined. The Task Force has constructed a model to facilitate this discussion. It is recognized that many different sets of numbers can be used within this model. Four levels of resolution have been identified and defined; additional levels can be added to the progression, as enabling technologies allow support for higher levels of resolution. The four levels in the model are:

The concept of interoperability first appeared in the early days of television because of the need to integrate film material into the television program content. Unfortunately, film and video were in many respects incompatible. Elaborate shuttering mechanisms were developed for the telecine to make it possible to display film in the world of video; thus the concept of interoperability was born. For 525/60 the compromise was the use of 3:2 pull down, to accommodate the change from 24 to 30 fps (frames per second). The solution for 625/50 was easier - a 4% speed change, playing programs acquired at 24 fps at the television rate of 25 fps.

The evolution of electronic image acquisition systems has been driven primarily by the mass market transmission standards -- NTSC, PAL and SECAM. New applications for video such as professional and personal video systems have been enabled through the economies of scale associated with these standards.

Thus, applications which require higher resolutions than those offered by NTSC, PAL and SECAM have either been forced to bear the expense of system development and low volume manufacturing - a luxury primarily reserved for the military - or to wait for the next imaging standard to evolve. It is interesting to note that the equipment developed for the various analog HDTV systems has seen extensive use in professional applications that need the added resolution afforded by these systems.

3.3.2 Components of the Model - Acquisition, Transmission and Display

The path through which an image passes from capture to display may involve as many as five major steps as shown in Figure 3.1. These steps are discussed in detail in Section 6.2.3 of this Report. As the imagery moves from one step to the next, it may be stored at one of several quality levels:

The first two steps are

and typically require production quality storage to preserve as much of the original imagery as possible for subsequent processing. After the processing steps have been completed the imagery may be stored at a lower level of quality for release to the distributor of the imagery; this is often referred to as contribution quality storage.

Delivering the imagery to the consumer typically involves the third step,

The imagery may be encoded and stored at a lower level of quality to conform to the transport characteristics; we refer to this as distribution quality storage.

Finally, the imagery must be decoded for display, requiring

The consumer may store the imagery for viewing at another time; this also requires distribution quality storage.

Some of these steps tend to be grouped with a specific level of storage quality, as illustrated in Figure 3.1. This allows a further simplification of the model based on three major system components - ACQUISITION, TRANSMISSION, and DISPLAY..

3.3.3 A Closed Architecture Model - Analog Composite Video

The transmission standards for the existing composite video systems frequently require all of these components to operate in close synchronism. The display is synchronized with live or taped program material that feeds the transmission system. Imagery acquired at other spatial or temporal resolutions requires conversion into the spatial and temporal specifications of the transmission standard. Such an architecture is depicted in Figure 3.2.

The advent of video recording provided a degree of decoupling of acquisition from the other components, allowing program producers to create program content without real-time constraints; however, transmission and display remain tightly coupled. Recording media for program content have typically been coupled to the transmission standard to take advantage of the bandwidth reduction techniques applied in the system. The design of consumer VCRs is based on compatibility with the transmission standard; packaged media played by the VCR must therefore conform to the same standard.

While interoperability between the various analog composite video systems has had to overcome differences in frame and line rates, these systems have been remarkably extensible. The acquisition, transmission and display components and the associated services of the system have evolved continuously over the past fifty years.

With the introduction of analog component video recording and processing systems in the '80s the video industry took a major step toward completely decoupling acquisition from transmission and display. The production community soon discovered the advantages of this decoupling.

By using analog component equipment for both acquisition and production, it became possible to edit video without concern for the multi-field color framing sequences that exist in subcarrier encoded composite video systems. Producers also discovered that fewer artifacts were introduced when layering video using component vision mixers and digital video effect systems. Decoupling of acquisition and production equipment from the encoded transmission standard produced far better results than could be achieved with composite video acquisition and production equipment - and the same video recorders also produced encoded outputs for transmission of the program.

3.3.4 An Open Architecture Model - Digital Hierarchies

In the '80s, the publishing industry experienced the collision of analog and digital technologies. Today, interoperability of media in the publishing industry is the rule rather than the exception, as digital image and document processing techniques, generally categorized under the umbrella of Desktop Publishing, have replaced traditional analog techniques.

To a large extent, the transition from the analog representations of printed media - type, line art, halftones, and color separations - to their digital counterparts, has been enabled by the use of scalable hierarchies for the acquisition, transmission, and display of printed materials. The tools for acquisition and production of print media have been separated from the display hierarchy, allowing output at the desired level of resolution.

Electronic transmission is also beginning to play a major role in the publishing of documents. Compact representations of printed media using page description languages, have allowed high quality print representations to be moved efficiently through the telecommunications network using low data rate modems. Remote printing of documents on fax machines or networked printers is commonplace.

The desktop publishing metaphor has been used as a model to predict similar transitions in other media industries, most notably Desktop Video. However, the transition has not occurred at the pace that many industry pundits have predicted. This is due, in large part, to the difficult task of breaking the problem up into manageable components. That is, to create separate hierarchies for acquisition, transmission, and display of motion imagery.

Interoperability of video systems with other media is facilitated a complete decoupling of the acquisition, transmission and display into separate hierarchies for each component. Such an architecture is depicted in Figure 3.3. Scalable representations of video will be enabled by this decoupling, and technological advances in one hierarchy can take place without upsetting the apple cart in the other two.

If a hierarchical digital imaging architecture is used as the model, a Digital Advanced Television System can be implemented that is equally adept in delivering low cost solutions that conform to single hierarchy, as well as more expensive scalable solutions that support multiple points in the hierarchies.

The acquisition hierarchy can provide image capture solutions at various price/performance points that are appropriate for the application. Production systems can evolve that deal with single image formats, or multiple formats within the hierarchy. This is of particular importance to producers of program content with significant archival value. Imagery can be captured at a higher level in the acquisition hierarchy with an eye toward distribution at one or more of the lower levels of the transmission hierarchy; the archival value of the program is protected as it can be released at higher quality levels in the future as consumers purchase products at a higher level in the display hierarchy.

Viewing transmission as a hierarchy is critical to the concept of interoperability. A hierarchical imaging architecture would support a progression of image quality levels that are interoperable and extensible, and allow for incremental improvements in image quality within a single transmission standard. This requires the use of a scalable encoding structure; a core image would be encoded at the first level of the hierarchy, and enhancement information would be encoded for each of the higher resolution levels supported by the transmission standard.

A scalable encoding structure may be more difficult to design and possibly less efficient for a given quality level than an encoding designed specifically for that level. It has, however, several advantages that will accrue over time:

The economic benefits associated with scalable image encoding will be significant. The emerging consensus among experts in video compression technology is that scalability will carry a minor penalty for encoding overhead. Consider the impact on media server storage systems: a single scalable representation will make more efficient use of storage than multiple copies of the same material at different scales.

The display hierarchy allows for a variety of products to evolve at various price/performance points that are appropriate for the application. Some display systems will evolve to single performance levels while others will offer multiple levels of performance within the transmission and display hierarchies.

Scalability plays a major role in the design of decoder and display components. If the transmission system delivers a scalable payload, only that portion of the information which is required for the display system need be decoded. A small personal information system may only need the low resolution component while a high-end home entertainment system can utilize all of the resolution components.

3.4 Factors That Have the Potential to Fundamentally Change Digital Image Architectures

Real world constraints, especially with respect to cost versus performance, are the driving factors in the implementation of a digital image architecture. In determining the requirements for the architecture, the Task Force has analyzed the current market situation as well as technology and regulatory trends that may reshape the market in the next few decades.

3.4.1 A Shift in the Pricing Structure of Broadband Telecommunications

Changes in the regulatory climate are likely to cause increased competition among all networked service providers (telcos, cable, data networks, etc.), and encourage service providers to upgrade the quality and capacity of these networks.

The current pricing structure for broad band telecommunications is typically based on channel bandwidth - the purchaser uses and pays for the entire channel regardless of the amount of information moved through it. In the future, greatly increased channel bandwidth and packetized encoding schemes using headers/descriptors for packet identification, will cause a shift in pricing structure - the purchaser will pay only for the information content that moves through the channel. This concept when applied to video services has been described as pay- per-view-per-bit.

This shift in pricing structure is likely to act as a catalyst for the rapid evolution of video compression techniques and transmission standards, with an emphasis on two areas:

During the transition to wide bandwidth communications channels, data rate reduction will be the driving force as the cost per bit will be relatively high. As the cost per bit declines, the emphasis will shift to scalability. This will be due largely to the market advantages of maintaining a single data file that can be delivered to a wide range of users at different levels of the display hierarchy.

3.4.2 Programmable Decoders

Another major trend that is anticipated is the evolution from fixed single standard decoders to programmable decoders that can adapt to scalable image representations. Single standard decoders will be used primarily for devices that tap into the communications network and deal only with one type of image representation. Programmable decoders will deal with families of standards. Fax machines serve as a good example of single and multiple standard encoder/decoders. The Group 1 fax standard provided a single level of resolution; machines were expensive and their use was limited. With the addition of Group 3 fax standards, multiple levels of resolution were supported, including the older Group 1 format. Due to advances in technology, the new machines were better and cheaper, yet compatible with the existing Group 1 machines. The marketplace responded in a very positive manner.

Programmable decoders will be the key component in providing extensibility to the digital imaging architecture. Because of the diversity of image compression standards (Group 3 fax, H.261, JPEG, MPEG, DVI, etc.), these decoders will play an important role in the integration of video and high resolution imaging with desktop computer workstations. This same diversity, with the addition of a digital television standard (or standards) will lead toward the use of programmable decoders in home entertainment and information delivery systems. Essentially fixed solutions will drive the low end of the market, providing inexpensive mass market consumer products, while programmable solutions will dominate at middle and upper levels of the transmission and display hierarchies.

3.4.3 Trends in Display Technology

The use of scanning CRT display technology for certain applications is expected to decline over the next decade as LCD based direct-view displays and projection systems are perfected. LCD displays are used extensively today in portable computers, and LCD light valves for high resolution projection are showing great promise. In a light valve, the LCD is used to control the amount of light - from a flicker-free light source - that can pass through each pixel location; since the display is no longer the light source, significant improvements in brightness can be achieved.

The characteristics of LCD displays are significantly different from flying spot scanning CRT displays. Flying-spot systems must operate at refresh rates above the critical frequency for flicker fusion; display brightness is limited since the spot is the only source of illumination (most of the display is decaying at any point in time).

Every pixel in an LCD display receives constant illumination. LCDs can be characterized as having long persistence; in fact, a significant design challenge has been to provide faster pixel response to deal with full motion video. This has been accomplished through the use of a transistor at each pixel location (an active matrix display), providing rapid response for pixel replenishment.

The nature of the active matrix circuit also allows a pixel value to be held for at least one second without replenishment, giving the display characteristics similar to a frame buffer. Direct addressing of each pixel location would make it possible to update only those pixels which change from one refresh period to the next. Transmission systems that utilize digital compression techniques to eliminate interframe image redundancies may take advantage of these aspects of LCD displays to implement conditional replenishment.

3.4.4 Conditional Replenishment

A significant portion of the data rate reduction achieved by digital image compression techniques deals with the elimination of interframe redundancies. In essence, much of the complexity, and hence the cost, of these encoding systems involves the processing required to analyze motion image data streams to determine which pixels have changed between temporal samples.

Over the next 10 to 15 years image acquisition and display technologies are likely to move to conditional replenishment. Image acquisition systems may evolve with on-board digital processing to implement conditional image acquisition. These cameras will be programmable, offering several advantages over scanning cameras that continuously update the entire image raster, including the ability to:

Future display technologies are likely to evolve around direct view displays (possibly LCD) offered in different pixel densities. Direct addressing of LCD displays will allow the use of conditional refreshment of only those pixels that change from one refresh period to the next; the display itself may become the frame buffer, allowing portions of the image to be updated at different temporal rates. Or, combined with an appropriate multi-ported frame buffer design, such a display could support multiple temporal refresh rates simultaneously for different image streams.

4.0 Critical Issues

4.1 Introduction

The Task Force has identified seven issues that are considered critical to the achievement of the objectives. Many of these issues are, by their nature, complex.

Backward compatibility to existing systems and extensibility to future systems present many technical challenges. The greatest challenge lies in preserving the value of existing infrastructures while enabling an orderly transition to the new architecture. For example, immense investments have been made in the aquisition and transmission infrastructures of our existing NTSC, PAL and SECAM television systems. Likewise, billions of consumers have invested in receivers and video recorders that support these systems. It is equally critical that investment in the vast archives of information and entertainment programming that exist today on film and video be protected, and that the new architecture unlock the economic potential of these archives.

In deliberating on these critical issues, every effort has been made to balance the interests arising from those investments with the future benefit to all of a single global standard. These deliberations have also taken into considerations the installed based of computer, medical, engineering and scientific imaging systems, and the diverse applications for still imaging in electronic publishing, visual databases and communications. Existing systems that demonstrate interoperability and extensibility - including some which have in fact been extended - were considered. Examples include the French Minitel system and the family of international facsimile standards.

The seven critical issues are:

4.2 The Establishment of Scalable and Interoperable Hierarchies for Basic Image Parameters

An ideal digital image architecture would allow the following image parameters to be independently varied, over a range of appropriate values:

While this independence may be technically feasible within the fifty year life span desired for the first digital imaging architecture, it does not appear to be practical for immediate implementation, nor is it required. The choice of an appropriate hierarchy for each these parameters can provide adequate degrees of freedom for system design, while facilitating affordable, high quality transcoding between the levels in each hierarchy.

Scalable and interoperable hierarchies offer many benefits when communications channel issues are considered. Such an approach promotes effective utilization of existing communications channels and the development of new broad band communication services. The lower levels of the hierarchy provide solutions for the capacity constrained channels that exist today. The introduction of new broad band communications services will enable the use of higher data rates to support the improved performance available at higher levels in each hierachy.

A digital image architecture that provides interoperability across applications with different spatial resolution requirements must be scalable in terms of resolution as discussed in Section 3.3. Interoperability also requires a family of related image acquisition and display rates. The greatest benefit, in terms of cost and simplicity, is gained when the display operates at the same rate as, or an integer multiple of the image acquisition rate. Though more expensive to implement, the greatest performance benefit is gained when motion compensation techniques are used in encoders/decoders to create in-between frames for display. Section 5.4 discusses the requirements for such a family.

To facilitate this hierarchical approach to a digital image architecture a scalable approach to image coding is required. Furthermore, improved techniques for video compression are likely to be enabled by the geometric progression in computational hardware. The design of the architecture must make provisions for this progression. Section 5.5 discusses the use of scalable coding algorithms.

4.3 The Establishment of an Appropriate Relationship Between Image Acquisition and Display Refresh Rates

In early discussions about the use of digital codings for HDTV systems, it became clear that receivers would likely need one or more frame stores to implement image decoding. This prompted the idea that image acquisition rates could be decoupled from display refresh rates - the display could be refreshed at a rate that is an integer multiple of the acquisition rate. For this reason the questions of image acquisition rates and display refresh rates will be considered separately.

No topic generated as much discussion in the Task Force as image acquisition and display refresh rates. This is due in part to the diversity of rates that exist in the standards and resulting practices within each of the affected industries. The issue is further complicated by the evolution of television down parallel paths with respect to field rates. Their harmonization will require solutions that lie in the realm of digital technology as well as the realm of politics and negotiation.

The choice of an image acquisition rate is a tradeoff between motion rendition and the resulting data rate. The following considerations are important in establishing a family of acquisition rates.

There are many factors affecting the choice of a display refresh rate including:

Refresh rate will be determined by the above criteria and price/performance requirements established by market factors.

Experience has shown that for wide-screen CRT displays of high brightness, a refresh rate in the region of 72 to 75 Hz is required to achieve tolerable levels of wide-area flicker (see Section 3.2.5). In some situations refresh rates in excess of 100 Hz may be desirable. Receivers which operate at 100 Hz (double the normal 50 Hz interlaced scan rate) are being introduced in the 50 Hz market; rate doubling receivers operating at 120 Hz are also being developed for the 60 Hz market.

The relationship of display refresh and image update rates shoul be based on a progression that permits non-interpolative transformations between the acquisition and display rates in the new architecture (i.e., display at integer multiples of the image update rate). As an example, theatrical display of film is usually double or triple shuttered to minimize wide-area flicker of the display.

Further research into the choice of a single family of acquisition rates and display rates is required. An appropriate interoperable family should include a 24 or 25 fps image acquisition rate which would enable a 72 or 75 Hz display refresh rate. This is the subject of further discussion in Section 5.0, Section 7.0 and 4.4 The Use of Square Sampling Grids (Square Pixels) The computer graphics, image processing, and publishing industries have adopted the use of geometrically square pixel sampling grids (frequently simply referred to as square pixels). The use of square pixels facilitates:

Early on, the computer graphics industry sought ways to insulate applications from variations in display technology. Support for different pixel configurations required run-time transformation of all graphical objects. Even then, applications rarely looked the same from display to display because different pixel configurations caused a variety of artifacts. These stopgap measures constrained functionality, reduced performance, an added cost to equipment and services. Ultimately, this approach failed.

Instead, computer graphics gravitated towards a common display technology based on square pixels. This simplified system design, which led to lower cost and better performance, enabled equipment and services to be used as commodities across a broad set of industries. Today the computer industry is a major consumer of displays, second only to consumer television receivers.

The use of a common pixel geometry eliminates the need for interpolative resampling when sharing imagery among all users. Resampling has two costs:

Thus the adoption of a common sampling grids is a key issue for discussion and resolution in the SMPTE work towards the specification of a digital image architecture.

4.5 The Establishment of Appropriate Representations for Colorimetry, Dynamic Range and Transfer Characteristics

The concepts of interoperability, scalability and extensibility apply not only to the sampling of the image but equally to the expression of its brightness and colorimetry. The digital image architecture must deal appropriately with dynamic range and colorimetry requirements of the acquisition (including processing), transmission and display modules of a system. During the acquisition and processing of imagery (for example post-production) image data that may not be required by the human visual system or reproducible on a given display may be required by processing hardware for optimal results. Similarly, the architecture must accommodate image exchanges between systems having differing dynamic range and colorimetry characteristics. The essential issues are summarized in the sub-sections which follow.

4.5.1 Extensibility

Existing image systems can reproduce only a limited range of the colors visible in the real world, often restricted to those corresponding to illuminated objects and the specific needs of the application. The colorimetry of television is currently confined to that of the display device. Figure 4.1, is a color space, within which are illustrated a red, green, blue (RGB) gamut of additive primary colors, and a typical yellow , cyan, magenta (YCM) gamut of subtractive colors. Also shown is the hue and saturation representation: saturation is the radial distance from a specified white point; hue is the associated angle. Hue and saturation vectors are shown pointing to the RGB and YCM color gamuts , as well as one vector that extends beyond both. It can be seen that this representation can be used to represent any visible color. This color space is application and device independent.

In the future it may be possible and desirable to extend the colorimetry representation to include a wider range of colors, possibly even including those of self-luminous objects, as one example. A close examination of this issue is needed to establish the range of colors to be represented within the colorimetry of the digital image architecture.

A similar situation to that of colorimetry exists for the representation of dynamic range transfer function. Current systems are individually optimized for the current technology and application and are not easily amenable to an increase in dynamic range. Mechanisms to effectively handle a much wider dynamic range need to be identified.

4.5.2 Scalability

To cover the intended range of application, it is necessary that the color and dynamic range representations be capable of being scaled, preferably independently. For instance, the display of an image having a wide color gamut at the source must produce acceptable color on a display of limited color capability. The reverse situation is also true. Similarly, the display of an image of high dynamic range should not loose essential information when viewed on a display of low dynamic range. The representation must accommodate these requirements efficiently.

The situation is somewhat similar to that of motion picture film in which the latitude of the negative film enables exposure and color adjustment after the image capture and the S-curve of the film characteristic provides effective compression of the highlights and dark regions. Similar provisions may be required in digital image systems to provide reasonable representations for both small and large numbers of bits. A further consideration may concern the optimal distribution of any necessary compression/expansion in respect of overall image quality.

4.5.3 Interoperability.

Interoperability demands that the chosen colorimetric representation and the dynamic range representation be device independent for current and future devices. In this fashion, devices supporting differing colorimetries and dynamic ranges can be supported.

It is also important that images of differing colorimetry and dynamic range at the acquisition device should be able to be combined effectively into a single image, when appropriately scaled.

The color space and dynamic range representations that could meet these objectives require extensive consideration. Section 7.7 includes a number of questions that should be considered in the analysis of these and other colorimetry issues.

4.6 The Use of Coherent Image Sampling (Progressive Scanning)

Historically, interlace has been used to achieve a 2:1 reduction in bandwidth requirements (i.e., data rate), and to eliminate wide-area flicker on scanning CRT displays. The use of progressive scanning is nearly universal in computer display applications, and is employed in some high quality video presentations.

In Section 3.2.5 it was established that higher scanning rates are required with displays that cover a wider field of view and/or operate at higher levels of brightness than today's television systems. Decoupling the refresh rate of the display from the image update rate provides a mechanism to deal with wide-area flicker - this is discussed in 4.7 Identification of the Characteristics of a Digital Image Stream (Header/Descriptors) A fundamental prerequisite for interoperability in digital systems is a mechanism for identifying and describing digital image data. For this information to be shared, decoders must be capable of identifying and conforming to the incoming data. Even simple decoders - those that only recognize a single standard - must identify data streams which they can decode. This is one of the primary functions of the header. Decoders must also ignore unrecognized data, to allow for extensions to the data stream.

Descriptors provide application oriented information, such as image and coding parameters, processing history, identification of program content, copyright, and scrambling. They also enable extensibility; the descriptor may also contain the coding algorithm or language representation necessary to interpret the encapsulated data. This provides a mechanism whereby expert groups can create and standardize the transmission of messages to meet their needs.

Descriptors may be used to identify and describe data at different levels of an image hierarchy, thus allowing a display system to decode only that part of a stream necessary for its function or capability. Descriptors might also contain information about the preferred display characteristics for imagery.

Thus information such as the colorimetry of the original acquisition system, and the transfer characteristics of the process used to move images from one media to another, can be included with the data. Decoders would use this information to optimize display of the image.

The SMPTE Task Force on Header/Descriptor in their Final Report dated January 3, 1992, and approved by the SMPTE Standards Committee on February 6, outlined the criteria for the use of Header/Descriptors. Work is now progressing on the development of proposed SMPTE Standards, Recommended Practices and Engineering Guidelines.

4.8 Compatibility with Current Television and Motion Picture Standards

The installed base of NTSC, PAL and SECAM equipment within the program production community, together with massive consumer investment in compatible receivers and VCRs, must be supported in the transition to a digital image architecture. Of even greater importance is the requirement to preserve the value of the archives of programs that have been created for mass market distribution using these systems and to exploit these resources to the greatest extent possible in the future.

This is by far the most critical issue of all, so much so that its impact is clear in the discussion of many of the previous issues. Only the last of them, the use of headers/descriptors, is without precedent in existing entertainment industry practice. It is precisely where a dichotomy exists in current practice that the greatest controversy arises - on the issue of temporal rates.

The convergence in being digital may provide the solutions which will resolve the temporal rate issue; convergence around the common language of digital coding, the progression in CPU performance, and the ability to design inexpensive modular interfaces in the form of mass produced microchips.

It is likely that a number of solution will evolve to facilitate interoperability between the existing world of film and analog television, and the new digital image architecture. These solutions should provide a variety of price/performance options appropriate to the applications requirements.

5.0 An Example of a Hierarchical Digital Image Architecture

This section suggests a technology transparent hierarchy - one compatible with the present and extensible for the future.

To illustrate the model, specific numbers have been chosen that take advantage of the mathematical relationships discussed in Section 4.0, as well as the architectures of digital memory and processing components. These numbers are not intended as the basis for a standard, but rather, provide a starting point, from which the validity of the architectural concepts can be verified. Further work is required for verification of the model and determination of the exact numbers, upon which a standard can be based (see Section 7.0).

The following parameters of a hierarchical digital image architecture are discussed in this section:

5.1 Open Architecture

In Section 3.0 it was indicated that the opportunity exists to design an open digital image architecture based on generic, inexpensive, and increasingly powerful digital components.

For a digital image architecture to be cast as an open system, two steps are required:

There must be a systems engineering of the standards so that the modules work together. There are two basic interface definitions to be publicly standardized:

Some of the parameters that should be part of this communications service definition include:

This careful modularization encapsulates other issues, including the critical issues discussed in Section 4.0, so they can be addressed one by one.

It can be argued that there is no need for rigid architectural standards in a digital world; that programmability in the transmission and display hierarchies provides a sufficient basis for interoperability. Perhaps some day this will be true. If the goal of longevity for the first digital image architecture is achieved, it is likely that the designers of the next imaging architecture will be less constrained than we are today.

The first digital image architecture however, must provide a bridge from the closed systems of the past to the open systems of the future. The fundamental structure of the digital building blocks and economies of scale associated with standardization suggest that the organizations charged with establishing these standards work in harmony.

5.2 Designing Display Systems to Deal with Multiple Spatial Resolution Requirements

The perceived resolution of a display is determined primarily by the viewing distance and the visual acuity of the observer. Visual acuity is often determined using sets of alternating black and white lines of equal width. One black/white line pair represents one cycle. The number of cycles that can be resolved across one degree of the eye's viewing field is typically used as a measure of human visual acuity, and is stated in cycles (line pairs) per degree. Under some conditions, with high contrast line pairs, human visual acuity extend beyond 40 cycles per degree; approximately 22 cycles per degree is perceived as a sharp image.

If the resolution of a display is held constant and the viewing distance is a variable, the resolution perceived by the viewer - measured in cycles per degree - will increase as the viewer moves away from the display. Therefore, all displays can be considered to be high resolution if viewed from an appropriate distance.

At a distance the varies with the visual acuity of each individual, the actual resolution of the display equals the limit of that viewer's ability to resolve image detail. Beyond this viewing distance additional image detail cannot be perceived; that is, the display has more resolution than is required for this viewer and set of viewing conditions.

In some cases excess resolution may be desirable. For example, the operator of a personal computer can typically reduce the viewing distance to a high resolution desktop display by one-half, simply by leaning forward, thus taking advantage of additional resolution improves enough to be significant, while moving 15 inches in a movie theatre would have little effect on perceived resolution.

The NTSC transmission standard was designed to provide a resolution of approximately 21 cycles per degree over a viewing field of just under 11 degrees. Display size can be variable in today's television, ranging from a diagonal of a few inches (a personal display) to more than 30 feet (direct view displays in stadiums and projection displays in controlled lighting environments). These displays differ only in the size of their pixels. At the appropriate viewing distance, the perceived resolution of the personal display and the stadium display will equal the design goal of 21 cycles per degree, and both displays will cover 11 degrees of the observer's field of view.

Many display applications require higher levels of perceived resolution. To increase the level of perceived resolution, while holding viewing distance constant, additional samples of the same image must be added, increasing pixel density. To cover a wider field of view, as in wide-screen displays, holding the same viewing distance and perceived resolution, new information, at the same pixel density, must be added to extend the picture.

5.3 Defining a Spatial Resolution Hierarchy

Section 3.2 identified the need for a variety of image resolutions to deal with specific imaging requirements. These ranges can now be further defined in terms of field of view and resolution in cycles per degree.

With personal, home entertainment and theatre displays, the viewer can vary the distance from the display, and thus vary the perceived resolution, over a significant range (see Figure 5.1). Taking into account the variations in acuity in the population, and variations in viewing distance for each application, it is common practice to design a display system for the average viewing conditions in each application. The overlaps in cycles per degree between low, normal and high resolutions are shown in the table to account for these variations.

Resolution       Cycles per Degree

Low                   1 - 15
Normal               10 - 25
High                 20 - 30
Ultra High           30 - 40
A special case exists for head mounted displays which provide a fixed viewing distance; here the display manufacturer must select the level of resolution appropriate for the application and then design for a specific perceived resolution.

Using these guidelines, a high resolution display designed for a 35 degree field of view would require about two thousand pixels per line at 30 cycles per degree. In a desktop computing application where the viewer is 30 inches from the display, the length of an active line (display width) would be about 19 inches. In an entertainment application, such as a consumer television receiver viewed from a distance of 108 inches (9 feet), the length of an active line would be about 68 inches.

These examples are illustrated in Figure 5.2. In this figure the principles described in this section are used to illustrate the relationships between the four resolution levels of the model hierarchy and a variety of display applications. The numbers, especially as they relate to image size (in pixels) are entirely relative; they serve only as examples of the pixel count required, at average viewing distances and fields of view, to achieve the specified perceived resolution.

It is important to note that seemingly diverse applications such as personal computer and home entertainment displays have similar resolution requirements as the size of the home entertainment display increases beyond the narrow field of view of today's television receivers. It is also important to note that direct view CRT displays (which are currently limited to around 40 inch diagonals) require resolution in the normal range for home entertainment applications.

FIGURE 5.1 - Relative Tile Resolutions
- These groups of letters represent the relative resolution for each level of the hierarchy from Level 1 (top) to Level 4. To better understand the practical application in displays, place this figure where it can be viewed from a distance of between 30 inches and 15 feet. level 4 should be sharp at 30 inches; as you move away each level lower in the hierarchy will become sharp.

5.3.1 Key Concepts of the Model

The example spatial resolution hierarchy is designed around a few basic concepts:

The hierarchy progression is based on the use of integer values related by powers of two. Essentially, at each higher level of the hierarchy, resolution doubles (e.g. 1, 2, 4, 8, etc.); subsets of the lowest level can be derived with similarly (1/2, 1/4, 1/8, etc).

It is noteworthy that such sequences also appear in the computer processor and memory component industry. This approach takes full advantage of the generic building blocks that are the driving force in the transition to a digital world.

In order to provide continuity between the various resolution levels of the hierarchy the model is based on the concept of an image tile. For the purposed of this discussion, a tile can be considered to be a constant portion of an image, representing the same part of the image regardless of the resolution level or image size. Thus, at each higher level in the hierarchy, the resolution within a tile doubles in each axis. This is illustrated in Figure 5.3.

The power of two progression may now be applied to determine the resolution, in pixels, for each level in the hierarchy.


  Resolution                       Pixels in   
Level  Name       in Cycles      Pixels in          32 x 32 
  per Degree     One Tile        Tile Superset


1    Low         1 - 15        16 x 16           512 x 512
2    Normal     10 - 25        32 x 32          1024 x 1024
3    High       20 - 30        64 x 64          2048 x 2048
4    Ultra      30 - 40       128 x 128         4096 x 4096
High
In this model a tile represents an area equal to 1/32nd of the image at any level of the hierachy. Thus each level consists of a 32 x 32 set of tiles (see Figure 5.3). The selection of this fraction for a tile is arbitrary; it was chosen because it is a convenient building block - integer multiples can be used to construct displays at all of the aspect ratios and spatial resolutions discussed in the model.

5.3.2 Construction of Displays from Tiles of the Appropriate Resolution

The table in Figure 5.3 provides a matrix of display aspect ratios and resolutions that can be derived from the full set of 32 x 32 tiles at each level. Since the tile size is a constant, each column represents a constant size display at four perceived levels of resolution.

The diagram in Figure 5.3 establishes several important relationships that provide a bridge to the past and illustrate how interoperability can be achieved:

The tile concept can similarly be applied to the manufacture of displays. In this case, a physical display tile would correspond to a conceptual tile and would have different physical sizes for different size displays and different pixel densities for different resolution requirements. Similarly, displays of different aspect ratios could be constructed by the selection of the appropriate conceptual tiles as shown in Figure 5.3.

Thus, using tiles and only four resolution levels, it is possible to construct a display for virtually every possible application; furthermore this display can also be used to show imagery from other levels of the hierarchy. This is especially practical if a scalable coding architecture is implemented that conforms to the same resolution progression.

5.4 A Family of Related Image Acquisition Rates and Display Refresh Rates

A family of image acquisition and display refresh rates should be based on a progression that permits non-interpolative transformations between the acquisition and display rates. This is easily implemented if the acquisition and display rates are the same, or if the display refresh rate is an integer multiple (or fraction) of the image acquisition rate.

Since significant archives of high resolution program material exist on film, which was acquired at 24 or 25 fps, one of these rates should be included in the progression. A progression based on integer multiples of 12 would include 12, 24, 36, 48, 60, 72, 96, 120 Hz, etc. A progression based on integer multiples of 12,5 would include 12,5, 25, 50, 75, 100, 125 Hz, etc. These progressions might also include integer fractions of 12 or 12,5 (e.g., 1/2 or 1/4 of the base frame rate for applications such as videoconferencing and searching of video databases.)

It has been common practice in Europe to display 24 fps film at 25 fps for compatibility with PAL and SECAM.; this results in a 4% speed increase. Many European programs produced for television distribution are acquired at 25 fps; if the family of rates is based on 24 fps, these programs would be played 4% slower. As indicated in Section 4.8, further research is required to determine the impact of choosing one of these rates, on those industries that utilize film for image acquisition.

Ideally, compatibility with existing electronic imaging systems should be accommodated in the design of the standard modules that will interface these systems with the digital image architecture. By design, this would place the burden of compatibility on the systems that are being replaced rather than products that conform to the new architecture; thus the future will not be constrained by today's limitations.

In the process of developing the existing analog and digital high resolution television systems, the designers of these systems have demonstrated the practicality of such a modular approach to interoperability. A variety of translation devices have been demonstrated that allow interoperation between PAL, NTSC, HD-MAC and MUSE. The interface modules that will be required to transform the signals from these systems (especially NTSC and PAL) into the new architecture, offer the potential for large volumes. It is likely that the market for these modules will be characterized by intense competition, leading to a range of solutions at various price/performance levels.

In the near term the choice of a family of rates based on 12 or 12,5 Hz would provide optimally low cost and high performance, for both advanced television and computer uses, as well as providing global interoperability. In the longer term decoupling of acquisition, transmission and display is likely to lead to entirely new approaches to pixel replenishment that may render the current concept of image acquisition rates and display refresh rates meaningless.

5.5 Scalable Coding Algorithms

Scalable image decomposition offers the ability to produce image data in packages that can be combined to produce images at a variety of spatial and temporal resolutions. Decoding and displaying the lowest frequency image packets would produce an image at the first level of the hierarchy. Additional packets (encoded with spatial and temporal differences) would be decoded to produce images at higher levels of the hierarchy.

This approach enables extensibility. For example, the coding of low resolution imagery might remain unchanged to provide compatibility with existing decoders, while new coding methods, made possible by the geometric progression in computational hardware, can be introduced to support more advanced imagery. Increasingly powerful (and affordable) programmable decoders can provide compatibility with the standards that form the foundation of the digital image architecture, and the additional processing power required for future enhancements to the architecture.

6.0 Industries and Applications Considered

6.1 Industries and Applications

Industries are categorized by current market segments. It is important to keep in mind that convergences among existing industries will likely occur (e.g., computers and consumer electronics; audio, video, and datacomm), and as new opportunities to provide value products and services emerge, entirely new industry segments will undoubtedly come forth.

It is becoming difficult to draw the line, even today, between consumer electronics and computers. Today's video game machines, already in millions of homes, are marketed as consumer accessories to televisions, but are in fact, more computationally competent than personal computers of only a few years ago. Similarly, personal computers are being marketed to the home market through traditional consumer electronics channels.

Traditional business factors should always be considered. These include equipment replacement costs, amortization, benefits, competition, market needs, and access to material.

Successful industry participants will both pay close attention to emerging trends and help to bring them about. Sometimes, deep pockets may be required to create a market. (It took years of major losses in both equipment and programming efforts before color television became profitable.) In contrast, agreement on a common architecture across a wide range of industries and applications would spread the costs and encourage early adoption.

The groupings used for this report help to relate application requirements to industries. It is well understood that there is already much overlap between industry groups and applications.

The industry groupings are as follows:

6.1.1 Entertainment Providers

Entertainment provider fields include programming, animation, games (personal and arcade), broadcasting, cinematography, post-production, theatrical presentation, and pre-recorded media.

The technologies used in these fields are highly dependent on downstream profits. It can be difficult to justify large investments (e.g., an HDTV production facility) in new technologies that can only be utilized by a small portion of their market. Smaller investments that require minimal infrastructure changes (e.g., MTS stereo, VHS-HQ) can be more easily justified, particularly when end-users can benefit with existing equipment or rapid upgrade is anticipated. Backward compatibility and extensibility are key issues here and can only be successfully violated when there are substantive benefits to the end user (e.g., audio compact disc).

Revenue streams can often be anticipated to flow well beyond the initial release of the product. Residuals from syndication, rentals, and sales require that providers anticipate future trends in end-user viewing equipment capabilities. This is one reason why most prime time television is shot on 35mm film and not video.

6.1.2 Distribution and Communications

The distribution and communication industries that will be affected by digital image systems include telephone, television broadcasting and cable TV, utilities, video conferencing, electronic mail (including text, data, image, animation, video, and sound), and mobile communications. Carrier channels that will play a yet-to-be-determined role in this process include optical fibers, broad and spot-beam satellites, microwave, cellular, conventional VHF/UHF terrestrial broadcast, broad band coaxial cable, and local and wide area networks. Also impacted will be video tape, video disc, game, and general software distribution.

There is some effort to establish a video dialtone similar, in concept, to today's voice telephone dialtone. As communication networks increase bandwidth, and compression technologies improve, an increased use of remote real-time visual communications can be expected.

These same advancements also facilitate rapid downloading of video information from media servers; At a 100:1 compression ratio, the data for a typical motion picture could be transmitted in a few minutes over a video capable network.

Because of the universal proliferation and conversion standards for the telephone, it is likely that we will soon see extensions of current fax standards including: voice fax (voice mail), high resolution color image fax, and video fax (video mail). One of the driving forces behind the development of the JPEG image compression standard was the need for an efficient data reduction technique for the transmission of still images.

The telecommunications industry is well down the road in the establishment of digital imaging standards. The CCITT, which controls fax standards worked with the IEEE on the JPEG standard and the videoconferencing standard, know as P.64 or H.261. These groups are also responsible for the MPEG family of moving picture standards. JPEG and MPEG I and P.64 form the basis for the first generation of image telecommunications products that are already starting to reach the market.

These standards were designed with a high degree of flexibility to deal with a variety of imaging applications; they have served as excellent examples for the Task Force in the area of interoperability, and scalability. Currently the MPEG group is working on extensibility; MPEG II is targeted for the delivery of higher quality motion image data streams in the range from two to forty megabits per second. The MPEG working group is investigating scalability as a requirement for this extension of MPEG. It would be beneficial for these new standards to relate harmoniously to other digital imaging architectures.

The merging of both broadcast and interactive voice, image (including graphics and video), text, and data across diverse transport media will create challenges in properly matching the information with the delivery mechanism. Current efforts to implement interactive television, for example, use differing transmission media for each direction (e.g., broadband in; telephone or cellular radio out).

Factors such as existing infrastructure, projected time and cost to deploy, bandwidth cost, regulatory issues, nature of the signal, target viewer, compression, error sources, localization, security, latency, etc., need to be considered.

The communications infrastructure deployed for the entertainment market could provide a profound leverage for the information domain. For example, a broad consumer demand for access to high bandwidth entertainment (and other) services could accelerate the national installation of fiber-optic cables. Once in place, these high bandwidth networks could also be used as high performance links to super-computers and very large data bases, and broadly distribute real-time business, engineering, and scientific data.

While installation of fiber-optic cable to a major user base can take many years, new or existing satellites can cover huge population areas very quickly. A variation of direct broadcast satellite (DBS) transmission is spot-beam satellite technology. In this approach, as few as three satellites could be used to provide localized high quality (HDTV) signals to small inexpensive receiving devices in as many as 150 geographic areas within a country the size of the continental United States.

6.1.3 Professional Equipment Manufacturers

Equipment manufacturers who produce studio, production, storage and distribution, and test & measurement equipment will enjoy opportunities to provide their customers with new products and services that can be useful across a range of industries. Digital, extensible, scalable image architectures can provide high value per dollar and increased economies of scale.

The computer, medical, and graphics industries could similarly benefit from harmonious formats that would allow them to produce image generating, manipulating, managing, storing, and viewing applications and devices at reduced cost and increased interoperability.

Some specific industrial application areas include security equipment for surveillance and identification and product and process inspection.

6.1.4 Consumer Electronics Manufacturers

The introduction of digital technologies into consumer products opens the way to new and improved services and capabilities. As the consumer market increasingly demands higher image quality for both work at home (e.g., personal computers) and entertainment (e.g., televisions and video games), there will continue to be incentives to push the technologies that will bring a better picture to the consumer.

This will create opportunities in the receiving devices, the electronic components that go into them (e.g., semiconductors, light sources and modulators) and the subsystems (e.g., displays, tuners, and signal processors). The likely emergence of new product categories can both heighten and personalize the entertainment experience.

Ancillary devices (e.g., tape and disc recorder/players, camcorders, editing, processing, sound systems, printers, scanners, interactive peripherals) will be additional sources of added value products.

It is likely that computer control technologies will play an ever increasing role in home entertainment and information systems. The integration of all of the equipment listed in the preceding paragraph in the home entertainment environment has proven to be a major problem - and a significant opportunity. We have seen programmable remote control devices evolve to replace the profusion of separate infrared controllers (TV tuner, cable tuner, VCR, laserdisc, audio CD, radio tuner, etc.). The integration of the graphical user interface from the world of desktop computing with the home entertainment/information system has begun.

Collaborative cross-industry efforts will merge computers into home entertainment networks, dealing with the issues of component integration, connection to multiple sources of entertainment and information, user interface, and "user friendly" programming of the system. Various flavors of "personal computers" in the home will be able to connect to this network as well as intelligent appliances and remote control devices. Inexpensive networkable cameras will allow remote visual monitoring; the front door; the baby's room; etc.

6.1.5 Computers and Information

Human vision provides the highest bandwidth information interface to the machine world. Computer technology can serve as an effective enabling tool for image information creation, capture, processing, storage and archiving, access, transmission, and presentation. While computer assisted information in the 1980s largely focused on text, data, and simple graphics, rapid changes are taking place to support other media (audio, image, animation, video, simulations, etc.). This places increased demands on computer performance and human interface to handle the significantly higher data content in these media.

To provide specific types of information to users, new classes of specially tuned information appliances will likely develop. These appliances will rely on information providers to collect, generate, and organize information. In the education market, for example, an information appliance might be tuned toward providing everything a student needs to progress through a particular class. Besides basic course content, texts, lecture notes, assignments, etc., it could make extensive use of imagery to provide interactive multimedia tutorials, remedial help, lab simulations, extensive reference material, electronic messaging, and smart links to classmates.

In the information age, a critical challenge is the productive management of the overwhelming amount of information produced each year. Unfortunately, images and video tend to make this problem even greater. While database search engines deal reasonably well with keyword searches and inverted indexes on textual data, corresponding tools for other media have tremendous opportunities for improvement.

Museums and libraries could use electronic file systems to catalog and view very high resolution images of the masters. Sculptures and other three dimensional objects could be shown on stereographic or holographic displays, or printed on very high quality large format printers.

The role of the artist and graphics designer has changed dramatically as the quality and flexibility of the "electronic canvas" has come to emulate the various forms of traditional media. Just as the camcorder has allowed many budding cinematographers to explore their art, high resolution drawing tools with interactive training are revolutionizing electronic publishing and winning over graphic artists. Many artists are expanding into new markets such as videographics and animation from this electronic base.

Traditional forms of printing and publishing information delivery will continue to exist alongside of newer mediums. Electronic billboards could change messages by day of week or time of day. Electronic books, magazines, catalogs, and advertisements can integrate interactive video and other media to tell a story, make a point, or sell a product. They can also elicit information from the user that can provide useful information to the publisher (e.g., "hard to understand this concept," "would like product in green").

6.1.6 Education

One strategy for promoting the use of digital image technologies in education is to leverage high volume consumer products. There is now a real opportunity to leverage scalable, interoperable, extensible consumer products into the classroom and other learning environments (e.g., lab, home, library, tutoring, group study).

Institutional training represents the high end of the educational market. An economic return on investment can often justify the use of expensive technology to maximize training "productivity" since the employee students are being paid wages while not working. Increased use of sophisticated interactive multimedia tools developed and used in these environments could find derivative use in public classrooms and the home.

6.1.7 Engineering and Science

Engineers and scientists have traditionally used the high end of graphics and imaging systems for data visualization, design, simulation and scientific visualization. This will likely continue as new uses expand into such areas as microscopy and astronomy.

This community has often utilized high-end versions of consumer technologies (e.g., TV CRT/Workstation CRT). Their role in leading versus leveraging the next generation of imaging systems is not clear. The existence of a proper digital image architecture will reduce barriers across applications, platforms, and markets.

6.1.8 Healthcare

Healthcare represents a growing cost concern for most industrialized societies. While a digital image architecture may not directly reduce costs, the judicious use of images and video can provide an improved cost/benefit ratio for physician training, medical research, and general patient care.

High resolution imaging can be useful in radiology, microscopy, patient monitoring (especially during surgery), and consultation with specialists in a remote location.

Image requirements can be very stringen