Available information parts inside this document:

Internal Information

Main Referee:

Norbert Gerfelder, Kerstin Pipke

State of Entry:

Complete

Remarks:

General Information

Type of Document:

SMPTE Task Force Report

Document Title:

Report of the Task Force on Digital Image Architecture

Date of Publication:

September 1993

Publisher:

SMPTE
595 West Hartsdale Avenue
White Plains, NY 10607-1824
USA

Primary Source / Published in:

SMPTE Publication
Report of the Task Force on Digital Image Architecture
September 1992

No. of Pages:

50

Source of Supply:

SMPTE
595 West Hartsdale Avenue
White Plains, NY 10607-1824
USA
Phone:    (+1) 914 761 1100
Fax:      (+1) 914 761 3115

Report of the Task Force on Digital Image Architecture

SOCIETY OF MOTION PICTURE AND TELEVISION ENGINEERS (SMPTE)

595 West Hartsdale Avenue
White Plains, New York 10607

Table of Contents

PREFACE

For some time, the communities participating in the standardization activities of the Society of Motion Picture and Television Engineers have considered the role of television in the future of visual communications. In recent years, the debate has been joined by members of other communities affected by convergence of communication and imaging technologies enabled by a common of digital building blocks.

The emergence of digital coding as the common language of visual communications may fundamentally change our view of the world. The extent to which this common language will affect life in the 21st century may be even more profound than the effect that the medium of television has had on life in the 20th century. Television has provided a window to the world - often real-time - for many of the 5,4 billion inhabitants of this planet. This medium of cultural an information exchange has enabled previously isolated populations to join an emerging global village - one increasingly free of barriers. The common digital language offers a unique opportunity to leverage converging technologies, such as television, computers an telecommunications, into a global communications network. Such a network would have the potential to offer a vastly augmented range of services to all system users, thus opening up new markets to all of the affected equipment and service providers.

Worldwide, there is a growing consensus that the time has come to develop standards for the television systems based on a new paradigm - appropriate for today - with forethought to future requirements. The introduction of digital technology into imaging industries, together with the widespread introduction of digital communications, creates a window of opportunity to establish a digital image architecture with unprecedented freedom of application and interconnection.

This report examines some of the fundamental issues that must be addressed in achieving a compatible set of standards enabling a globally interconnected and interoperable visual communications network. The essential concepts for this family of standards include: an open (non-proprietary) system architecture, interoperability, scalability and extensibility. It is hoped that this Report will stimulate the interest of many groups and organizations involved in the establishment of imaging standards, today and in the future, and lead to agreement on a single system, flexible enough to accommodate a wide variety of needs, while enabling worldwide interoperability.

The report was prepared by the SMPTE Task Force on Digital Image Architecture and is responsive to the Work Assignment, dated April 1991, which established the following objective:

The Task Force, early in its considerations, identified the need to expand the Work Assignment to include all relevant aspects of digital image systems - acquisition, processing, storage, transmission, reconstruction and display - and to consider systems across a much wider range of resolutions than previously planned. This was agreed to and is reflected in this Report. Requirements and constraints noted in SMPTE/IEEE/ATSC cosponsored digital system information exchange meetings have also been incorporated as appropriate.

The Report is, in essence, the outcome of a feasibility study concerning the creation of standards for digital image systems that are scalable and extensible, effecting a high level of interoperability between a diverse range of industries and applications. The work is, as yet, incomplete; however, it has already established an important though preliminary basis for a family of digital imaging standards. The Report raises many new questions and identifies additional work required to refine the concepts that form the basis of a digital image architecture. Of particular importance will be the selection of source and display refresh rates to provide performance and economic compatibility with today's television systems.

The concepts outlined can provide a basis for a modular open system architecture, in which the parameters and characteristics for each module, and the interfaces between these modules, are clearly defined and in the public domain.

Such a system should use common standard components to serve diverse needs across all affected industries. It should enable the movement of image data across application and industry boundaries without degradation and with minimum complication. This is interoperability.

Such a system should also provide the ability to adjust image parameters - temporal and spatial resolution, colorimetry and dynamic range - by varying the amount of data that is stored, transmitted, received, or displayed. This is scalability.

A digital image architecture must give forethought to evolution - to incorporate advances in technology within any module, without changes to any other module. It must be backward compatible with today's systems, and forward enabled to accomodate the technology explosions of the 21st century. This is extensibility.

The Report was prepared by a Task Force chaired initially by David Trczinski (PictureTel) and latterly by Dr. Will Stackhouse (Jet Propulsion Laboratory), with a wide participation from the computer, television, post production and telecommunications industries. A detailed list of the membership follows. The Report was considered by the SMPTE Standards Committee at its meeting of August 13th, 1992 and subsequently adopted after an in-depth review.

List of Members and Participants

Members

Will Stackhouse, (Chair)          JPL
Walter Bender                     MIT
Craig Birkmaier (Editor)          PCUBED
Rita Brennan                      Apple Computer
Wayne Bretl                       Zenith
Barry Bronson (Co-chair)          Hewlett-Packard
Ken Davies (Ex  Officio)          CBC, SMPTE
Gary Demos (Co-chair)             DemoGraFX
Hugo Gaggioni                     Sony
Bill Glenn                        Florida Atlantic University
Bob Keeler                        AT&T, Bell Labs.
Thomas Leedy                      NIST
Peiya Liu                         Siemens
Lee McKnight                      MIT
Robert Powers                     MCI Telecomms
Tom Meyer                         Duir Assoc.
Alan Reekie                       European Community (CCE)
Richard Solomon                   MIT
Arpad Toth                        Kodak
David Trzcinski                   PictureTel
Mitchell Wade                     DemoGraFX
Ken Yang                          Ampex

Participants

 
Stan Barron                       NBC, SMPTE
Si Becker                         SMPTE
Rex Buddenberg                    Consultant
Robert Burroughs                  Panasonic
David Carver                      MIT
Peter Dare                        Sony
Phil Dodds                        IMA
Charles Fenimore                  NIST
David Fibush                      Tektronix, SMPTE
Paul Fleischer                    Bellcore
Branko Gerovac                    DEC
Barry Gilbert                     Mayo Foundation
Christopher Hamlin                Apple Computer
David Herbine                     NADC
Clark Johnson                     Consultant
Thomas Leeder                     NIST
Bijoy Khandheria                  Mayo Foundation
Edward Krause                     General Instrument
Arvid Larson                      IEEE-USA
Derrick Lattibeaudiere            Panasonic
Richard Lau                       Bellcore
Bernard Lechner                   Consultant
Michael Liebhold                  Apple Computer
Henry Meadows                     Center for Telecomm. Research
Francois Michaud                  CBC
Marvin Mitchell                   Mayo Clinic
Robert Morrow                     USAF Academy
Robert Myers                      Hewlett-Packard
Suzanne Neil                      MIT
Bruce Penney                      Tektronix
Ken Phillips                      Citicorp
Ed Post                           Quark
Charles Poynton                   Sun
Glenn Reitmeier                   DSRC
Robert Sanderson                  Kodak
William Schreiber                 MIT
Scott Silver                      Tektronix
John Sprung                       Viacom
David Staelin                     MIT
Peter Symes                       Grass Valley Group
David Tennenhouse                 MIT
Greg Thagard                      CST
Mark Urdahl                       IBM
John Weaver                       Liberty Television
Merrill Weiss                     Consultant

International Participants

Norbert Gerfelder                 Fraunhofer Computer Graphics 
		  ISO/IEC
Rainer Hofmann                    Fraunhofer Computer Graphics 
		  ISO/IEC
Detlef Kroemker                   Fraunhofer Computer Graphics 
		  ISO/IEC

1.0 Executive Summary

The SMPTE Task Force on Digital Image Architecture was charged with developing and proposing a structure for a hierarchy of digital image standards that would facilitate interoperation of image systems. The major objective was to establish the basis for image systems that are open, scalable and extensible, thus meeting the perceived needs for image communications in the environment likely to exist as computers, television and communications converge, enabled by pervasive digital technology.

The Task Force, formed from representatives of the affected industries and applications, has examined the issues, setting out those that are believed critical at this time, and has modelled, for discussion, further refinement and testing, one possible approach that meets the basic requirements. It has also produced extensive tutorial information concerning the matters under consideration.

The Key Concepts of the approach are defined in Section 2, setting the conditions for image systems that are:

Such systems would be based on a hierarchy that is:

Current and future image systems are presented and analyzed in Section 3.0 of the Report, which also states the main objectives of the Task Force activity:

Section 3.0 of the report establishes the fundamental concepts upon which a model for an open digital image architecture can be constructed, taking into consideration the objectives defined above.

Section 4.0 details the critical issues in the development of a suitable image architecture meeting the stated objectives:

It is believed that this approach will result in systems that achieve a good level of compatibility with current television and imaging systems, while placing a minimum of constraints on the path to the future (extensibility).

A model of an open architecture approach to image standards is developed in Section 5.0, one that is both compatible with the present and extensible to the future. It is based on a low order hierarchical approach, using image tiles. The model defines four levels of resolution and takes account of a number of possible aspect ratios currently in use. Additional analysis is provided regarding the selection of an appropriate family of image acquisition rates and display refresh rates. Finally a scalable coding approach is proposed that offers the ability to produce image data in packages that can be combined to produce images at a variety of spatial and temporal resolutions.

The Task Force is expected to be of interest across a wide range of industries and applications. Section 6.0 examines the industries likely to be most affected, their specific imaging needs and the possible impacts of a defined digital image architecture.

In Section 7.0 the Task Force suggests additional work that must be completed, to move towards a full implementation of the of the digital image architecture. The list of suggestions included in Section 7.0 is not exhaustive; it is recognized that in the process of validating the architectural concepts, additional areas for further analysis will be identified. An extensive list of questions is included which should be considered in the process of establishing standards for an architecture.

The suggestions include the following items of high priority:

A considerable amount of background and tutorial material was developed during the preparation of the Report. Some of it is believed to be of value generally or for reference in future work on the development of the digital image architecture. This material is included in Section 8.0:

2.0 Key Concepts

2.1 Introduction

As a starting point in the process of developing and communicating the requirements for a digital image architecture, it is important to establish a clear definition of the key concepts upon which the architecture is to be based. In many cases, existing definitions must be enhanced to bridge the gap between current practice and future requirements embodied in the architecture.

Two reference documents were utilized in the process of creating the definitions which follow:

Definitions obtained from the IEEE Dictionary are presented in "quotations" - the provide a reference point for the expanded definitions developed by the Task Force. Definitions presented in the Report of the SMPTE Task Force on Headers/Descriptors proved to be incomplete for the needs of this report, due to the expanded Work Assignment for the Task Force on Digital Image Architecture. While the definitions in this Report are consistent with the earlier work of the Task Force on Headers/ Descriptors, they provide an expanded understanding of the key concepts for a digital image architecture.

2.2 Digital Image Architecture

A system architecture defines "the structure and relationship among the components of a system".

One of the major objectives of this Report is to define a system architecture which promotes sharing of images an equipment across applications and industry boundaries. To achieve this goal, the digital image architecture must be high flexible to deal with a variety of diverse requirements, including the evolution of technology.

A Digital Image Architecture should be an open system, that is, one made up of functional modules with standard, public interfaces which can be assembled into a functional system "a set of interconnected elements constituted to achieve a given objective by performing specified functions." Explicit objectives of the architecture include:

The Digital Image Architecture supports both natural and synthetic imagery including:

A key feature of the architecture is that it allows decoupling of the system into functional moduls. The functional modules of the architecture are:

2.3 Interoperability

Interoperability is the sharing of images and equipment across application and industry boundaries. When dealing with digital image representations, this sharing should be facilitated without degrading image quality due to transformations in temporal and spatial resolution, grid geometry, and image aspect ratio.

This requires careful attention to the definition of the interfaces -- the shared boundaries -- between the functional modules.

The key interface definitions are

2.4 Hierarchy

A Hierarchical digital image architecture is one in which various levels of performance are supported:

The architecture is hierarchical in order to address the requirements for scalability and extensibility.

2.4.1 Scalability

To scale is: "To change the quantity by a factor in order to bring its range within prescribed limits."

Scalability deals with the ability of an imaging system to adjust the level of performance by varying the amount of data that is stored, transmitted, received, or displayed -- up to the maximum resolution that was originally acquired. A number of specific definitions are implied:

2.4.2 Extensibility

Extensibility in the design of an hierarchical digital image architecture allows the system to evolve with advances in the underlying technologies so that additional levels of performance can be implemented, without rendering obsolete those existing products that conform to the basic requirements of the imaging hierarchy.

Extensibility implies designing evolution into the system. The transmission and display modules of the system should be cast as building blocks. The building blocks, because of their inherent modularity, may freely evolve over time.

3.0 Analysis of Imaging Architectures

3.1 Establishing a Framework for Analysis

For more than two decades, the application of digital processing techniques has contributed to the evolution of analog composite television systems, especially in the areas of video recording, image processing, and image synthesis. This evolutionary use of digital technology had little effect on the perception of imaging systems; from this perspective many observers believed that digital video would gradually replace analog video without any fundamental changes to the foundation of imaging systems.

However, in the past few years the evolutionary view of imaging systems has been challenged. At the 26th Annual SMPTE Advanced Television and Electronic Imaging Conference, John Watkinson suggested that we analyze the impact of digital technologies from another perspective: "To think that digital technology only impacts the underlying equipment and that otherwise it's business as usual is to miss the larger transformation that is occurring in each of the affected industries."

From Watkinson's perspective, the transition to a new digital imaging architecture represents the opportunity for a new paradigm. Proponents of this position have encouraged system designers to step back and take a global view of the impact that digital technologies are having on every industry that deals with electronic imaging; to think not just in terms of delivering ever-improving levels of image quality, but to consider what being digital really means.

John Naisbett in his 1982 best seller Megatrends: Ten Directions for Transforming Our Lives, stated that new technologies go through three phases as they become part of our daily lives. Applying Naisbett's model to the evolution of electronic imaging systems leads to the following three paradigms:

From the new perspective, being digital deals with the shift to the third paradigm. It is the enabling technology that has made it possible for this Task Force to analyze the requirements for interoperability, scalability and extensibility, and to propose a set of guidelines to accomplish these goals. What are the aspects of being digital that have brought about this transformation in perspectives?

A major factor has been the geometric progression in computer processing capabilities - doubling computational power every two years, with little change in cost or size. This progression is projected to continue well into the next century. As a result, high resolution still image processing capabilities are now within reach of every computer user. Techniques once reserved for high-end workstations are now commonly applied in desktop computing, including the recent addition of full motion video as a data type.

Video has also been a major beneficiary of the technology progression. Production systems that only a decade ago required a six foot rack of electronics can now be implemented in a few rack units - or on a few cards that plug into a personal computer.

The tremendous increase in computational power has enabled another critical aspect of being digital - video encoding based on the use of digital compression techniques to reduce the required data rate. A variety of compression technologies have evolved that remove image redundancy within and between video frames. The required data rate may also be significantly reduced by more efficient coding of the image at the source. Developments of such techniques are progressing rapidly and may become useful in the near future.

While compression technology has existed for many years, and continues to evolve, practical implementations for video have only become possible in the past few years due to the rapid evolution of digital processing technologies. This in turn has stimulated new research into scalable video encoding techniques that will allow multiple levels of image quality to be extracted from a single image data stream. Some observers predict that the processing power required for the decoding of scalable digital video streams will be universal and inexpensive before the end of this decade.

Improvements in data compression perform the same function as increases in bit carrying capacity in the communications system - delivery of more bits to the user. In the past decade, increases in communications capacity of several orders of magnitude have occurred.

In such an environment, the longevity of new equipment purchases may be dependent upon a digital image architecture that is designed with adequate provisions for extensibility. To meet this objective the Task Force has focused its attention on three areas:

3.2 Properties of Human Visual Perception

The human visual system deals with the physical world both in the terms of its ability to resolve image detail (spatial resolution), and changes in the environment (temporal resolution). We experience the world visually by capturing light directly from a source, or as the reflections of light off of objects in our physical environment. The resulting perceptions of the environment are typically described in terms of size, shape, brightness, color, depth, direction, and speed. These qualities arise in the brain's image processing circuitry; essentially they result from a comparison of the acquired visual cues with what we have learned about the world's intrinsic structure.

A research has revealed more about the physiology of vision, prevailing theory has evolved, placing major emphasis on the computational and cognitive role played by the brain and local image receptors. In turn, this research is providing potentially valuable input to the designers of digital imaging systems.

3.2.1 Human Image Acquisition

The human visual system relies on multiple image receptors to deal with the diversity of environment that it encounters: cones are utilized for color image acquisition over a wide range of illumination levels; rods are utilized for monochrome image acquisition over the lower range of illumination levels.

The eye contains approximately two million cones and 120 million rods. The cones are organized into three broad groups of receptors that are sensitive to light in specific spectral bands; while these bands have significant overlaps, they roughly conform to the red, green, and blue portions of the spectrum. Red and green receptors each outnumber blue receptors by a factor of two to one. The dispersion of these receptors is not uniform, thus spatial perception deals with a complex matrix of receptor types and cognitive processing by the brain.

The center of the visual field, an area called the fovea, contains 30,000 to 40,000 cones an no rods. Outside the fovea the density of cones diminishes, interspersed among the high density rods. The cones within the fovea are responsible for high spatial detail perception while the extrafoveal cones and rods play an important role in visual search and influence directed eye movement. Central vision enables use to see detail, while peripheral vision is attuned to change.

Although high spatial resolution vision is restricted to the fovea, the visual system acquires high resolution images over a wide portion of the field of view. This is achieved through involuntary eye movements; high frequency tremor, slow drift, and rapid saccade.

Research has determined that it takes several hundred milliseconds for the eye to acquire a high spatial resolution image, synthesized from a number of overlapping views. Slow drift and rapid saccade are the mechanisms used for repositioning the fovea to acquire these multiple impressions. The tremor appears to be a mechanism to remove high frequency spatial noise. The tremor's oscillation occurs at a frequency range of 40 to 80 Hz over an area approximately equal to the size of a single cone.

Since human vision is binocular, involuntary eye movements also contribute to depth perception: the brain process these overlapping views to obtain differences from which depth and spatial properties are inferred.

The spatial resolution of moving objects is also linked to eye movement:

3.2.2 Human Visual Processing

Much of the research in visual science today is focused on the processing of data acquired by the image receptors. A variety of specialized analyzers in the eye process data from small localized regions and accumulate the results into channels which are processed by the brain to create an integrated view of the physicals environment.

There is evidence that the brain directs the activity of the image receptors for processes such as establishing white balance and light sensitivity levels. Simple localized analyzers are used to enhance the data transmitted back to the brain. Some of these analyzers are sensitive to a particular edge orientation; there are sufficient analyzers at each location to represent a full set of edge orientations. Additional tuned analyzers cover portions of the range of human sensitivity for spatial frequency, spatial position, temporal frequency direction of motion; and binocular disparity.

The data processed by these analyzers moves to the brain through two types of channels; a set of fast responding channels with relatively transient responses to stimuli, and a set of slower channels with relatively sustained responses to stimuli. Transient channels process the output of analyzers that are tuned for low spatial and high temporal frequency stimuli. Sustained channels process the output of analyzers that are tuned for high spatial and low temporal frequency stimuli.

3.2.3 Thresholds for the Perception of Flicker

Above certain frequencies, flickering light sources will appear as a continuous light source. The relevant frequency is called the critical fusion frequency and varies with the level of illumination. Separate flicker thresholds exist for the transient and sustained processing channels.

Transient channels are sensitive to flickering light sources with low spatial resolution; this type of stimulation appears as wide-area flicker and is most noticeable in peripheral vision. At low levels of illumination (where rod vision is used) flicker fusion occurs at frequencies of only a few Hz; as the level of illumination increases and cone vision is triggered the fusion frequency increases.

Flicker from low light level sources such as a television or movie screen typically disappears in the range of 20 to 60 Hz. As screen size increase, taking up a larger portion of the field of vision, or if screen brightness increases, the frequency for flicker fusion increases.

Sustained channels are sensitive to flickering light sources with high spatial resolution; this type of stimulation appears as small area-flicker, often associated with moving objects. In this case the flicker fusion frequency can be much higher than for wide-area flicker; this form of flicker manifests itself as strobing of the object.

An excellent example is found in the single pixel horizontal lines often used in computer graphics. These lines do not appear to flicker on a progressive scan computer display which is refreshed at rates above 60 Hz; but if the same image is presented on an interlaced video display the single pixel lines are presented in every other field (at 30 Hz) and they flicker. This is due to the fact that the persistence of the display phosphor is of shorter duration than the refresh rate; higher scanning rates (either progressive or interlaced) eliminate the flicker.

3.2.4 Tuning Electronic Imaging Systems to Match Human Visual Perception

Our improved understanding of human visual perception together with an exponential improvement in electronic image processing techniques has set the stage for the design of a new digital image architecture.

In order for a new digital image architecture to be interoperable it must deal with existing imaging technologies. This requirement can place many constraints on the design of the architecture. It is important to understand the reasons that these constraints exist to determine if the new architecture must be similarly constrained.

3.2.5 The Elimination of Flicker on Scanning Displays

Information is presented in Section 4.3 which suggests that the refresh rate of scanning CRT displays should be linked to the field of view and brightness of the display. Lower refresh rates are acceptable when the display covers a narrow field of view, as is the case with our existing analog composite video delivery systems. Lower refresh rates are also acceptable for a display with a wide field of view at low brightness levels; typically this type of display requires a viewing environment with low ambient light levels such as a theater.

As the display covers a wider field of view at higher levels of brightness, the refresh rate must be increased to eliminate wide-area flicker. If information with high frequency edges such as computer generated text and graphics, is presented on the display it must also be refreshed at a higher rate. The computer industry uses progressive scanning with refresh frequencies above 60 Hz to eliminate flicker, larger display (>=16 inches diagonal) are typically refreshed at 72 or 75 Hz.

The same requirements for the elimination of wide-area flicker are now starting to influence the development of display systems for home entertainment. At the higher end of the home entertainment market it would be desirable for displays to provide a 50 degree field of view, and be viewable at normal room ambient light levels. Such a display has resolution and refresh requirements nearly identical to a large personal computer display.

3.2.6 Constraints that Dictated the Use of Interlace Scanning Technologies in Analog Composite Video Systems

Several factors influenced the decision to use interlace scanning techniques for acquisition and display when our composite video systems were designed:

Both interlaced and progressive scanning were evaluated; interlace proved to be the best solution to reduce signal bandwidth and minimize flicker in the display.

3.3 Models for the Design of an Imaging Architecture

As the design of a new digital imaging architecture is approached, it is important to take into account of all the applications and industries that may utilize the architecture as well as the economic contributions of each in the development and purchase of the system components (see Section 6.0). Experience with analog television has amply demonstrated the value of "economies of scale".The opportunity now exist to design an open digital imaging architecture that is based on generic, inexpensive, and increasingly powerful processing elements.

The choice of a Digital Image Architecture has implications that reach far beyond the normal realm of standards-setting activities. Telecommunications, television, and computing have made major impacts on life in the 20th century -- their integration is likely to have a profound affect on the way that the world communicates, is educated, works, plays and relaxes in the next century.

3.3.1 Components of the Model - Resolution

The level of resolution perceived by a viewer is a function of the distance of the viewer from the display. Thus, to design a digital image architecture that provides constant perceived resolution across applications that involve different viewing distances (e.g., close for a computer display, further away for a conventional TV screen, still further for a large flat-panel display), the system must be scalable in terms of image resolution.

In addition to holding perceived resolution constant under varying viewing distances, it is considered desirable to provide even greater resolution in some applications, as discussed below and as implemented in current proposals for advanced television systems.

While it would be desirable to design an imaging architecture in which resolution could be scaled in a continuous fashion, a hierarchy based on a progression of related image resolution levels can provide similar benefits to system designers and simplify the process of interoperation. Section 5.2 and Section 5.3 provide a detailed analysis of the variables that affect the perceived resolution of a display and illustrates the principles of a hierarchical digital image architecture with a progression of four image resolution levels.

Throughout this report, the concept of a multi-resolution hierarchy will be discussed and refined. The Task Force has constructed a model to facilitate this discussion. It is recognized that many different sets of numbers can be used within this model. Four levels of resolution have been identified and defined; additional levels can be added to the progression, as enabling technologies allow support for higher levels of resolution. The four levels in the model are:

The concept of interoperability first appeared in the early days of television because of the need to integrate film material into the television program content. Unfortunately, film and video were in many respects incompatible. Elaborate shuttering mechanisms were developed for the telecine to make it possible to display film in the world of video; thus the concept of interoperability was born. For 525/60 the compromise was the use of 3:2 pull down, to accommodate the change from 24 to 30 fps (frames per second). The solution for 625/50 was easier - a 4% speed change, playing programs acquired at 24 fps at the television rate of 25 fps.

The evolution of electronic image acquisition systems has been driven primarily by the mass market transmission standards -- NTSC, PAL and SECAM. New applications for video such as professional and personal video systems have been enabled through the economies of scale associated with these standards.

Thus, applications which require higher resolutions than those offered by NTSC, PAL and SECAM have either been forced to bear the expense of system development and low volume manufacturing - a luxury primarily reserved for the military - or to wait for the next imaging standard to evolve. It is interesting to note that the equipment developed for the various analog HDTV systems has seen extensive use in professional applications that need the added resolution afforded by these systems.

3.3.2 Components of the Model - Acquisition, Transmission and Display

The path through which an image passes from capture to display may involve as many as five major steps as shown in Figure 3.1. These steps are discussed in detail in Section 6.2.3 of this Report. As the imagery moves from one step to the next, it may be stored at one of several quality levels:

The first two steps are

and typically require production quality storage to preserve as much of the original imagery as possible for subsequent processing. After the processing steps have been completed the imagery may be stored at a lower level of quality for release to the distributor of the imagery; this is often referred to as contribution quality storage.

Delivering the imagery to the consumer typically involves the third step,

The imagery may be encoded and stored at a lower level of quality to conform to the transport characteristics; we refer to this as distribution quality storage.

Finally, the imagery must be decoded for display, requiring

The consumer may store the imagery for viewing at another time; this also requires distribution quality storage.

Some of these steps tend to be grouped with a specific level of storage quality, as illustrated in Figure 3.1. This allows a further simplification of the model based on three major system components - ACQUISITION, TRANSMISSION, and DISPLAY..

3.3.3 A Closed Architecture Model - Analog Composite Video

The transmission standards for the existing composite video systems frequently require all of these components to operate in close synchronism. The display is synchronized with live or taped program material that feeds the transmission system. Imagery acquired at other spatial or temporal resolutions requires conversion into the spatial and temporal specifications of the transmission standard. Such an architecture is depicted in Figure 3.2.

The advent of video recording provided a degree of decoupling of acquisition from the other components, allowing program producers to create program content without real-time constraints; however, transmission and display remain tightly coupled. Recording media for program content have typically been coupled to the transmission standard to take advantage of the bandwidth reduction techniques applied in the system. The design of consumer VCRs is based on compatibility with the transmission standard; packaged media played by the VCR must therefore conform to the same standard.

While interoperability between the various analog composite video systems has had to overcome differences in frame and line rates, these systems have been remarkably extensible. The acquisition, transmission and display components and the associated services of the system have evolved continuously over the past fifty years.

With the introduction of analog component video recording and processing systems in the '80s the video industry took a major step toward completely decoupling acquisition from transmission and display. The production community soon discovered the advantages of this decoupling.

By using analog component equipment for both acquisition and production, it became possible to edit video without concern for the multi-field color framing sequences that exist in subcarrier encoded composite video systems. Producers also discovered that fewer artifacts were introduced when layering video using component vision mixers and digital video effect systems. Decoupling of acquisition and production equipment from the encoded transmission standard produced far better results than could be achieved with composite video acquisition and production equipment - and the same video recorders also produced encoded outputs for transmission of the program.

3.3.4 An Open Architecture Model - Digital Hierarchies

In the '80s, the publishing industry experienced the collision of analog and digital technologies. Today, interoperability of media in the publishing industry is the rule rather than the exception, as digital image and document processing techniques, generally categorized under the umbrella of Desktop Publishing, have replaced traditional analog techniques.

To a large extent, the transition from the analog representations of printed media - type, line art, halftones, and color separations - to their digital counterparts, has been enabled by the use of scalable hierarchies for the acquisition, transmission, and display of printed materials. The tools for acquisition and production of print media have been separated from the display hierarchy, allowing output at the desired level of resolution.

Electronic transmission is also beginning to play a major role in the publishing of documents. Compact representations of printed media using page description languages, have allowed high quality print representations to be moved efficiently through the telecommunications network using low data rate modems. Remote printing of documents on fax machines or networked printers is commonplace.

The desktop publishing metaphor has been used as a model to predict similar transitions in other media industries, most notably Desktop Video. However, the transition has not occurred at the pace that many industry pundits have predicted. This is due, in large part, to the difficult task of breaking the problem up into manageable components. That is, to create separate hierarchies for acquisition, transmission, and display of motion imagery.

Interoperability of video systems with other media is facilitated a complete decoupling of the acquisition, transmission and display into separate hierarchies for each component. Such an architecture is depicted in Figure 3.3. Scalable representations of video will be enabled by this decoupling, and technological advances in one hierarchy can take place without upsetting the apple cart in the other two.

If a hierarchical digital imaging architecture is used as the model, a Digital Advanced Television System can be implemented that is equally adept in delivering low cost solutions that conform to single hierarchy, as well as more expensive scalable solutions that support multiple points in the hierarchies.

The acquisition hierarchy can provide image capture solutions at various price/performance points that are appropriate for the application. Production systems can evolve that deal with single image formats, or multiple formats within the hierarchy. This is of particular importance to producers of program content with significant archival value. Imagery can be captured at a higher level in the acquisition hierarchy with an eye toward distribution at one or more of the lower levels of the transmission hierarchy; the archival value of the program is protected as it can be released at higher quality levels in the future as consumers purchase products at a higher level in the display hierarchy.

Viewing transmission as a hierarchy is critical to the concept of interoperability. A hierarchical imaging architecture would support a progression of image quality levels that are interoperable and extensible, and allow for incremental improvements in image quality within a single transmission standard. This requires the use of a scalable encoding structure; a core image would be encoded at the first level of the hierarchy, and enhancement information would be encoded for each of the higher resolution levels supported by the transmission standard.

A scalable encoding structure may be more difficult to design and possibly less efficient for a given quality level than an encoding designed specifically for that level. It has, however, several advantages that will accrue over time:

The economic benefits associated with scalable image encoding will be significant. The emerging consensus among experts in video compression technology is that scalability will carry a minor penalty for encoding overhead. Consider the impact on media server storage systems: a single scalable representation will make more efficient use of storage than multiple copies of the same material at different scales.

The display hierarchy allows for a variety of products to evolve at various price/performance points that are appropriate for the application. Some display systems will evolve to single performance levels while others will offer multiple levels of performance within the transmission and display hierarchies.

Scalability plays a major role in the design of decoder and display components. If the transmission system delivers a scalable payload, only that portion of the information which is required for the display system need be decoded. A small personal information system may only need the low resolution component while a high-end home entertainment system can utilize all of the resolution components.

3.4 Factors That Have the Potential to Fundamentally Change Digital Image Architectures

Real world constraints, especially with respect to cost versus performance, are the driving factors in the implementation of a digital image architecture. In determining the requirements for the architecture, the Task Force has analyzed the current market situation as well as technology and regulatory trends that may reshape the market in the next few decades.

3.4.1 A Shift in the Pricing Structure of Broadband Telecommunications

Changes in the regulatory climate are likely to cause increased competition among all networked service providers (telcos, cable, data networks, etc.), and encourage service providers to upgrade the quality and capacity of these networks.

The current pricing structure for broad band telecommunications is typically based on channel bandwidth - the purchaser uses and pays for the entire channel regardless of the amount of information moved through it. In the future, greatly increased channel bandwidth and packetized encoding schemes using headers/descriptors for packet identification, will cause a shift in pricing structure - the purchaser will pay only for the information content that moves through the channel. This concept when applied to video services has been described as pay- per-view-per-bit.

This shift in pricing structure is likely to act as a catalyst for the rapid evolution of video compression techniques and transmission standards, with an emphasis on two areas:

During the transition to wide bandwidth communications channels, data rate reduction will be the driving force as the cost per bit will be relatively high. As the cost per bit declines, the emphasis will shift to scalability. This will be due largely to the market advantages of maintaining a single data file that can be delivered to a wide range of users at different levels of the display hierarchy.

3.4.2 Programmable Decoders

Another major trend that is anticipated is the evolution from fixed single standard decoders to programmable decoders that can adapt to scalable image representations. Single standard decoders will be used primarily for devices that tap into the communications network and deal only with one type of image representation. Programmable decoders will deal with families of standards. Fax machines serve as a good example of single and multiple standard encoder/decoders. The Group 1 fax standard provided a single level of resolution; machines were expensive and their use was limited. With the addition of Group 3 fax standards, multiple levels of resolution were supported, including the older Group 1 format. Due to advances in technology, the new machines were better and cheaper, yet compatible with the existing Group 1 machines. The marketplace responded in a very positive manner.

Programmable decoders will be the key component in providing extensibility to the digital imaging architecture. Because of the diversity of image compression standards (Group 3 fax, H.261, JPEG, MPEG, DVI, etc.), these decoders will play an important role in the integration of video and high resolution imaging with desktop computer workstations. This same diversity, with the addition of a digital television standard (or standards) will lead toward the use of programmable decoders in home entertainment and information delivery systems. Essentially fixed solutions will drive the low end of the market, providing inexpensive mass market consumer products, while programmable solutions will dominate at middle and upper levels of the transmission and display hierarchies.

3.4.3 Trends in Display Technology

The use of scanning CRT display technology for certain applications is expected to decline over the next decade as LCD based direct-view displays and projection systems are perfected. LCD displays are used extensively today in portable computers, and LCD light valves for high resolution projection are showing great promise. In a light valve, the LCD is used to control the amount of light - from a flicker-free light source - that can pass through each pixel location; since the display is no longer the light source, significant improvements in brightness can be achieved.

The characteristics of LCD displays are significantly different from flying spot scanning CRT displays. Flying-spot systems must operate at refresh rates above the critical frequency for flicker fusion; display brightness is limited since the spot is the only source of illumination (most of the display is decaying at any point in time).

Every pixel in an LCD display receives constant illumination. LCDs can be characterized as having long persistence; in fact, a significant design challenge has been to provide faster pixel response to deal with full motion video. This has been accomplished through the use of a transistor at each pixel location (an active matrix display), providing rapid response for pixel replenishment.

The nature of the active matrix circuit also allows a pixel value to be held for at least one second without replenishment, giving the display characteristics similar to a frame buffer. Direct addressing of each pixel location would make it possible to update only those pixels which change from one refresh period to the next. Transmission systems that utilize digital compression techniques to eliminate interframe image redundancies may take advantage of these aspects of LCD displays to implement conditional replenishment.

3.4.4 Conditional Replenishment

A significant portion of the data rate reduction achieved by digital image compression techniques deals with the elimination of interframe redundancies. In essence, much of the complexity, and hence the cost, of these encoding systems involves the processing required to analyze motion image data streams to determine which pixels have changed between temporal samples.

Over the next 10 to 15 years image acquisition and display technologies are likely to move to conditional replenishment. Image acquisition systems may evolve with on-board digital processing to implement conditional image acquisition. These cameras will be programmable, offering several advantages over scanning cameras that continuously update the entire image raster, including the ability to:

Future display technologies are likely to evolve around direct view displays (possibly LCD) offered in different pixel densities. Direct addressing of LCD displays will allow the use of conditional refreshment of only those pixels that change from one refresh period to the next; the display itself may become the frame buffer, allowing portions of the image to be updated at different temporal rates. Or, combined with an appropriate multi-ported frame buffer design, such a display could support multiple temporal refresh rates simultaneously for different image streams.

4.0 Critical Issues

4.1 Introduction

The Task Force has identified seven issues that are considered critical to the achievement of the objectives. Many of these issues are, by their nature, complex.

Backward compatibility to existing systems and extensibility to future systems present many technical challenges. The greatest challenge lies in preserving the value of existing infrastructures while enabling an orderly transition to the new architecture. For example, immense investments have been made in the aquisition and transmission infrastructures of our existing NTSC, PAL and SECAM television systems. Likewise, billions of consumers have invested in receivers and video recorders that support these systems. It is equally critical that investment in the vast archives of information and entertainment programming that exist today on film and video be protected, and that the new architecture unlock the economic potential of these archives.

In deliberating on these critical issues, every effort has been made to balance the interests arising from those investments with the future benefit to all of a single global standard. These deliberations have also taken into considerations the installed based of computer, medical, engineering and scientific imaging systems, and the diverse applications for still imaging in electronic publishing, visual databases and communications. Existing systems that demonstrate interoperability and extensibility - including some which have in fact been extended - were considered. Examples include the French Minitel system and the family of international facsimile standards.

The seven critical issues are:

4.2 The Establishment of Scalable and Interoperable Hierarchies for Basic Image Parameters

An ideal digital image architecture would allow the following image parameters to be independently varied, over a range of appropriate values:

While this independence may be technically feasible within the fifty year life span desired for the first digital imaging architecture, it does not appear to be practical for immediate implementation, nor is it required. The choice of an appropriate hierarchy for each these parameters can provide adequate degrees of freedom for system design, while facilitating affordable, high quality transcoding between the levels in each hierarchy.

Scalable and interoperable hierarchies offer many benefits when communications channel issues are considered. Such an approach promotes effective utilization of existing communications channels and the development of new broad band communication services. The lower levels of the hierarchy provide solutions for the capacity constrained channels that exist today. The introduction of new broad band communications services will enable the use of higher data rates to support the improved performance available at higher levels in each hierachy.

A digital image architecture that provides interoperability across applications with different spatial resolution requirements must be scalable in terms of resolution as discussed in Section 3.3. Interoperability also requires a family of related image acquisition and display rates. The greatest benefit, in terms of cost and simplicity, is gained when the display operates at the same rate as, or an integer multiple of the image acquisition rate. Though more expensive to implement, the greatest performance benefit is gained when motion compensation techniques are used in encoders/decoders to create in-between frames for display. Section 5.4 discusses the requirements for such a family.

To facilitate this hierarchical approach to a digital image architecture a scalable approach to image coding is required. Furthermore, improved techniques for video compression are likely to be enabled by the geometric progression in computational hardware. The design of the architecture must make provisions for this progression. Section 5.5 discusses the use of scalable coding algorithms.

4.3 The Establishment of an Appropriate Relationship Between Image Acquisition and Display Refresh Rates

In early discussions about the use of digital codings for HDTV systems, it became clear that receivers would likely need one or more frame stores to implement image decoding. This prompted the idea that image acquisition rates could be decoupled from display refresh rates - the display could be refreshed at a rate that is an integer multiple of the acquisition rate. For this reason the questions of image acquisition rates and display refresh rates will be considered separately.

No topic generated as much discussion in the Task Force as image acquisition and display refresh rates. This is due in part to the diversity of rates that exist in the standards and resulting practices within each of the affected industries. The issue is further complicated by the evolution of television down parallel paths with respect to field rates. Their harmonization will require solutions that lie in the realm of digital technology as well as the realm of politics and negotiation.

The choice of an image acquisition rate is a tradeoff between motion rendition and the resulting data rate. The following considerations are important in establishing a family of acquisition rates.

There are many factors affecting the choice of a display refresh rate including:

Refresh rate will be determined by the above criteria and price/performance requirements established by market factors.

Experience has shown that for wide-screen CRT displays of high brightness, a refresh rate in the region of 72 to 75 Hz is required to achieve tolerable levels of wide-area flicker (see Section 3.2.5). In some situations refresh rates in excess of 100 Hz may be desirable. Receivers which operate at 100 Hz (double the normal 50 Hz interlaced scan rate) are being introduced in the 50 Hz market; rate doubling receivers operating at 120 Hz are also being developed for the 60 Hz market.

The relationship of display refresh and image update rates shoul be based on a progression that permits non-interpolative transformations between the acquisition and display rates in the new architecture (i.e., display at integer multiples of the image update rate). As an example, theatrical display of film is usually double or triple shuttered to minimize wide-area flicker of the display.

Further research into the choice of a single family of acquisition rates and display rates is required. An appropriate interoperable family should include a 24 or 25 fps image acquisition rate which would enable a 72 or 75 Hz display refresh rate. This is the subject of further discussion in Section 5.0, Section 7.0 and 4.4 The Use of Square Sampling Grids (Square Pixels) The computer graphics, image processing, and publishing industries have adopted the use of geometrically square pixel sampling grids (frequently simply referred to as square pixels). The use of square pixels facilitates:

Early on, the computer graphics industry sought ways to insulate applications from variations in display technology. Support for different pixel configurations required run-time transformation of all graphical objects. Even then, applications rarely looked the same from display to display because different pixel configurations caused a variety of artifacts. These stopgap measures constrained functionality, reduced performance, an added cost to equipment and services. Ultimately, this approach failed.

Instead, computer graphics gravitated towards a common display technology based on square pixels. This simplified system design, which led to lower cost and better performance, enabled equipment and services to be used as commodities across a broad set of industries. Today the computer industry is a major consumer of displays, second only to consumer television receivers.

The use of a common pixel geometry eliminates the need for interpolative resampling when sharing imagery among all users. Resampling has two costs:

Thus the adoption of a common sampling grids is a key issue for discussion and resolution in the SMPTE work towards the specification of a digital image architecture.

4.5 The Establishment of Appropriate Representations for Colorimetry, Dynamic Range and Transfer Characteristics

The concepts of interoperability, scalability and extensibility apply not only to the sampling of the image but equally to the expression of its brightness and colorimetry. The digital image architecture must deal appropriately with dynamic range and colorimetry requirements of the acquisition (including processing), transmission and display modules of a system. During the acquisition and processing of imagery (for example post-production) image data that may not be required by the human visual system or reproducible on a given display may be required by processing hardware for optimal results. Similarly, the architecture must accommodate image exchanges between systems having differing dynamic range and colorimetry characteristics. The essential issues are summarized in the sub-sections which follow.

4.5.1 Extensibility

Existing image systems can reproduce only a limited range of the colors visible in the real world, often restricted to those corresponding to illuminated objects and the specific needs of the application. The colorimetry of television is currently confined to that of the display device. Figure 4.1, is a color space, within which are illustrated a red, green, blue (RGB) gamut of additive primary colors, and a typical yellow , cyan, magenta (YCM) gamut of subtractive colors. Also shown is the hue and saturation representation: saturation is the radial distance from a specified white point; hue is the associated angle. Hue and saturation vectors are shown pointing to the RGB and YCM color gamuts , as well as one vector that extends beyond both. It can be seen that this representation can be used to represent any visible color. This color space is application and device independent.

In the future it may be possible and desirable to extend the colorimetry representation to include a wider range of colors, possibly even including those of self-luminous objects, as one example. A close examination of this issue is needed to establish the range of colors to be represented within the colorimetry of the digital image architecture.

A similar situation to that of colorimetry exists for the representation of dynamic range transfer function. Current systems are individually optimized for the current technology and application and are not easily amenable to an increase in dynamic range. Mechanisms to effectively handle a much wider dynamic range need to be identified.

4.5.2 Scalability

To cover the intended range of application, it is necessary that the color and dynamic range representations be capable of being scaled, preferably independently. For instance, the display of an image having a wide color gamut at the source must produce acceptable color on a display of limited color capability. The reverse situation is also true. Similarly, the display of an image of high dynamic range should not loose essential information when viewed on a display of low dynamic range. The representation must accommodate these requirements efficiently.

The situation is somewhat similar to that of motion picture film in which the latitude of the negative film enables exposure and color adjustment after the image capture and the S-curve of the film characteristic provides effective compression of the highlights and dark regions. Similar provisions may be required in digital image systems to provide reasonable representations for both small and large numbers of bits. A further consideration may concern the optimal distribution of any necessary compression/expansion in respect of overall image quality.

4.5.3 Interoperability.

Interoperability demands that the chosen colorimetric representation and the dynamic range representation be device independent for current and future devices. In this fashion, devices supporting differing colorimetries and dynamic ranges can be supported.

It is also important that images of differing colorimetry and dynamic range at the acquisition device should be able to be combined effectively into a single image, when appropriately scaled.

The color space and dynamic range representations that could meet these objectives require extensive consideration. Section 7.7 includes a number of questions that should be considered in the analysis of these and other colorimetry issues.

4.6 The Use of Coherent Image Sampling (Progressive Scanning)

Historically, interlace has been used to achieve a 2:1 reduction in bandwidth requirements (i.e., data rate), and to eliminate wide-area flicker on scanning CRT displays. The use of progressive scanning is nearly universal in computer display applications, and is employed in some high quality video presentations.

In Section 3.2.5 it was established that higher scanning rates are required with displays that cover a wider field of view and/or operate at higher levels of brightness than today's television systems. Decoupling the refresh rate of the display from the image update rate provides a mechanism to deal with wide-area flicker - this is discussed in 4.7 Identification of the Characteristics of a Digital Image Stream (Header/Descriptors) A fundamental prerequisite for interoperability in digital systems is a mechanism for identifying and describing digital image data. For this information to be shared, decoders must be capable of identifying and conforming to the incoming data. Even simple decoders - those that only recognize a single standard - must identify data streams which they can decode. This is one of the primary functions of the header. Decoders must also ignore unrecognized data, to allow for extensions to the data stream.

Descriptors provide application oriented information, such as image and coding parameters, processing history, identification of program content, copyright, and scrambling. They also enable extensibility; the descriptor may also contain the coding algorithm or language representation necessary to interpret the encapsulated data. This provides a mechanism whereby expert groups can create and standardize the transmission of messages to meet their needs.

Descriptors may be used to identify and describe data at different levels of an image hierarchy, thus allowing a display system to decode only that part of a stream necessary for its function or capability. Descriptors might also contain information about the preferred display characteristics for imagery.

Thus information such as the colorimetry of the original acquisition system, and the transfer characteristics of the process used to move images from one media to another, can be included with the data. Decoders would use this information to optimize display of the image.

The SMPTE Task Force on Header/Descriptor in their Final Report dated January 3, 1992, and approved by the SMPTE Standards Committee on February 6, outlined the criteria for the use of Header/Descriptors. Work is now progressing on the development of proposed SMPTE Standards, Recommended Practices and Engineering Guidelines.

4.8 Compatibility with Current Television and Motion Picture Standards

The installed base of NTSC, PAL and SECAM equipment within the program production community, together with massive consumer investment in compatible receivers and VCRs, must be supported in the transition to a digital image architecture. Of even greater importance is the requirement to preserve the value of the archives of programs that have been created for mass market distribution using these systems and to exploit these resources to the greatest extent possible in the future.

This is by far the most critical issue of all, so much so that its impact is clear in the discussion of many of the previous issues. Only the last of them, the use of headers/descriptors, is without precedent in existing entertainment industry practice. It is precisely where a dichotomy exists in current practice that the greatest controversy arises - on the issue of temporal rates.

The convergence in being digital may provide the solutions which will resolve the temporal rate issue; convergence around the common language of digital coding, the progression in CPU performance, and the ability to design inexpensive modular interfaces in the form of mass produced microchips.

It is likely that a number of solution will evolve to facilitate interoperability between the existing world of film and analog television, and the new digital image architecture. These solutions should provide a variety of price/performance options appropriate to the applications requirements.

5.0 An Example of a Hierarchical Digital Image Architecture

This section suggests a technology transparent hierarchy - one compatible with the present and extensible for the future.

To illustrate the model, specific numbers have been chosen that take advantage of the mathematical relationships discussed in Section 4.0, as well as the architectures of digital memory and processing components. These numbers are not intended as the basis for a standard, but rather, provide a starting point, from which the validity of the architectural concepts can be verified. Further work is required for verification of the model and determination of the exact numbers, upon which a standard can be based (see Section 7.0).

The following parameters of a hierarchical digital image architecture are discussed in this section:

5.1 Open Architecture

In Section 3.0 it was indicated that the opportunity exists to design an open digital image architecture based on generic, inexpensive, and increasingly powerful digital components.

For a digital image architecture to be cast as an open system, two steps are required:

There must be a systems engineering of the standards so that the modules work together. There are two basic interface definitions to be publicly standardized:

Some of the parameters that should be part of this communications service definition include:

This careful modularization encapsulates other issues, including the critical issues discussed in Section 4.0, so they can be addressed one by one.

It can be argued that there is no need for rigid architectural standards in a digital world; that programmability in the transmission and display hierarchies provides a sufficient basis for interoperability. Perhaps some day this will be true. If the goal of longevity for the first digital image architecture is achieved, it is likely that the designers of the next imaging architecture will be less constrained than we are today.

The first digital image architecture however, must provide a bridge from the closed systems of the past to the open systems of the future. The fundamental structure of the digital building blocks and economies of scale associated with standardization suggest that the organizations charged with establishing these standards work in harmony.

5.2 Designing Display Systems to Deal with Multiple Spatial Resolution Requirements

The perceived resolution of a display is determined primarily by the viewing distance and the visual acuity of the observer. Visual acuity is often determined using sets of alternating black and white lines of equal width. One black/white line pair represents one cycle. The number of cycles that can be resolved across one degree of the eye's viewing field is typically used as a measure of human visual acuity, and is stated in cycles (line pairs) per degree. Under some conditions, with high contrast line pairs, human visual acuity extend beyond 40 cycles per degree; approximately 22 cycles per degree is perceived as a sharp image.

If the resolution of a display is held constant and the viewing distance is a variable, the resolution perceived by the viewer - measured in cycles per degree - will increase as the viewer moves away from the display. Therefore, all displays can be considered to be high resolution if viewed from an appropriate distance.

At a distance the varies with the visual acuity of each individual, the actual resolution of the display equals the limit of that viewer's ability to resolve image detail. Beyond this viewing distance additional image detail cannot be perceived; that is, the display has more resolution than is required for this viewer and set of viewing conditions.

In some cases excess resolution may be desirable. For example, the operator of a personal computer can typically reduce the viewing distance to a high resolution desktop display by one-half, simply by leaning forward, thus taking advantage of additional resolution improves enough to be significant, while moving 15 inches in a movie theatre would have little effect on perceived resolution.

The NTSC transmission standard was designed to provide a resolution of approximately 21 cycles per degree over a viewing field of just under 11 degrees. Display size can be variable in today's television, ranging from a diagonal of a few inches (a personal display) to more than 30 feet (direct view displays in stadiums and projection displays in controlled lighting environments). These displays differ only in the size of their pixels. At the appropriate viewing distance, the perceived resolution of the personal display and the stadium display will equal the design goal of 21 cycles per degree, and both displays will cover 11 degrees of the observer's field of view.

Many display applications require higher levels of perceived resolution. To increase the level of perceived resolution, while holding viewing distance constant, additional samples of the same image must be added, increasing pixel density. To cover a wider field of view, as in wide-screen displays, holding the same viewing distance and perceived resolution, new information, at the same pixel density, must be added to extend the picture.

5.3 Defining a Spatial Resolution Hierarchy

Section 3.2 identified the need for a variety of image resolutions to deal with specific imaging requirements. These ranges can now be further defined in terms of field of view and resolution in cycles per degree.

With personal, home entertainment and theatre displays, the viewer can vary the distance from the display, and thus vary the perceived resolution, over a significant range (see Figure 5.1). Taking into account the variations in acuity in the population, and variations in viewing distance for each application, it is common practice to design a display system for the average viewing conditions in each application. The overlaps in cycles per degree between low, normal and high resolutions are shown in the table to account for these variations.

Resolution       Cycles per Degree

Low                   1 - 15
Normal               10 - 25
High                 20 - 30
Ultra High           30 - 40
A special case exists for head mounted displays which provide a fixed viewing distance; here the display manufacturer must select the level of resolution appropriate for the application and then design for a specific perceived resolution.

Using these guidelines, a high resolution display designed for a 35 degree field of view would require about two thousand pixels per line at 30 cycles per degree. In a desktop computing application where the viewer is 30 inches from the display, the length of an active line (display width) would be about 19 inches. In an entertainment application, such as a consumer television receiver viewed from a distance of 108 inches (9 feet), the length of an active line would be about 68 inches.

These examples are illustrated in Figure 5.2. In this figure the principles described in this section are used to illustrate the relationships between the four resolution levels of the model hierarchy and a variety of display applications. The numbers, especially as they relate to image size (in pixels) are entirely relative; they serve only as examples of the pixel count required, at average viewing distances and fields of view, to achieve the specified perceived resolution.

It is important to note that seemingly diverse applications such as personal computer and home entertainment displays have similar resolution requirements as the size of the home entertainment display increases beyond the narrow field of view of today's television receivers. It is also important to note that direct view CRT displays (which are currently limited to around 40 inch diagonals) require resolution in the normal range for home entertainment applications.

FIGURE 5.1 - Relative Tile Resolutions
- These groups of letters represent the relative resolution for each level of the hierarchy from Level 1 (top) to Level 4. To better understand the practical application in displays, place this figure where it can be viewed from a distance of between 30 inches and 15 feet. level 4 should be sharp at 30 inches; as you move away each level lower in the hierarchy will become sharp.

5.3.1 Key Concepts of the Model

The example spatial resolution hierarchy is designed around a few basic concepts:

The hierarchy progression is based on the use of integer values related by powers of two. Essentially, at each higher level of the hierarchy, resolution doubles (e.g. 1, 2, 4, 8, etc.); subsets of the lowest level can be derived with similarly (1/2, 1/4, 1/8, etc).

It is noteworthy that such sequences also appear in the computer processor and memory component industry. This approach takes full advantage of the generic building blocks that are the driving force in the transition to a digital world.

In order to provide continuity between the various resolution levels of the hierarchy the model is based on the concept of an image tile. For the purposed of this discussion, a tile can be considered to be a constant portion of an image, representing the same part of the image regardless of the resolution level or image size. Thus, at each higher level in the hierarchy, the resolution within a tile doubles in each axis. This is illustrated in Figure 5.3.

The power of two progression may now be applied to determine the resolution, in pixels, for each level in the hierarchy.


  Resolution                       Pixels in   
Level  Name       in Cycles      Pixels in          32 x 32 
  per Degree     One Tile        Tile Superset


1    Low         1 - 15        16 x 16           512 x 512
2    Normal     10 - 25        32 x 32          1024 x 1024
3    High       20 - 30        64 x 64          2048 x 2048
4    Ultra      30 - 40       128 x 128         4096 x 4096
High
In this model a tile represents an area equal to 1/32nd of the image at any level of the hierachy. Thus each level consists of a 32 x 32 set of tiles (see Figure 5.3). The selection of this fraction for a tile is arbitrary; it was chosen because it is a convenient building block - integer multiples can be used to construct displays at all of the aspect ratios and spatial resolutions discussed in the model.

5.3.2 Construction of Displays from Tiles of the Appropriate Resolution

The table in Figure 5.3 provides a matrix of display aspect ratios and resolutions that can be derived from the full set of 32 x 32 tiles at each level. Since the tile size is a constant, each column represents a constant size display at four perceived levels of resolution.

The diagram in Figure 5.3 establishes several important relationships that provide a bridge to the past and illustrate how interoperability can be achieved:

The tile concept can similarly be applied to the manufacture of displays. In this case, a physical display tile would correspond to a conceptual tile and would have different physical sizes for different size displays and different pixel densities for different resolution requirements. Similarly, displays of different aspect ratios could be constructed by the selection of the appropriate conceptual tiles as shown in Figure 5.3.

Thus, using tiles and only four resolution levels, it is possible to construct a display for virtually every possible application; furthermore this display can also be used to show imagery from other levels of the hierarchy. This is especially practical if a scalable coding architecture is implemented that conforms to the same resolution progression.

5.4 A Family of Related Image Acquisition Rates and Display Refresh Rates

A family of image acquisition and display refresh rates should be based on a progression that permits non-interpolative transformations between the acquisition and display rates. This is easily implemented if the acquisition and display rates are the same, or if the display refresh rate is an integer multiple (or fraction) of the image acquisition rate.

Since significant archives of high resolution program material exist on film, which was acquired at 24 or 25 fps, one of these rates should be included in the progression. A progression based on integer multiples of 12 would include 12, 24, 36, 48, 60, 72, 96, 120 Hz, etc. A progression based on integer multiples of 12,5 would include 12,5, 25, 50, 75, 100, 125 Hz, etc. These progressions might also include integer fractions of 12 or 12,5 (e.g., 1/2 or 1/4 of the base frame rate for applications such as videoconferencing and searching of video databases.)

It has been common practice in Europe to display 24 fps film at 25 fps for compatibility with PAL and SECAM.; this results in a 4% speed increase. Many European programs produced for television distribution are acquired at 25 fps; if the family of rates is based on 24 fps, these programs would be played 4% slower. As indicated in Section 4.8, further research is required to determine the impact of choosing one of these rates, on those industries that utilize film for image acquisition.

Ideally, compatibility with existing electronic imaging systems should be accommodated in the design of the standard modules that will interface these systems with the digital image architecture. By design, this would place the burden of compatibility on the systems that are being replaced rather than products that conform to the new architecture; thus the future will not be constrained by today's limitations.

In the process of developing the existing analog and digital high resolution television systems, the designers of these systems have demonstrated the practicality of such a modular approach to interoperability. A variety of translation devices have been demonstrated that allow interoperation between PAL, NTSC, HD-MAC and MUSE. The interface modules that will be required to transform the signals from these systems (especially NTSC and PAL) into the new architecture, offer the potential for large volumes. It is likely that the market for these modules will be characterized by intense competition, leading to a range of solutions at various price/performance levels.

In the near term the choice of a family of rates based on 12 or 12,5 Hz would provide optimally low cost and high performance, for both advanced television and computer uses, as well as providing global interoperability. In the longer term decoupling of acquisition, transmission and display is likely to lead to entirely new approaches to pixel replenishment that may render the current concept of image acquisition rates and display refresh rates meaningless.

5.5 Scalable Coding Algorithms

Scalable image decomposition offers the ability to produce image data in packages that can be combined to produce images at a variety of spatial and temporal resolutions. Decoding and displaying the lowest frequency image packets would produce an image at the first level of the hierarchy. Additional packets (encoded with spatial and temporal differences) would be decoded to produce images at higher levels of the hierarchy.

This approach enables extensibility. For example, the coding of low resolution imagery might remain unchanged to provide compatibility with existing decoders, while new coding methods, made possible by the geometric progression in computational hardware, can be introduced to support more advanced imagery. Increasingly powerful (and affordable) programmable decoders can provide compatibility with the standards that form the foundation of the digital image architecture, and the additional processing power required for future enhancements to the architecture.

6.0 Industries and Applications Considered

6.1 Industries and Applications

Industries are categorized by current market segments. It is important to keep in mind that convergences among existing industries will likely occur (e.g., computers and consumer electronics; audio, video, and datacomm), and as new opportunities to provide value products and services emerge, entirely new industry segments will undoubtedly come forth.

It is becoming difficult to draw the line, even today, between consumer electronics and computers. Today's video game machines, already in millions of homes, are marketed as consumer accessories to televisions, but are in fact, more computationally competent than personal computers of only a few years ago. Similarly, personal computers are being marketed to the home market through traditional consumer electronics channels.

Traditional business factors should always be considered. These include equipment replacement costs, amortization, benefits, competition, market needs, and access to material.

Successful industry participants will both pay close attention to emerging trends and help to bring them about. Sometimes, deep pockets may be required to create a market. (It took years of major losses in both equipment and programming efforts before color television became profitable.) In contrast, agreement on a common architecture across a wide range of industries and applications would spread the costs and encourage early adoption.

The groupings used for this report help to relate application requirements to industries. It is well understood that there is already much overlap between industry groups and applications.

The industry groupings are as follows:

6.1.1 Entertainment Providers

Entertainment provider fields include programming, animation, games (personal and arcade), broadcasting, cinematography, post-production, theatrical presentation, and pre-recorded media.

The technologies used in these fields are highly dependent on downstream profits. It can be difficult to justify large investments (e.g., an HDTV production facility) in new technologies that can only be utilized by a small portion of their market. Smaller investments that require minimal infrastructure changes (e.g., MTS stereo, VHS-HQ) can be more easily justified, particularly when end-users can benefit with existing equipment or rapid upgrade is anticipated. Backward compatibility and extensibility are key issues here and can only be successfully violated when there are substantive benefits to the end user (e.g., audio compact disc).

Revenue streams can often be anticipated to flow well beyond the initial release of the product. Residuals from syndication, rentals, and sales require that providers anticipate future trends in end-user viewing equipment capabilities. This is one reason why most prime time television is shot on 35mm film and not video.

6.1.2 Distribution and Communications

The distribution and communication industries that will be affected by digital image systems include telephone, television broadcasting and cable TV, utilities, video conferencing, electronic mail (including text, data, image, animation, video, and sound), and mobile communications. Carrier channels that will play a yet-to-be-determined role in this process include optical fibers, broad and spot-beam satellites, microwave, cellular, conventional VHF/UHF terrestrial broadcast, broad band coaxial cable, and local and wide area networks. Also impacted will be video tape, video disc, game, and general software distribution.

There is some effort to establish a video dialtone similar, in concept, to today's voice telephone dialtone. As communication networks increase bandwidth, and compression technologies improve, an increased use of remote real-time visual communications can be expected.

These same advancements also facilitate rapid downloading of video information from media servers; At a 100:1 compression ratio, the data for a typical motion picture could be transmitted in a few minutes over a video capable network.

Because of the universal proliferation and conversion standards for the telephone, it is likely that we will soon see extensions of current fax standards including: voice fax (voice mail), high resolution color image fax, and video fax (video mail). One of the driving forces behind the development of the JPEG image compression standard was the need for an efficient data reduction technique for the transmission of still images.

The telecommunications industry is well down the road in the establishment of digital imaging standards. The CCITT, which controls fax standards worked with the IEEE on the JPEG standard and the videoconferencing standard, know as P.64 or H.261. These groups are also responsible for the MPEG family of moving picture standards. JPEG and MPEG I and P.64 form the basis for the first generation of image telecommunications products that are already starting to reach the market.

These standards were designed with a high degree of flexibility to deal with a variety of imaging applications; they have served as excellent examples for the Task Force in the area of interoperability, and scalability. Currently the MPEG group is working on extensibility; MPEG II is targeted for the delivery of higher quality motion image data streams in the range from two to forty megabits per second. The MPEG working group is investigating scalability as a requirement for this extension of MPEG. It would be beneficial for these new standards to relate harmoniously to other digital imaging architectures.

The merging of both broadcast and interactive voice, image (including graphics and video), text, and data across diverse transport media will create challenges in properly matching the information with the delivery mechanism. Current efforts to implement interactive television, for example, use differing transmission media for each direction (e.g., broadband in; telephone or cellular radio out).

Factors such as existing infrastructure, projected time and cost to deploy, bandwidth cost, regulatory issues, nature of the signal, target viewer, compression, error sources, localization, security, latency, etc., need to be considered.

The communications infrastructure deployed for the entertainment market could provide a profound leverage for the information domain. For example, a broad consumer demand for access to high bandwidth entertainment (and other) services could accelerate the national installation of fiber-optic cables. Once in place, these high bandwidth networks could also be used as high performance links to super-computers and very large data bases, and broadly distribute real-time business, engineering, and scientific data.

While installation of fiber-optic cable to a major user base can take many years, new or existing satellites can cover huge population areas very quickly. A variation of direct broadcast satellite (DBS) transmission is spot-beam satellite technology. In this approach, as few as three satellites could be used to provide localized high quality (HDTV) signals to small inexpensive receiving devices in as many as 150 geographic areas within a country the size of the continental United States.

6.1.3 Professional Equipment Manufacturers

Equipment manufacturers who produce studio, production, storage and distribution, and test & measurement equipment will enjoy opportunities to provide their customers with new products and services that can be useful across a range of industries. Digital, extensible, scalable image architectures can provide high value per dollar and increased economies of scale.

The computer, medical, and graphics industries could similarly benefit from harmonious formats that would allow them to produce image generating, manipulating, managing, storing, and viewing applications and devices at reduced cost and increased interoperability.

Some specific industrial application areas include security equipment for surveillance and identification and product and process inspection.

6.1.4 Consumer Electronics Manufacturers

The introduction of digital technologies into consumer products opens the way to new and improved services and capabilities. As the consumer market increasingly demands higher image quality for both work at home (e.g., personal computers) and entertainment (e.g., televisions and video games), there will continue to be incentives to push the technologies that will bring a better picture to the consumer.

This will create opportunities in the receiving devices, the electronic components that go into them (e.g., semiconductors, light sources and modulators) and the subsystems (e.g., displays, tuners, and signal processors). The likely emergence of new product categories can both heighten and personalize the entertainment experience.

Ancillary devices (e.g., tape and disc recorder/players, camcorders, editing, processing, sound systems, printers, scanners, interactive peripherals) will be additional sources of added value products.

It is likely that computer control technologies will play an ever increasing role in home entertainment and information systems. The integration of all of the equipment listed in the preceding paragraph in the home entertainment environment has proven to be a major problem - and a significant opportunity. We have seen programmable remote control devices evolve to replace the profusion of separate infrared controllers (TV tuner, cable tuner, VCR, laserdisc, audio CD, radio tuner, etc.). The integration of the graphical user interface from the world of desktop computing with the home entertainment/information system has begun.

Collaborative cross-industry efforts will merge computers into home entertainment networks, dealing with the issues of component integration, connection to multiple sources of entertainment and information, user interface, and "user friendly" programming of the system. Various flavors of "personal computers" in the home will be able to connect to this network as well as intelligent appliances and remote control devices. Inexpensive networkable cameras will allow remote visual monitoring; the front door; the baby's room; etc.

6.1.5 Computers and Information

Human vision provides the highest bandwidth information interface to the machine world. Computer technology can serve as an effective enabling tool for image information creation, capture, processing, storage and archiving, access, transmission, and presentation. While computer assisted information in the 1980s largely focused on text, data, and simple graphics, rapid changes are taking place to support other media (audio, image, animation, video, simulations, etc.). This places increased demands on computer performance and human interface to handle the significantly higher data content in these media.

To provide specific types of information to users, new classes of specially tuned information appliances will likely develop. These appliances will rely on information providers to collect, generate, and organize information. In the education market, for example, an information appliance might be tuned toward providing everything a student needs to progress through a particular class. Besides basic course content, texts, lecture notes, assignments, etc., it could make extensive use of imagery to provide interactive multimedia tutorials, remedial help, lab simulations, extensive reference material, electronic messaging, and smart links to classmates.

In the information age, a critical challenge is the productive management of the overwhelming amount of information produced each year. Unfortunately, images and video tend to make this problem even greater. While database search engines deal reasonably well with keyword searches and inverted indexes on textual data, corresponding tools for other media have tremendous opportunities for improvement.

Museums and libraries could use electronic file systems to catalog and view very high resolution images of the masters. Sculptures and other three dimensional objects could be shown on stereographic or holographic displays, or printed on very high quality large format printers.

The role of the artist and graphics designer has changed dramatically as the quality and flexibility of the "electronic canvas" has come to emulate the various forms of traditional media. Just as the camcorder has allowed many budding cinematographers to explore their art, high resolution drawing tools with interactive training are revolutionizing electronic publishing and winning over graphic artists. Many artists are expanding into new markets such as videographics and animation from this electronic base.

Traditional forms of printing and publishing information delivery will continue to exist alongside of newer mediums. Electronic billboards could change messages by day of week or time of day. Electronic books, magazines, catalogs, and advertisements can integrate interactive video and other media to tell a story, make a point, or sell a product. They can also elicit information from the user that can provide useful information to the publisher (e.g., "hard to understand this concept," "would like product in green").

6.1.6 Education

One strategy for promoting the use of digital image technologies in education is to leverage high volume consumer products. There is now a real opportunity to leverage scalable, interoperable, extensible consumer products into the classroom and other learning environments (e.g., lab, home, library, tutoring, group study).

Institutional training represents the high end of the educational market. An economic return on investment can often justify the use of expensive technology to maximize training "productivity" since the employee students are being paid wages while not working. Increased use of sophisticated interactive multimedia tools developed and used in these environments could find derivative use in public classrooms and the home.

6.1.7 Engineering and Science

Engineers and scientists have traditionally used the high end of graphics and imaging systems for data visualization, design, simulation and scientific visualization. This will likely continue as new uses expand into such areas as microscopy and astronomy.

This community has often utilized high-end versions of consumer technologies (e.g., TV CRT/Workstation CRT). Their role in leading versus leveraging the next generation of imaging systems is not clear. The existence of a proper digital image architecture will reduce barriers across applications, platforms, and markets.

6.1.8 Healthcare

Healthcare represents a growing cost concern for most industrialized societies. While a digital image architecture may not directly reduce costs, the judicious use of images and video can provide an improved cost/benefit ratio for physician training, medical research, and general patient care.

High resolution imaging can be useful in radiology, microscopy, patient monitoring (especially during surgery), and consultation with specialists in a remote location.

Image requirements can be very stringent. Doctors sometimes use a magnifying glass to look for subtle changes in gray level on an x-ray. Image fidelity is critical.

Training simulators, perhaps utilizing virtual reality techniques, can provide medical students with improved environments for learning over classroom and cadaver procedures.

Although the spatial resolutions and signal integrity requirements may exceed many other applications, the healthcare community would like to benefit from harmonization with other digital image architectures.

6.1.9 Military & Aerospace

Traditionally, the military and aerospace industries have driven the high end of the imaging market with severe mission-critical requirements. While in the past, cost was a concern second to functionality, new economics dictate that more effective leverage be made of existing standards, technologies, and products wherever possible.

Typical applications include radar and other tracking, surveillance, flight simulation, general training simulators, mission/situation control rooms, instrument control panels in aircraft/vehicles, satellite imaging, virtual reality, telepresence, and cartography.

Increasing emphasis is being placed on dual use technologies. The community learning network is one example of using advanced imaging technologies for both government and civilian education.

6.2 Application Requirements

Throughout the digital image path, there are specific and interrelated requirements that should be understood for any particular application or family of applications. A standards architect or application developer should be aware of not only current needs and projected needs, but also past infrastructures, potentially related application areas, and long term technology trends.

6.2.1 Latency

For any application, tolerance to transmission path latency should be considered. Small latency times can impose increased compression and transport costs. Some examples of acceptable latencies follow:

6.2.2 Synchronization with Other Media

Traditionally, motion pictures and video have been concerned primarily with synchronization between image and sound track. Future imaging architectures should consider synchronization requirements with other media and general control inputs and outputs. Examples of other media used for special applications might involve any of the other three human senses (touch, smell, taste), extensions of the first two (sight and sound), as well as physiological inputs (e.g., EKG, EEG, EMG, respiration, perspiration, salivation, biochemical levels).

Both input (i.e. response time) and output synchronization should be considered. Acceptable synchronization can vary with image content. For example, voice should have excellent synchronization when an actor's lips are seen, but less synchronization is needed when the actor is off camera. Background music can accommodate even less synchronization as long as it is not keyed to action or scene transitions.

Motion inputs (e.g., physical controls, gestures, head or eye tracking, facial expressions), and outputs (vibration, g-forces, wind) can also have varying needs for synchronization.

6.2.3 The Digital Image Path

Images flow through (and may be stored in) the following five processes:

Each of these represents a range of opportunities for industries and applications.

Image capture/acquisition/creation includes:

Factors affecting image capture include available light, subject motion, spatial resolution, colorimetry, and cost.

Good light sensitivity is an important factor in available light location shots. In studio productions, sensitive cameras can reduce equipment and electrical requirements for lighting and resultant air conditioning.

In scenes with high subject motion like sporting events, an image sensor configured with quick response to motion is important. A fast scan rate is the overriding factor here, particularly when minimal single frame blur is important (for slow motion or single frame playback). Current technology favors interlaced image sensors for this type of application, however post-processing or future technical advances in image sensors can be applied to eliminate interlace artifacts before the signal goes very far down the image path.

The scan rate of an image sensor should also relate compatibly to the frame rate of its source material (e.g., movie film), and/or the anticipated frame rate of the viewing device.

Spatial resolution can be expected to improve for both still and motion image sensors. As described elsewhere in this Report, square grids and properly scalable array geometries are important factors in providing extensibility.

Specific applications can require high spatial, but lower temporal resolutions. Image scanners fall into this category. Used for medical X-rays, hard copy scanning, film conversion, and fax, a common characteristic is the need for high image integrity (e.g., error-free image sensors, lossless compression, robust error correction).

An ideal image sensor would be able to resolve the entire range of color tints and hues visible to a human eye over a very wide dynamic range. It would also have well defined electrical transfer characteristics. Falling short of this, it is important that the colorimetric transfer characteristics be sufficiently defined to accommodate faithful propagation throughout the image path.

Future image sensors will likely contain increasing amounts of on-device signal processing in the form of motion detection, compression, and error detection and correction. They may not be scanned, but interrupt driven, responding to changes in the image. Devices may even begin to take on some functional characteristics of the human retina.

Processing includes:

Transport includes:

Reconstruction includes:

A receiving device requires an image processing engine to properly reconstruct information from the signal. This information must be compatible with both the presentation device and local storage (e.g., tape, disc, semiconductor).

Presentation includes:

Presentation manipulation can be spatial (e.g., zoom, pan, detailing, colorization) or temporal (slow motion, still frame, fast scan). Many of these manipulations may be difficult to achieve with compression schemes that use incremental transmission or sub-sampling techniques.

Some applications require bi-directional capabilities. Some examples are: interactive communications, on-demand programming, pay-per-view, and client/server models.

In a client/server structure, the presentation "client" device may be physically separate from the reception/reconstruction "server." This model might apply to both robust and upgradable servers in a home neighborhood or on a computer network within an engineering office environment. In either case, the server would need to be able to interrogate the client so that it could properly reconstruct the presentation information.

6.3 Displays

6.3.1 General Considerations

No other component of a digital image system has more impact on industries and applications than the display. More than bandwidth limitations, image capture, and signal processing, the performance and economic constraints of available displays are currently the greatest pacing factors.

Applications that can live within current display constraints, or rapidly utilize or promote advancements in both flat screen and projection (and to a lesser degree, direct view CRTs) will be in the best positions to prosper.

Potential display image sizes can range from a wrist watch to a planetarium. General factors that should be considered in specifying a display include: number of viewers, viewing conditions, spatial and temporal resolutions, pixel size and shape, lithographed versus variable picture elements, refresh rate, brightness, density, color gamut, micro defects, aging, reliability, aliasing, artifacts, aspect ratio, overall display image area, display package size, power requirements, and cost. Some of these factors, not already discussed, will be expanded on.

There are two general categories of viewing environments: single viewer and multiple viewer. Traditionally, single viewer display sizes have been smaller (<17 inches) and the applications have been more "task" oriented and interactive (e.g., computer display). Multiple viewer displays have been larger (>19 inches) and been more "entertainment" oriented (e.g., TV.) and passive.

Both of these traditions are changing and will continue to do so. In particular, a proliferation of single viewer entertainment displays (e.g., personal TV's and games), and multiple viewer task displays (e.g., electronic white boards and overhead projection panels) will be fueled by continuing advances in display technologies.

Viewing angle is an important factor in tuning displays to applications. The viewing angle is a function of display size and viewer distance. For a constant spatial resolution at the viewer's retina, overall display resolution needs to increase as viewing distance decreases.

As screen sizes increase and images get brighter, flicker becomes more of an issue in scanned displays. Even a 72 Hz scan rate can produce noticeable flicker with younger viewers in some situations. At a 50 or 60 Hz display scan rate, screens with high brightness that cover a wide field of view can produce objectionable levels of flicker.

It is important to separate flicker produced from scanning a display (commonly a flying spot on a CRT), from other causes (e.g., capture, conversion, interlace, signal processing artifacts).

Head mounted viewing devices (glasses, or goggles) could make single user, low cost, high resolution displays practical. Additionally, a viewing device with dual displays (versus a mirror arrangement) would have the inherent ability to display in stereoscopic images. This type of device could have both task and entertainment applications, would have a size and privacy advantage for portable applications (e.g., portable computing, viewing proprietary videos on airplanes), and operate well in poor ambient lighting situations. It could also be the lowest cost way to deliver high resolution images to the early consumer market.

Future displays might also provide stereoscopic images without special viewing glasses and virtual holography (stereoscopic images with multiple viewing perspectives). Image architectures would need to pay particular attention to accommodating the latter.

As the market demands increasingly improved display capabilities, entirely new technologies and display structures may come into being. Some features which could find their way into future displays include: directly addressable image elements, layered structures with control over picture element persistence, variable spatial and temporal resolutions across the surface, on-display scene creation and manipulation, fixed eye position displays that map resolutions to match human retinas, eye tracking displays that tune resolutions across the surface to produce an optimal image to the viewer.

6.3.2 Practical Limits

A digital image architecture that is extensible and interoperable can be expected to improve in quality over the years. There are some practical limits beyond which human visual perception can be exceeded, making further gains in hardware capabilities non productive.

Although current sound technology parameters are close to or beyond human audible capabilities, there is a vast chasm of opportunities to be filled before we approach our optical limits. There are more mundane limitations as well.

For example: While one might imagine a 20 meter diagonal display with 0.1mm pixel pitch, there are practical limits to both physical display size (how many home living/entertainment rooms could support such a large screen?) and human capabilities (one would need a magnifying glass or binoculars to appreciate such spatial resolution.) On the other hand, close examination of images from the old masters might justify just such a display. And topological images might even require that this physically limited display be manipulated to bring in additional portions or viewpoints of the larger source image(s).

6.3.3 Future Receiver/Display Possibilities

During the next twenty years, some receiver/display options that should be considered include:

Wrist watch display

Personal viewing device

Home entertainment display (classical HDTV)

Physicians work surface (simulates x-ray light wall)

Engineers white board

Drafting table

Writers work table

Artists canvas

Make-up mirror

Augmented imagery (ancillary to main viewing surface)

6.4 Toward the AAAA (Anything, Anytime, Anyplace Appliance)

At the extreme in communications, imagine having a global archive, switching, communication, and receiving infrastructure in place that would let an individual access information in any media (Library of Congress, Smithsonian, encyclopedias, technical/professional journals, movies and TV programs from all film libraries, art from all major museums, company databases, home videos, personal medical records, new car facts with pictures and videos, weather maps, bank accounts, sports scores with selected replays, visual stock performance, restaurant menus complete with aromas and a view from your table, etc.) plus real-time access to any individual or group of people (voice, video, shared display, etc.)

7.0 Future Work & Other Issues

This Report is in essence a progress report on the requirements for establishing digital imaging standards that are interoperable, scalable and extensible. In many cases additional research and further discussions will be required to resolve the remaining issues. This section identifies some of those issues that need to be resolved. These provocative issues will benefit from discussion in the SMPTE engineering community at large. The SMPTE Standards Committee will determine engineering committee assignments to undertake a more detailed evaluation of these elements.

Gating Technologies

A gating technology sets the pace for advancements in technology products and systems. For most of the history of television, the display was the fundamental gating technology. Only in recent years, has this role shifted to the transmission standard itself.

It should not be assumed that future architectures will be gated by display technologies over the long term. Other elements in the image path should be carefully evaluated as to their potential impact as gating technologies.

Strawman Proposals

For a robust digital architecture standard to successfully cross industries, applications, and time, it is critical that thorough simulation and testing be performed across a range of applications.

At least three diverse strawman applications should be selected as test vehicles. Interoperability between these should be verified. Candidate applications include: broadcast television, multi-media computer workstations, medical x-ray, virtual reality, flight simulation, video phone, scientific visualization, and client-server networks.

Long Term Extensibility

An accurate forecast of technologies and applications for the next fifty years is unlikely. However, a diligent evaluation of potentially relevant work underway in research laboratories throughout the world, and a careful study of anthropology and market demographics should help in achieving long term extensibility.

Specifically, breakthroughs in image capture, display, communications, storage, and signal processing technologies could all have a profound effect on future image based applications.

Frame Rate Evaluation

There is a need to gather existing data and perform experiments to determine desirable frame rates for a range of services including sports and other high temporal events. Bad numbers for conversion to or from existing standards should be explored so that they can be avoided.

Other Issues

The questions which follow should be considered in the future work leading to a digital image architecture. They cover a wide range of topics and considerations. Some may ultimately prove to be important, others may become insignificant due to advances in technology.

As discussed in Section 2, the concept of scalability refers to the ability to extract higher and lower quality results from a common signal format. The concept of extensibility indicates the need to accommodate future enhancements in systems due to the rapid pace of technology.

Scalability and Extensibility require that many of the following areas have mechanisms for increasing and decreasing:

In each of the above areas, may issues arise in evaluating efficient mechanisms for increasing and decreasing.

7.1 Image resolution

7.1.1 Are simple fractions a good guide for image resolution scaling and extending? If so, what should be the numerical basis?

The following issues support the use of simple fractions when scaling resolution:

7.1.2 Given that CCDs, active matrix liquid crystal displays and projectors, and other devices, have inherently lithographed, and therefore very fixed, resolutions, should the numerical basis and specific (and optimally related) image sizes be definitively determined?

7.1.3 Further in light of the many such fixed size, resolution, and raster format devices, would not square pixels be an important consideration? How many industries and applications require square pixels, and would be hurt if a non-square pixel format were chosen for advanced television? How severe would the degradations in quality and conversion cost be in such a case? Conversely, how many industries and applications not requiring square pixels would be hurt if a square pixel environment were imposed upon them?

7.1.4 If simple fraction resolution transformation guidelines are deemed worthwhile, should there be a numerical basis of certain base resolutions? Is the horizontal or fast axis more critical for the basis due to digital design considerations? If so, are powers of two the optimal basis for horizontal resolutions, such as 512, 1024, 2048, etc.?

7.1.5 When transcoding resolution, what parameters are required to perform optimal transcoding? Is the bandlimiting associated with transcoding at ratios other than simple fractions an acceptable degradation? In what industries/applications is it acceptable, and not acceptable?

7.1.6 CCDs, Active matrix liquid crystal displays and projectors, and computer generated images, and other image scanning, generating, and displaying devices can produce digital image values which are not bandwidth limited. Further, it is common for computer displays to use text, windows, and graphics which are aligned to the raster and which use maximum bandwidth signals such as white lines of pixels on black and black dots on white, etc. Given that such non-band-limited signals are common and useful in many industries, is the issue of requiring band limiting for transcoding, compression, or coding problematic? What industries would be significantly hindered if high definition systems required band limiting?

7.1.7 What useful increments of scaling might be best for a resolution hierarchy? Factors of two, being one optimally decodable resolution per octave? Two samples per resolution octave such as 3/4 and 1/2, or 3/2 and 2? Or is continuously variable resolution, and associated band limiting a requirement in some industries/applications?

7.1.8 Should an image architecture emphasize the ability to apply more resolution to some screen areas than others? Or should constant image resolution and quality be mandated for all areas of the image? Is the answer different for different industries/applications? What problems might arise if such a signal format were considered for production? What issues arise within production switchers for such formats?

7.1.9 If some image areas are updated with different resolutions, or temporal rates, than others, should the universal header or descriptor contain this information and make it visible to all devices, or is it acceptable if such information is hidden within the data stream?

7.1.10 Are image region rectangular and square structures such as that proposed as tiles, a useful construct in providing interoperability and flexibility in image update?

7.1.11 How likely are future image structures which are not xy raster based such as hexagonal or poisson distribution samplings? Is it possible to develop an image architecture which has mechanisms to accommodate such structures in the future? Can we anticipate the transcoding steps between a square-pixel xy raster and a uniform hexagonal or poisson distribution raster and thereby do our best to allow for such future possibilities?

7.1.12 How completely should image filtering and processing histories be specified in order to support subsequent image processing operations? A knowledge of the concatenation of all pre-filters may be desirable in complex image operations. How lengthy are such histories likely to become?

7.1.13 How do flying spot devices such as CRTs and cathode-ray cameras, relate to fixed raster "lithographed" devices such as CCDs and active matrix liquid crystal displays and projectors? How can the high definition image architecture accommodate both types of image sources and displays without substantial quality loss? Can both types of image data be processed in the same transforming devices using the same parameters, or are different processing steps required for the two different types of image data?

7.1.14 Is there a representation appropriate where an idealized pixel can be generated through signal processing? The purpose of such an idealized pixel would be to be used as input to resolution scaling (also known as resolution transcoding). Would the digital signal processing required to create such an idealized pixel result in unacceptable artifacts due to the footprint of the convolutional processing kernel?

7.1.15 In traditional television, there was no possibility of color dot triad alignment with scanlines or pixels. However, with the advent of lithographed displays, such as active matrix flat panel displays, the relationship between the color triad and the pixel becomes exact. Is there an optimal organization of color area portions in the context of a digital image architecture? Should the colors be overlayed onto a common area through the use of lenslets, fibers, or other techniques? Should the pixels be adjusted so that they even overlap through such optical techniques in order to reduce blocky appearance?

7.1.16 If the color regions representing a pixel must remain spatially distinct, is there a particular arrangement which is optimal? If so, should the precise positions of the color sub-pixel areas be taken into account in the digital image architecture and in the representation, capture, and processing of the digital image signal? What effect do the gaps between color regions have? How do lenticular or Fresnel screens affect color?

7.1.17 If the color regions representing a pixel must remain spatially distinct, could more than one triad be placed within one logical pixel? Is there an optimal number of such sub-triads, such as perhaps four? Is there an optimal arrangement of such sub-triads? If so, should this arrangement be taken into account in the digital image architecture and in the representation, capture, and processing of the digital image signal?

7.1.18 If a color space where to support more than three primary colors, would there be benefit to using four or more primaries in some appropriate configuration as a standardized pixel shape?

7.1.19 On a CRT, the spot shape is usually a round or ellipsoidal gaussian which flies horizontally. On a flat panel display, the spot is usually a square shape which is stationary. There may also be dead-zones between pixels. What signal properties should be adjusted to take these issues into account?

7.1.20 Could some of these issues be handled by the use of an idealized or standardized pixel representation, with defined transformations at the receiving device appropriate to its particular pixel configuration? If so, how should the digital image architecture specify this?

7.1.21 What should be done concerning similar issues in CCD image sensors?

7.1.22 What is the impact of these various issues on the Kell factor?

7.1.23 Would other pixel shapes such as triangle, hexagonal, and diamond have advantages for future image capture and display technologies?

7.1.24 The 1.333 : 1 (4 x 3) aspect ratio is widely used in television and computers. In the motion picture industry, 1.37:1, 1.66:1, 1.85:1, and 2.35:1 are all commonly used. The 8.5" x 11" page has an aspect ratio of about 0.77: 1. The European page size approaches 0.71: 1. Computer display memories are most simply organized with aspect ratios which are simple fractions such as 1:1, 3:2, and 2:1. Medical radiology, still photography, newspapers, magazines, books, and other images have a number of commonly used aspect ratios. How do we achieve support of all of the aspect ratios in common use?

7.1.25 Can the header/descriptor be used to indicate the aspect ratio and resolution of an image, so that the displaying device can do its own version of letter boxing (unused areas) or overscan (discarded areas)? For those systems which have compressed frame groupings, how could the edges of the letterbox be protected from the moving blocks?

7.1.26 What would be the most widely used mappings between common aspect ratios and anticipated common screen sizes?

7.1.27 If the 16:9 aspect ratio becomes popular, what mappings are likely for European (metric) and American (English) sized paper pages, wide screen movies at 2.35 : 1, television and film at 1.333 : 1 (4 x 3), movies at 1.85:1, and other widely used image formats?

7.1.28 Can the digital image architecture support not only a wide variety of aspect ratios in the material being displayed, but also a wide variety of aspect ratios at the receiving display itself?

7.1.29 If the digital image architecture supports multiple aspect ratios, with interoperability between displays at such various aspect ratios, what are the key technical issues? Should the horizontal resolution of all aspect ratios be held at simple fractional relationships, while allowing the vertical resolution (with square pixels) to vary in fine increments to fit the exact aspect ratio desired at the display?

7.1.30 The diagonals of the camera apertures of common film formats have dimensions as follows:


Format                    Diagonal

35mm Full Aperture           31.14 mm 
35mm Academy                 27.16 mm
35mm Still                   43.27 mm   
65mm                         57.30 mm 
Professional Roll Still      101.1 mm    
There are many high quality lenses in existence for each of these formats. Would it be useful to keep these dimensions in mind when developing lithographed sensors such as CCD arrays?

7.2 Image Temporal Rate

7.2.1 Should a single temporal rate be emphasized? Or is there a need to support multiple temporal rates for different industries/applications, or within one application?

7.2.2 Are there sufficient mechanisms available in temporal properties of high definition systems to handle the issue of computer CRT displays requiring refresh rates higher than 70 Hz?

7.2.3 Is there a mechanism for reliably and consistently transforming high definition television imagery to 24 frame per second film for theatrical release?

7.2.4 Should there be a family of temporal rates which are related by a simple fraction rule? What should be the numerical basis of such rates?

7.2.5 When temporally transcoding, what temporal beat frequencies are visually acceptable? Does the 12 Hz beat frequency of the 3-2 pull down, and its wide use and seeming acceptance, indicate that 12 Hz or higher is an acceptable beat frequency, or are there frame patterns in which higher beat frequencies are required for acceptable viewing?

7.2.6 What sort of synchronization mechanisms are optimal for digital systems, given that inherent digital system flexibility need not require every device to be locked to a common master very-high-frequency oscillator (near 100 MHz)?

7.2.7 CCDs, Active matrix liquid crystal displays and projectors, and other devices, utilize a portion of horizontal or vertical retrace intervals to transfer to/from frame buffers. Future systems may not require these intervals. How will this affect the need to dedicate signal time to these intervals?

7.2.8 Given that CCDs, active matrix liquid crystal displays and projectors, and other devices, have no inherent flicker or update rate requirements, should temporal rate flexibility be part of the high definition architecture?

7.2.9 Should an image architecture emphasize the ability to use a higher update rate for some screen areas than others? Or should constant image rate be mandated for all areas of the image? Is the answer different for different industries/applications?

7.2.10 It is common to use a 50% temporal duty sampling cycle (180 degree shutter in film cameras to allow film pull-down), which provides a balance between motion blur and sharpness. Is not this temporal undersampling certain to introduce temporal aliasing during temporal rate transcoding? Is not such aliasing certain to appear as artifacts which occur at the temporal beat frequency rate? (e.g. a 50 Hz to 60 Hz transcoding would have a 10 Hz beat frequency)

7.2.11 Some CCD sensors used in cameras see the entire frame area during the exposure time. This is similar to film exposure. Some tube cameras scan the image top to bottom, whether progressively or interlaced. What are the temporal processing, displaying, and viewing effects caused by mixing devices which integrate the entire image versus those that scan the image from top to bottom? What temporal issues arise due to the fact that the top of the image may be seen or displayed nearly a frame time before the bottom of the image, and half a frame time before the center of the image?

7.2.12 As just mentioned, both displays and sensors exist which scan from top to bottom or which integrate the entire image for the frame time. What architectural issues should be examined in attempting to take this issue into account? What issues are involved in converting a scanned image for area display or in converting an area sensed image for scanned display? How do these issues affect film scanning such as in a telecine? Does the wipe time involved in the physical film camera shutter have an affect? How do these issues affect film recording from an electronically captured moving image?

7.2.13 How do these scanning pattern issues affect temporal transcoding, finding motion vectors for compression or standards conversion, effects processing, or other image processing operations? What issues arise when compositing or mixing multiple image sources captured with different scanning patterns?

7.2.14 Standards converters which convert between 525/60 and 625/50 are available at various levels of cost/performance, utilizing a number of techniques. At the highest level of performance, motion estimation may be employed to interpolate frames. Undersampling due to interlace, may have an impact of these process. What problems may thus arise for architectures that rely on temporal transcoding or standards conversion?

7.2.15 When displaying multiple windows of moving images on a screen, as in a future video teleconference, how can buffering be minimized for each picture stream in order to achieve display synchronization? What options are available for local, regional, and global synchronization, both loose and near exact?

7.2.16 Is there a benefit to selecting a particular master oscillation rate, from which pixel clocks in the scalable system are derived? If so, what candidate rates might offer advantages?

7.2.17 Some applications, such as teleconferencing, interactive flight simulation, or virtual reality, require low latency. Other applications, such as broadcast television, can have substantial latency without much problem. What digital image architecture mechanisms are needed to provide for those applications which require low latency.

7.2.18 Compression algorithm design is significantly affected by a low latency requirement. What latency is implied by any candidate digital advanced television systems. What affect would such inherent latency have on usefulness for those applications requiring low latency?

7.2.19 Digital network design is affected by needs for real-time bandwidth as well as latency requirements. How does the need for low latency combined with high real-time bandwidth in these industries affect digital interactive network design?

7.3 Image Layers, Overlays, and Windows

7.3.1 Some workstations, one to four bits (usually two or four) are used for overlay planes, which are added on top of the underlying image (which is usually full color). However, many popular systems, like the Macintosh II(R), the DECstation 5000(R), and others, do not use overlay planes. It is possible for X-Windows and other window management systems to support overlay planes as well as managing full color windows in the main bit planes. Should overlay planes be part of digital advanced television architectures?

7.3.2 If overlay planes are used, should these overlay planes be implemented in hardware, or as a virtual mechanism in software, or can both be accommodated?

7.3.3 How many bits of real or virtual overlay plane should be mandated or recommended, if overlay planes are mandated or recommended?

7.3.4 The common practice of bandwidth limiting moving images suggests the possibility of using overlay planes to contain non-band-limited imagery, with the band-limited moving images using the main bit planes underneath. Overlay planes could easily contain the usually non-band-limited window borders, text, stipple patterns, graphics, etc., which characterize computer screens. This architecture would require all receiving devices to support either real or virtual overlay planes. Is such an architecture appropriate?

7.3.5 Is it necessary for interoperability across industries and applications to allow for the possibility of non-band-limited picture data co-existing with band-limited imagery from cameras or other (possibly synthetic) sources?

7.3.6 Do appropriate digital image compression algorithms exist which can pass non-band-limited picture data such as that used typically on computer screens? If such a compression technique exists, would this allow such non-band-limited picture data to be transmitted together with the moving picture stream? What are the properties of such a compression algorithm, if one exists?

7.3.7 Can the data areas available in some digital advanced television proposals be used to convey encoded data for use with real or virtual overlay planes? Would Unix X-Windows(R), Display Postscript(R), Apple Macintosh Toolbox(R), Microsoft Windows(R), fax, or other forms of encoded graphic and text data such as run-length codes be conveyable in this manner? Are there one or more such techniques which might be appropriate to support for digital advanced television?

7.3.8 Is the proprietary nature of many of these formats a barrier to interoperability, or are there potential solutions to provide universal access?

7.3.9 Are open standards such as IGES for vector and graphics images, CGM for raster images, or Open Document Architecture (ODA) for compound documents worth considering in light of desire for universal access?

7.3.10 Should one or a small number of such formats be supported universally? By standardizing on only one such format, all receiving devices would only need to support that single format. If no such single standard is chosen, then each receiving device desiring to display computer-type window or graphics displays might need to support many or all of the formats in common use. Is there a way to encourage adoption of one or a small number of text and graphics protocol standards to be universally supported?

7.3.11 Should a digital image architecture require that text and graphic data, typical of computer displays, be able to be passed to the display by either the use of real or virtual overlay planes or appropriate compression algorithms capable of passing this information. Should the digital image architecture insist on at least one of these two ways of passing computer display information?

7.3.12 As an alternative to screen-resolution-specific graphics, should all graphics be specified with much higher precision than the display? Such might likely use outline fonts and graphic commands which can do proportional blending of text with appropriate filters, and which allow image detail to placed between pixels or lines. Would such non-raster-aligned text and graphics, with appropriate filtering, be acceptably legible and clear compared to raster-aligned and non-band-limited text and graphics as is typical of current computer screens?

7.3.13 If presentation of non-band-limited image information, as is typical of computer displays, is a requirement, then should multiple screen resolutions be supported for resolution scalability? If so, should the simple fraction guideline be used for the relationships of screen resolutions due to the need to preserve legibility and clarity of the non-bandwidth-limited image data?

7.3.14 Should the capability for selecting among a variety of overlay-planes for display be part of digital advanced television architectures? Could such a selection be useful for closed captions, sign-language inserts, foreign language subtitles, television program guides, sports statistics, or other picture augmentation information?

7.3.15 How many such simultaneous overlay planes could or should be supported?

7.3.16 Should more general compositing functions other than overlay be supported at the receiving device as a part of digital advanced television architectures? In particular, should alpha blending be supported? Alpha blending uses proportional blending at edges of the overlaying area. This technique is also known as proportional matte edge compositing.

7.3.17 If such compositing should be supported, should there be limits or scene-specification guidelines for the amount of area involved in mattes and proportional edges each frame, or each second?

7.3.18 If such compositing should be supported, how many layers of composite should be allowed? If the activation of the composited foreground overlays is supported, it is likely to be controlled and specified by the user at the receiving device. If foreground overlays are transmitted in moderate time intervals such as a second or more, then the number of overlays allowed will affect the amount of buffering required at the receiving device. Should such issues be a part of an image architecture for digital advanced television?

7.3.19 Is the concept of tiles and plates useful by providing compact data representation for locating pixels used in proportional alpha blending? Tiles and plates are screen area subdivisions allowing specification of screen locations. The concept of plates is similar to city blocks and the concept of tiles is similar to house lots, with pixels being similar to locations on a grid layed over the house lot. This organization makes addressing a given pixel, tile, or plate location simpler. It also allows all locations on a given street to be easily located due to the proximity of each house to the next. Is this method of area subdivision useful? Are there other methods for subdividing an image which are appropriate or useful in digital advanced image architectures?

7.3.20 Should the use of windows on the display of the receiving device be anticipated as part of the architecture? What potential affect would there be on the architecture by anticipating the use of two or more windows within the display, each independently controlled by each user at each display? Is the minimal functionality of picture-in-picture appropriate, or is the elaborate window sizing, positioning, and overlaying capability of typical computer window systems appropriate, or somewhere in between?

7.3.21 If windows are anticipated, how would dynamic resizing take into account the possible need for simple-fractional guidelines in resolution scaling? Should there be notches in the window sizes at simple fraction points to allow clearer and more legible text and graphics?

7.3.22 If the local computer wishes to use the overlay planes, how would such use interact with remote control of these overlay planes? Would the local system have priority? How would such priority be controlled?

7.3.23 Is it not likely that locally simulated synthetic computer-generated images will often be overlayed onto the digital advanced television image stream? What filter representations are appropriate for matching the pixel representations of the simulated and real or television imagery? Should idealized pixels be used? What should be the assumptions concerning flying- spot versus lithographed raster pixel representations in this context? What are the optimal computer anti-aliasing filters for such composited images?

7.4 Compression Quality Level

7.4.1 What is the proper tradeoff between compression quality, data rate, image resolution, image dynamic range, image color fidelity, and image temporal rate? How should these tradeoffs be resolved for different industries and applications (e.g., medicine, scientific visualization, production, videoconferencing, transmission)?

7.4.2 Can a family of compression quality levels be developed which allow spatial and temporal resolution scalability with a layered coding technique? Can such a compression technique compete successfully with single point solution or single resolution/rate heavily optimized systems? Would such a technique offer benefits to interoperability between industries/applications? Are the benefits of scalability and extensibility sufficient to justify the effort to develop a layered compression system of sufficient quality? How much is gained by compromising the orthogonality required for scalable layered systems via the use of non-linear terms? Are such non-linear data interactions likely to hinder other interoperability needs, in addition to the desire for scalability and extensibility?

7.4.3 How serious is the problem of motion vector interaction between motion vectors used in standards conversion and motion vectors used in motion-compensated compression? Would an image which has been transcoded in either resolution, temporal rate, or both, interact acceptably with a subsequent digital compression algorithm, or would the compression motion vectors show severe errors and aliasing beat frequencies? If the image is further transcoded after subsequent decompression, would the errors compound further?

7.4.4 What sub-pixel resolution is required for motion vectors used in motion-compensated compression?

7.4.5 What is the affect of concatenated compress/decompress cycles within one algorithm as used when exchanging images multiple times between different industries/applications? What is the affect of concatenated compress/decompress cycles between different compression algorithms? Are the preliminary studies which indicate that this might result in severe degradation correct?

7.4.6 It is common practice in some compression schemes, such as MPEG, to use frame groupings. How can live switching be performed if frame groupings are not aligned? How can misalignment of key intermediate anchor frames be prevented when performing multiple compress and decompress cycles?

7.4.7 Do these issue argue for commonality and compatibility of compression algorithms, and a minimization or elimination of temporal and spatial transcoding in processing images? If transcoding is applied, are simple fraction-based temporal and spatial transcodings less prone to degradation than arbitrary fraction transcodings?

7.4.8 Can a high resolution system architecture accommodate the rapid algorithmic and digital hardware advances in the state of the art which appear to be inevitable each year? If an optimal algorithm for this year is selected, what is the likelihood that this algorithm would be obsolete in five or ten years due to improvements in hardware or algorithmic techniques? If compression algorithms are likely to become obsolete every five years, what high definition system architecture principles can be developed to allow radical algorithmic and hardware improvements as appropriate? Is the header/descriptor sufficient, or are other principles required in conjunction with the basic compression algorithm design to allow extensibility or easy replacement/upgrade? Can new algorithms coexist with old algorithms while providing efficient bandwidth/spectrum usage, in light of the fact that new algorithms may be many times more efficient than old ones? Can old algorithms be continued in use when their inefficiency approaches factors of four or eight below optimal with respect to the newest algorithms and hardware? How can extensibility be accommodated, as will certainly be required, while maintaining service to older devices which require inefficient digital signal architectures for a given point in time?

7.4.9 How likely will the current DCT and sub-band systems advance as future optimal algorithms in five to ten years? Are other algorithms such as fractal, wavelet, vector quantization or other large codebook algorithms likely to be more efficient in some future hardware capability level? Is it likely that future compression algorithms may be as yet unanticipated? What steps can be taken to prepare for such major shifts in compression techniques, should they occur?

7.4.10 Is it likely that decompression chips in receiving devices could be programmable? If so, could updates to compression algorithms be downloaded using header/descriptor support, or by other software distribution methods? Would such updates be useful? Would it be useful to place the decompression module on a standardized card, so that the chip itself could be replaced as technology advances?

7.4.11 Are there proposed signal formats based upon very rapid partial frames which can support multiple receiver display rates from a single signal without degradation of any rate? Do such formats also provide for minimum buffering of multiple asynchronous sources being presented on the same display?

7.4.12 Some applications, such as the colorization of movies, create data which defines the objects and their boundaries and motion for every frame. Can such data be useful in a system architecture for compression or other uses?

7.4.13 Is it useful to gather macro information about a scene by encoding data such as camera position, motion, and orientation? Could tripod head encoders be useful for this purpose? Are there navigational tools which could be adapted for this purpose? Would such global information be of sufficient value in compression and other uses to warrant the capturing of this information?

7.5 Data Rate in Relationship to Image Quality

7.5.1 Can digital image compression algorithms be layered in resolution and temporal rate, while at the same time being usefully layered in data rate?

7.5.2 Is a data rate hierarchy possible in this context?

7.5.3 There will always be a variety of bus rates, memory bandwidths, disk transfer rates, and channel rates. What are the useful rates of a data rate hierarchy, if such a hierarchy is possible?

7.5.4 Is orthogonality of temporal and spatial resolution via a layered hierarchical compression technique possible? Is data rate orthogonality, integrated with temporal and spatial resolution, possible?

7.5.5 Can other useful augmentations be layered onto the data rate such as extra camera views, stereoscopic imagery, z-value depth information, blending coefficients for compositing, additional dynamic range or improved colorimetry?

7.5.6 Can high quality still frames be acceptably interleaved into the data stream concurrent with the moving image data stream?

7.5.7 Can alternate aspect ratios be provided simultaneously by an appropriately layered data stream construction?

7.5.8 Can extra image channels such as closed-caption sign language display windows, previews of future shows, and others, be acceptably layered into the data stream?

7.5.9 Can three dimensional image construction information be provided for those receiving devices capable of creating three dimensional computer generated images? Could such images, by computer synthesis at the display, substitute for tiny details which are not adequately captured with the camera resolution such as red-orange golf balls (usually red and blue have lower resolution than green due to chroma-sub-sampling in the Y,Cr,Cb technique)? Could computer graphics create new interactive games or other locally interactive education or training in this way? Will future display devices be likely to have the capability to generate some amount of screen area containing three dimensional computer generated synthetic images and composite them correctly into the two-dimensional high definition background image situation?

7.5.10 Transport errors are highly dependant on the transport channel. What sort of error protection/correction should accompany advanced television digitally compressed data, in light of such data's extreme sensitivity to errors? Are the mechanisms being developed in the transport header portion of the universal header/descriptor sufficient, or should all digitally compressed picture data contain inherent protection and correction protocols?

7.5.11 How should encryption be supported? Is public-key encryption appropriate and sufficient? What levels of encryption are required for various transport media and various uses?

7.5.12 Packet-retry type networks, such as the current Internet, or Ethernet (R) with TCP/IP cannot guarantee delivery of data for a real-time stream since a packet of data may be "bumped" and must be resent. Real-time streams require that data be both intact as well as "on-time", invalidating packet-retry protocols which would provide resent data subsequent to the required time. It is therefore likely that the basic data network infrastructure will need to significantly change in order to support real-time imagery and audio streams in shared channels, unless switched point-to-point services are used due to insufficient shared-channel infrastructure technology. In light of such significant changes, is it possible to anticipate the future networking protocols and techniques so that the high definition image architecture can be developed to be compatible? Are packet prioritization and priority-based graceful degradation likely to be key techniques in such future networks? If so, how much priority information and priority verification and authorization is needed? Also, how many levels of priority, and what priority schemes might be required, to optimize quality for all users of a shared channel as well as priority packet routing performance? How much guaranteed bandwidth is required by each type of user, and can such data bandwidth be guaranteed? Will such potential guarantee requirements need some amount of data bandwidth reservation? Is it possible to design appropriate shared networks with a hybrid of reservation and non-reservation, applied with both reservation and non-reservation portions of each connection's data stream? Is such a technique a reasonable match between prioritized compressed advanced television and shared data networks, such that a certain amount of real-time bandwidth is guaranteed, but an additional portion is not reserved but is usually provided?

7.6 Image Luminance Dynamic Range

7.6.1 As cameras and displays continue to increase their luminance dynamic range beyond the 100:1 which is now common, would it not be desirable to be able to support such extended range? Are 1000:1 luminance dynamic range cameras and display devices likely in the next five to ten years?

7.6.2 Can a given advanced television digital compression algorithm be augmented to allow more bits for luminance, or would the requirement for more accuracy defeat the ability of the algorithm to provide the required compression ratio?

7.6.3 Can luminance transfer functions such as those used in CCIR Rec. 709 and SMPTE 240M be augmented to provide extended black and white range?

7.7 Image Colorimetric Range

7.7.1 Do there exist cases in which red or blue details on dark backgrounds, or yellow or cyan details on light backgrounds, would require higher resolution in color for both broadcasting and other industries/applications?

7.7.2 What are the benefits of an RGB digital representation for interoperability across industries and applications versus the Y,Cr,Cb (also called Y,Pr,Pb and YUV) color difference representation commonly used in television?

7.7.3 What are the tradeoffs for using wider gamuts, including gamuts beyond the real spectrum, in covering the real colors? What industries/applications, such as perhaps museums, colleges, printing, photography, and motion pictures, may require accurate color reproduction over a color gamut which is wider than is commonly proposed for high definition television systems?

7.7.4 What other color representations, such as HSV, CIE x,y or CIE u',v' are useful?

7.7.5 How much precision loss accompanies a given high definition color transformation to and from these other representations?

7.7.6 Is there benefit from using a color space which supports color sensors and displays which use more than three color primaries?

7.7.7 When are linear color representations needed for computations? What linear color representations are appropriate in a device-independent context?

7.7.8 What color representations offer device independence?

7.7.9 Can luminance or other brightness measure be represented such that it is orthogonal to color representation? Can such a representation offer color invariance under exposure or illumination level or adjustment?

7.7.10 What color representation is most useful for adjusting a wider gamut of color for a narrower gamut display? What are the tradeoffs between clipping to the narrower gamut, and a softer adjustment, similar to highlight compression in the S-Curve?

7.7.11 What are the perceptual uniformity properties of various useful color spaces?

7.7.12 Are Hue Saturation and Value representations useful in these contexts?

7.7.13 How can digital numeric representation efficiency be optimized, while still allowing the possibility of wide gamut colorimetry in addition to efficient support of narrower gamuts.

7.7.14 What device independent color spaces are most efficient for compression?

7.7.15 Luminance, which matches human color sensitivity, is an appropriate representation of brightness near the display. Other representations may be required in the studio where processing is required. For example, blue screen compositing involving transparency requires as much detail in blue as in green. How should the division be made between the use of luminance versus an equal representation of red, green and blue.

7.8 Image, Number of Active Channels

7.8.1 As efficiency of compression improves, if ways are found to allow new compression techniques to replace old ones, then new data bandwidth will be made available with each such upgrade. This bandwidth can carry more active image channels. This channels can be of quality equivalent to the original if a factor of two is gained in the upgrade. Such would make possible the two required image streams for stereoscopic images. Alternatively, or additionally, a number of lower-quality channels could be added with alternate views of the scene from different cameras (cockpit camera in a car race or helmet camera in football, closeups, long shots, etc). Also, entirely different programming is possible on the new data. Can an architecture be developed which will allow this evolution?

7.8.2 Is the header/descriptor mechanism likely to be a major element in providing such augmented capabilities, or do the issues extend into the nature of upgradable compression algorithms?

7.9 Audio Quality

7.9.1 Musicam (used in MPEG) and other compressed audio systems are also likely to improve. Similar issues of upgradability and backward compatibility must be considered when evaluating the system architecture. Can the image architecture for this upgrade path, if one is found, be applied as effectively to audio?

7.9.2 As compression techniques improve, a given level of quality can be maintained while reducing data bandwidth requirements. What are the best uses for newly freed bandwidth? Will older devices be able to decode new algorithms if they are made to be somewhat programmable from the beginning? Programmable devices will allow algorithmic improvements which can utilize a given device. However, it is likely that improvements in algorithms will have to be accompanied by new hardware, thus making the upgrade path and backward compatibility path difficult. Are there ways to improve this situation in the system architecture?

7.10 Audio, Number of Channels

7.10.1 Additional audio channels can be added in expanding bandwidth due to more efficient algorithms or due to increased bandwidth or higher reliability channels.

7.10.2 Likely uses for additional channels are six-channel surround sound, and multiple languages on separate tracks.

8.0 Annex

8.1 Glossary of Terms

Alias
a form of image distortion associated with spatial and temporal filtering. A common form of aliasing is a stairstepped appearance along diagonal and curved lines. See Scaling:Interpolation.

Compression
the process of removing redundancies in a digital data stream to reduce the amount of data that must be stored or transmitted. The following terms are often used in describing image compression systems:

Lossless Compression
techniques for data reduction in which the original information can be recovered exactly as it existed prior to encoding.

Lossy Compression
techniques for data reduction in which some information maybe lost in the process of encoding or decoding the data. In image compression, an effort is made to preserve as much of the visually-significant data as is possible, sacrificing, when necessary, only data unlikely to be perceived by the average viewer.

Scalable Coding
the ability to encode a visual sequence so as to enable the decoding of the digital data stream at various spatial and/or temporal resolutions. Scalable compression techniques typically filter the image into separate bands of spatial and/or temporal data. Appropriate data reduction techniques are then applied to each band to match the response characteristics of human vision.

Fixed Data Rate Compression
techniques designed to produce a data stream with a constant data rate. Such techniques may vary the quality of quantization to match the allocated bandwidth.

Variable Data Rate Compression
techniques designed to produce a data stream with a variable data rate. Such techniques typically maintain a constant level of quantization producing a variable data rate based on the spatial and temporal energy content of the images being encoded.

Conditional Replenishment
a technique whereby various portions of the image are updated at differing rates. Interframe coding techniques utilize this concept as they eliminate redundancies between frames, only storing or transmitting image data related to the changes between frames. Conditional replenishment occurs as image sequences are reconstructed in dual ported frame memory; current display designs typically scan the frame memory at a constant rate to refresh the display. The following terms are related to the concept of conditional replenishment:

Addressable Display
a display designed to allow each pixel element to be controlled independently, allowing for conditional replenishment.

Conditional Acquisition
an image acquisition system that processes the image data and outputs information about changes as they occur, up to the maximum temporal sampling rate of the image sensor. Parallel outputs from separate image regions (tiles) may permit higher sampling rates.

Incremental Update
the practice of sending information only about changes to an image that occur between temporal samples; computer graphics and animation systems typically employ this technique.

Multirate Encoding
the digital encoding of multiple image components that are presented to the encoder at different temporal rates. For example, a high spatial/low temporal resolution component may be acquired at 24 fps while a low spatial/high temporal resolution component may be acquired at 72 fps.

Latency
the delay (latent period) between the occurrence of an event and its display by an electronic imaging system. Each of the following factors can contribute to the total latency of an imaging system, in some cases image distortions may be related to the latency of the system:

Encoding/Decoding
interframe coding techniques utilized for digital video compression require the processing of multiple frames to eliminate redundancies in the static areas of the image and to calculate motion vectors for the motion components of the image. Depending on the sophistication of the coding, interframe compression can introduce delays from two to several hundred frames.

Storage
used here to describe the access speed of a digital image storage system. Fast access speeds reduce the time required to retrieve specific data. CD-ROM is said to have high latency due to it's slow access speed, while hard disks have lower latency due to their faster access speeds.

Synchronization/processing
the use of frame synchronizers, timebase correction, and digital processors for visual effects introduce frame delays that accumulate as the signal passes through a video production system.
Pixel
the smallest picture element of an image (one sample of each color component). A digital display is typically specified in terms of pixels and color depth, the number of digital bits stored per pixel. A picture element is also called a "pel" in the field of image processing.

Quantization Levels
the predetermined levels at which an analog signal can be sampled as determined by the resolution of the analog to digital converter (in bits per sample) or the number of bits stored for the sampled signal. See: Sampling.

Resolution
The capability of an optical system, or other imaging system, of making clear and distinguishable the separate parts or components of an object. With respect to the relationship of the human visual system to an imaging system display, several factors must be taken into consideration:

Spatial Resolution
the ability of the display to reproduce adequate detail to allow the visual system to distinguish the separate parts or components of an object.

Temporal Resolution
the ability of the display to reproduce adequate detail to allow the visual system to distinguish the separate parts or components of an object that is moving through the display.

Perceived Resolution
from the observer's viewpoint, the apparent resolution of a display. This concept is based on the ability of the viewer to resolve all image detail presented by the display. At this ideal viewing distance perceived and actual spatial resolution are equal; at greater viewing distances the perceived resolution is higher than the actual spatial resolution of the display.

Sampling
the first step in the process of converting an analog signal into a digital representation. This is accomplished by measuring the value of the analog signal at regular intervals called samples. These values are then encoded to provide a digital representation of the analog signal. Image samples are usually called pixels. See: Quantization Levels.

Scaling (spatial)
alteration of the spatial resolution of an acquired image to decrease or increase the number of pixels used to represent the image. Any of the following techniques may be used for image scaling, resulting in the addition of image artifacts as indicated:

Interpolation
The process of averaging pixel information when scaling an image. When reducing the size of an image pixels are averaged to create a single new pixel that replaces two or more adjacent pixels; when an image is scaled up in size additional pixels are created by averaging the values of adjacent pixels. Interpolation generally causes apparent softening of the image when it is increased in size, because the averaging process does not create any new information.

Pixel Replication
a process used to display an image at a larger size by repeating pixels along a horizontal line and/or repeating lines to increase the vertical size. For example a 320 x 240 pixel image can be displayed at 640 by 480 size by duplicating each pixel along a line and then repeating the line; the resulting image will contain blocks of four pixels with the same value.

Resampling
the process of converting images between the spatial resolutions utilized by different imaging systems. The process may include interpolation to correct for differences in pixel geometry or to scale the image. Resampling is frequently used to change the density of pixels, typically measured in dots per inch (DPI), when preparing images for printing using halftones and color separations.

Sub-Sampling
bandwidth reduction techniques that reduce the amount of digital data used to represent an image. The following techniques are commonly utilized:

Chroma Sub-Sampling
the reduction of color resolution by reducing the bandwidth of color difference signals as practiced in composite video transmission and recording systems or by eliminating some color difference pixel information in digital processing systems.

Decimation (pixel sub sampling)
the process of discarding complete samples. The resulting image is reduced in size and may suffer from aliasing.

Scaling (temporal)
alteration of the temporal resolution of a visual sequence, to decrease (typical case), or increase the amount of data used to represent the visual sequence. This process may include one or more of the following techniques:

Interpolation
the process of adding or deleting temporal samples to a visual sequence by averaging adjacent temporal samples. Results are poor when there is rapid motion.

Motion compensation
the process of adding or deleting temporal samples to a visual sequence by predicting where a moving object should appear in the resulting new temporal sample.

Replication
the repetition of temporal data to increase the temporal display rate of a visual sequence. Typically used in computer systems to provide a higher screen refresh rate than the temporal rate of the visual sequence being displayed. Also used in film to video conversions to change from the 24 fps temporal rate of film to the 30 fps temporal rate of NTSC composite video (3:2 pull down).

Speed Change
playing information acquired at one temporal rate at a different temporal rate. For example, 24 fps film is played at 25 fps when it is converted to PAL composite video (a 4% speed change).

Sub-sampling
dropping entire temporal samples (e.g. video frames) to reduce the data rate.

8.2 Temporal Rate Analysis

This section includes detailed descriptions for some techniques for temporal rate conversions that might be utilized in the process of translating existing film or television pictures for display at higher refresh rates.

8.2.1 59.94 and 60.0 Hz vs 72 and 75 Hz

Computers which use CRT displays (as opposed to flat panel displays) currently comprise nearly all of the market except for portable laptop computers. Substantial experience with high resolution image presentation, with wide field of view and high brightness, has increasingly led to higher refresh rates such as 72 and 75 Hz

The use of 72 Hz is very naturally compatible with 24 fps film, being exactly three times. The rate of 75 Hz is compatible with the common European practice of transferring motion picture film to 50 Hz video by running it at 25 fps or 4% fast. A display which has a sync tolerance range of 4% could adapt to both 72 Hz and 75 Hz picture rates. A 4% tolerance does not add undue cost to a display. A CRT or other display which can present both 72 and 75 Hz progressively scanned images can be a key architectural element in a digital image architecture. Such a display can be used on computers as well as being able to be used for high quality presentation of motion picture film, and for 50 Hz material.

8.2.2 Motion Prediction

Very expensive receivers can attempt to perform motion- predictive frame rate conversion in the receiver. The motion vectors which are used in the motion compensation portion of the digital image compression process could be helpful in this regard. Compression only requires statistically beneficial correlation between the motion vectors and image motion while rate conversion requires an accurate motion analysis in order to be artifact free.

8.2.3 Temporal Undersampling

It is common practice to temporally undersample. This leads to revers wagon wheels and many other well known temporal aliasing artifacts. In motion picture film production, temporal undersampling takes the form of a near 50% duty cycle camera. Shutter angles near 180 degrees are typical for most production, where approximately half the time is used to expose each frame, with the other half being used to pull the film down to the next frame.

In video, the temporal light sampling time for each pixel is sometimes adjustable. Video temporal light capturing duty cycles will vary from about 30% to about 100%. Short duty cycles or adjustable duty cycles are only available on some cameras, since light sensitivity is usually reduced. However, even some home video cameras have a persistence type control to allow image capture duty cycle adjustment between sharp frames and smooth motion.

Some experts feel that a 50% exposure duty cycle for each frame is the most common choice since it provides a balance between image sharpness and smooth motion. It is likely that some photography of moving objects may desire a short shutter time in order to emphasize sharpness, and to distinguish each image on each frame separately. Other uses such as "go motion", used successfully in many Lucasfilm productions, may favor smooth motion by choosing a 100% duty cycle. The 100% duty cycle is achieved in this example by a repeat motion non-real-time computer controlled camera.

In computer graphics, when motion blur is simulated, a duty cycle is usually specified. Some standard software rendering packages, such as Pixar's Renderman(R), offer a value from 0 to 1 which controls the image duty cycle for the motion blur processing.

The common use of temporal undersampling virtually ensures some degree of aliasing on images which have periodic motion or object interaction near the frame rate or its harmonics. There are also some pathological cases where image misrepresentation due to temporal aliasing can make motion vector analysis impossible. When such motion vectors are required for de-interlacing or other standards conversion involving temporal transcoding, such conversions become artifact prone due to incorrect motion analysis. Temporal aliasing is also exaggerated when temporal transcoding is employed without the use of motion vectors.

It is very likely that temporal aliasing artifacts will appear at conversion rate beat frequencies. Thus, it is necessary to use beat frequencies which are at rates as high as possible.

8.2.4 Summary of Temporal Rate Analysis

In a digital image architecture which is interoperable and scalable across industries and applications, it is necessary to accommodate displays which exceed 70 Hz in refresh rate.

A choice of either 72 or 75 Hz for the displays refresh rate could be beneficial in both advanced television systems and computer uses, particularly since NTSC may, in the future, become obsolete.

It is likely that flat panel displays such as active matrix liquid crystal color displays may begin to take hold in the market. These devices do not have characteristic flicker. Thus, it is possible to update the image at low rates, such as 24 fps when presenting film images, without resulting flicker.

It is possible that future display devices may be developed which maintain the display on an effectively indefinite pixel-by-pixel basis. These devices will be frame rate independent and will permit even greater efficiency in the distribution of visual communications.

8.3 Bibliography

Boynton, Robert M., "Human Color Perception," Chapter 8, Science of Vision, K.N. Leibovic, Editor, Springer-Verlag, New York, 1990.

Leibovic, K. N., "Perceptual Aspects of Spatial and Temporal Relationships," Chapter 6, Science of Vision, K.N. Leibovic, Editor, Springer-Verlag, New York, 1990.

Maguire, William, Naomi Weisstein, and Victor Klymenko, "From Visual Structure to Perceptual Function," Chapter 9, Science of Vision, K.N. Leibovic, Editor, Springer-Verlag, New York, 1990.

Symes, Peter D., "The Enhanced Viewing Experience -- What Does It Take?," Proceedings of the 26th Annual SMPTE Advanced Television and Electronic Imaging Conference, SMPTE, White Plains, NY, 1992.