SMPTE 595 West Hartsdale Avenue White Plains, NY 10607-1824 USA
SMPTE Publication Report of the Task Force on Digital Image Architecture September 1992
SMPTE 595 West Hartsdale Avenue White Plains, NY 10607-1824 USA Phone: (+1) 914 761 1100 Fax: (+1) 914 761 3115
595 West Hartsdale Avenue White Plains, New York 10607
The emergence of digital coding as the common language of visual communications may fundamentally change our view of the world. The extent to which this common language will affect life in the 21st century may be even more profound than the effect that the medium of television has had on life in the 20th century. Television has provided a window to the world - often real-time - for many of the 5,4 billion inhabitants of this planet. This medium of cultural an information exchange has enabled previously isolated populations to join an emerging global village - one increasingly free of barriers. The common digital language offers a unique opportunity to leverage converging technologies, such as television, computers an telecommunications, into a global communications network. Such a network would have the potential to offer a vastly augmented range of services to all system users, thus opening up new markets to all of the affected equipment and service providers.
Worldwide, there is a growing consensus that the time has come to develop standards for the television systems based on a new paradigm - appropriate for today - with forethought to future requirements. The introduction of digital technology into imaging industries, together with the widespread introduction of digital communications, creates a window of opportunity to establish a digital image architecture with unprecedented freedom of application and interconnection.
This report examines some of the fundamental issues that must be addressed in achieving a compatible set of standards enabling a globally interconnected and interoperable visual communications network. The essential concepts for this family of standards include: an open (non-proprietary) system architecture, interoperability, scalability and extensibility. It is hoped that this Report will stimulate the interest of many groups and organizations involved in the establishment of imaging standards, today and in the future, and lead to agreement on a single system, flexible enough to accommodate a wide variety of needs, while enabling worldwide interoperability.
The report was prepared by the SMPTE Task Force on Digital Image Architecture and is responsive to the Work Assignment, dated April 1991, which established the following objective:
The Report is, in essence, the outcome of a feasibility study concerning the creation of standards for digital image systems that are scalable and extensible, effecting a high level of interoperability between a diverse range of industries and applications. The work is, as yet, incomplete; however, it has already established an important though preliminary basis for a family of digital imaging standards. The Report raises many new questions and identifies additional work required to refine the concepts that form the basis of a digital image architecture. Of particular importance will be the selection of source and display refresh rates to provide performance and economic compatibility with today's television systems.
The concepts outlined can provide a basis for a modular open system architecture, in which the parameters and characteristics for each module, and the interfaces between these modules, are clearly defined and in the public domain.
Such a system should use common standard components to serve diverse needs across all affected industries. It should enable the movement of image data across application and industry boundaries without degradation and with minimum complication. This is interoperability.
Such a system should also provide the ability to adjust image parameters - temporal and spatial resolution, colorimetry and dynamic range - by varying the amount of data that is stored, transmitted, received, or displayed. This is scalability.
A digital image architecture must give forethought to evolution - to incorporate advances in technology within any module, without changes to any other module. It must be backward compatible with today's systems, and forward enabled to accomodate the technology explosions of the 21st century. This is extensibility.
The Report was prepared by a Task Force chaired initially by David Trczinski (PictureTel) and latterly by Dr. Will Stackhouse (Jet Propulsion Laboratory), with a wide participation from the computer, television, post production and telecommunications industries. A detailed list of the membership follows. The Report was considered by the SMPTE Standards Committee at its meeting of August 13th, 1992 and subsequently adopted after an in-depth review.
Will Stackhouse, (Chair) JPL Walter Bender MIT Craig Birkmaier (Editor) PCUBED Rita Brennan Apple Computer Wayne Bretl Zenith Barry Bronson (Co-chair) Hewlett-Packard Ken Davies (Ex Officio) CBC, SMPTE Gary Demos (Co-chair) DemoGraFX Hugo Gaggioni Sony Bill Glenn Florida Atlantic University Bob Keeler AT&T, Bell Labs. Thomas Leedy NIST Peiya Liu Siemens Lee McKnight MIT Robert Powers MCI Telecomms Tom Meyer Duir Assoc. Alan Reekie European Community (CCE) Richard Solomon MIT Arpad Toth Kodak David Trzcinski PictureTel Mitchell Wade DemoGraFX Ken Yang Ampex
Stan Barron NBC, SMPTE Si Becker SMPTE Rex Buddenberg Consultant Robert Burroughs Panasonic David Carver MIT Peter Dare Sony Phil Dodds IMA Charles Fenimore NIST David Fibush Tektronix, SMPTE Paul Fleischer Bellcore Branko Gerovac DEC Barry Gilbert Mayo Foundation Christopher Hamlin Apple Computer David Herbine NADC Clark Johnson Consultant Thomas Leeder NIST Bijoy Khandheria Mayo Foundation Edward Krause General Instrument Arvid Larson IEEE-USA Derrick Lattibeaudiere Panasonic Richard Lau Bellcore Bernard Lechner Consultant Michael Liebhold Apple Computer Henry Meadows Center for Telecomm. Research Francois Michaud CBC Marvin Mitchell Mayo Clinic Robert Morrow USAF Academy Robert Myers Hewlett-Packard Suzanne Neil MIT Bruce Penney Tektronix Ken Phillips Citicorp Ed Post Quark Charles Poynton Sun Glenn Reitmeier DSRC Robert Sanderson Kodak William Schreiber MIT Scott Silver Tektronix John Sprung Viacom David Staelin MIT Peter Symes Grass Valley Group David Tennenhouse MIT Greg Thagard CST Mark Urdahl IBM John Weaver Liberty Television Merrill Weiss Consultant
Norbert Gerfelder Fraunhofer Computer Graphics ISO/IEC Rainer Hofmann Fraunhofer Computer Graphics ISO/IEC Detlef Kroemker Fraunhofer Computer Graphics ISO/IEC
The Task Force, formed from representatives of the affected industries and applications, has examined the issues, setting out those that are believed critical at this time, and has modelled, for discussion, further refinement and testing, one possible approach that meets the basic requirements. It has also produced extensive tutorial information concerning the matters under consideration.
The Key Concepts of the approach are defined in Section 2, setting the conditions for image systems that are:
Section 4.0 details the critical issues in the development of a suitable image architecture meeting the stated objectives:
A model of an open architecture approach to image standards is developed in Section 5.0, one that is both compatible with the present and extensible to the future. It is based on a low order hierarchical approach, using image tiles. The model defines four levels of resolution and takes account of a number of possible aspect ratios currently in use. Additional analysis is provided regarding the selection of an appropriate family of image acquisition rates and display refresh rates. Finally a scalable coding approach is proposed that offers the ability to produce image data in packages that can be combined to produce images at a variety of spatial and temporal resolutions.
The Task Force is expected to be of interest across a wide range of industries and applications. Section 6.0 examines the industries likely to be most affected, their specific imaging needs and the possible impacts of a defined digital image architecture.
In Section 7.0 the Task Force suggests additional work that must be completed, to move towards a full implementation of the of the digital image architecture. The list of suggestions included in Section 7.0 is not exhaustive; it is recognized that in the process of validating the architectural concepts, additional areas for further analysis will be identified. An extensive list of questions is included which should be considered in the process of establishing standards for an architecture.
The suggestions include the following items of high priority:
Two reference documents were utilized in the process of creating the definitions which follow:
One of the major objectives of this Report is to define a system architecture which promotes sharing of images an equipment across applications and industry boundaries. To achieve this goal, the digital image architecture must be high flexible to deal with a variety of diverse requirements, including the evolution of technology.
A Digital Image Architecture should be an open system, that is, one made up of functional modules with standard, public interfaces which can be assembled into a functional system "a set of interconnected elements constituted to achieve a given objective by performing specified functions." Explicit objectives of the architecture include:
This requires careful attention to the definition of the interfaces -- the shared boundaries -- between the functional modules.
The key interface definitions are
Scalability deals with the ability of an imaging system to adjust the level of performance by varying the amount of data that is stored, transmitted, received, or displayed -- up to the maximum resolution that was originally acquired. A number of specific definitions are implied:
Extensibility implies designing evolution into the system. The transmission and display modules of the system should be cast as building blocks. The building blocks, because of their inherent modularity, may freely evolve over time.
However, in the past few years the evolutionary view of imaging systems has been challenged. At the 26th Annual SMPTE Advanced Television and Electronic Imaging Conference, John Watkinson suggested that we analyze the impact of digital technologies from another perspective: "To think that digital technology only impacts the underlying equipment and that otherwise it's business as usual is to miss the larger transformation that is occurring in each of the affected industries."
From Watkinson's perspective, the transition to a new digital imaging architecture represents the opportunity for a new paradigm. Proponents of this position have encouraged system designers to step back and take a global view of the impact that digital technologies are having on every industry that deals with electronic imaging; to think not just in terms of delivering ever-improving levels of image quality, but to consider what being digital really means.
John Naisbett in his 1982 best seller Megatrends: Ten Directions for Transforming Our Lives, stated that new technologies go through three phases as they become part of our daily lives. Applying Naisbett's model to the evolution of electronic imaging systems leads to the following three paradigms:
A major factor has been the geometric progression in computer processing capabilities - doubling computational power every two years, with little change in cost or size. This progression is projected to continue well into the next century. As a result, high resolution still image processing capabilities are now within reach of every computer user. Techniques once reserved for high-end workstations are now commonly applied in desktop computing, including the recent addition of full motion video as a data type.
Video has also been a major beneficiary of the technology progression. Production systems that only a decade ago required a six foot rack of electronics can now be implemented in a few rack units - or on a few cards that plug into a personal computer.
The tremendous increase in computational power has enabled another critical aspect of being digital - video encoding based on the use of digital compression techniques to reduce the required data rate. A variety of compression technologies have evolved that remove image redundancy within and between video frames. The required data rate may also be significantly reduced by more efficient coding of the image at the source. Developments of such techniques are progressing rapidly and may become useful in the near future.
While compression technology has existed for many years, and continues to evolve, practical implementations for video have only become possible in the past few years due to the rapid evolution of digital processing technologies. This in turn has stimulated new research into scalable video encoding techniques that will allow multiple levels of image quality to be extracted from a single image data stream. Some observers predict that the processing power required for the decoding of scalable digital video streams will be universal and inexpensive before the end of this decade.
Improvements in data compression perform the same function as increases in bit carrying capacity in the communications system - delivery of more bits to the user. In the past decade, increases in communications capacity of several orders of magnitude have occurred.
In such an environment, the longevity of new equipment purchases may be dependent upon a digital image architecture that is designed with adequate provisions for extensibility. To meet this objective the Task Force has focused its attention on three areas:
A research has revealed more about the physiology of vision, prevailing theory has evolved, placing major emphasis on the computational and cognitive role played by the brain and local image receptors. In turn, this research is providing potentially valuable input to the designers of digital imaging systems.
The eye contains approximately two million cones and 120 million rods. The cones are organized into three broad groups of receptors that are sensitive to light in specific spectral bands; while these bands have significant overlaps, they roughly conform to the red, green, and blue portions of the spectrum. Red and green receptors each outnumber blue receptors by a factor of two to one. The dispersion of these receptors is not uniform, thus spatial perception deals with a complex matrix of receptor types and cognitive processing by the brain.
The center of the visual field, an area called the fovea, contains 30,000 to 40,000 cones an no rods. Outside the fovea the density of cones diminishes, interspersed among the high density rods. The cones within the fovea are responsible for high spatial detail perception while the extrafoveal cones and rods play an important role in visual search and influence directed eye movement. Central vision enables use to see detail, while peripheral vision is attuned to change.
Although high spatial resolution vision is restricted to the fovea, the visual system acquires high resolution images over a wide portion of the field of view. This is achieved through involuntary eye movements; high frequency tremor, slow drift, and rapid saccade.
Research has determined that it takes several hundred milliseconds for the eye to acquire a high spatial resolution image, synthesized from a number of overlapping views. Slow drift and rapid saccade are the mechanisms used for repositioning the fovea to acquire these multiple impressions. The tremor appears to be a mechanism to remove high frequency spatial noise. The tremor's oscillation occurs at a frequency range of 40 to 80 Hz over an area approximately equal to the size of a single cone.
Since human vision is binocular, involuntary eye movements also contribute to depth perception: the brain process these overlapping views to obtain differences from which depth and spatial properties are inferred.
The spatial resolution of moving objects is also linked to eye movement:
There is evidence that the brain directs the activity of the image receptors for processes such as establishing white balance and light sensitivity levels. Simple localized analyzers are used to enhance the data transmitted back to the brain. Some of these analyzers are sensitive to a particular edge orientation; there are sufficient analyzers at each location to represent a full set of edge orientations. Additional tuned analyzers cover portions of the range of human sensitivity for spatial frequency, spatial position, temporal frequency direction of motion; and binocular disparity.
The data processed by these analyzers moves to the brain through two types of channels; a set of fast responding channels with relatively transient responses to stimuli, and a set of slower channels with relatively sustained responses to stimuli. Transient channels process the output of analyzers that are tuned for low spatial and high temporal frequency stimuli. Sustained channels process the output of analyzers that are tuned for high spatial and low temporal frequency stimuli.
Transient channels are sensitive to flickering light sources with low spatial resolution; this type of stimulation appears as wide-area flicker and is most noticeable in peripheral vision. At low levels of illumination (where rod vision is used) flicker fusion occurs at frequencies of only a few Hz; as the level of illumination increases and cone vision is triggered the fusion frequency increases.
Flicker from low light level sources such as a television or movie screen typically disappears in the range of 20 to 60 Hz. As screen size increase, taking up a larger portion of the field of vision, or if screen brightness increases, the frequency for flicker fusion increases.
Sustained channels are sensitive to flickering light sources with high spatial resolution; this type of stimulation appears as small area-flicker, often associated with moving objects. In this case the flicker fusion frequency can be much higher than for wide-area flicker; this form of flicker manifests itself as strobing of the object.
An excellent example is found in the single pixel horizontal lines often used in computer graphics. These lines do not appear to flicker on a progressive scan computer display which is refreshed at rates above 60 Hz; but if the same image is presented on an interlaced video display the single pixel lines are presented in every other field (at 30 Hz) and they flicker. This is due to the fact that the persistence of the display phosphor is of shorter duration than the refresh rate; higher scanning rates (either progressive or interlaced) eliminate the flicker.
In order for a new digital image architecture to be interoperable it must deal with existing imaging technologies. This requirement can place many constraints on the design of the architecture. It is important to understand the reasons that these constraints exist to determine if the new architecture must be similarly constrained.
As the display covers a wider field of view at higher levels of brightness, the refresh rate must be increased to eliminate wide-area flicker. If information with high frequency edges such as computer generated text and graphics, is presented on the display it must also be refreshed at a higher rate. The computer industry uses progressive scanning with refresh frequencies above 60 Hz to eliminate flicker, larger display (>=16 inches diagonal) are typically refreshed at 72 or 75 Hz.
The same requirements for the elimination of wide-area flicker are now starting to influence the development of display systems for home entertainment. At the higher end of the home entertainment market it would be desirable for displays to provide a 50 degree field of view, and be viewable at normal room ambient light levels. Such a display has resolution and refresh requirements nearly identical to a large personal computer display.
The choice of a Digital Image Architecture has implications that reach far beyond the normal realm of standards-setting activities. Telecommunications, television, and computing have made major impacts on life in the 20th century -- their integration is likely to have a profound affect on the way that the world communicates, is educated, works, plays and relaxes in the next century.
In addition to holding perceived resolution constant under varying viewing distances, it is considered desirable to provide even greater resolution in some applications, as discussed below and as implemented in current proposals for advanced television systems.
While it would be desirable to design an imaging architecture in which resolution could be scaled in a continuous fashion, a hierarchy based on a progression of related image resolution levels can provide similar benefits to system designers and simplify the process of interoperation. Section 5.2 and Section 5.3 provide a detailed analysis of the variables that affect the perceived resolution of a display and illustrates the principles of a hierarchical digital image architecture with a progression of four image resolution levels.
Throughout this report, the concept of a multi-resolution hierarchy will be discussed and refined. The Task Force has constructed a model to facilitate this discussion. It is recognized that many different sets of numbers can be used within this model. Four levels of resolution have been identified and defined; additional levels can be added to the progression, as enabling technologies allow support for higher levels of resolution. The four levels in the model are:
The evolution of electronic image acquisition systems has been driven primarily by the mass market transmission standards -- NTSC, PAL and SECAM. New applications for video such as professional and personal video systems have been enabled through the economies of scale associated with these standards.
Thus, applications which require higher resolutions than those offered by NTSC, PAL and SECAM have either been forced to bear the expense of system development and low volume manufacturing - a luxury primarily reserved for the military - or to wait for the next imaging standard to evolve. It is interesting to note that the equipment developed for the various analog HDTV systems has seen extensive use in professional applications that need the added resolution afforded by these systems.
The first two steps are
Delivering the imagery to the consumer typically involves the third step,
Finally, the imagery must be decoded for display, requiring
Some of these steps tend to be grouped with a specific level of storage quality, as illustrated in Figure 3.1. This allows a further simplification of the model based on three major system components - ACQUISITION, TRANSMISSION, and DISPLAY..
The advent of video recording provided a degree of decoupling of acquisition from the other components, allowing program producers to create program content without real-time constraints; however, transmission and display remain tightly coupled. Recording media for program content have typically been coupled to the transmission standard to take advantage of the bandwidth reduction techniques applied in the system. The design of consumer VCRs is based on compatibility with the transmission standard; packaged media played by the VCR must therefore conform to the same standard.
While interoperability between the various analog composite video systems has had to overcome differences in frame and line rates, these systems have been remarkably extensible. The acquisition, transmission and display components and the associated services of the system have evolved continuously over the past fifty years.
With the introduction of analog component video recording and processing systems in the '80s the video industry took a major step toward completely decoupling acquisition from transmission and display. The production community soon discovered the advantages of this decoupling.
By using analog component equipment for both acquisition and production, it became possible to edit video without concern for the multi-field color framing sequences that exist in subcarrier encoded composite video systems. Producers also discovered that fewer artifacts were introduced when layering video using component vision mixers and digital video effect systems. Decoupling of acquisition and production equipment from the encoded transmission standard produced far better results than could be achieved with composite video acquisition and production equipment - and the same video recorders also produced encoded outputs for transmission of the program.
To a large extent, the transition from the analog representations of printed media - type, line art, halftones, and color separations - to their digital counterparts, has been enabled by the use of scalable hierarchies for the acquisition, transmission, and display of printed materials. The tools for acquisition and production of print media have been separated from the display hierarchy, allowing output at the desired level of resolution.
Electronic transmission is also beginning to play a major role in the publishing of documents. Compact representations of printed media using page description languages, have allowed high quality print representations to be moved efficiently through the telecommunications network using low data rate modems. Remote printing of documents on fax machines or networked printers is commonplace.
The desktop publishing metaphor has been used as a model to predict similar transitions in other media industries, most notably Desktop Video. However, the transition has not occurred at the pace that many industry pundits have predicted. This is due, in large part, to the difficult task of breaking the problem up into manageable components. That is, to create separate hierarchies for acquisition, transmission, and display of motion imagery.
Interoperability of video systems with other media is facilitated a complete decoupling of the acquisition, transmission and display into separate hierarchies for each component. Such an architecture is depicted in Figure 3.3. Scalable representations of video will be enabled by this decoupling, and technological advances in one hierarchy can take place without upsetting the apple cart in the other two.
If a hierarchical digital imaging architecture is used as the model, a Digital Advanced Television System can be implemented that is equally adept in delivering low cost solutions that conform to single hierarchy, as well as more expensive scalable solutions that support multiple points in the hierarchies.
The acquisition hierarchy can provide image capture solutions at various price/performance points that are appropriate for the application. Production systems can evolve that deal with single image formats, or multiple formats within the hierarchy. This is of particular importance to producers of program content with significant archival value. Imagery can be captured at a higher level in the acquisition hierarchy with an eye toward distribution at one or more of the lower levels of the transmission hierarchy; the archival value of the program is protected as it can be released at higher quality levels in the future as consumers purchase products at a higher level in the display hierarchy.
Viewing transmission as a hierarchy is critical to the concept of interoperability. A hierarchical imaging architecture would support a progression of image quality levels that are interoperable and extensible, and allow for incremental improvements in image quality within a single transmission standard. This requires the use of a scalable encoding structure; a core image would be encoded at the first level of the hierarchy, and enhancement information would be encoded for each of the higher resolution levels supported by the transmission standard.
A scalable encoding structure may be more difficult to design and possibly less efficient for a given quality level than an encoding designed specifically for that level. It has, however, several advantages that will accrue over time:
The display hierarchy allows for a variety of products to evolve at various price/performance points that are appropriate for the application. Some display systems will evolve to single performance levels while others will offer multiple levels of performance within the transmission and display hierarchies.
Scalability plays a major role in the design of decoder and display components. If the transmission system delivers a scalable payload, only that portion of the information which is required for the display system need be decoded. A small personal information system may only need the low resolution component while a high-end home entertainment system can utilize all of the resolution components.
The current pricing structure for broad band telecommunications is typically based on channel bandwidth - the purchaser uses and pays for the entire channel regardless of the amount of information moved through it. In the future, greatly increased channel bandwidth and packetized encoding schemes using headers/descriptors for packet identification, will cause a shift in pricing structure - the purchaser will pay only for the information content that moves through the channel. This concept when applied to video services has been described as pay- per-view-per-bit.
This shift in pricing structure is likely to act as a catalyst for the rapid evolution of video compression techniques and transmission standards, with an emphasis on two areas:
Programmable decoders will be the key component in providing extensibility to the digital imaging architecture. Because of the diversity of image compression standards (Group 3 fax, H.261, JPEG, MPEG, DVI, etc.), these decoders will play an important role in the integration of video and high resolution imaging with desktop computer workstations. This same diversity, with the addition of a digital television standard (or standards) will lead toward the use of programmable decoders in home entertainment and information delivery systems. Essentially fixed solutions will drive the low end of the market, providing inexpensive mass market consumer products, while programmable solutions will dominate at middle and upper levels of the transmission and display hierarchies.
The characteristics of LCD displays are significantly different from flying spot scanning CRT displays. Flying-spot systems must operate at refresh rates above the critical frequency for flicker fusion; display brightness is limited since the spot is the only source of illumination (most of the display is decaying at any point in time).
Every pixel in an LCD display receives constant illumination. LCDs can be characterized as having long persistence; in fact, a significant design challenge has been to provide faster pixel response to deal with full motion video. This has been accomplished through the use of a transistor at each pixel location (an active matrix display), providing rapid response for pixel replenishment.
The nature of the active matrix circuit also allows a pixel value to be held for at least one second without replenishment, giving the display characteristics similar to a frame buffer. Direct addressing of each pixel location would make it possible to update only those pixels which change from one refresh period to the next. Transmission systems that utilize digital compression techniques to eliminate interframe image redundancies may take advantage of these aspects of LCD displays to implement conditional replenishment.
Over the next 10 to 15 years image acquisition and display technologies are likely to move to conditional replenishment. Image acquisition systems may evolve with on-board digital processing to implement conditional image acquisition. These cameras will be programmable, offering several advantages over scanning cameras that continuously update the entire image raster, including the ability to:
Backward compatibility to existing systems and extensibility to future systems present many technical challenges. The greatest challenge lies in preserving the value of existing infrastructures while enabling an orderly transition to the new architecture. For example, immense investments have been made in the aquisition and transmission infrastructures of our existing NTSC, PAL and SECAM television systems. Likewise, billions of consumers have invested in receivers and video recorders that support these systems. It is equally critical that investment in the vast archives of information and entertainment programming that exist today on film and video be protected, and that the new architecture unlock the economic potential of these archives.
In deliberating on these critical issues, every effort has been made to balance the interests arising from those investments with the future benefit to all of a single global standard. These deliberations have also taken into considerations the installed based of computer, medical, engineering and scientific imaging systems, and the diverse applications for still imaging in electronic publishing, visual databases and communications. Existing systems that demonstrate interoperability and extensibility - including some which have in fact been extended - were considered. Examples include the French Minitel system and the family of international facsimile standards.
The seven critical issues are:
Scalable and interoperable hierarchies offer many benefits when communications channel issues are considered. Such an approach promotes effective utilization of existing communications channels and the development of new broad band communication services. The lower levels of the hierarchy provide solutions for the capacity constrained channels that exist today. The introduction of new broad band communications services will enable the use of higher data rates to support the improved performance available at higher levels in each hierachy.
A digital image architecture that provides interoperability across applications with different spatial resolution requirements must be scalable in terms of resolution as discussed in Section 3.3. Interoperability also requires a family of related image acquisition and display rates. The greatest benefit, in terms of cost and simplicity, is gained when the display operates at the same rate as, or an integer multiple of the image acquisition rate. Though more expensive to implement, the greatest performance benefit is gained when motion compensation techniques are used in encoders/decoders to create in-between frames for display. Section 5.4 discusses the requirements for such a family.
To facilitate this hierarchical approach to a digital image architecture a scalable approach to image coding is required. Furthermore, improved techniques for video compression are likely to be enabled by the geometric progression in computational hardware. The design of the architecture must make provisions for this progression. Section 5.5 discusses the use of scalable coding algorithms.
No topic generated as much discussion in the Task Force as image acquisition and display refresh rates. This is due in part to the diversity of rates that exist in the standards and resulting practices within each of the affected industries. The issue is further complicated by the evolution of television down parallel paths with respect to field rates. Their harmonization will require solutions that lie in the realm of digital technology as well as the realm of politics and negotiation.
The choice of an image acquisition rate is a tradeoff between motion rendition and the resulting data rate. The following considerations are important in establishing a family of acquisition rates.
Experience has shown that for wide-screen CRT displays of high brightness, a refresh rate in the region of 72 to 75 Hz is required to achieve tolerable levels of wide-area flicker (see Section 3.2.5). In some situations refresh rates in excess of 100 Hz may be desirable. Receivers which operate at 100 Hz (double the normal 50 Hz interlaced scan rate) are being introduced in the 50 Hz market; rate doubling receivers operating at 120 Hz are also being developed for the 60 Hz market.
The relationship of display refresh and image update rates shoul be based on a progression that permits non-interpolative transformations between the acquisition and display rates in the new architecture (i.e., display at integer multiples of the image update rate). As an example, theatrical display of film is usually double or triple shuttered to minimize wide-area flicker of the display.
Further research into the choice of a single family of acquisition rates and display rates is required. An appropriate interoperable family should include a 24 or 25 fps image acquisition rate which would enable a 72 or 75 Hz display refresh rate. This is the subject of further discussion in Section 5.0, Section 7.0 and 4.4 The Use of Square Sampling Grids (Square Pixels) The computer graphics, image processing, and publishing industries have adopted the use of geometrically square pixel sampling grids (frequently simply referred to as square pixels). The use of square pixels facilitates:
Instead, computer graphics gravitated towards a common display technology based on square pixels. This simplified system design, which led to lower cost and better performance, enabled equipment and services to be used as commodities across a broad set of industries. Today the computer industry is a major consumer of displays, second only to consumer television receivers.
The use of a common pixel geometry eliminates the need for interpolative resampling when sharing imagery among all users. Resampling has two costs:
In the future it may be possible and desirable to extend the colorimetry representation to include a wider range of colors, possibly even including those of self-luminous objects, as one example. A close examination of this issue is needed to establish the range of colors to be represented within the colorimetry of the digital image architecture.
A similar situation to that of colorimetry exists for the representation of dynamic range transfer function. Current systems are individually optimized for the current technology and application and are not easily amenable to an increase in dynamic range. Mechanisms to effectively handle a much wider dynamic range need to be identified.
The situation is somewhat similar to that of motion picture film in which the latitude of the negative film enables exposure and color adjustment after the image capture and the S-curve of the film characteristic provides effective compression of the highlights and dark regions. Similar provisions may be required in digital image systems to provide reasonable representations for both small and large numbers of bits. A further consideration may concern the optimal distribution of any necessary compression/expansion in respect of overall image quality.
It is also important that images of differing colorimetry and dynamic range at the acquisition device should be able to be combined effectively into a single image, when appropriately scaled.
The color space and dynamic range representations that could meet these objectives require extensive consideration. Section 7.7 includes a number of questions that should be considered in the analysis of these and other colorimetry issues.
In Section 3.2.5 it was established that higher scanning rates are required with displays that cover a wider field of view and/or operate at higher levels of brightness than today's television systems. Decoupling the refresh rate of the display from the image update rate provides a mechanism to deal with wide-area flicker - this is discussed in 4.7 Identification of the Characteristics of a Digital Image Stream (Header/Descriptors) A fundamental prerequisite for interoperability in digital systems is a mechanism for identifying and describing digital image data. For this information to be shared, decoders must be capable of identifying and conforming to the incoming data. Even simple decoders - those that only recognize a single standard - must identify data streams which they can decode. This is one of the primary functions of the header. Decoders must also ignore unrecognized data, to allow for extensions to the data stream.
Descriptors provide application oriented information, such as image and coding parameters, processing history, identification of program content, copyright, and scrambling. They also enable extensibility; the descriptor may also contain the coding algorithm or language representation necessary to interpret the encapsulated data. This provides a mechanism whereby expert groups can create and standardize the transmission of messages to meet their needs.
Descriptors may be used to identify and describe data at different levels of an image hierarchy, thus allowing a display system to decode only that part of a stream necessary for its function or capability. Descriptors might also contain information about the preferred display characteristics for imagery.
Thus information such as the colorimetry of the original acquisition system, and the transfer characteristics of the process used to move images from one media to another, can be included with the data. Decoders would use this information to optimize display of the image.
The SMPTE Task Force on Header/Descriptor in their Final Report dated January 3, 1992, and approved by the SMPTE Standards Committee on February 6, outlined the criteria for the use of Header/Descriptors. Work is now progressing on the development of proposed SMPTE Standards, Recommended Practices and Engineering Guidelines.
This is by far the most critical issue of all, so much so that its impact is clear in the discussion of many of the previous issues. Only the last of them, the use of headers/descriptors, is without precedent in existing entertainment industry practice. It is precisely where a dichotomy exists in current practice that the greatest controversy arises - on the issue of temporal rates.
The convergence in being digital may provide the solutions which will resolve the temporal rate issue; convergence around the common language of digital coding, the progression in CPU performance, and the ability to design inexpensive modular interfaces in the form of mass produced microchips.
It is likely that a number of solution will evolve to facilitate interoperability between the existing world of film and analog television, and the new digital image architecture. These solutions should provide a variety of price/performance options appropriate to the applications requirements.
To illustrate the model, specific numbers have been chosen that take advantage of the mathematical relationships discussed in Section 4.0, as well as the architectures of digital memory and processing components. These numbers are not intended as the basis for a standard, but rather, provide a starting point, from which the validity of the architectural concepts can be verified. Further work is required for verification of the model and determination of the exact numbers, upon which a standard can be based (see Section 7.0).
The following parameters of a hierarchical digital image architecture are discussed in this section:
For a digital image architecture to be cast as an open system, two steps are required:
It can be argued that there is no need for rigid architectural standards in a digital world; that programmability in the transmission and display hierarchies provides a sufficient basis for interoperability. Perhaps some day this will be true. If the goal of longevity for the first digital image architecture is achieved, it is likely that the designers of the next imaging architecture will be less constrained than we are today.
The first digital image architecture however, must provide a bridge from the closed systems of the past to the open systems of the future. The fundamental structure of the digital building blocks and economies of scale associated with standardization suggest that the organizations charged with establishing these standards work in harmony.
If the resolution of a display is held constant and the viewing distance is a variable, the resolution perceived by the viewer - measured in cycles per degree - will increase as the viewer moves away from the display. Therefore, all displays can be considered to be high resolution if viewed from an appropriate distance.
At a distance the varies with the visual acuity of each individual, the actual resolution of the display equals the limit of that viewer's ability to resolve image detail. Beyond this viewing distance additional image detail cannot be perceived; that is, the display has more resolution than is required for this viewer and set of viewing conditions.
In some cases excess resolution may be desirable. For example, the operator of a personal computer can typically reduce the viewing distance to a high resolution desktop display by one-half, simply by leaning forward, thus taking advantage of additional resolution improves enough to be significant, while moving 15 inches in a movie theatre would have little effect on perceived resolution.
The NTSC transmission standard was designed to provide a resolution of approximately 21 cycles per degree over a viewing field of just under 11 degrees. Display size can be variable in today's television, ranging from a diagonal of a few inches (a personal display) to more than 30 feet (direct view displays in stadiums and projection displays in controlled lighting environments). These displays differ only in the size of their pixels. At the appropriate viewing distance, the perceived resolution of the personal display and the stadium display will equal the design goal of 21 cycles per degree, and both displays will cover 11 degrees of the observer's field of view.
Many display applications require higher levels of perceived resolution. To increase the level of perceived resolution, while holding viewing distance constant, additional samples of the same image must be added, increasing pixel density. To cover a wider field of view, as in wide-screen displays, holding the same viewing distance and perceived resolution, new information, at the same pixel density, must be added to extend the picture.
With personal, home entertainment and theatre displays, the viewer can vary the distance from the display, and thus vary the perceived resolution, over a significant range (see Figure 5.1). Taking into account the variations in acuity in the population, and variations in viewing distance for each application, it is common practice to design a display system for the average viewing conditions in each application. The overlaps in cycles per degree between low, normal and high resolutions are shown in the table to account for these variations.
Resolution Cycles per Degree Low 1 - 15 Normal 10 - 25 High 20 - 30 Ultra High 30 - 40A special case exists for head mounted displays which provide a fixed viewing distance; here the display manufacturer must select the level of resolution appropriate for the application and then design for a specific perceived resolution.
Using these guidelines, a high resolution display designed for a 35 degree field of view would require about two thousand pixels per line at 30 cycles per degree. In a desktop computing application where the viewer is 30 inches from the display, the length of an active line (display width) would be about 19 inches. In an entertainment application, such as a consumer television receiver viewed from a distance of 108 inches (9 feet), the length of an active line would be about 68 inches.
These examples are illustrated in Figure 5.2. In this figure the principles described in this section are used to illustrate the relationships between the four resolution levels of the model hierarchy and a variety of display applications. The numbers, especially as they relate to image size (in pixels) are entirely relative; they serve only as examples of the pixel count required, at average viewing distances and fields of view, to achieve the specified perceived resolution.
It is important to note that seemingly diverse applications such as personal computer and home entertainment displays have similar resolution requirements as the size of the home entertainment display increases beyond the narrow field of view of today's television receivers. It is also important to note that direct view CRT displays (which are currently limited to around 40 inch diagonals) require resolution in the normal range for home entertainment applications.
It is noteworthy that such sequences also appear in the computer processor and memory component industry. This approach takes full advantage of the generic building blocks that are the driving force in the transition to a digital world.
In order to provide continuity between the various resolution levels of the hierarchy the model is based on the concept of an image tile. For the purposed of this discussion, a tile can be considered to be a constant portion of an image, representing the same part of the image regardless of the resolution level or image size. Thus, at each higher level in the hierarchy, the resolution within a tile doubles in each axis. This is illustrated in Figure 5.3.
The power of two progression may now be applied to determine the resolution, in pixels, for each level in the hierarchy.
Resolution Pixels in Level Name in Cycles Pixels in 32 x 32 per Degree One Tile Tile Superset 1 Low 1 - 15 16 x 16 512 x 512 2 Normal 10 - 25 32 x 32 1024 x 1024 3 High 20 - 30 64 x 64 2048 x 2048 4 Ultra 30 - 40 128 x 128 4096 x 4096 HighIn this model a tile represents an area equal to 1/32nd of the image at any level of the hierachy. Thus each level consists of a 32 x 32 set of tiles (see Figure 5.3). The selection of this fraction for a tile is arbitrary; it was chosen because it is a convenient building block - integer multiples can be used to construct displays at all of the aspect ratios and spatial resolutions discussed in the model.
The diagram in Figure 5.3 establishes several important relationships that provide a bridge to the past and illustrate how interoperability can be achieved:
Thus, using tiles and only four resolution levels, it is possible to construct a display for virtually every possible application; furthermore this display can also be used to show imagery from other levels of the hierarchy. This is especially practical if a scalable coding architecture is implemented that conforms to the same resolution progression.
Since significant archives of high resolution program material exist on film, which was acquired at 24 or 25 fps, one of these rates should be included in the progression. A progression based on integer multiples of 12 would include 12, 24, 36, 48, 60, 72, 96, 120 Hz, etc. A progression based on integer multiples of 12,5 would include 12,5, 25, 50, 75, 100, 125 Hz, etc. These progressions might also include integer fractions of 12 or 12,5 (e.g., 1/2 or 1/4 of the base frame rate for applications such as videoconferencing and searching of video databases.)
It has been common practice in Europe to display 24 fps film at 25 fps for compatibility with PAL and SECAM.; this results in a 4% speed increase. Many European programs produced for television distribution are acquired at 25 fps; if the family of rates is based on 24 fps, these programs would be played 4% slower. As indicated in Section 4.8, further research is required to determine the impact of choosing one of these rates, on those industries that utilize film for image acquisition.
Ideally, compatibility with existing electronic imaging systems should be accommodated in the design of the standard modules that will interface these systems with the digital image architecture. By design, this would place the burden of compatibility on the systems that are being replaced rather than products that conform to the new architecture; thus the future will not be constrained by today's limitations.
In the process of developing the existing analog and digital high resolution television systems, the designers of these systems have demonstrated the practicality of such a modular approach to interoperability. A variety of translation devices have been demonstrated that allow interoperation between PAL, NTSC, HD-MAC and MUSE. The interface modules that will be required to transform the signals from these systems (especially NTSC and PAL) into the new architecture, offer the potential for large volumes. It is likely that the market for these modules will be characterized by intense competition, leading to a range of solutions at various price/performance levels.
In the near term the choice of a family of rates based on 12 or 12,5 Hz would provide optimally low cost and high performance, for both advanced television and computer uses, as well as providing global interoperability. In the longer term decoupling of acquisition, transmission and display is likely to lead to entirely new approaches to pixel replenishment that may render the current concept of image acquisition rates and display refresh rates meaningless.
This approach enables extensibility. For example, the coding of low resolution imagery might remain unchanged to provide compatibility with existing decoders, while new coding methods, made possible by the geometric progression in computational hardware, can be introduced to support more advanced imagery. Increasingly powerful (and affordable) programmable decoders can provide compatibility with the standards that form the foundation of the digital image architecture, and the additional processing power required for future enhancements to the architecture.
It is becoming difficult to draw the line, even today, between consumer electronics and computers. Today's video game machines, already in millions of homes, are marketed as consumer accessories to televisions, but are in fact, more computationally competent than personal computers of only a few years ago. Similarly, personal computers are being marketed to the home market through traditional consumer electronics channels.
Traditional business factors should always be considered. These include equipment replacement costs, amortization, benefits, competition, market needs, and access to material.
Successful industry participants will both pay close attention to emerging trends and help to bring them about. Sometimes, deep pockets may be required to create a market. (It took years of major losses in both equipment and programming efforts before color television became profitable.) In contrast, agreement on a common architecture across a wide range of industries and applications would spread the costs and encourage early adoption.
The groupings used for this report help to relate application requirements to industries. It is well understood that there is already much overlap between industry groups and applications.
The industry groupings are as follows:
The technologies used in these fields are highly dependent on downstream profits. It can be difficult to justify large investments (e.g., an HDTV production facility) in new technologies that can only be utilized by a small portion of their market. Smaller investments that require minimal infrastructure changes (e.g., MTS stereo, VHS-HQ) can be more easily justified, particularly when end-users can benefit with existing equipment or rapid upgrade is anticipated. Backward compatibility and extensibility are key issues here and can only be successfully violated when there are substantive benefits to the end user (e.g., audio compact disc).
Revenue streams can often be anticipated to flow well beyond the initial release of the product. Residuals from syndication, rentals, and sales require that providers anticipate future trends in end-user viewing equipment capabilities. This is one reason why most prime time television is shot on 35mm film and not video.
There is some effort to establish a video dialtone similar, in concept, to today's voice telephone dialtone. As communication networks increase bandwidth, and compression technologies improve, an increased use of remote real-time visual communications can be expected.
These same advancements also facilitate rapid downloading of video information from media servers; At a 100:1 compression ratio, the data for a typical motion picture could be transmitted in a few minutes over a video capable network.
Because of the universal proliferation and conversion standards for the telephone, it is likely that we will soon see extensions of current fax standards including: voice fax (voice mail), high resolution color image fax, and video fax (video mail). One of the driving forces behind the development of the JPEG image compression standard was the need for an efficient data reduction technique for the transmission of still images.
The telecommunications industry is well down the road in the establishment of digital imaging standards. The CCITT, which controls fax standards worked with the IEEE on the JPEG standard and the videoconferencing standard, know as P.64 or H.261. These groups are also responsible for the MPEG family of moving picture standards. JPEG and MPEG I and P.64 form the basis for the first generation of image telecommunications products that are already starting to reach the market.
These standards were designed with a high degree of flexibility to deal with a variety of imaging applications; they have served as excellent examples for the Task Force in the area of interoperability, and scalability. Currently the MPEG group is working on extensibility; MPEG II is targeted for the delivery of higher quality motion image data streams in the range from two to forty megabits per second. The MPEG working group is investigating scalability as a requirement for this extension of MPEG. It would be beneficial for these new standards to relate harmoniously to other digital imaging architectures.
The merging of both broadcast and interactive voice, image (including graphics and video), text, and data across diverse transport media will create challenges in properly matching the information with the delivery mechanism. Current efforts to implement interactive television, for example, use differing transmission media for each direction (e.g., broadband in; telephone or cellular radio out).
Factors such as existing infrastructure, projected time and cost to deploy, bandwidth cost, regulatory issues, nature of the signal, target viewer, compression, error sources, localization, security, latency, etc., need to be considered.
The communications infrastructure deployed for the entertainment market could provide a profound leverage for the information domain. For example, a broad consumer demand for access to high bandwidth entertainment (and other) services could accelerate the national installation of fiber-optic cables. Once in place, these high bandwidth networks could also be used as high performance links to super-computers and very large data bases, and broadly distribute real-time business, engineering, and scientific data.
While installation of fiber-optic cable to a major user base can take many years, new or existing satellites can cover huge population areas very quickly. A variation of direct broadcast satellite (DBS) transmission is spot-beam satellite technology. In this approach, as few as three satellites could be used to provide localized high quality (HDTV) signals to small inexpensive receiving devices in as many as 150 geographic areas within a country the size of the continental United States.
The computer, medical, and graphics industries could similarly benefit from harmonious formats that would allow them to produce image generating, manipulating, managing, storing, and viewing applications and devices at reduced cost and increased interoperability.
Some specific industrial application areas include security equipment for surveillance and identification and product and process inspection.
This will create opportunities in the receiving devices, the electronic components that go into them (e.g., semiconductors, light sources and modulators) and the subsystems (e.g., displays, tuners, and signal processors). The likely emergence of new product categories can both heighten and personalize the entertainment experience.
Ancillary devices (e.g., tape and disc recorder/players, camcorders, editing, processing, sound systems, printers, scanners, interactive peripherals) will be additional sources of added value products.
It is likely that computer control technologies will play an ever increasing role in home entertainment and information systems. The integration of all of the equipment listed in the preceding paragraph in the home entertainment environment has proven to be a major problem - and a significant opportunity. We have seen programmable remote control devices evolve to replace the profusion of separate infrared controllers (TV tuner, cable tuner, VCR, laserdisc, audio CD, radio tuner, etc.). The integration of the graphical user interface from the world of desktop computing with the home entertainment/information system has begun.
Collaborative cross-industry efforts will merge computers into home entertainment networks, dealing with the issues of component integration, connection to multiple sources of entertainment and information, user interface, and "user friendly" programming of the system. Various flavors of "personal computers" in the home will be able to connect to this network as well as intelligent appliances and remote control devices. Inexpensive networkable cameras will allow remote visual monitoring; the front door; the baby's room; etc.
To provide specific types of information to users, new classes of specially tuned information appliances will likely develop. These appliances will rely on information providers to collect, generate, and organize information. In the education market, for example, an information appliance might be tuned toward providing everything a student needs to progress through a particular class. Besides basic course content, texts, lecture notes, assignments, etc., it could make extensive use of imagery to provide interactive multimedia tutorials, remedial help, lab simulations, extensive reference material, electronic messaging, and smart links to classmates.
In the information age, a critical challenge is the productive management of the overwhelming amount of information produced each year. Unfortunately, images and video tend to make this problem even greater. While database search engines deal reasonably well with keyword searches and inverted indexes on textual data, corresponding tools for other media have tremendous opportunities for improvement.
Museums and libraries could use electronic file systems to catalog and view very high resolution images of the masters. Sculptures and other three dimensional objects could be shown on stereographic or holographic displays, or printed on very high quality large format printers.
The role of the artist and graphics designer has changed dramatically as the quality and flexibility of the "electronic canvas" has come to emulate the various forms of traditional media. Just as the camcorder has allowed many budding cinematographers to explore their art, high resolution drawing tools with interactive training are revolutionizing electronic publishing and winning over graphic artists. Many artists are expanding into new markets such as videographics and animation from this electronic base.
Traditional forms of printing and publishing information delivery will continue to exist alongside of newer mediums. Electronic billboards could change messages by day of week or time of day. Electronic books, magazines, catalogs, and advertisements can integrate interactive video and other media to tell a story, make a point, or sell a product. They can also elicit information from the user that can provide useful information to the publisher (e.g., "hard to understand this concept," "would like product in green").
Institutional training represents the high end of the educational market. An economic return on investment can often justify the use of expensive technology to maximize training "productivity" since the employee students are being paid wages while not working. Increased use of sophisticated interactive multimedia tools developed and used in these environments could find derivative use in public classrooms and the home.
This community has often utilized high-end versions of consumer technologies (e.g., TV CRT/Workstation CRT). Their role in leading versus leveraging the next generation of imaging systems is not clear. The existence of a proper digital image architecture will reduce barriers across applications, platforms, and markets.
High resolution imaging can be useful in radiology, microscopy, patient monitoring (especially during surgery), and consultation with specialists in a remote location.
Image requirements can be very stringent. Doctors sometimes use a magnifying glass to look for subtle changes in gray level on an x-ray. Image fidelity is critical.
Training simulators, perhaps utilizing virtual reality techniques, can provide medical students with improved environments for learning over classroom and cadaver procedures.
Although the spatial resolutions and signal integrity requirements may exceed many other applications, the healthcare community would like to benefit from harmonization with other digital image architectures.
Typical applications include radar and other tracking, surveillance, flight simulation, general training simulators, mission/situation control rooms, instrument control panels in aircraft/vehicles, satellite imaging, virtual reality, telepresence, and cartography.
Increasing emphasis is being placed on dual use technologies. The community learning network is one example of using advanced imaging technologies for both government and civilian education.
Both input (i.e. response time) and output synchronization should be considered. Acceptable synchronization can vary with image content. For example, voice should have excellent synchronization when an actor's lips are seen, but less synchronization is needed when the actor is off camera. Background music can accommodate even less synchronization as long as it is not keyed to action or scene transitions.
Motion inputs (e.g., physical controls, gestures, head or eye tracking, facial expressions), and outputs (vibration, g-forces, wind) can also have varying needs for synchronization.
Each of these represents a range of opportunities for industries and
applications.
Image capture/acquisition/creation includes:
Good light sensitivity is an important factor in available light location shots. In studio productions, sensitive cameras can reduce equipment and electrical requirements for lighting and resultant air conditioning.
In scenes with high subject motion like sporting events, an image sensor configured with quick response to motion is important. A fast scan rate is the overriding factor here, particularly when minimal single frame blur is important (for slow motion or single frame playback). Current technology favors interlaced image sensors for this type of application, however post-processing or future technical advances in image sensors can be applied to eliminate interlace artifacts before the signal goes very far down the image path.
The scan rate of an image sensor should also relate compatibly to the frame rate of its source material (e.g., movie film), and/or the anticipated frame rate of the viewing device.
Spatial resolution can be expected to improve for both still and motion image sensors. As described elsewhere in this Report, square grids and properly scalable array geometries are important factors in providing extensibility.
Specific applications can require high spatial, but lower temporal resolutions. Image scanners fall into this category. Used for medical X-rays, hard copy scanning, film conversion, and fax, a common characteristic is the need for high image integrity (e.g., error-free image sensors, lossless compression, robust error correction).
An ideal image sensor would be able to resolve the entire range of color tints and hues visible to a human eye over a very wide dynamic range. It would also have well defined electrical transfer characteristics. Falling short of this, it is important that the colorimetric transfer characteristics be sufficiently defined to accommodate faithful propagation throughout the image path.
Future image sensors will likely contain increasing amounts of on-device signal processing in the form of motion detection, compression, and error detection and correction. They may not be scanned, but interrupt driven, responding to changes in the image. Devices may even begin to take on some functional characteristics of the human retina.
Processing includes:
Presentation includes:
Some applications require bi-directional capabilities. Some examples are: interactive communications, on-demand programming, pay-per-view, and client/server models.
In a client/server structure, the presentation "client" device may be physically separate from the reception/reconstruction "server." This model might apply to both robust and upgradable servers in a home neighborhood or on a computer network within an engineering office environment. In either case, the server would need to be able to interrogate the client so that it could properly reconstruct the presentation information.
Applications that can live within current display constraints, or rapidly utilize or promote advancements in both flat screen and projection (and to a lesser degree, direct view CRTs) will be in the best positions to prosper.
Potential display image sizes can range from a wrist watch to a planetarium. General factors that should be considered in specifying a display include: number of viewers, viewing conditions, spatial and temporal resolutions, pixel size and shape, lithographed versus variable picture elements, refresh rate, brightness, density, color gamut, micro defects, aging, reliability, aliasing, artifacts, aspect ratio, overall display image area, display package size, power requirements, and cost. Some of these factors, not already discussed, will be expanded on.
There are two general categories of viewing environments: single viewer and multiple viewer. Traditionally, single viewer display sizes have been smaller (<17 inches) and the applications have been more "task" oriented and interactive (e.g., computer display). Multiple viewer displays have been larger (>19 inches) and been more "entertainment" oriented (e.g., TV.) and passive.
Both of these traditions are changing and will continue to do so. In particular, a proliferation of single viewer entertainment displays (e.g., personal TV's and games), and multiple viewer task displays (e.g., electronic white boards and overhead projection panels) will be fueled by continuing advances in display technologies.
Viewing angle is an important factor in tuning displays to applications. The viewing angle is a function of display size and viewer distance. For a constant spatial resolution at the viewer's retina, overall display resolution needs to increase as viewing distance decreases.
As screen sizes increase and images get brighter, flicker becomes more of an issue in scanned displays. Even a 72 Hz scan rate can produce noticeable flicker with younger viewers in some situations. At a 50 or 60 Hz display scan rate, screens with high brightness that cover a wide field of view can produce objectionable levels of flicker.
It is important to separate flicker produced from scanning a display (commonly a flying spot on a CRT), from other causes (e.g., capture, conversion, interlace, signal processing artifacts).
Head mounted viewing devices (glasses, or goggles) could make single user, low cost, high resolution displays practical. Additionally, a viewing device with dual displays (versus a mirror arrangement) would have the inherent ability to display in stereoscopic images. This type of device could have both task and entertainment applications, would have a size and privacy advantage for portable applications (e.g., portable computing, viewing proprietary videos on airplanes), and operate well in poor ambient lighting situations. It could also be the lowest cost way to deliver high resolution images to the early consumer market.
Future displays might also provide stereoscopic images without special viewing glasses and virtual holography (stereoscopic images with multiple viewing perspectives). Image architectures would need to pay particular attention to accommodating the latter.
As the market demands increasingly improved display capabilities, entirely new technologies and display structures may come into being. Some features which could find their way into future displays include: directly addressable image elements, layered structures with control over picture element persistence, variable spatial and temporal resolutions across the surface, on-display scene creation and manipulation, fixed eye position displays that map resolutions to match human retinas, eye tracking displays that tune resolutions across the surface to produce an optimal image to the viewer.
Although current sound technology parameters are close to or beyond human audible capabilities, there is a vast chasm of opportunities to be filled before we approach our optical limits. There are more mundane limitations as well.
For example: While one might imagine a 20 meter diagonal display with 0.1mm pixel pitch, there are practical limits to both physical display size (how many home living/entertainment rooms could support such a large screen?) and human capabilities (one would need a magnifying glass or binoculars to appreciate such spatial resolution.) On the other hand, close examination of images from the old masters might justify just such a display. And topological images might even require that this physically limited display be manipulated to bring in additional portions or viewpoints of the larger source image(s).
Wrist watch display
Gating Technologies
A gating technology sets the pace for advancements in technology products and systems. For most of the history of television, the display was the fundamental gating technology. Only in recent years, has this role shifted to the transmission standard itself.
It should not be assumed that future architectures will be gated by display technologies over the long term. Other elements in the image path should be carefully evaluated as to their potential impact as gating technologies.
At least three diverse strawman applications should be selected as test vehicles. Interoperability between these should be verified. Candidate applications include: broadcast television, multi-media computer workstations, medical x-ray, virtual reality, flight simulation, video phone, scientific visualization, and client-server networks.
Specifically, breakthroughs in image capture, display, communications, storage, and signal processing technologies could all have a profound effect on future image based applications.
As discussed in Section 2, the concept of scalability refers to the ability to extract higher and lower quality results from a common signal format. The concept of extensibility indicates the need to accommodate future enhancements in systems due to the rapid pace of technology.
Scalability and Extensibility require that many of the following areas have mechanisms for increasing and decreasing:
The following issues support the use of simple fractions when scaling resolution:
7.1.3 Further in light of the many such fixed size, resolution, and raster format devices, would not square pixels be an important consideration? How many industries and applications require square pixels, and would be hurt if a non-square pixel format were chosen for advanced television? How severe would the degradations in quality and conversion cost be in such a case? Conversely, how many industries and applications not requiring square pixels would be hurt if a square pixel environment were imposed upon them?
7.1.4 If simple fraction resolution transformation guidelines are deemed worthwhile, should there be a numerical basis of certain base resolutions? Is the horizontal or fast axis more critical for the basis due to digital design considerations? If so, are powers of two the optimal basis for horizontal resolutions, such as 512, 1024, 2048, etc.?
7.1.5 When transcoding resolution, what parameters are required to perform optimal transcoding? Is the bandlimiting associated with transcoding at ratios other than simple fractions an acceptable degradation? In what industries/applications is it acceptable, and not acceptable?
7.1.6 CCDs, Active matrix liquid crystal displays and projectors, and computer generated images, and other image scanning, generating, and displaying devices can produce digital image values which are not bandwidth limited. Further, it is common for computer displays to use text, windows, and graphics which are aligned to the raster and which use maximum bandwidth signals such as white lines of pixels on black and black dots on white, etc. Given that such non-band-limited signals are common and useful in many industries, is the issue of requiring band limiting for transcoding, compression, or coding problematic? What industries would be significantly hindered if high definition systems required band limiting?
7.1.7 What useful increments of scaling might be best for a resolution hierarchy? Factors of two, being one optimally decodable resolution per octave? Two samples per resolution octave such as 3/4 and 1/2, or 3/2 and 2? Or is continuously variable resolution, and associated band limiting a requirement in some industries/applications?
7.1.8 Should an image architecture emphasize the ability to apply more resolution to some screen areas than others? Or should constant image resolution and quality be mandated for all areas of the image? Is the answer different for different industries/applications? What problems might arise if such a signal format were considered for production? What issues arise within production switchers for such formats?
7.1.9 If some image areas are updated with different resolutions, or temporal rates, than others, should the universal header or descriptor contain this information and make it visible to all devices, or is it acceptable if such information is hidden within the data stream?
7.1.10 Are image region rectangular and square structures such as that proposed as tiles, a useful construct in providing interoperability and flexibility in image update?
7.1.11 How likely are future image structures which are not xy raster based such as hexagonal or poisson distribution samplings? Is it possible to develop an image architecture which has mechanisms to accommodate such structures in the future? Can we anticipate the transcoding steps between a square-pixel xy raster and a uniform hexagonal or poisson distribution raster and thereby do our best to allow for such future possibilities?
7.1.12 How completely should image filtering and processing histories be specified in order to support subsequent image processing operations? A knowledge of the concatenation of all pre-filters may be desirable in complex image operations. How lengthy are such histories likely to become?
7.1.13 How do flying spot devices such as CRTs and cathode-ray cameras, relate to fixed raster "lithographed" devices such as CCDs and active matrix liquid crystal displays and projectors? How can the high definition image architecture accommodate both types of image sources and displays without substantial quality loss? Can both types of image data be processed in the same transforming devices using the same parameters, or are different processing steps required for the two different types of image data?
7.1.14 Is there a representation appropriate where an idealized pixel can be generated through signal processing? The purpose of such an idealized pixel would be to be used as input to resolution scaling (also known as resolution transcoding). Would the digital signal processing required to create such an idealized pixel result in unacceptable artifacts due to the footprint of the convolutional processing kernel?
7.1.15 In traditional television, there was no possibility of color dot triad alignment with scanlines or pixels. However, with the advent of lithographed displays, such as active matrix flat panel displays, the relationship between the color triad and the pixel becomes exact. Is there an optimal organization of color area portions in the context of a digital image architecture? Should the colors be overlayed onto a common area through the use of lenslets, fibers, or other techniques? Should the pixels be adjusted so that they even overlap through such optical techniques in order to reduce blocky appearance?
7.1.16 If the color regions representing a pixel must remain spatially distinct, is there a particular arrangement which is optimal? If so, should the precise positions of the color sub-pixel areas be taken into account in the digital image architecture and in the representation, capture, and processing of the digital image signal? What effect do the gaps between color regions have? How do lenticular or Fresnel screens affect color?
7.1.17 If the color regions representing a pixel must remain spatially distinct, could more than one triad be placed within one logical pixel? Is there an optimal number of such sub-triads, such as perhaps four? Is there an optimal arrangement of such sub-triads? If so, should this arrangement be taken into account in the digital image architecture and in the representation, capture, and processing of the digital image signal?
7.1.18 If a color space where to support more than three primary colors, would there be benefit to using four or more primaries in some appropriate configuration as a standardized pixel shape?
7.1.19 On a CRT, the spot shape is usually a round or ellipsoidal gaussian which flies horizontally. On a flat panel display, the spot is usually a square shape which is stationary. There may also be dead-zones between pixels. What signal properties should be adjusted to take these issues into account?
7.1.20 Could some of these issues be handled by the use of an idealized or standardized pixel representation, with defined transformations at the receiving device appropriate to its particular pixel configuration? If so, how should the digital image architecture specify this?
7.1.21 What should be done concerning similar issues in CCD image sensors?
7.1.22 What is the impact of these various issues on the Kell factor?
7.1.23 Would other pixel shapes such as triangle, hexagonal, and diamond have advantages for future image capture and display technologies?
7.1.24 The 1.333 : 1 (4 x 3) aspect ratio is widely used in television and computers. In the motion picture industry, 1.37:1, 1.66:1, 1.85:1, and 2.35:1 are all commonly used. The 8.5" x 11" page has an aspect ratio of about 0.77: 1. The European page size approaches 0.71: 1. Computer display memories are most simply organized with aspect ratios which are simple fractions such as 1:1, 3:2, and 2:1. Medical radiology, still photography, newspapers, magazines, books, and other images have a number of commonly used aspect ratios. How do we achieve support of all of the aspect ratios in common use?
7.1.25 Can the header/descriptor be used to indicate the aspect ratio and resolution of an image, so that the displaying device can do its own version of letter boxing (unused areas) or overscan (discarded areas)? For those systems which have compressed frame groupings, how could the edges of the letterbox be protected from the moving blocks?
7.1.26 What would be the most widely used mappings between common aspect ratios and anticipated common screen sizes?
7.1.27 If the 16:9 aspect ratio becomes popular, what mappings are likely for European (metric) and American (English) sized paper pages, wide screen movies at 2.35 : 1, television and film at 1.333 : 1 (4 x 3), movies at 1.85:1, and other widely used image formats?
7.1.28 Can the digital image architecture support not only a wide variety of aspect ratios in the material being displayed, but also a wide variety of aspect ratios at the receiving display itself?
7.1.29 If the digital image architecture supports multiple aspect ratios, with interoperability between displays at such various aspect ratios, what are the key technical issues? Should the horizontal resolution of all aspect ratios be held at simple fractional relationships, while allowing the vertical resolution (with square pixels) to vary in fine increments to fit the exact aspect ratio desired at the display?
7.1.30 The diagonals of the camera apertures of common film formats have dimensions as follows:
Format Diagonal 35mm Full Aperture 31.14 mm 35mm Academy 27.16 mm 35mm Still 43.27 mm 65mm 57.30 mm Professional Roll Still 101.1 mmThere are many high quality lenses in existence for each of these formats. Would it be useful to keep these dimensions in mind when developing lithographed sensors such as CCD arrays?
7.2.2 Are there sufficient mechanisms available in temporal properties of high definition systems to handle the issue of computer CRT displays requiring refresh rates higher than 70 Hz?
7.2.3 Is there a mechanism for reliably and consistently transforming high definition television imagery to 24 frame per second film for theatrical release?
7.2.4 Should there be a family of temporal rates which are related by a simple fraction rule? What should be the numerical basis of such rates?
7.2.5 When temporally transcoding, what temporal beat frequencies are visually acceptable? Does the 12 Hz beat frequency of the 3-2 pull down, and its wide use and seeming acceptance, indicate that 12 Hz or higher is an acceptable beat frequency, or are there frame patterns in which higher beat frequencies are required for acceptable viewing?
7.2.6 What sort of synchronization mechanisms are optimal for digital systems, given that inherent digital system flexibility need not require every device to be locked to a common master very-high-frequency oscillator (near 100 MHz)?
7.2.7 CCDs, Active matrix liquid crystal displays and projectors, and other devices, utilize a portion of horizontal or vertical retrace intervals to transfer to/from frame buffers. Future systems may not require these intervals. How will this affect the need to dedicate signal time to these intervals?
7.2.8 Given that CCDs, active matrix liquid crystal displays and projectors, and other devices, have no inherent flicker or update rate requirements, should temporal rate flexibility be part of the high definition architecture?
7.2.9 Should an image architecture emphasize the ability to use a higher update rate for some screen areas than others? Or should constant image rate be mandated for all areas of the image? Is the answer different for different industries/applications?
7.2.10 It is common to use a 50% temporal duty sampling cycle (180 degree shutter in film cameras to allow film pull-down), which provides a balance between motion blur and sharpness. Is not this temporal undersampling certain to introduce temporal aliasing during temporal rate transcoding? Is not such aliasing certain to appear as artifacts which occur at the temporal beat frequency rate? (e.g. a 50 Hz to 60 Hz transcoding would have a 10 Hz beat frequency)
7.2.11 Some CCD sensors used in cameras see the entire frame area during the exposure time. This is similar to film exposure. Some tube cameras scan the image top to bottom, whether progressively or interlaced. What are the temporal processing, displaying, and viewing effects caused by mixing devices which integrate the entire image versus those that scan the image from top to bottom? What temporal issues arise due to the fact that the top of the image may be seen or displayed nearly a frame time before the bottom of the image, and half a frame time before the center of the image?
7.2.12 As just mentioned, both displays and sensors exist which scan from top to bottom or which integrate the entire image for the frame time. What architectural issues should be examined in attempting to take this issue into account? What issues are involved in converting a scanned image for area display or in converting an area sensed image for scanned display? How do these issues affect film scanning such as in a telecine? Does the wipe time involved in the physical film camera shutter have an affect? How do these issues affect film recording from an electronically captured moving image?
7.2.13 How do these scanning pattern issues affect temporal transcoding, finding motion vectors for compression or standards conversion, effects processing, or other image processing operations? What issues arise when compositing or mixing multiple image sources captured with different scanning patterns?
7.2.14 Standards converters which convert between 525/60 and 625/50 are available at various levels of cost/performance, utilizing a number of techniques. At the highest level of performance, motion estimation may be employed to interpolate frames. Undersampling due to interlace, may have an impact of these process. What problems may thus arise for architectures that rely on temporal transcoding or standards conversion?
7.2.15 When displaying multiple windows of moving images on a screen, as in a future video teleconference, how can buffering be minimized for each picture stream in order to achieve display synchronization? What options are available for local, regional, and global synchronization, both loose and near exact?
7.2.16 Is there a benefit to selecting a particular master oscillation rate, from which pixel clocks in the scalable system are derived? If so, what candidate rates might offer advantages?
7.2.17 Some applications, such as teleconferencing, interactive flight simulation, or virtual reality, require low latency. Other applications, such as broadcast television, can have substantial latency without much problem. What digital image architecture mechanisms are needed to provide for those applications which require low latency.
7.2.18 Compression algorithm design is significantly affected by a low latency requirement. What latency is implied by any candidate digital advanced television systems. What affect would such inherent latency have on usefulness for those applications requiring low latency?
7.2.19 Digital network design is affected by needs for real-time bandwidth as well as latency requirements. How does the need for low latency combined with high real-time bandwidth in these industries affect digital interactive network design?
7.3.2 If overlay planes are used, should these overlay planes be implemented in hardware, or as a virtual mechanism in software, or can both be accommodated?
7.3.3 How many bits of real or virtual overlay plane should be mandated or recommended, if overlay planes are mandated or recommended?
7.3.4 The common practice of bandwidth limiting moving images suggests the possibility of using overlay planes to contain non-band-limited imagery, with the band-limited moving images using the main bit planes underneath. Overlay planes could easily contain the usually non-band-limited window borders, text, stipple patterns, graphics, etc., which characterize computer screens. This architecture would require all receiving devices to support either real or virtual overlay planes. Is such an architecture appropriate?
7.3.5 Is it necessary for interoperability across industries and applications to allow for the possibility of non-band-limited picture data co-existing with band-limited imagery from cameras or other (possibly synthetic) sources?
7.3.6 Do appropriate digital image compression algorithms exist which can pass non-band-limited picture data such as that used typically on computer screens? If such a compression technique exists, would this allow such non-band-limited picture data to be transmitted together with the moving picture stream? What are the properties of such a compression algorithm, if one exists?
7.3.7 Can the data areas available in some digital advanced television proposals be used to convey encoded data for use with real or virtual overlay planes? Would Unix X-Windows(R), Display Postscript(R), Apple Macintosh Toolbox(R), Microsoft Windows(R), fax, or other forms of encoded graphic and text data such as run-length codes be conveyable in this manner? Are there one or more such techniques which might be appropriate to support for digital advanced television?
7.3.8 Is the proprietary nature of many of these formats a barrier to interoperability, or are there potential solutions to provide universal access?
7.3.9 Are open standards such as IGES for vector and graphics images, CGM for raster images, or Open Document Architecture (ODA) for compound documents worth considering in light of desire for universal access?
7.3.10 Should one or a small number of such formats be supported universally? By standardizing on only one such format, all receiving devices would only need to support that single format. If no such single standard is chosen, then each receiving device desiring to display computer-type window or graphics displays might need to support many or all of the formats in common use. Is there a way to encourage adoption of one or a small number of text and graphics protocol standards to be universally supported?
7.3.11 Should a digital image architecture require that text and graphic data, typical of computer displays, be able to be passed to the display by either the use of real or virtual overlay planes or appropriate compression algorithms capable of passing this information. Should the digital image architecture insist on at least one of these two ways of passing computer display information?
7.3.12 As an alternative to screen-resolution-specific graphics, should all graphics be specified with much higher precision than the display? Such might likely use outline fonts and graphic commands which can do proportional blending of text with appropriate filters, and which allow image detail to placed between pixels or lines. Would such non-raster-aligned text and graphics, with appropriate filtering, be acceptably legible and clear compared to raster-aligned and non-band-limited text and graphics as is typical of current computer screens?
7.3.13 If presentation of non-band-limited image information, as is typical of computer displays, is a requirement, then should multiple screen resolutions be supported for resolution scalability? If so, should the simple fraction guideline be used for the relationships of screen resolutions due to the need to preserve legibility and clarity of the non-bandwidth-limited image data?
7.3.14 Should the capability for selecting among a variety of overlay-planes for display be part of digital advanced television architectures? Could such a selection be useful for closed captions, sign-language inserts, foreign language subtitles, television program guides, sports statistics, or other picture augmentation information?
7.3.15 How many such simultaneous overlay planes could or should be supported?
7.3.16 Should more general compositing functions other than overlay be supported at the receiving device as a part of digital advanced television architectures? In particular, should alpha blending be supported? Alpha blending uses proportional blending at edges of the overlaying area. This technique is also known as proportional matte edge compositing.
7.3.17 If such compositing should be supported, should there be limits or scene-specification guidelines for the amount of area involved in mattes and proportional edges each frame, or each second?
7.3.18 If such compositing should be supported, how many layers of composite should be allowed? If the activation of the composited foreground overlays is supported, it is likely to be controlled and specified by the user at the receiving device. If foreground overlays are transmitted in moderate time intervals such as a second or more, then the number of overlays allowed will affect the amount of buffering required at the receiving device. Should such issues be a part of an image architecture for digital advanced television?
7.3.19 Is the concept of tiles and plates useful by providing compact data representation for locating pixels used in proportional alpha blending? Tiles and plates are screen area subdivisions allowing specification of screen locations. The concept of plates is similar to city blocks and the concept of tiles is similar to house lots, with pixels being similar to locations on a grid layed over the house lot. This organization makes addressing a given pixel, tile, or plate location simpler. It also allows all locations on a given street to be easily located due to the proximity of each house to the next. Is this method of area subdivision useful? Are there other methods for subdividing an image which are appropriate or useful in digital advanced image architectures?
7.3.20 Should the use of windows on the display of the receiving device be anticipated as part of the architecture? What potential affect would there be on the architecture by anticipating the use of two or more windows within the display, each independently controlled by each user at each display? Is the minimal functionality of picture-in-picture appropriate, or is the elaborate window sizing, positioning, and overlaying capability of typical computer window systems appropriate, or somewhere in between?
7.3.21 If windows are anticipated, how would dynamic resizing take into account the possible need for simple-fractional guidelines in resolution scaling? Should there be notches in the window sizes at simple fraction points to allow clearer and more legible text and graphics?
7.3.22 If the local computer wishes to use the overlay planes, how would such use interact with remote control of these overlay planes? Would the local system have priority? How would such priority be controlled?
7.3.23 Is it not likely that locally simulated synthetic computer-generated images will often be overlayed onto the digital advanced television image stream? What filter representations are appropriate for matching the pixel representations of the simulated and real or television imagery? Should idealized pixels be used? What should be the assumptions concerning flying- spot versus lithographed raster pixel representations in this context? What are the optimal computer anti-aliasing filters for such composited images?
7.4.2 Can a family of compression quality levels be developed which allow spatial and temporal resolution scalability with a layered coding technique? Can such a compression technique compete successfully with single point solution or single resolution/rate heavily optimized systems? Would such a technique offer benefits to interoperability between industries/applications? Are the benefits of scalability and extensibility sufficient to justify the effort to develop a layered compression system of sufficient quality? How much is gained by compromising the orthogonality required for scalable layered systems via the use of non-linear terms? Are such non-linear data interactions likely to hinder other interoperability needs, in addition to the desire for scalability and extensibility?
7.4.3 How serious is the problem of motion vector interaction between motion vectors used in standards conversion and motion vectors used in motion-compensated compression? Would an image which has been transcoded in either resolution, temporal rate, or both, interact acceptably with a subsequent digital compression algorithm, or would the compression motion vectors show severe errors and aliasing beat frequencies? If the image is further transcoded after subsequent decompression, would the errors compound further?
7.4.4 What sub-pixel resolution is required for motion vectors used in motion-compensated compression?
7.4.5 What is the affect of concatenated compress/decompress cycles within one algorithm as used when exchanging images multiple times between different industries/applications? What is the affect of concatenated compress/decompress cycles between different compression algorithms? Are the preliminary studies which indicate that this might result in severe degradation correct?
7.4.6 It is common practice in some compression schemes, such as MPEG, to use frame groupings. How can live switching be performed if frame groupings are not aligned? How can misalignment of key intermediate anchor frames be prevented when performing multiple compress and decompress cycles?
7.4.7 Do these issue argue for commonality and compatibility of compression algorithms, and a minimization or elimination of temporal and spatial transcoding in processing images? If transcoding is applied, are simple fraction-based temporal and spatial transcodings less prone to degradation than arbitrary fraction transcodings?
7.4.8 Can a high resolution system architecture accommodate the rapid algorithmic and digital hardware advances in the state of the art which appear to be inevitable each year? If an optimal algorithm for this year is selected, what is the likelihood that this algorithm would be obsolete in five or ten years due to improvements in hardware or algorithmic techniques? If compression algorithms are likely to become obsolete every five years, what high definition system architecture principles can be developed to allow radical algorithmic and hardware improvements as appropriate? Is the header/descriptor sufficient, or are other principles required in conjunction with the basic compression algorithm design to allow extensibility or easy replacement/upgrade? Can new algorithms coexist with old algorithms while providing efficient bandwidth/spectrum usage, in light of the fact that new algorithms may be many times more efficient than old ones? Can old algorithms be continued in use when their inefficiency approaches factors of four or eight below optimal with respect to the newest algorithms and hardware? How can extensibility be accommodated, as will certainly be required, while maintaining service to older devices which require inefficient digital signal architectures for a given point in time?
7.4.9 How likely will the current DCT and sub-band systems advance as future optimal algorithms in five to ten years? Are other algorithms such as fractal, wavelet, vector quantization or other large codebook algorithms likely to be more efficient in some future hardware capability level? Is it likely that future compression algorithms may be as yet unanticipated? What steps can be taken to prepare for such major shifts in compression techniques, should they occur?
7.4.10 Is it likely that decompression chips in receiving devices could be programmable? If so, could updates to compression algorithms be downloaded using header/descriptor support, or by other software distribution methods? Would such updates be useful? Would it be useful to place the decompression module on a standardized card, so that the chip itself could be replaced as technology advances?
7.4.11 Are there proposed signal formats based upon very rapid partial frames which can support multiple receiver display rates from a single signal without degradation of any rate? Do such formats also provide for minimum buffering of multiple asynchronous sources being presented on the same display?
7.4.12 Some applications, such as the colorization of movies, create data which defines the objects and their boundaries and motion for every frame. Can such data be useful in a system architecture for compression or other uses?
7.4.13 Is it useful to gather macro information about a scene by encoding data such as camera position, motion, and orientation? Could tripod head encoders be useful for this purpose? Are there navigational tools which could be adapted for this purpose? Would such global information be of sufficient value in compression and other uses to warrant the capturing of this information?
7.5.2 Is a data rate hierarchy possible in this context?
7.5.3 There will always be a variety of bus rates, memory bandwidths, disk transfer rates, and channel rates. What are the useful rates of a data rate hierarchy, if such a hierarchy is possible?
7.5.4 Is orthogonality of temporal and spatial resolution via a layered hierarchical compression technique possible? Is data rate orthogonality, integrated with temporal and spatial resolution, possible?
7.5.5 Can other useful augmentations be layered onto the data rate such as extra camera views, stereoscopic imagery, z-value depth information, blending coefficients for compositing, additional dynamic range or improved colorimetry?
7.5.6 Can high quality still frames be acceptably interleaved into the data stream concurrent with the moving image data stream?
7.5.7 Can alternate aspect ratios be provided simultaneously by an appropriately layered data stream construction?
7.5.8 Can extra image channels such as closed-caption sign language display windows, previews of future shows, and others, be acceptably layered into the data stream?
7.5.9 Can three dimensional image construction information be provided for those receiving devices capable of creating three dimensional computer generated images? Could such images, by computer synthesis at the display, substitute for tiny details which are not adequately captured with the camera resolution such as red-orange golf balls (usually red and blue have lower resolution than green due to chroma-sub-sampling in the Y,Cr,Cb technique)? Could computer graphics create new interactive games or other locally interactive education or training in this way? Will future display devices be likely to have the capability to generate some amount of screen area containing three dimensional computer generated synthetic images and composite them correctly into the two-dimensional high definition background image situation?
7.5.10 Transport errors are highly dependant on the transport channel. What sort of error protection/correction should accompany advanced television digitally compressed data, in light of such data's extreme sensitivity to errors? Are the mechanisms being developed in the transport header portion of the universal header/descriptor sufficient, or should all digitally compressed picture data contain inherent protection and correction protocols?
7.5.11 How should encryption be supported? Is public-key encryption appropriate and sufficient? What levels of encryption are required for various transport media and various uses?
7.5.12 Packet-retry type networks, such as the current Internet, or Ethernet (R) with TCP/IP cannot guarantee delivery of data for a real-time stream since a packet of data may be "bumped" and must be resent. Real-time streams require that data be both intact as well as "on-time", invalidating packet-retry protocols which would provide resent data subsequent to the required time. It is therefore likely that the basic data network infrastructure will need to significantly change in order to support real-time imagery and audio streams in shared channels, unless switched point-to-point services are used due to insufficient shared-channel infrastructure technology. In light of such significant changes, is it possible to anticipate the future networking protocols and techniques so that the high definition image architecture can be developed to be compatible? Are packet prioritization and priority-based graceful degradation likely to be key techniques in such future networks? If so, how much priority information and priority verification and authorization is needed? Also, how many levels of priority, and what priority schemes might be required, to optimize quality for all users of a shared channel as well as priority packet routing performance? How much guaranteed bandwidth is required by each type of user, and can such data bandwidth be guaranteed? Will such potential guarantee requirements need some amount of data bandwidth reservation? Is it possible to design appropriate shared networks with a hybrid of reservation and non-reservation, applied with both reservation and non-reservation portions of each connection's data stream? Is such a technique a reasonable match between prioritized compressed advanced television and shared data networks, such that a certain amount of real-time bandwidth is guaranteed, but an additional portion is not reserved but is usually provided?
7.6.2 Can a given advanced television digital compression algorithm be augmented to allow more bits for luminance, or would the requirement for more accuracy defeat the ability of the algorithm to provide the required compression ratio?
7.6.3 Can luminance transfer functions such as those used in CCIR Rec. 709 and SMPTE 240M be augmented to provide extended black and white range?
7.7.2 What are the benefits of an RGB digital representation for interoperability across industries and applications versus the Y,Cr,Cb (also called Y,Pr,Pb and YUV) color difference representation commonly used in television?
7.7.3 What are the tradeoffs for using wider gamuts, including gamuts beyond the real spectrum, in covering the real colors? What industries/applications, such as perhaps museums, colleges, printing, photography, and motion pictures, may require accurate color reproduction over a color gamut which is wider than is commonly proposed for high definition television systems?
7.7.4 What other color representations, such as HSV, CIE x,y or CIE u',v' are useful?
7.7.5 How much precision loss accompanies a given high definition color transformation to and from these other representations?
7.7.6 Is there benefit from using a color space which supports color sensors and displays which use more than three color primaries?
7.7.7 When are linear color representations needed for computations? What linear color representations are appropriate in a device-independent context?
7.7.8 What color representations offer device independence?
7.7.9 Can luminance or other brightness measure be represented such that it is orthogonal to color representation? Can such a representation offer color invariance under exposure or illumination level or adjustment?
7.7.10 What color representation is most useful for adjusting a wider gamut of color for a narrower gamut display? What are the tradeoffs between clipping to the narrower gamut, and a softer adjustment, similar to highlight compression in the S-Curve?
7.7.11 What are the perceptual uniformity properties of various useful color spaces?
7.7.12 Are Hue Saturation and Value representations useful in these contexts?
7.7.13 How can digital numeric representation efficiency be optimized, while still allowing the possibility of wide gamut colorimetry in addition to efficient support of narrower gamuts.
7.7.14 What device independent color spaces are most efficient for compression?
7.7.15 Luminance, which matches human color sensitivity, is an appropriate representation of brightness near the display. Other representations may be required in the studio where processing is required. For example, blue screen compositing involving transparency requires as much detail in blue as in green. How should the division be made between the use of luminance versus an equal representation of red, green and blue.
7.8.2 Is the header/descriptor mechanism likely to be a major element in providing such augmented capabilities, or do the issues extend into the nature of upgradable compression algorithms?
7.9.2 As compression techniques improve, a given level of quality can be maintained while reducing data bandwidth requirements. What are the best uses for newly freed bandwidth? Will older devices be able to decode new algorithms if they are made to be somewhat programmable from the beginning? Programmable devices will allow algorithmic improvements which can utilize a given device. However, it is likely that improvements in algorithms will have to be accompanied by new hardware, thus making the upgrade path and backward compatibility path difficult. Are there ways to improve this situation in the system architecture?
7.10.2 Likely uses for additional channels are six-channel surround sound, and multiple languages on separate tracks.
The use of 72 Hz is very naturally compatible with 24 fps film, being exactly three times. The rate of 75 Hz is compatible with the common European practice of transferring motion picture film to 50 Hz video by running it at 25 fps or 4% fast. A display which has a sync tolerance range of 4% could adapt to both 72 Hz and 75 Hz picture rates. A 4% tolerance does not add undue cost to a display. A CRT or other display which can present both 72 and 75 Hz progressively scanned images can be a key architectural element in a digital image architecture. Such a display can be used on computers as well as being able to be used for high quality presentation of motion picture film, and for 50 Hz material.
In video, the temporal light sampling time for each pixel is sometimes adjustable. Video temporal light capturing duty cycles will vary from about 30% to about 100%. Short duty cycles or adjustable duty cycles are only available on some cameras, since light sensitivity is usually reduced. However, even some home video cameras have a persistence type control to allow image capture duty cycle adjustment between sharp frames and smooth motion.
Some experts feel that a 50% exposure duty cycle for each frame is the most common choice since it provides a balance between image sharpness and smooth motion. It is likely that some photography of moving objects may desire a short shutter time in order to emphasize sharpness, and to distinguish each image on each frame separately. Other uses such as "go motion", used successfully in many Lucasfilm productions, may favor smooth motion by choosing a 100% duty cycle. The 100% duty cycle is achieved in this example by a repeat motion non-real-time computer controlled camera.
In computer graphics, when motion blur is simulated, a duty cycle is usually specified. Some standard software rendering packages, such as Pixar's Renderman(R), offer a value from 0 to 1 which controls the image duty cycle for the motion blur processing.
The common use of temporal undersampling virtually ensures some degree of aliasing on images which have periodic motion or object interaction near the frame rate or its harmonics. There are also some pathological cases where image misrepresentation due to temporal aliasing can make motion vector analysis impossible. When such motion vectors are required for de-interlacing or other standards conversion involving temporal transcoding, such conversions become artifact prone due to incorrect motion analysis. Temporal aliasing is also exaggerated when temporal transcoding is employed without the use of motion vectors.
It is very likely that temporal aliasing artifacts will appear at conversion rate beat frequencies. Thus, it is necessary to use beat frequencies which are at rates as high as possible.
A choice of either 72 or 75 Hz for the displays refresh rate could be beneficial in both advanced television systems and computer uses, particularly since NTSC may, in the future, become obsolete.
It is likely that flat panel displays such as active matrix liquid crystal color displays may begin to take hold in the market. These devices do not have characteristic flicker. Thus, it is possible to update the image at low rates, such as 24 fps when presenting film images, without resulting flicker.
It is possible that future display devices may be developed which maintain the display on an effectively indefinite pixel-by-pixel basis. These devices will be frame rate independent and will permit even greater efficiency in the distribution of visual communications.
Leibovic, K. N., "Perceptual Aspects of Spatial and Temporal Relationships," Chapter 6, Science of Vision, K.N. Leibovic, Editor, Springer-Verlag, New York, 1990.
Maguire, William, Naomi Weisstein, and Victor Klymenko, "From Visual Structure to Perceptual Function," Chapter 9, Science of Vision, K.N. Leibovic, Editor, Springer-Verlag, New York, 1990.
Symes, Peter D., "The Enhanced Viewing Experience -- What Does It Take?," Proceedings of the 26th Annual SMPTE Advanced Television and Electronic Imaging Conference, SMPTE, White Plains, NY, 1992.