Pixels

Pixels and Resolution
A picture is worth a thousand words, but requires a larger file size.

Image pixels:
In a colour display or a printed image, a pixel (picture element) is a dot or small area of the picture that is large enough to carry the three attributes hue, colour saturation, and brightness but too small to carry any detail. An electronically produced picture is made up of an array of pixels and is normally viewed from a distance such that the underlying structure is no longer visible.

In practice, image pixels are usually composite: the pixels of a TV picture being composed of areas of red, green, and blue illumination; and the pixels in a printed image being built up from groups of cyan, yellow, magenta, and black ink spots. Any structure within a pixel will average to the required combination of brightness and colour when viewed from a suitable distance.

Pixel data:
In the context of photography, the information used to create a displayed image is usually extracted from a data file. The data file therefore contains, or can be decompressed in some way to provide, a set of numbers representing each pixel in the image. In an uncompressed RGB image file, a pixel is represented by three numbers, these being related to the brightness values for red, green, and blue at a particular point in a video display. In an uncompressed CMYK image file, a pixel is represented by four numbers, these being related to the amounts of cyan, magenta, yellow, and black ink to be applied to a particular point in a printed picture. Other representations are also possible, and we can, of course, convert between representations for such purposes as viewing CMYK files on a monitor screen, or printing RGB files onto paper.
The maximum amount of detail that can be recorded in an image file is dependent on the number of pixels, but the number of pixels should not be used as a measure of resolution. The relationship between pixels and actual sensor resolution will be discussed shortly, but the most obvious way in which the pixel count can become divorced from recorded detail lies in the fact that files can be re-sized. If a file is re-sized to reduce the number of pixels in it, detail may be lost, but if a file is re-sized to increase the number of pixels, new detail cannot be created (although subjective improvements can be obtained).

Camera pixels:
Digital cameras also have pixels. They must have because the manufacturers say they do, and the pixel number is usually assumed to be a measure of resolution. It is therefore tempting to think that if the pixel size of a file produced by a camera is set to be the same as the number of pixels in the camera sensor, then the native resolution of the camera will be carried through to the final image. Unfortunately, this will not be the case, for two very important reasons:

The resolution of the camera sensor is not related to the resolution of the lens.

A camera does not necessarily have identifiable RGB pixels.

The pixels in the camera output are created by interpolation from raw sensor data and so cannot be assumed to be directly related to the individual sensor elements. Furthermore, cameras do not have to have their light sensors arranged as RGB triads, and so it does not always make sense to think of a single colour sample as a third of a pixel. This paradox has gradually led to a consensus within the camera industry, which is that the pixels of cameras should be defined differently to the pixels of files and images. Specifically:

The number of pixels stated for a digital camera is the total number of light sensing elements (red + green + blue).
This number is, of course, considerably larger than the number of RGB pixels corresponding to a notional 'native' resolution, but if a single number is to be used as an indicator of camera quality, it is the only measure that gives a reasonably fair comparison between the various different sensor architectures.

Filter-mosaic cameras
Most digital cameras use a two-dimensional array of light sensors and colour filters arranged in a pattern known as a 'Bayer mosaic' (US Pat. No. 3971065). The filter mosaic has twice as many green filters as it has of red and blue, i.e., an 8M pixel camera has 4M green, 2M blue, and 2M red sensors.

With this architecture it is not possible to identify physically discrete RGB pixels, but the sensor data can be processed to produce an image that has about twice as much luminance (brightness) resolution as it has colour resolution, luminance resolution being subjectively more important. The RGB pixel equivalent of the luminance resolution is about equal to the number of green sensors* if the raw image processing is done well; so an output file size that is 'just large enough' to capture all of the genuine detail that the camera can produce will have up to about half as many RGB pixels as there are camera (single colour) pixels.

The normal maximum output file size for a Bayer mosaic camera however, usually has as nearly as many RGB pixels as there are camera pixels, RGB pixels being obtained (notionally, ignoring processing) by going to every intersection between four camera pixels and taking the adjacent light values (a red, a blue, and the average of two green samples). Hence the output RGB pixels are unique, but share samples with other pixels; and the maximum file size should not be taken to be indicative of sensor resolution.

* The recipe for simulating white light using the three additive primary colours is 0.59G + 0.3R +0.11B. Hence the green channel, being the main contributor, carries more luminance information than the other colour channels.

3-Chip cameras:
In a 3-chip camera, light from the lens is split into three, filtered, and sent to 3 monochrome imaging chips, one each for red, green, and blue. The native luminance resolution of such a camera is approximately equal to the number of sensing elements in the green channel. The colour resolution will be slightly less than the luminance resolution because there is a limit on the accuracy with which the three colour images can be superimposed. The 3-chip imaging method was once preferred as a superior alternative to the Bayer mosaic for high-quality video cameras (i.e., relatively low resolution but fast readout cameras). In high resolution systems however, it is the optical splitter prism rather than the pixel number that places limitations on the ultimate resolution obtainable. Consequently, as pixel numbers and in-camera processing power have increased, single chip imaging methods have become universal.

The Foveon X3® sensor:

The Foveon X3 has its blue, green, and red sensor arrays in different layers of the chip, with the sensing elements stacked in perfect registration. Hence the sensor does have identifiable RGB pixels, even though the number of 'camera' pixels will normally be reported as the total of red + green + blue sensor elements.

Illustration by kind courtesy of Foveon Inc.
All rights reserved.

The X3 architecture eliminates many of the interpolation artifacts associated with filter mosaic cameras, and gives the same resolution in all 3 colour channels*. The output file size that is just large enough to capture all of the detail the sensor can produce has the same number of pixels as there are light sensing elements in any of the layers; but due to the equal luminance and colour resolutions, and freedom from artifacts, the potential detail rendering capability is at least as good as and generally better than that of a mosaic sensor of the same size having the same number of camera (single colour) pixels.

For more information, see: www.foveon.com/X3_tech.html .
* The X3 Quattro sensor is a modified form having more luminance than colour resolution. It has four times as many sensor elements in the top layer compared to the 2 layers beneath.

Line-scan cameras (scanners):
In a typical scanner, a portion of the image is focused onto a strip-sensor, and the sensor is moved in small steps in order to acquire the whole image. The sensor usually has 3 strips (one each for red, green, and blue), each consisting of a linear array of photosensors. An RGB image is built up line-by-line by stepping the 3 different colour recording strips into position in turn. Since each dot is sampled three times, generally with near-perfect registration (in single-pass scanners at least), luminance and colour resolutions are practically identical. If the scanner software is set to give an output file at the scanner's native (i.e., optical) resolution, each pixel in the output is directly traceable to a point in the image (neglecting the various processing operations that may be carried out between image acquisition and file creation).

Sensor comparisons:
In the real world, fine detail does not have to be in black and white. Consequently, imaging devices that give equal resolution in luminance and colour (X3 and scanners) have the potential to give higher quality colour pictures for a given number of 'native' RGB pixels than imaging devices that have reduced colour resolution (mosaic and 3-chip). Hence we cannot even use 'equivalent native RGB pixels', or 'file size just big enough to capture all of the information' as an objective measure of comparative resolution, because the information captured varies in quality. We can illustrate this point by comparing two hypothetical 10.2M pixel cameras, one based on the Foveon X3 and one using a Bayer mosaic:

Sensor	Camera pixels	Luminance resolution (equivalent native RGB pixels)	Overall resolution (equivalent native RGB pixels)
X3	10.2M	3.4M	3.4M
Mosaic	10.2M	£5.1M	£2.55M

If we exclude subjective criteria, the X3 obviously gives the best resolution, but if we accept that detail is less important in areas of high colour saturation, the two image capturing methods are about the same. Thus defining the number of camera pixels as the total number of light sensing elements gives a simple method of comparison, and has the virtue that the pixels so defined are related to physical light sensors. The problem with simple comparisons however, is that they are sometimes far too simple. One major limitation of the mosaic system is that it produces interpolation artifacts (false detail), and cameras often incorporate a blur filter in order to reduce such artifacts to an acceptable level. A blur filter reduces resolution in the raw camera output, and although this loss can be restored by a process known as deconvolution, deconvolution is often too computationally intensive to be carried out in the camera. Also, one of the many ways in which camera pixels become divorced from output pixels is that the raw sensor data is usually subjected to a noise reduction process. Such processing also reduces resolution slightly and may also generate artifacts, and so low sensor noise is a definite advantage.
On the issue of whether overall (full colour) resolution is truly less important than luminance resolution, we can at this point allow some consideration of lenses to creep into the discussion. All lenses exhibit at least some minor chromatic aberration, this being due to their inability to provide exactly the same amount of magnification at all wavelengths in the visible spectrum. The problem manifests itself as colour fringing in off-centre picture detail, and is of particular relevance to underwater photographers because it is unavoidable when using an air-corrected lens behind an air-water boundary. It can however be substantially reduced using software, by a process known as radial correction (see lens correction article); but by a parallel argument, it is also true that maximum resolution usually cannot be obtained unless radial correction is applied. As we move away from the centre of an image projected by a lens, we expect any detail with a strong luminance component (e.g., a small white spot) to be turned into a detail with strong colour components (i.e., red, green, and blue spots either separate or partially overlapping). In the correction process, which involves re-mapping the red and blue images to superimpose them properly on the green image, fine colour detail is converted into fine luminance detail. Consequently, if we fail to preserve the same amount of detail in all colour channels of the image recording device, we lose some of the ability to extract fine luminance detail from the data.

From a software point of view, radial correction for lenses working in air requires only a knowledge of the lens focal length (zoom) setting, and to a lesser extent, the focus setting. This information is easily recorded at the time of picture taking, and so provision for radial correction can (at least in principle) be included in the camera firmware or the raw-file conversion software. Underwater photographers however, being in the business of adapting air-working lenses for underwater use, will need to apply their own corrections.

Sensor resolution:
Resolution (resolving power) is the ability to record or represent detail. A physically meaningful measure of resolution can be given in terms of the maximum number of parallel lines that can be reproduced over a given distance; a grid of lines being the worst-case test. In a monochrome camera sensor, or a colour imaging system having superimposed R G and B pixels, we can relate resolution in lines to the number of sensing elements in a strip perpendicular to those lines by observing that lines cannot be resolved unless there are enough sensing elements to register all of the alternations between light and dark. Therefore, the limit of resolution is reached at the point at which there are just two sensor elements for every line. This situation is represented in the diagram below, which shows what happens when a test pattern of lines is projected onto a sensor at a magnification that makes the spacing between the lines equal to the distance between two sensor elements.

If the system is operating at something close to the maximum resolution of the lens, the lens will blur the detail and project a pattern of smoothly alternating light intensity onto a row of light sensors. Each sensor records the average intensity of the light collected over its area, and so the output will be a row of alternating dark grey and light grey pixels. Thus, although there is some loss of contrast caused by the lens in this case, the output is a faithful reproduction of the original detail. Notice however, that in the diagram above the brightness peaks were lined up so that they fell in the middle of their respective sensors. When the pattern is moved sideways by one half of the pixel spacing an entirely different result is obtained:

Now the peaks and troughs of the projected image lie on the boundaries between the sensor elements. Hence if one sensor element records the average of the transition from lightest to darkest, the one next to it will record the average from darkest to lightest, and the result will be a mid-grey in either case. Hence if we move the camera or the test card, the contrast in the detail recorded at maximum resolution will vary between zero and some maximum value. Thus we may deduce that two pixels for every line of resolution is an absolute minimum, and represents the point at which all contrast in the resulting image may just be extinguished. Also, we must note that there is no rule to say that the lines in the resolution test pattern must be horizontal or vertical, and if we want to state the resolution for lines of arbitrary orientation, we must take the worst case, which is the number of pixels per unit length along a line at 45° to the horizontal or vertical. The diagonal of a 1 × 1 square is √2. Therefore, if a monochrome sensor (or a colour sensor having superimposed R G and B pixels) has p sensor elements per mm in the horizontal and vertical directions, the absolute maximum resolution in lines/mm will be p/(2√2).
Unfortunately, we cannot use this simple notion of image capture to deduce the resolving power of a Bayer mosaic sensor, except to note that its luminance resolution will be, at best, about the same as that of a monochrome camera with half as many pixels. We cannot assume moreover, that best performance will always be obtained, it depends on the sophistication (and hence the cost) of the optical system and the software that converts the raw data into RGB pixels. In particular, a blur filter is required in order to ensure that no detail in the image projected onto the sensor is smaller than the area of 4 camera pixels (if a detail is too small it will fall selectively on a single colour pixel and be recorded incorrectly as colour information), and intensive calculation is required in order to reconstruct the full resolution image from the raw recorded data.
The sensor resolution, of course, does not tell us what the overall resolution of a camera will be. The final limiting factor is lens resolution, which is beyond the scope of this article but well covered elsewhere. See for example, the excellent website of Norman Koren
www.normankoren.com
and particularly the tutorials on image sharpness and MTF
www.normankoren.com/Tutorials/MTF.html

See also: Dick Lyon's photo technology lectures.

We can however offer a general rule based on sensor format size:
The resolving power of lenses is not infinitely scalable. Therefore when a lens is used to project an image onto a surface, the maximum amount of detail that can be preserved is a function of the size of the image. Consequently, if a large format camera and a small format camera are both used to produce a picture of the same size, the large format camera will give the greatest resolution in the final picture (assuming a sufficient number of pixels and well designed lenses in both cases).

DWK, Nov 2004, Updated March 2018.
(Thanks to Dick Lyon of Foveon Inc. for helpful advice and suggestions in the writing this article).

© David W Knight, 2004, 2018
© 1998-2004 Foveon, Inc. Foveon X3 and the X3 logo are registered trademarks of Foveon Inc.

Photography & Optics