Next: Results Up: Contents Previous: Basic Design Considerations

Design and Implementation

A prototype system has been designed and built for M = N = 64 and G = 16. The use of 16 grey-tones ensures sufficient information preservation in shading, while causing smooth changes in brightness to be perceived as smooth changes in loudness. The latter is important to avoid hearing distracting artificial discretization boundaries within, for example, near-uniform image backgrounds. The system design involved a 64 × 64 pixel matrix instead of a 50 × 50 matrix, because powers of 2 are more convenient for a digital implementation. In order to be able to improve the quality of the sound representations in less time-critical situations (e.g., static images), the design also included a user-switchable 1.05 s or 2.10 s conversion time T. According to (6), we find a system throughput of up to 16 kb/s for T = 1s, and 8 kb/s for T = 2 s.

The M= 64 sinusoidal oscillators have not been implemented as physically distinguishable analog components, as was the basis of [6]. Instead, the behaviour of the oscillators is emulated via fast digital computations. In this way, the implementation can be made much more compact (portable), more robust (against detuning) and more flexible (programmable), but also cheaper and lower in power dissipation (for battery operation), than a large set of independent and precisely-tuned analog oscillators could. In [6], the tuning problem was treated by using digital frequency dividers, but the other aspects remained unsolved due to the 64-fold hardware involved in signal integration, modulation, and summation.

Fig. 2. Principles of the sound sample synthesis.

Fig. 2 depicts schematically how a sound sample is calculated for M oscillators, by updating the phase of oscillator i with a phase step , calculating the sine of the resulting phase, scaled by the brightness of the pixel at vertical position (row) i, and adding the result to a superposition accumulator. The phase, phase steps, scaled sine values and pixel brightnesses all reside in memory modules. Thereby expensive hardware multipliers or sine evaluators could be avoided, while increasing the flexibility for the implementation, via programming, of alternative mappings. The frequencies of all 64 individual oscillators, determining the applied bandwidth, are programmable. This provides sufficient flexibility for further optimization of the system. It may also be used to take care of individual differences - and malfunctions - in hearing among users, by using a personalized frequency distribution. In the present prototype, a 16K-word memory module allows for 256 different and independent frequency distributions for the 64 oscillators, without reprogramming.

To reduce system cost further, only commercially available components of standard speed were used. Apart from CMOS memory components, most of the system logic was implemented using standard LS TTL technology. Using a system clock of F = 2 MHz, a serial computation of samples of the superposition of oscillator signals can take place at a frequency of F/M = 31.25 kHz, which is sufficiently high for very good audio quality (cf. CD players using 44.1 kHz). The superposition samples are represented by 16-b values, again to achieve high audio quality. The proper processing of 2 million pixel oscillator samples per second at a 2 MHz system clock requires significant parallelism. Because the transformation itself is fixed, a pipelined design was made to provide the required data flow. Several oscillator contributions are being processed simultaneously, but in physically different stages of the necessary set of operations. For example, the phase step value of oscillator i+1 is read from memory at the same time as the scaled sine value belonging to oscillator i (of which the phase step value has been read 500ns earlier). The resulting special purpose computer is optimized towards the image-to-sound conversion. The whole conversion system, including 20 ms frame grabbing and 16 grey-tone digitization hardware for input, and the analog output stages for the headphones, has been implemented on a single 236 × 160 mm circuit board, dissipating a measured 4.4 W. In addition, a commercial 2.7 W Philips 12TX3412 vidicon observation camera is presently used for input. This standard 625-line PAL camera delivers 312- and 313-line interlaced images every 20ms, of which only 64 lines are used for later conversion, by neglecting three out of every four lines in a centered subset of 256 lines. The digitization hardware applies Gray code encoding in a 4-b flash AD-converter, and includes a sample-and-hold circuit for extremely rapid video signal transitions. The sampling, digitization and storage of a 64 × 64 pixel image takes place within a single 20 ms frame time, thereby avoiding blurred images. Because the camera frame generation runs independently, vertical and horizontal (delayed) synchronization signals are used to synchronize the conversion system with the camera. Due to the required synchronization, the conversion system may first have to wait for a period lasting less than 20 ms, before the 20 ms frame grabbing of a new camera image can actually start. Still, 20 ms 40 ms easily satisfies . The user only notices a characteristic synchronization click, as needed to delimit subsequent sound patterns.

Fig. 3. The pipelined system architecture.

Fig. 3 illustrates the system architecture with a more detailed overview of the system design, close to the component level. The analog stages for input (camera IN) and output (headphones OUT) are only indicated symbolically. The drawn width of data and address buses is proportional to the number of bits. The two oversized 2K × 8 SRAM's for the 64 16-b values were used only because discrete 64 × 16 b memories are non-standard, and consequently far more expensive than these larger memory chips. Using divisions of the system clock by 2²¹ or 2²², T can at any moment be switched by the user between a 1.05 or a 2.10 second image-to-sound conversion time.

The proper width of the data and address buses has to be selected very carefully. For example, truncation of digital phase values to a smaller number of bits leads to a kind of frequency noise. The bandwidth of this noise should be much less than the frequency differences between neighbouring oscillator frequencies. In the present design this condition is met by a frequency noise less than , due to a division of F by M = 64 = 2⁶ to arrive at the sound sample frequency, and an additional division by 2¹² corresponding to the truncated 12-b phase values used for calculating sine values within one whole sine period. The truncation to 12 b can lead to at most a 1/2¹² period phase step error when going from one sound sample to the next. The different oscillator frequencies are freely programmable integer multiples of , because the phase values are updated and stored as 16-b values before their truncation to 12 b. This ensures the assumed orthogonality of the oscillators, and provides a very fine scale for selecting and programming a particular set of frequency distributions.

For the purpose of experimentation, an observation monitor is presently an integral part of the experimental setup, to provide visual feedback to the sighted experimenter or teacher on the field of view and on the quality of the camera image, concerning contrast and definition.

Next: Results Up: Contents Previous: Basic Design Considerations

Peter B.L. Meijer, ``An Experimental System for Auditory Image Representations,'' IEEE Transactions on Biomedical Engineering, Vol. 39, No. 2, pp. 112-121, Feb 1992.