Self-Training for The vOICe

The vOICe for Windows « The vOICe Home Page
« The vOICe for Windows

 A training manual for The vOICe is under development, and the current version is online at The vOICe Training Manual web page. It can also be downloaded as a zipped Microsoft Word file with linked MP3 sound files manual.zip.

Technically, The vOICe sensory substitution and synthetic vision approach provides access to any visual information through an auditory display. A theoretical possibility is that it can not only be used for practical purposes in various visual tasks, but that it may - through education and extensive immersive use with conscious and subconscious visual processing - also lead to vivid and truly visual sensations, a "visual awakening", by exploiting the neural plasticity of the human brain. However, very little is known about the prospects, and learning to see requires much effort on behalf of the blind user, possibly comparable to mastering a foreign language, and without guarantees for worthwhile results. Here we will consider several approaches to self-education, given the fact that blindness institutes and organizations are in general not yet prepared to teach seeing with sound using The vOICe (for Windows). In addition to the built-in exercise mode and the fully immersive use with a wearable setup with a head-mounted camera, we will even introduce a complementary novel self-training paradigm for starting with learning to see with sound without any hardware.

Just as with learning a foreign language or mastering a musical instrument, mastering seeing-with-sound can at times appear to progress slowly, causing some frustration. To alleviate this, blind users are welcome to share insights and experiences in The vOICe User Group.


Page contents:


In general, the advice is to basically "play" while paying attention to what you perceive, much like a child would be exploring new things and places. Be curious, and try to notice patterns that relate that what you do, and what you know is there, to what you hear in the image sounds.


Mapping principles

It is essential that you as a blind user first obtain a thorough understanding of the principles of The vOICe seeing with sound mapping, consisting of three simple rules, each rule dealing with one fundamental aspect of (black-and-white) vision: rule 1 concerns left and right, rule 2 concerns up and down, and rule 3 concerns dark and light. The actual rules are

  1. Left and Right.

    Video is sounded in a left to right scanning order, by default at a rate of one image snapshot per second. You will hear the stereo sound pan from left to right correspondingly. Hearing some sound on your left or right thus means having a corresponding visual pattern on your left or right, respectively.

  2. Up and Down.

    During every scan, pitch means elevation: the higher the pitch, the higher the position of the visual pattern. Consequently, if the pitch goes up or down, you have a rising or falling visual pattern, respectively.

  3. Dark and Light.

    Loudness means brightness: the louder the brighter. Consequently, silence means black, and a loud sound means white, and anything in between is a shade of grey.

In other words, The vOICe scans every image from left to right, while associating height with pitch and brightness with loudness. Further examples can be found in the introductory mini-tutorial.


Built-in exercise mode (no camera needed)

One training option is that you make use of The vOICe for Windows's built-in exercise mode under function key F11. It allows you to hear the soundscapes of randomly placed rectangles, circles, triangles and line segments. No camera is needed here and by default you will hear an image containing two bright filled Training pattern with two rectangles rectangles placed at random positions. Although it does not replace immersive use, it is particularly convenient for users who do not have a suitable camera yet, or who want to train without wearing an immersive setup. The exercise mode can be applied while lying in bed before going to sleep, or it can be applied to make some good use of any sleepless periods during the night. Thus you build a basic level of fluency and understanding through visualization of the soundscapes of randomly generated simple views. The generated patterns can be used for exercising your conscious mental analysis: what shapes (objects) do you hear at what positions? For simple patterns like the one shown on the right this will allow for vision training through self-study without tactile or sighted reference feedback on correctness: your brain will build a self-consistent map, while visualization should help activate higher-order multimodal association cortices in your brain and help recruit visual areas for visual processing.

Note: Neuroscience research indicates that visual cortex excitability increases during visual mental imagery, thereby possibly contributing to the likelihood of obtaining visual percepts. See for instance the publication  ``Visual cortex excitability increases during visual mental imagery: a TMS study in healthy human subjects,'' by Roland Sparing, Felix Mottaghy, Giorgio Ganis, William Thompson, Rudolf Töpper, Stephen Kosslyn and Alvaro Pascual-Leone, in Brain Research, Volume 938, No. 1-2, 2002, pages 92-97.

Beware that meaningless auditory input is much less likely to activate more than the auditory areas in your brain, so you should not listen to the sounds as if they make a meaningless jumble, but keep your attention focussed on interpreting the sounds in terms of views with shapes. This takes effort and persistent active involvement! However, with more complex visual patterns a practical problem often remains that you have no independent feedback to know with certainty if your mental interpretation of the view is correct. Also beware that normal human beings, blind or sighted, generally have a very poor memory for more or less random patterns, so you should not expect to be able to mentally grasp the positions of more than a few shapes or objects at a time from these soundscapes. Still, the placement of shapes of random sizes at occasionally overlapping random positions is meant to help the brain build an orderly 2D response map with overlapping receptive fields, by analogy to what is known to work with even unsupervised learning in artificial neural networks (e.g., with Kohonen self-organizing maps), while some degree of supervised learning applies here as well through your own rational interpretation of the soundscapes. In the exercise mode, you can activate manual pattern updating by pressing the space bar. The arrow keys will then move a shape, and the plus and minus keys on the numeric keypad will select the shape to move.

Mindfold mask

Note: Anyone who has at least light perception may consider using a good light-blocking blindfold for total darkness, such as the commercial  Mindfold sleep shades (sleep mask), an eye mask with foam conveniently cut out for the eyes, such that you can keep your eyes open while working with The vOICe without interference or distraction from even residual natural eyesight. Moreover, there are indications that complete visual deprivation (light deprivation or "dark exposure") of only hours already causes a significant increase in neural plasticity in the brain.

Note 2: A trick that can boost soundscape visualization for some users is to blink your eyes as fast as you can (e.g. five time per second!). Blinking suppresses normal visual input from the eyes, but it is also known to activate visual brain areas when blinking in the dark! See the publication  ``Blinking suppresses the neural response to unchanging retinal stimulation'' by Davina Bristow, John-Dylan Haynes, Richard Sylvester, Christopher D. Frith and Geraint Rees, in Current Biology, Volume 15, 2005, pages 1296–1300: "In contrast to the suppression of activity during retinal stimulation by blinks in both retinotopic V3 and parietal and prefrontal cortices, we also observed, in the LGN and early visual areas V1–V3, a positive signal associated with blinking in the absence of retinal stimulation". Of course fast blinking is tiring and may quicky give you dry eyes, so only apply this intermittently and for brief periods of time as a separate exercise mode to limit the inconveniences.

By default, two filled rectangles of varying brightness are randomly positioned, with the pattern changing either automatically every three soundscape scans, or Exercise mode preferences dialog manually after the space bar has been pressed. Other settings can be configured via the menu Edit | Exercise Preferences | Randomly Placed Shapes.

Further usage notes: with the default two bright rectangles of the exercise mode, you get two sound bursts, with duration depending on their width, pitch depending on their height and elevation, and loudness depending on brightness. Every time you press the space bar, you get a new random size and position for two filled rectangles. Subsequently, you can move the currently selected shape around by using your arrow keys, and you can use the plus and minus keys on your numeric keyboard to select another shape to move around. The name of the next selected shape is spoken to you, although with default settings the shapes are both rectangles and you will only note which one is selected from the changes that you hear when you move the shape around with your arrow keys. The up and down arrow keys give corresponding pitch changes as the selected shape moves up and down, while the left and right arrow keys change the timing and stereo position of the shape's sound as this changes the horizontal position of the shape relative to any other shapes in the soundscape. This description may sound more complicated then it really is, so just try it and you'll soon find out. As stated before, instead of two filled rectangles, you can select many other combinations of shapes via the menu Edit | Exercise Preferences | Randomly Placed Shapes. You will need your screen reader to navigate this dialog. You can set the number of rectangles, circles, triangles and line segments. So by default you have 2 rectangles, 0 circles, 0 triangles and 0 line segments. If you are new to vision, hearing two shapes in one soundscape scan can still be a bit confusing, and it may then be advisable to first go back to sounding only one shape, and move that shape around with your arrow keys. Try this for various shapes of choice until you feel comfortable in understanding what you hear in the soundscapes. Subsequently change settings to get two shapes in a soundscape scan, and try telling which is which. You can also make use of your echoic memory (auditory sensory memory) to mentally "replay" and further analyze any soundscape that you just heard: the time span of echoic memory is generally estimated to be on the order of three or four seconds - long enough to hold a soundscape from The vOICe.

The free-running exercise mode may also be combined with brainwave entrainment. The brainwave entrainment option is based on the idea that cross-modal and multisensory perception takes places when different areas in the human brain synchronize their natural oscillations for efficient cooperation and information transfer. This goes along with measurable brainwave patterns, for instance in the theta and gamma range. What one would really hope to achieve through use of The vOICe is to induce the brain activity patterns as needed for conscious and truly visual perception through sound, and although this possibility is currently still completely hypothetical, the brainwave entrainment option aims to facilitate the synchronization of brain areas by additionally providing, through binaural beats and other sound components, the brain frequencies that would arise naturally when auditory and visual brain areas are functionally linked. A fundamental hypothesis is also that there exist certain thresholds that brain activity needs to cross before a conscious visual percept occurs, and an important goal is to somehow cross these thresholds for conscious visual perception, such that seeing with sound becomes a truly visual experience, much like forms of artificial synesthesia. The brainwave entrainment is meant to facilitate this by moving brain activity closer to those thresholds, but it may on its own not be quite enough for most people to really cross the thresholds without a still lengthy adaptation process. One key idea is therefore to include further methods, such that a combination of approaches may finally and literally "add up" to move above the thresholds involved in conscious visual perception, preferably also with a minimum training time and effort. When viewing the brain as a collection of modules that either inhibit or reinforce each other's activities, reinforcement seems more likely when the tasks at hand require cooperation, such as through brainwave synchronization. This is what we will try to further exploit in the following.

One important purpose of vision is to allow for follow-up action, such as picking up an object that you see. However, in terms of brain activation, we might in part turn things around and point at things that we somehow know (or imagine) are there, such that the motor areas in the brain will generate signals that are received by the visual areas as being consistent with a visual percept even if it isn't a conscious visual percept - yet. Thus we may make better use of the exercise mode by including physical action. Apart from using the brainwave entrainment feature while safely lying in bed, you can also stand up or sit on a chair (with eyes closed or in the dark if you are sighted or have low vision), and try to point at shapes that you see/hear in the soundscapes. You may observe that this helps in immediately perceiving a greater visual realism, and that without a need for a fully wearable setup or even a camera: you can do this while tethered by stereo headphones to a nearby desktop PC. Now assuming that you have the exercise mode set to two rectangles and a line, and the brainwave entrainment option active, for instance in one of its preset modes, you can experiment with gesturing what you see/hear. The exercise mode should typically be running such that it repeats each random view three times (the default setting). After the first soundscape scan you can mentally pick a shape - or two shapes if you are getting good at it - to gesture during the second and third repeat scan, before a new random view is generated. Try to physically point your finger at any small items that show up in the soundscapes with randomly placed shapes, use both hands to indicate the apparent width or height of any of the bigger rectangles that show up, and trace out the slope of a rising or falling line with your finger. You can trace out shapes "in the air" or add touch by tracing out the shapes with your finger on a flat surface such as a wall in front of you, while ensuring that your finger pointing matches the directions from where you hear the sounds.
Use your free tongue display...

"Draw" soundscape content with the tip of your tongue against your palate to stimulate visual perception through congruent multisensory feedback.

As an alternative, or complementary, means of "gesturing" soundscape content you can use your tongue, "drawing" basic parts of soundscape content with the tip of your tongue against your palate (roof of the mouth). It can have a number of advantages over gesturing soundscape content with your hands. For example, gesturing with your hands can quickly become physically tiring, whereas moving your tongue takes less physical effort. Moreover, one would not easily gesture soundscape content in public, if only because of what is considered socially acceptable behavior. Drawing with the tip of your tongue against the palate can be done such that it is virtually invisible to any bystanders. It is also easily done while lying in bed listening to the exercise mode showing, say, bright lines and rectangles. You may notice that the added "hand gesturing" or "tongue drawing" works better than passively listening to the exercise mode soundscapes even while paying attention to what you are seeing/hearing. Somehow the brain seems to "love it when a plan comes together", in this case by creating a richer multi-sensorimotor interaction. Gesturing of soundscape content with hands or tongue may be instrumental to this, and together with the brainwave entrainment option help raise the likelihood of obtaining truly visual sensations along with the objective visual information content encoded in the soundscapes. The underlying general idea is to add together multiple forms of congruent subthreshold crossmodal sensory stimulation (coactivation).

Further multimodal support for visual perception may be added by vocalizing soundscape content. Obviously you can pronounce at most one shape at a time, so simultaneously sounding shapes can at best be vocalized over multiple soundscape scans. With a rising line you can try to pronounce its pitch sweep as "ooiieep", and with a small rectangle at the top left and another one at the bottom right you can try to approximate their soundscape as something like "chee choo", and so on. Note that this approach already has an analogy in the treatment of strokes and brain injuries, where people afflicted with aphasia learn to speak again more quickly if they sing words and phrases (applied in Melodic Intonation Therapy, MIT).

Another built-in exercise mode for practising the interpretation of soundscapes is the game of Tic-Tac-Toe, activated by Shift F11.


Working with image files (no camera needed)

The vOICe for Windows can also work with image files in various popular formats, such as GIF, BMP and JPEG. Simply pressing Control O will pop up an image file requester. This approach can be useful if you have a good resource for images that are relatively simple or described in sufficient detail. One sample set is the zip file named "TimesNewRoman72.zip" which contains images of all letters 'a' through 'z' in both lowercase and uppercase, as well as the digits '0' through '9'. Other sample sets are the zip file named "lines.zip" which contains images of various straight lines and combinations of straight lines, the zip file named "keyboard.zip" which contains photographs of a regular desktop PC keyboard, and the zip file named "fence.zip" which contains images of crossing a street at an angle while approaching a wooden fence on the other side of the road. Short descriptions are in the included ".txt" files. Options such as zoom (function key F4) can be applied, while the arrow keys allow for moving around in a zoomed view. Moreover, you can switch to manual selection of the next or previous image using the Shift right-arrow or Shift left-arrow key combination, or randomize the presentation sequence with Alt R.


Mental imagery of soundscapes (no hardware needed)

With The vOICe's built-in exercise mode, the image import facility and with live camera views, the difficult task is to hear out and interpret the multiple items in the view, often without full feedback to know whether your interpretation is correct and complete. Now an interesting finding in neuroscience is that when people imagine performing a certain task, much the same brain areas are activated that are also activated when physically performing that task, and it has in addition been found that training by imagining some task can quite effectively contribute to mastering that task, and to some extent act as a replacement. Here we use that to our advantage for a new brain training paradigm in sensory substition, specifically for training synthetic vision with The vOICe. The trick is as simple as it is fundamental: rather than trying to figure out what the sound of some view means visually, we will turn things around and imagine The vOICe sound that belongs to a view that we mentally define ourselves! Through this visual imagery we can always ensure that our "visual" view is correct and complete, while imagining the corresponding soundscapes is significantly easier than deriving visual content from soundscapes. Our still unproven hypothesis is that the direct correspondence between image and sound as well as the brain mechanisms for imagining tasks will make for an effective sensory training tool. It does not require any hardware: no computer, no headphones, no camera, but just a thorough understanding of The vOICe's simple image to sound mapping principles. You can apply it anywhere, in bed, while walking, in the shower, you name it. Now let's clarify the approach by means of a number of examples that right away function as training exercises:

The above set of examples is of course by no means exhaustive, but it is hoped that you will from now on be able to elaborate on them yourself and think of many many situations that you can then try to translate mentally into sound. In principle this can be done by both late-blind and early blind people, although it will be somewhat helpful if you are already familiar with visual perspective and the appearance of common objects through prior visual experience.


Table-top grasping exercise (with camera glasses)

For this direct grasping exercise it is assumed that you have a setup with USB camera sunglasses or equivalent, for consistent alignment of the camera with your head. The exercise can with minor variations also be done with a handheld webcam or a mobile phone version of The vOICe (see box below for a blind user's account), but the experience will then be less vision-like.

DUPLO brick on dark wooden table, within grasping range Be seated at a dark wooden table that is emptied of anything visually distracting. Of course you can also put dark cloth over any table top to much the same effect. Get a relatively small but bright object, for instance a yellow rectangular DUPLO brick like we discuss here, but it may also be a white pen or something else of similar size that is much brighter than the table top. Make sure that lighting is adequate and that there are no distracting shadows.

Now you drop the object on the table such that it bounces around a bit without dropping off the table, landing at a somewhat arbitrary (random) position on the table, and your task is to grab the object without sweeping your hand over the table. So you need to visually locate the object and then reach out with your arm and grab the object
with your hand. The most reliable way to do this is to first center the object in your (camera) view, and only then reach out.

In order to do so, you first have to get the object in the camera view if it isn't already, by looking around. The small bright object sounds much like a beep. Once the beep is noticed, you must center it in your camera view, both vertically and horizontally. Then the object is located right ahead, in the direction where your nose is pointing.

For vertical alignment you need to tilt your head up and down until the object beep is at medium pitch (not high and not low). Next, while maintaining this pitch, you turn your head left and right until the beep sounds half a second after the start of each soundscape, that is, in the horizontal middle of the default one second duration of each soundscape scan. The direction from which the object sound seems to come will then also be straight ahead and not to the left or right. Then you can reach out and grab the object, imagining that it is in the direction where your nose is pointing.

In early practice it is normal to be a few centimeters off the mark, but after a few hours of practice you should be able to grab the object spot on most of the time. Practise this exercise until it becomes a fluent action that you can repeat with ease. You can at the same time try to visualize the object that you are looking for in order to emphasize the visual nature of what you are accomplishing. Try to avoid falling into the old habit of sweeping your hand over the table to locate the object (although a quick correction of a few centimeters is OK). The goal is really to grasp for the object spot on, and this is doable through practice. Learning camera-hand coordination can be fast because you can cast and grab for the DUPLO brick every five seconds or so, giving you some two hundred trials in just 15 minutes.

Sound-induced double brick illusion? Moreover, you will notice that it is fairly easy to tell the orientation of a DUPLO brick from how fast its pitch rises or falls even though this takes only a split second. It can be much harder to tell if the brick landed upside down or not (or on its side), but the eight bumps on top of the brick as well as the pattern of ridges on the bottom both give a characteristic sound texture that contributes to the realism of "seeing".

Once you have mastered the above, you can relax the condition of first centering the object in your view, and directly grasp for any off-center object, guided by pitch and the direction from where you hear the object sound. The next stage is that you extend the single object grasping exercise by casting several DUPLO bricks onto the table, and grasp them all without first centering them one by one in your view. Thus you can very efficiently grasp each of multiple objects from just a single visual sound view. It is much like mastering a foreign language, where you first master conscious application of strict rules of grammar, but can later "forget" about the conscious application of these rules once this becomes automatic and fluent, because conscious application would then only slow you down.

Once you can perform the table top grasping exercise with reasonable ease, you can start training for more mobile situations. Starting from a position at several meters distance from the table, center the object in your view, and walk towards it while keeping the object centered in your view. Note that you will need to bend over to prevent the object from vanishing from your view. Finally grasp it when you are close enough (with the object filling a noticeable slice of your view). In order to strive for maximum fluency, think of this exercise as if you were a predator tracking and going after its prey.

DUPLO brick on dark wooden table, at some distanceDUPLO brick on dark wooden table, at shorter distanceDUPLO brick on dark wooden table, at still shorter distanceDUPLO brick on dark wooden table, almost within grasping rangeDUPLO brick on dark wooden table, within grasping range

The single object grasping exercise may sound elementary, and it is, but it is important to first get the basics right, and it will also build confidence that good use can be made of certain aspects of the often very complex visual sounds. It is also part of many more complex behaviors in mobile situations that can be mastered only after mastering this particular grasping skill. Reaching for a door handle is an immediate practical application of this skill. Directly grasping an object without groping or sweeping your hand is something that you could not have done without "sight".

``I was at my girlfriends house one day. She wanted me to get a clock down off the wall for her so the battery could be changed. Given that I knew that this clock like most of her others was round I decided to try to find it with the mobile phone version of The vOICe. I aimed the camera at the wall and within a few seconds found the clock in question and grabbed it down without first having to search the wall with my hands.''

BS, male blind user of The vOICe MIDlet for Nokia Symbian phones (now obsolete), September 23, 2010.

``While cleaning up a rental I was using the vOICe to find left over carpet scraps, packaging, etc that had been left on the floor. My mother and I are renovating a home for rent, and she wanted me to clean up the living room since that was where all of the stuff was put. I had my n86 with me so I ran the vOICe. After getting an idea of the sound of the carpet that was supposed to be there I then went arround the room and picked up the scraps to be thrown out. I was rather surprised at how simple it was even without headphones, but I guess like anything else as long as the skill is used it won't fade.''

BS, male blind user of The vOICe MIDlet for Nokia Symbian phones (now obsolete), February 23, 2011.


Fully immersive use (with camera glasses)

It is best to alternate the various kinds of exercises with the real thing, which is to walk around with a wearable setup with for instance a notebook PC in a backpack, stereo headphones and a head-mounted camera. Video sunglasses, with covert camera above nose The latter can be a simple home-made setup with a webcam strapped to a hat or helmet of sorts, or a more sophisticated setup with USB camera sunglasses that contain a tiny hidden camera above the nose in the bridge of the glasses. Whatever you do with such a mobile setup, make sure that your personal safety is ensured, because most of the visual information will only be very confusing and distracting at first, and the sounds will inevitably mask the environmental sounds a bit. It is strongly advised to start playing with it in a very familiar and safe environment, such as your home. That also makes it much easier to relate what you see/hear to what you already know is there. Some of the following may be useful to know for congenitally blind people who may be only partly familiar with the various visual concepts.

To some extent, moving around totally blind is a bit like hopping from object to object with perceptual gaps in between - unless the objects themselves emit sounds. This is clearly an oversimplification, but you get the idea. The next object or obstacle often comes into view, by touch or echo, only after losing contact, again in terms of touch or echo, with the former object. With vision, there is a greater continuity in relating to objects because often several objects are in view at the same time, and the next object comes into view before the former object vanishes from the view. So there is a greater amount of perceptual overlap, even if the distance between objects or landmarks is fairly large. That also helps in following a route. Although natural hearing already gives you a sensory surround experience, vision will often provide far more detail about your surroundings, generally enhancing context-awareness. By the way, everything said here about vision also applies to the soundscapes from the camera, because the soundscapes contain the same visual information.

Note: To aid the learning process in active sensing, it is important to be as systematic as possible while exploring even your familiar home environment, which now acts as an engaging training environment. Vision clearly gives you a wider perceptual range than touch, while touch is perfectly familiar to you, so you have to somehow bridge the two types of perceptual input. In case of a moderately distant indoor view with some characteristic sound pattern, such as a rhythm of vertical bars, you would slowly walk towards the source of this visual pattern while continuously reorienting the camera as needed such that the pattern stays within your view. Any rhythm would slow down while you get closer, until at some point you would normally be able to reach out and identify or verify the physical source of the visual pattern by touch. Thus you will learn what was the source of the visual pattern both at close range and at the original larger distance. After a while these patterns should become so familiar to you that you always immediately know in what direction you are looking, and what your approximate position is in your visual environment. Some perceptual recalibration is involved for accurate grasping, also depending on the viewing angle of your camera.

In addition to often having several objects in view that are more or less in the foreground, there is also a visual background. That background is anything distant that is not occluded by the objects in the foreground. It could be a skyline of houses or buildings or whatever there is at a distance. Since distant things appear smaller visually, this background is often highly cluttered with tiny details, and one of the things that takes a lot of brain processing is to figure out what is in the foreground and what is in the background. That is difficult, although the sighted have had a lifetime of vision training to do it without apparent effort, but we can say something about what clues the brain gets and uses for that purpose. One of the key things is apparent change. A door at ten meters distance may appear visually the same as the side of a building at a hundred meters. Both can look like shaded rectangles filling part of the view. However, as you make a few strides towards the door, the door will appear to grow, while the building view barely changes. This is because your relative distance to the nearby door changes much more rapidly than the relative distance to the distant building. So although the visual shapes may at some point even be identical, the amount of change as you move tells something about which items are distant and which items are nearby.

The vOICe in IEEE Spectrum, February 2004 It is the constant patterns within the soundscapes as you move that are part of the distant background (not counting the overall changes in pitch as you look up or down or the lateral shifts as you look left or right). So if you pay attention to those non-changing parts of the soundscape view, that will give you a frame of reference about your heading, somewhat like a "visual compass". The background too will change as you move along, but much more slowly.

Note: By going around you learn how objects have a visual relation to each other, how visual perspective and shading can affect the appearance of one and the same object, and so on. In other words, the mental training task here becomes to learn to hear out the key invariants - particularly objects - in a scene despite the various transformations that change the visual appearance of objects through lighting, position and orientation, (partial) occlusion by other objects, parallax and visual perspective in general. The importance of this kind of live interaction with the environment and the implicit "calibration" or "recalibration" of the body's sensorimotor feedback loop makes that one might even speak of "sensorimotor substitution" rather than the more commonly used term "sensory substitution". Adequate movement as such can be a very important contributing factor to visual perception, although it is likely not the only factor: even while passively looking at views with an unknown underlying structure, pattern recognition and pattern prediction by the human brain may lead to an incremental and mostly statistics-based learning of vision. Still, active exploration and interaction can be very important facilitators in learning through active vision. Some researchers speak of "embodied cognition" in emphasizing the importance of this interplay between movement and visual perception and understanding.

Now imagine that you are walking towards a parked car. Don't try this kind of thing without proper safety precautions, such as having a sighted person with you, preferably an orientation and mobility instructor. With the camera pointing to this car, chances are that the car is still distant enough to let it appear fairly small. So much of the remaining view will be filled with whatever is around the car, including its background. Only when you get close to the car it will seem to grow until it completely fills the soundscape (soundscape), but if you walk along it then it will simply drop out of view on your left or right while other things already came into view, maybe the next parked car. So roughly speaking, apart from simple horizontal and vertical shifts, it is the perceived amount of change as you move that tells which things must be nearby. Knowing this does not make it easy, far from that, but it is useful to be at least aware of such visual principles while experiencing and learning about live soundscapes. Again, if you go mobile, make sure though that whatever you do is in a safe environment, because your brain will be overwhelmed with new and confusing input that it cannot quite handle yet and your normal hearing will also be partly masked by the soundscapes.

In case you are new to vision, an analogy with natural hearing can be helpful to grasp those visual concepts about foreground and background. Suppose there is a distant road with traffic on your left and a distant school with the murmur of children playing on your right. As with the visual background, this auditory background won't change much as you make a few strides, although it will turn left as you turn right and vice versa. However, with this constant background continuously present, you may suddenly touch or start hearing the echo of parked car or wall, so here too it is the nearby things that give the more rapid perceptual changes.

It is the distant background, be it from natural sounds or from the camera's soundscapes, that help you maintain your heading. The nearby objects are only confusing in that particular respect. Again to oversimplify, the nearby objects are of course important in mobility because they can be obstacles, while the distant ones, if perceived, are also important, but rather in orientation, in maintaining your overall heading, and in avoiding veering.

If you know of some metal fence or security door made of regularly spaced vertical bars, that can be a useful visual object to learn about the effects of visual perspective. Let's assume it is a fence. In using The vOICe, such a fence would give a very strong regular rhythm of noise bursts corresponding to the sequence of bars being automatically scanned from left to right. Hard to miss those kinds of gratings. At a distance the rhythm will be very fast. As you approach the fence, the "optical" rhythm slows down because the apparent visual spacing between the bars increases, until there are typically only a few bars in view when the fence gets within arm's reach. You will also find that when you are close to the fence and look along it, you will hear the optical rhythm move fast for the more distant parts of the fence and slowly for the nearby parts of the fence. It is all the consequence of distant things appearing smaller in a visual view. Desktop PC keyboard, slanted view from the left

Don't have a notebook PC yet? To learn a bit about visual perspective while still "tied" to the desktop PC with a regular webcam, you can point the webcam to your keyboard. The rows of keys give a characteristic rhythm. This rhythm slows down if you move the camera closer to the keyboard, and it speeds up as the distance gets larger. This is because visual items appear bigger at close range, such that fewer keys fit in the view. If you look at the keyboard under an angle, you will find that the rhythm speeds up or slows down within one soundscape view, because the more distant parts of the keyboard have a faster rhythm than the closer parts. By experimenting and paying attention to the relation between what you do and what you hear you will learn about effects of visual perspective. You can try the same things with a book shelf holding a row of books, and once mobile you will indeed see/hear the same effects with fences, distant rows of trees or poles or windows, and so on. Practice with various familiar objects that you can touch, such that you can mentally connect what you see/hear in the soundscapes to what you can feel and understand.

In many cases, the special sound rendering of The vOICe lets you readily make the distinction between soundscapes and any natural environmental sounds. The fact that external sounds usually do not make much visual sense, and vice versa, also helps to resolve possible ambiguities. Occasionally you may still get confused by sounds for which it is not immediately apparent if they are part of the visual sounds or part of natural environmental sounds. Typically, if much the same sound is then heard again after a second (or whatever The vOICe scan time was set to), then it was a soundscape, because a natural environmental sound is very unlikely to repeat itself in a similar way. This re-interpretation may take a second or two, and this extra mental check will resolve most remaining ambiguities at the expense of a slight delay, and assuming that the scene is not too dynamically changing. Some room for ambiguities will always remain, but that is no different from normal biological vision.

Opaque clip-on sunglasses attached to camera glasses

Note: for blind people with residual eyesight it is important to train in using The vOICe without relying on eyesight. The same applies to sighted people who want to master The vOICe. It can therefore be very helpful to block natural eyesight either simply by keeping eyes closed, or - preferably - by putting clip-on sunglasses on camera glasses after making those clip-on sunglasses completely opaque by covering them with for instance thick black tape. At a local optician one may buy clip-on sunglasses that have not yet been cut to fit a specific pair of glasses, such that a wide field of view is blocked by making the clip-on sunglasses opaque. The photograph on the right illustrates the resulting glasses setup, which blocks most natural eyesight (all central vision and most peripheral vision) but not the camera lens and its central and peripheral view via The vOICe, using a wide-angle lens in combination with The vOICe's foveal view setting. A convenient side-effect for interested sighted users of The vOICe is that the little bit of remaining peripheral eyesight lets one move around more naturally in a safe (home) environment, without the stress of being an "amateur blind", and really focus attention on the sounds of The vOICe for most of the view, and perform visual training tasks using only The vOICe.

Note 2: Also consider applying the fast blinking trick as described above, because fast intentional blinking may help with soundscape visualization by activating visual brain areas if blind or blindfolded (or while wearing the opaque clip-on sunglasses).

Once you start experiencing the live soundscapes immersively, you will find that some things can be made sense of even while you do not recognize the individual visual objects, and some rational background knowledge about the rules of visual perspective and what to pay attention to can be helpful in the beginning to make at least a little sense of the typically very high complexity of the camera sounds. Are you ready now to raise the bar and implement your own personalized training programme?

YouTube video clips of training during SBIR phase I evaluation study on The vOICe by MetaModal LLC

Blind man visually picks up objects using The vOICe (PIP version) Blind man finds his cane using The vOICe Blind man performs street crossing using The vOICe Blind man scores a goal using The vOICe Blind man analyzes abstract house shape using The vOICe (PIP version) Blind man plays visual tic-tac-toe using The vOICe

Looking around in the desert, nearby cactus

Copyright © 1996 - 2024 Peter B.L. Meijer

Everything should be made as simple as possible, but not simpler

Albert Einstein