Monday, July 30, 2018

A sound visualization app that makes deafness irrelevant

There are some very good programmers among the readers – this could be a fun exercise. I just accidentally "chatted" with a deaf guy on Twitter and decided to spend a few more minutes by thinking about some high-brow, hi-tech technological ways to aid the deaf people – who may represent, according to some counting, up to 5% of the world population.

It's a huge market. It could be interesting to create a helpful app that effectively "returns the hearing" by maximally visualizing the information contained in the sound. That app could be used with some phones but maybe with some "Google Glasses" or a similar gadget that the deaf folks could find more helpful than others.




First, the app should try to do voice recognition - like if you click the microphone icon at Google.com. Second, it should try to a pick a maximum amount of information about the frequencies represented in the present sound and noise.




That information should be quickly Fourier-analyzed – each split second of the sound – and the Fourier components should undergo some additional processing that extracts the information that is maximally helpful for communication or distingushing thing, and that information would be represented by some image.

The image obtained from the sound could contain not just curves or spectrograms (see a spectrogram of a "song" that looks like cool images with creatures; graph-like pictures for fun sounds; spectrograms of vowels).

Colors could be everywhere to represent the number of frequency peaks, the ratios of dominant frequencies, and lots of other things. The graphical representation should be helpful to distinguish known vowels and consonants, aside from some other important sounds, and their frequencies. You could approach the problem in some truly sophisticated way – as a machine learning problem. Some criteria that may maximally distinguish sounds could be graphed in some way. Also, a detailed stereoanalysis of the sources' location could be made and influence the pictures.

Music could be reverse engineered to notes – but perhaps in some more "biological" way so that the programmer who does it doesn't have to code any precise notation. Instead, the users will spontaneously learn to recognize what is instantly incorporated in the sound.

If the sound is pictured in a sufficiently comprehensive, readable way, isn't it obvious that a (deaf) human being could learn to "hear" the sound by his eyes as well as others can hear it with their ears? Do you agree with my intuition that a human being gets much more information from the eyes than the ears? Compare the audio and visual data inside a video file. I think it's just a question how this should be done optimally, not whether it's possible at all.

The visualization could respect the superposition principle to some degrees – so that if two people are speaking at the same moment, you can in principle read both of them simultaneously.

No comments:

Post a Comment