- Research Article
- Open access
- Published:
3D-Audio Matting, Postediting, and Rerendering from Field Recordings
EURASIP Journal on Advances in Signal Processing volume�2007, Article�number:�047970 (2007)
Abstract
We present a novel approach to real-time spatial rendering of realistic auditory environments and sound sources recorded live, in the field. Using a set of standard microphones distributed throughout a real-world environment, we record the sound field simultaneously from several locations. After spatial calibration, we segment from this set of recordings a number of auditory components, together with their location. We compare existing time delay of arrival estimation techniques between pairs of widely spaced microphones and introduce a novel efficient hierarchical localization algorithm. Using the high-level representation thus obtained, we can edit and rerender the acquired auditory scene over a variety of listening setups. In particular, we can move or alter the different sound sources and arbitrarily choose the listening position. We can also composite elements of different scenes together in a spatially consistent way. Our approach provides efficient rendering of complex soundscapes which would be challenging to model using discrete point sources and traditional virtual acoustics techniques. We demonstrate a wide range of possible applications for games, virtual and augmented reality, and audio visual post production.
References
Malham DG, Myatt A: 3-D sound spatialization using ambisonic techniques. Computer Music Journal 1995,19(4):58-70. 10.2307/3680991
Soundfield https://doi.org/www.soundfield.com/
Aliaga DG, Carlbom I: Plenoptic stitching: a scalable method for reconstructing 3D interactive walkthroughs. Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '01), August 2001, Los Angeles, Calif, USA 443–450.
Buehler C, Bosse M, McMillan L, Gortler S, Cohen M: Unstructured lumigraph rendering. Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '01), August 2001, Los Angeles, Calif, USA 425–432.
Chen SE, Williams L: View interpolation for image synthesis. Proceedings of the Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '93), August 1993, Anaheim, Calif, USA 279–288.
Horry Y, Anjyo K-I, Arai K: Tour into the picture: using a spidery mesh interface to make animation from a single image. Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '97), August 1997, Los Angeles, Calif, USA 225–232.
Porter T, Duff T: Compositing digital images. Proceedings of the 11th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '84), July 1984, Minneapolis, Minn, USA 253–259.
Yewdall DL: Practical Art of Motion Picture Sound. 2nd edition. Focal Press, Boston, Mass, USA; 2003.
Streicher R: The decca tree - it's not just for stereo anymore. https://doi.org/www.wesdooley.com/pdf/Surround_Sound_Decca_Tree-urtext.pdf
Streicher R, Everest FA (Eds): The New Stereo Soundbook. 2nd edition. Audio Engineering Associate, Pasadena, Calif, USA; 1998.
Daniel J, Rault J-B, Polack J-D: Ambisonics encoding of other audio formats for multiple listening conditions. Proceedings of the 105th Convention of the Audio Engineering Society, September 1998, San Francisco, Calif, USA preprint 4795
Gerzon MA: Ambisonics in multichannel broadcasting and video. Journal of the Audio Engineering Society 1985,33(11):859-871.
Leese MJ: Ambisonic surround sound FAQ (version 2.8). 1998.https://doi.org/members.tripod.com/martin_leese/Ambisonic/
Merimaa J: Applications of a 3-D microphone array. 112th AES Convention, May 2002, Munich, Germany preprint 5501
Laborie A, Bruno R, Montoya S: A new comprehensive approach of surround sound recording. Proceedings of the 114th Convention of the Audio Engineering Society, March 2003, Amsterdam, The Netherlands preprint 5717
Jot J-M, Larcher V, Pernaux J-M: A comparative study of 3D audio encoding and rendering techniques. Proceedings of the AES 16th International Conference on Spatial Sound Reproduction, April 1999, Rovaniemi, Finland
Abhayapala TD, Ward DB: Theory and design of high order sound field microphones using spherical microphone array. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 2: 1949–1952.
Laborie A, Bruno R, Montoya S: High spatial resolution multi-channel recording. Proceedings of the 116th Convention of the Audio Engineering Society, May 2004, Berlin, Germany preprint 6116
Meyer J, Elko G: Spherical microphone arrays for 3D sound recording. In Audio Signal Processing for Next-Generation Multimedia Communication Systems. Edited by: (Arden) Huang Y, Benesty J. Kluwer Academic, Boston, Mass, USA; 2004. chapter 2
Berkhout AJ, de Vries D, Vogel P: Acoustic control by wave field synthesis. Journal of the Acoustical Society of America 1993,93(5):2764-2778. 10.1121/1.405852
Boone MM, Verheijen ENG, van Tol PF: Spatial sound-field reproduction by wave-field synthesis. Journal of the Audio Engineering Society 1995,43(12):1003-1012.
Ajdler T, Vetterli M: The plenacoustic function and its sampling. Proceedings of the 1st IEEE Benelux Workshop on Model Based Processing and Coding of Audio (MPCA '02), November 2002, Leuven, Belgium
Do MN: Toward sound-based synthesis: the far-field case. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '04), May 2004, Montreal, Que, Canada 2: 601–604.
Gortler SJ, Grzeszczuk R, Szeliski R, Cohen MF: The lumigraph. Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '96), August 1996, New Orleans, La, USA 43–54.
Levoy M, Hanrahan P: Light field rendering. Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '96), August 1996, New Orleans, La, USA 31–42.
Horbach U, Karamustafaoglu A, Pellegrini R, Mackensen P, Theile G: Design and applications of a data-based auralization system for surround sound. Proceedings of the 106th Convention of the Audio Engineering Society, May 1999, Munich, Germany preprint 4976
Pellegrini RS: Comparison of data and model-based simulation algorithms for auditory virtual environments. 106th Convention of the Audio Engineering Society, May 1999, Munich, Germany preprint 4953
Bregman AS: Auditory Scene Analysis, The Perceptual Organization of Sound. MIT Press, Cambridge, Mass, USA; 1990.
Baumgarte F, Faller C: Binaural cue coding—part I: psychoacoustic fundamentals and design principles. IEEE Transactions on Speech and Audio Processing 2003,11(6):509-519. 10.1109/TSA.2003.818109
Faller C, Baumgarte F: Binaural cue coding—part II: schemes and applications. IEEE Transactions on Speech and Audio Processing 2003,11(6):520-531. 10.1109/TSA.2003.818108
Merimaa J, Pulkki V: Spatial impulse response rendering. Proceedings of the 7th International Conference on Digital Audio Effects (DAFx '04), October 2004, Naples, Italy 139–144.
Pulkki V: Directional audio coding in spatial sound reproduction and stereo upmixing. Proceedings of the 28th AES International Conference, June 2006, Pitea, Sweden
O'Grady PD, Pearlmutter BA, Rickard ST: Survey of sparse and non-sparse methods in source separation. International Journal of Imaging Systems and Technology 2005,15(1):18-33. 10.1002/ima.20035
Vincent E, Rodet X, Röbel A, et al.: A tentative typology of audio source separation tasks. Proceedings of the 4th International Symposium on Independent Component Analysis and Blind Signal Separation (ICA '03), April 2003, Nara, Japan 715–720.
Rickard S: Sparse sources are separated sources. Proceedings of the 14th Annual European Signal Processing Conference, September 2006, Florence, Italy
Lewicki MS: Efficient coding of natural sounds. Nature Neuroscience 2002,5(4):356-363. 10.1038/nn831
Comon P: Independent component analysis. A new concept? Signal Processing 1994,36(3):287-314. 10.1016/0165-1684(94)90029-9
Sawada H, Araki S, Mukai R, Makino S: Blind extraction of dominant target sources using ICA and time-frequency masking. IEEE Transactions on Audio, Speech and Language Processing 2006,14(6):2165-2173.
Jourjine A, Rickard S, Yilmaz O: Blind separation of disjoint orthogonal signals: demixing N sources from 2 mixtures. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '00), June 2000, Istanbul, Turkey 5: 2985–2988.
Yilmaz Ö, Rickard S: Blind separation of speech mixtures via time-frequency masking. IEEE Transactions on Signal Processing 2004,52(7):1830-1847. 10.1109/TSP.2004.828896
Avendano C: Frequency-domain source identification and manipulation in stereo mixes for enhancement, suppression and re-panning applications. Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA '03), October 2003, New Paltz, NY, USA 55–58.
Radke R, Rickard S: Audio interpolation. Proceedings of the AES 22nd International Conference on Virtual, Synthetic and Entertainment Audio (AES22 '02), June 2002, Espoo, Finland 51–57.
Moses RL, Krishnamurthy D, Patterson R: An auto-calibration method for unattended ground sensors. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 3: 2941–2944.
Faugeras O: Three-Dimensional Computer Vision: A Geometric Viewpoint. MIT Press, Cambridge, Mass, USA; 1993.
Moore BCJ: An Introduction to the Psychology of Hearing. 4th edition. Academic Press, New York, NY, USA; 1997.
Aarabi P: The fusion of distributed microphone arrays for sound localization. EURASIP Journal on Applied Signal Processing 2003,2003(4):338-347. 10.1155/S1110865703212014
(Arden) Huang Y, Benesty J, Elko GW: Microphone arrays for video camera steering. In Acoustic Signal Processing for Telecommunication. Kluwer Academic, Boston, Mass, USA; 2000:239-259. chapter 11
Knapp CH, Carter GC: The generalized correlation method for estimation of time delay. IEEE Transactions on Acoustics, Speech, and Signal Processing 1976,24(4):320-327. 10.1109/TASSP.1976.1162830
Krim H, Viberg M: Two decades of array signal processing research: the parametric approach. IEEE Signal Processing Magazine 1996,13(4):67-94. 10.1109/79.526899
Schmidt RO: Multiple emitter location and signal parameter estimation. IEEE Transactions on Antennas and Propagation 1986,34(3):276-280. 10.1109/TAP.1986.1143830
Chen JC, Yao K, Hudson RE: Acoustic source localization and beamforming: theory and practice. EURASIP Journal on Applied Signal Processing 2003,2003(4):359-370. 10.1155/S1110865703212038
DiBiase JH, Silverman HF, Branstein MS: Microphone Arrays, Signal Processing Techniques and Applications. Springer, New York, NY, USA; 2001. chapter 8
Mungamuru B, Aarabi P: Enhanced sound localization. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 2004,34(3):1526-1540. 10.1109/TSMCB.2004.826398
Chen J, Benesty J, (Arden) Huang Y: Time delay estimation in room acoustic environments: an overview. EURASIP Journal on Applied Signal Processing 2006, 2006: 19 pages.
Rabinkin DV, Renomeron RJ, French JC, Flanagan JL: Estimation of wavefront arrival delay using the cross-power spectrum phase technique. 132nd Meeting of the Acoustical Society of America, December 1996, Honolulu, Hawaii, USA
Chen J, Benesty J, (Arden) Huang Y: Performance of GCC- and AMDF-based time-delay estimation in practical reverberant environments. EURASIP Journal on Applied Signal Processing 2005,2005(1):25-36. 10.1155/ASP.2005.25
Rui Y, Florencio D: New direct approaches to robust sound source localization. Proceedings of International Conference on Multimedia and Expo (ICME '03), July 2003, Baltimore, Md, USA 1: 737–740.
Ajdler T, Kozintsev I, Lienhart R, Vetterli M: Acoustic source localization in distributed sensor networks. Proceedings of the 38th Asilomar Conference on Signals, Systems and Computers, November 2004, Pacific Grove, Calif, USA 2: 1328–1332.
Samet H: The Design and Analysis of Spatial Data Structures. Addison-Wesley, Reading, Mass, USA; 1990.
Tsingos N, Gallo E, Drettakis G: Perceptual audio rendering of complex virtual environments. ACM Transactions on Graphics 2004,23(3):249-258. Proceedings of SIGGRAPH 2004 10.1145/1015706.1015710
Kalman RE: A new approach to linear filtering and prediction problems. Transactions of the ASME - Journal of Basic Engineering 1960, 82: 35–45. 10.1115/1.3662552
Malham DG: Spherical harmonic coding of sound objects - the ambisonic 'O' format. Proceedings of the 19th AES International Conference, Surround Sound—Techniques, Technology, and Perception, June 2001, Schloss Elmau, Germany 54–57.
Tsingos N, Gascuel J-D: Fast rendering of sound occlusion and diffraction effects for virtual acoustic environments. Proceedings of the 104th Audio Engineering Society Convention, May 1998, Amsterdam, The Netherlands preprint 4699
Baskind A, Warusfel O: Methods for blind computational estimation of perceptual attributes of room acoustics. Proceedings of the AES 22nd International Conference on Virtual, Synthetic and Entertainment Audio, June 2002, Espoo, Finland 402–411.
Rickard S, Yilmaz O: On the approximate W-disjoint orthogonality of speech. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 1: 529–532.
Lewicki MS, Sejnowski TJ: Learning overcomplete representations. Neural Computation 2000,12(2):337-365. 10.1162/089976600300015826
Mallat SG, Zhang Z: Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing 1993,41(12):3397-3415. 10.1109/78.258082
Slaney M, Covell M, Lassiter B: Automatic audio morphing. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '96), May 1996, Atlanta, Ga, USA 2: 1001–1004.
Faller C, Merimaa J: Source localization in complex listening situations: selection of binaural cues based on interaural coherence. Journal of the Acoustical Society of America 2004,116(5):3075-3089. 10.1121/1.1791872
Ephraim Y, Malah D: Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing 1984,32(6):1109-1121. 10.1109/TASSP.1984.1164453
Huang G, Yang L, He Z: Multiple acoustic sources location based on blind source separation. Proceedings of the 1st International Conference on Natural Computation (ICNC '05), August 2005, Changsha, China 683–687.
Saruwatari H, Kurita S, Takeda K, Itakura F, Nishikawa T, Shikano K: Blind source separation combining independent component analysis and beamforming. EURASIP Journal on Applied Signal Processing 2003,2003(11):1135–1146. 10.1155/S1110865703305104
Wilson KW, Darell T: Learning a precedence effect-like weighting function for the generalized cross-correlation framework. IEEE Transactions on Audio, Speech and Language Processing 2006,14(6):2156-2164.
Lu L, Wenyin L, Zhang H-J: Audio textures: theory and applications. IEEE Transactions on Speech and Audio Processing 2004,12(2):156-167. 10.1109/TSA.2003.819947
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://doi.org/creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Gallo, E., Tsingos, N. & Lemaitre, G. 3D-Audio Matting, Postediting, and Rerendering from Field Recordings. EURASIP J. Adv. Signal Process. 2007, 047970 (2007). https://doi.org/10.1155/2007/47970
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1155/2007/47970