I got interested in seeing how do the spectral distribution of everyday sounds look like. So I got an app in my phone (Smart Recorder) and started recording them. The most interesting result (until now) is from the simplest sound I have recorded: a bottle of milk being filled at my kitchen’s tap. I present the audio, associated spectrogram and the theoretical analysis in this post. All the work is performed in Python, from reading the data to plotting.

Audio recording

Here is the audio:

As it sounds, it is just a bottle being filled with water. At the beginning (\(t\) < 1 second) there is nothing, until I open the tap. After about 32 seconds, the bottle is full and water is overflowing to the sink. There is a constant component lied to the impact of the particles on the bottom of the bottle/water column. Besides that, an indistinguishable and interesting tone that is changing in time can be heard. This sound is a resonance of air column with a closed-end (that is actually the water) and an open-end:

Illustration of the change of resonance frequency with the air column height — Change of resonance frequency with the air column height.

There is an increase of the frequency with the reduction of the wavelength \(\lambda\), that is linear in time until around 20 seconds. After that the increase is not constant due to the non linear modification of the available space for the air inside the bottle originated from the reduction of the diameter with the height.

Spectrogram

In order to calculate the evolution of the spectral content of the signal, a spectrogram sounds like a very good option. The idea is calculating the Fourier transform for consecutive frames of the signal, that way, having a time distribution of spectra. In the end, you get a 2D result, that is, a spectrum for each time (based on the center of the frame).

The first step is reading the audio file (.wav) generated by the app. I used the function available in scipy (scipy.io.wavfile):

    print('Reading audio file: ' + audio_filename)
    sampling_frequency, data = scipy.io.wavfile.read(audio_folder
                                                     + audio_filename)

Rather than coding my spectrogram function myself, the calculation is done using the dedicated function in module signal, also from scipy (signal.spectrogram):

    # properties for the spectrogram calculation
    nperseg = 8192 # segment size for the spectrogram (nfft)
    noverlap = np.int16(nperseg*0.9) # number of elements in the overlap
        
    # selecting data corresponding to the time interval
    # calculation is not performed for the full time
    ns, = np.shape(data)
    t = np.linspace(0,ns,ns)*(1/sampling_frequency)
    mask = (time_start <= t) & (t <= time_end)
    data = data[mask]
    
    # calculating the spectrogram
    print('Calculating spectrogram')
    freq, time_spec, Sxx = signal.spectrogram(
                                   data, sampling_frequency,
                                   nperseg=nperseg,
                                   noverlap=noverlap)
    print('Done')

Parameters used for the calculation (number of the segment and size of the overlap) are defined as to have a nice visualization of the spectral evolution. A segment too big won’t be able to follow correctly the evolution of the spectra, and a frame too small results in poor spectral resolution of the result. A frame size of 8192 points represents a time duration of 0.186 seconds and a frequency resolution of 2.69 Hz. With an overlap of 90%, available resolution in time is of 0.0186 seconds. Take a look at the result:

Spectrogram of the audio of a bottle being filled. — Spectrogram of the audio.

Since the microphone was not calibrated, I don’t know its sensibility, and most importantly, I have no idea of what happens inside the operating system and recording app in terms of weightings and corrections, so it is impossible to give absolute values for the sound pressure levels. Nevertheless, the analysis for the relative energy levels in each frequency remains very fascinating.

There are 4 highly energetic spots, a fundamental frequency and its harmonics. We must recall that for the case of one open-end air column, only the odd harmonics are present (only the first 2 are illustred on the diagram):

Resonance frequencies for a given air column height.

Those frequencies are growing in time in a more or less parabolic matter. The spectrogram is as clear as the audio in showing that after 20 seconds the increase is more pronounced. To make things even more interesting, one can present the sound and spectrogram simultaneously, so I merged the two in a video where we can follow the time evolution with the help of a vertical red bar walking along the time scale, synced with the audio:

The video is done in Python using matplotlib. The module Animation allows the creation of videos, it has also been used in this previous post. To have a nice looking animation, several dpi and bit rates values are tested being 800 and 800 the selected values.

For those parameters, rendering time started to getting really long, so in order to speed things up a little, I kinda parallelized the code. My strategy was dividing the frames between the processors, and rather than asynchronously passing data to ffmpeg, what I believe is a very complex mission, I just passed the task of creating a part of the video to each processor using the multiprocessing package. Only small issue is that you cannot pass matplotlib objects, so the plot was initiated in each process (evaluated inside the function generate_images(args), presented next) rather than in the main code (my original idea) by calling the function spectrogram_plot. The bar is added and its position is updated for each frame:

def generate_images(args):
    # plotting the base image
    data, frames, fs, fps, folder, duration = args
    
    video_dpi = 800
    frequency_min = 20 # minimal frequency in plot [Hz]
    frequency_max = 6000 # maximal frequency in plot [Hz]
    
    FFMpegWriter = animation.writers['ffmpeg']
    writer = FFMpegWriter(fps=fps, codec='mpeg4', bitrate=800)
    
    fig, ax = spectrogram_plot(data,duration,fs)
    
    # adding the line representing the current time
    line, = ax.plot([0,0],
                    [frequency_min,frequency_max], 
                    color='r',
                    linestyle='-',
                    linewidth=0.8)
    
    
    video_filename = '{:}temp_vid{:08d}.mp4'.format(folder,frames[0])
    print('Starting writing video: ' + video_filename)
    n_frames = frames[-1] - frames[0] + 1
    # creating intermediary video files
    with writer.saving(fig, video_filename, dpi=video_dpi):
        i_frame = 0
        for frame in frames:
            t = frame/fps
            if i_frame%fps == 0:
                print('{:} ({:06}/{:06})'.format(
                        video_filename,i_frame,n_frames)
                     )
            # updating the line location
            line.set_xdata([t,t])            
            writer.grab_frame() # adding frame
            i_frame += 1
            
    print('End writing video: ' + video_filename)
    # returning the name of the generated file
    return video_filename

After they are all done, they are merged from a call of ffmpeg directly from python (using the subprocess module). The text files (list_files.txt) with the name of the individual videos is also generated in python and used as an input for ffmpeg. After the videos are merged into one silent spectrogram, audio is embedded:

    print('Joining videos')
    subprocess.call(['ffmpeg','-y','-safe','0','-f',
                     'concat','-i','list_files.txt',
                     '-c','copy',silent_folder])
        
    # joining audio and video files
    # audio file is cut to respect the selected video duration
    print('Merging audio + video')
    #  output properties
    subprocess.call(['ffmpeg','-y', 
                    '-i',silent_folder,  
                    '-ss','0:00:{:02d}'.format(time_start),
                    '-t','{:d}'.format(video_duration),
                    '-i','{:}'.format(audio_folder + audio_filename), 
                    '-crf','15',
                    '-level','5.1',
                    '-shortest',
                    '-metadata','title=\"bottle spectrogram\"',
                    final_folder])
    print('End')

Actually, what matplotlib does for the animation is a wrapper of ffmpeg. I followed the same principle, just did it myself. With 5 processors, the complete process takes about 10 minutes in my PC (half of what is done with only one). It’s far from being a perfect scability, but at least I gain 10 minutes.

Theoretical frequencies

It’s not hard to calculate the resonance in this case. Considering an open-end tube and its odd harmonics:

\( f_n = \frac{n c}{\lambda} = \frac{n c}{4 L}
\)

where \(n\) is the harmonic number (1, 3, 5, …), \(c\) is the sound speed in air, and \(L\) is the air column height. As the bottle is being filled and \(L\) reduces, the wavelength (\(L = \lambda/4\)) is reduced, thus, the frequency increases, as noted in the audio and the spectrogram.

The first step for the identification of the theoretical frequencies is to define the shape of the bottle, so the evolution of the equivalent air column height is known at each time. This was actually the hardest part of the work, once I didn’t have any proper tool to do it. My solution was cutting the bottle (actually an identical twin of the one used in the recording), contouring one of the halves in paper and using a ruler + 2 triangle rulers to get the external radius at different heights.

For calculation purposes, I separated the bottle in three parts, based on the change of diameter: the base of content diameter (\(z\) from 0 to 14.5 cm, starting at the bottom of the table); the top, region with the constriction (14.5 < \(z\) < 21.5 cm); and the neck, with a more or less constant diameter of 2.0 cm (\(z\) from 21.5 to 23.5 cm, the top of the bottle). The thickness of the bottle is rudely approximated as 0.5 mm at the body and top and 1.5 mm at the neck.

For each part of the bottle, the radius is approximated by a 2nd degree polynomial considering the relative height (distance from the beginning of the current part):

\( r(\Delta z) = a \Delta z^2 + b \Delta z + c
\)

The coefficients are obtained using least squares fit, the starting point is imposed by defining the term c, with my homemade function least_squares_fit(x,y,y0=[]) at bottle_shape.py. Even if only \(C^0\) continuity is imposed, the transitions are not aggressive singularities. Globally, the result follows well the trend noted in the points, as you can see next:

Graph with the measured radius vs height. — Measured radius and regression laws representing the bottle shape.

Volume can be defined as the integral of the area (in this case, square of the radius times 2\(\pi\)) from the bottom (\(\Delta z = 0\)) to top (\(\Delta z = H\)):

\( V = \int_{0}^{H} \pi r^2(\Delta z) d \Delta z
\)

For each part of the bottle, the radius is approximated as a 2nd degree polynomial, so an analytical solution for the volume is possible. Calculation is not that hard (mostly dealing with exponents), but very long. I was smart enough to not try to do this by hand, and went directly to Maxima, where only 1 line of code is necessary:

integrate(pi*(a*x^2 + b*x + c)^2,x,0,H)

Obtained answer is:

\( V = (\pi/30) (30c^2 H+30bcH^2+ (20ac+ 10b^2) H^3+ 15abH^4 + 6a^2 H^5)
\)

Another trickiest step was coding the calculation of the volume, since each one of the 3 parts has a different equation. I tried for sometime to avoid having a million of if statements, but was gained by my urge to post this as soon as possible. My solution was having a sum of contributions until the selected height, accounting for the excess considered on the volume calculation of already filled parts. See the code:

    vol = vol_base(z) + \
          (z > 14.5)*(vol_curve_1(z-14.5) - vol_base(z-14.5)) + \
          (z > 21.5)*(vol_curve_2(z-21.5) + 
                     - (vol_curve_1(z-14.5) - vol_curve_1(21.5-14.5))) + \
          (z > 23.5)*(-(vol_curve_2(z-21.5) - vol_curve_2(23.5-21.5)))

So we can move one, three hypothesis are performed: 1) bottle is considered completely rigid and perfectly vertical, 2) the volume rate is constant, 3) the sound velocity in air is taken as 340 cm/s. They are quite reasonable, since there was no notable change in temperature or in my apartment infrastructure during the 30 seconds of the recordings. To define the frame rate, the sound recording is analyzed in Audacity, so the start and end of the filling process are defined: start at approximately 1.145 seconds and end at 34.022. Based on the total bottle volume of 1071.05 \(\mathrm{cm}^3\), naturally bigger than the volume of milk that is sold inside it, the volume flow rate is of 32.58 \(\mathrm{cm}^3/\mathrm{s}\).

It is not the volume we are looking for, but the air/water column height. In order to have that, the inverse must be calculated. This is done numerically with the Newton’s method, as well implemented in scipy (scipy.optimize.newton). The difference between the target volume and the volume at height \(z\) is set as the value to eradicate:

def height_in_bottle(volume,z0=0):
    """ Inverse function to get the height from the water volume 
    
    """
    
    # if not given, the initial estimate is the height for a cylinder
    z0 += (z0 == 0)*volume/(np.pi*radius**2)
    
    def fun_to_zero(z):
        return volume_in_bottle(z) - volume
            
    return scipy.optimize.newton(fun_to_zero,z0,
                                 tol=1e-5,maxiter=100)

So, for each time, the volume is calculated for a constant volume rate. For the water volume, the water column (thus, air column) height is defined based on the regression laws for the shape of the bottle. Using the open tube resonance frequency equations, the value of the air resonance frequencies are known at each time. The values are superimposed to the original spectrogram, and I get a very nice result:

Spectrogram superposed by the theoretical resonance frequencies.

The outcome is quite impressive considering the awful precision of the measuring of the bottle. The differences at the very beginning of the time scale are associated with the details on the bottom of the bottle, that give it structural strength, but are completely ignored in my model of the shape. Also, accounting for the actual position of the nodes in the case of open air tube could enhance the results.

All the scripts generated for this analysis are available here. You are more than welcome to download, modify and share!