Pythonのlibrosaで音声解析まとめ

音声解析なんて大学時代に授業で触った程度。
最近はPythonのライブラリでおおよそのことは出来るみたいなので、一通り触ってみた。

librosa

音声解析用ライブラリ。
github.com

librosaのインストール

pip install librosa

音声読み込み

audio_path="hogehoge"
y, sr = librosa.load(audio_path)

音声再生

注：jupyter notebookで実行しています。

display(IPython.display.Audio(y, rate=sr))

STFT（短時間フーリエ変換）

S = np.abs(librosa.stft(y)
plt.figure(figsize=(12,4))
img = librosa.display.specshow(librosa.amplitude_to_db(S,ref=np.max),y_axis='log', x_axis='time', ax=ax)
plt.title('Chromagram')
plt.colorbar()
plt.tight_layout()

メル尺度

人間の音声近くを反映した尺度。人間は低い音の聞き分けは得意だが、高い音は苦手なので、その辺を考慮したものになっている。

S = librosa.feature.melspectrogram(y, sr=sr, n_mels=128)
log_S = librosa.power_to_db(S, ref=np.max)
plt.figure(figsize=(12,4))
librosa.display.specshow(log_S, sr=sr, x_axis='time', y_axis='mel')
plt.title('mel power spectrogram')
plt.colorbar(format='%+02.0f dB')
plt.tight_layout()

オンセット検出

オンセットとは音の開始点（らしい）

o_env = librosa.onset.onset_strength(y, sr=sr)
times = librosa.times_like(o_env, sr=sr)
onset_frames = librosa.onset.onset_detect(onset_envelope=o_env, sr=sr)
D = np.abs(librosa.stft(y))
fig, ax = plt.subplots(nrows=2, sharex=True)
librosa.display.specshow(librosa.amplitude_to_db(D, ref=np.max),x_axis='time', y_axis='log', ax=ax[0])
ax[0].set(title='Power spectrogram')
ax[0].label_outer()
ax[1].plot(times, o_env, label='Onset strength')
ax[1].vlines(times[onset_frames], 0, o_env.max(), color='r', alpha=0.9,linestyle='--', label='Onsets')
ax[1].legend()

ビートトラッキング

音声のテンポを自動検知して周波数を返す

onset_env = librosa.onset.onset_strength(y, sr=sr,
                                         aggregate=np.median)
tempo, beats = librosa.beat.beat_track(onset_envelope=onset_env,
                                       sr=sr)
import matplotlib.pyplot as plt
hop_length = 512
fig, ax = plt.subplots(nrows=2, sharex=True)
times = librosa.times_like(onset_env, sr=sr, hop_length=hop_length)
M = librosa.feature.melspectrogram(y=y, sr=sr, hop_length=hop_length)
librosa.display.specshow(librosa.power_to_db(M, ref=np.max),
                         y_axis='mel', x_axis='time', hop_length=hop_length,
                         ax=ax[0])
ax[0].label_outer()
ax[0].set(title='Mel spectrogram')
ax[1].plot(times, librosa.util.normalize(onset_env),
         label='Onset strength')
ax[1].vlines(times[beats], 0, 1, alpha=0.5, color='r',
           linestyle='--', label='Beats')
ax[1].legend()

MFCC（メル周波数ケプストラム

mfccs = librosa.feature.mfcc(x, sr=fs)

HPSS(Harmonic-Percussive sound separation)

調波打楽器音分離なんかも関数一発

y, sr = librosa.load(librosa.ex('choice'))
y_harmonic, y_percussive = librosa.effects.hpss(y)

サンプルが充実しているのが良い。

スーパーメモ

進捗なし