Machine Learning - Pitch Detection

Input - continuous monophonic audio signal at 44.1kHZ at 16 bit
ANN Architecture - Use an ANN specifically an RNN with LSTM or GRU trained using Adam.
Layer (or should it simply be number of parameters) sized (ignore nyquist condition as included in sample rate) to capture longest wave form probably need a filter layer.
Use RELU or a softmax output so there is a nice linear output.
If each cell (4 weights) is a filter we could initialise forget weights to pick out frequencies rather than just being random.

$(\frac{1}{4} \cdot \frac{44100}{27.5}) = 400.9$

$(\frac{1}{4} \cdot 88) = 22$

Training Data - computer generated audio at various frequencies and phase shift possibly include harmonics and different wave forms. From A0 (27.5Hz) to B8 (7902.13Hz)
Output - continuous value proportional to the frequency of the input signal
Test data set - tuning fork, guitar strings, violin strings

Bibliography

Technologies used

python
tensor-flow

Discover more products

Discover more research

Discover more unsupported-products

neonStarlight
swampStomp