Applying Data Mining Techniques to Identify Musical Instruments

A case study based project which was part of a Data Mining unit at Bournemouth University, BSc (Hons) Computing degree.

Background:
training dataset that contained numerical features extracted from short audio clips of two musical instruments playing simultaneously. Each data instance in the training dataset has been annotated with two instruments that play – e.g. Saxophone and Tuba.

The task was to build a predictive model that will recognise an instrument on the basis of the same numerical features in a test dataset. The instances in the test dataset have been generated from short audio clips of single instrument playing.

The data:

  • Training dataset – 2000 instances including information of the two instruments playing. Available here – train_dataset
  • Test dataset – 5000 instances for which the model has to determine the instrument that is playing. Available here – test_dataset

The data consisted of different attributes where the Training dataset had 123 and the Test dataset 127 in total.

Some of the attributes are listed below:

  • BandsCoef1-33, bandsCoefSum – Flatness coefficients
  • MFCC1-13 – MFCC coefficients
  • HamoPk1-28 – Harmonic peaks
  • Prj1-33, prjmin, prjmax, prjsum, prjdis, prjstd – Spectrum projection coefficients
  • SpecCentroid, specSpread, energy, log spectral centroid, log spectral spread, flux, rolloff, zerocrossing – acoustic spectral features
  • LogAttackTime, temporalCentroid – Temporal features

Additionally, the single instruments set contained:

  • Frameid – Each frame is 40ms long signal
  • Note – Pitch information
  • Playmethod – One schema of musical instrument classification according to the way they are played
  • Class1, Class2 – Another schema of musical instrument classification according to Hornbostel-sache

The list of the instruments was the following – Accordian, Cello, B-FlatClarinet, DoubleBass, FrenchHorn, Oboe, Piano, SopranoSaxophone, BassSaxophone, BaritoneSaxophone, TenorSaxophone, AltoSaxophone, Synthbass, TenorTrombone, CTrumpet, cTrumpet, B-FlatTrumpet, Tuba, Viola, Violin, AcousticBass, ElectricGuitar, Vibraphone and Marimba.

Modelling tool:
Weka 3 – Data Mining Software in Java

Online Model Checker:
Kaggle

The methodology that was used for this data mining project was the Cross Industry Standard Process for Data Mining (CRISP-DM). The model below displays the six phases that were sequentially executed for the completion of the project.

All of the required steps were completed in the Business Understanding phase. After the Data Understanding phase was completed as well, the following was done:

  1. Reducing the number of instruments from 24 to 18.
  2. Registering 4 unknown instruments.
  3. Separation of the instruments into 3 main groups with subgroups. (figure available below)

During the Data Preparation phase, the listed steps were executed and additionally the data was transferred from the provided .csv files into a SQL database for easier manipulation and preparation. What is more, dealing with a multi-label classification (having two target labels – two instruments playing in the training dataset) problem led to the transformation of the training dataset and creating multiple one instrument detector datasets with a boolean condition (e.g. Oboe / not Oboe). That transformation is called “problem transformation method” and more information about it can be found here. All of the methods discussed in the paper were analysed and the method that transforms the multi-label class problem using binary classifier was chosen.

During the Modelling phase, steps were taken back to the Data Preparation phase multiple times. The approach was to experiment with 5 fold cross-validation tests on the training dataset for each instruments using different algorithms.

The algorithms that were used were:

  1. Logistic Regression (SimpleLogistic in Weka)
  2. Decision Tree (J48 in Weka)
  3. Naive Bayes ( NaiveBayes in Weka)

They were specifically selected due to their different logic and execution. Logistic Regression proved to be the best one for the problem, after the transformation of the multi-classification into binary classification, due to the single decision boundry where the algorithm had to work out whether something is true or false. The way the algorithm works is by applying a weighting to each instance based on the combination of the attributes in it and determining whether it is likely to be true, or it is likely to be false. In that way, it can clearly separate the true cases from the false cases and the closer to 1 the overall weight is, the higher the probability of that instance to be true or false is. What is more, another advantage that I found by using this algorithm was that it produced output that can be interpreted as a probability. That came as a really nice side effect when cross-evaluating the collected results from all of the instrument detectors because there was a probability factor that was taken into account when deciding the instrument playing.

Additionally,  because the problem was transformed to a binary classification, the problem of imbalanced classes appeared. That meant in many cases there were for example 1800 Not Oboe to 300 Oboe. The problem was tackled applying Synthetic Over-Sampling Technique (SMOTE)

The best results from the balanced training dataset that the algorithm got were:

  • Precision – 0.994 / 1
  • Recall – 0.994 / 1
  • F-Measure – 0.994 /1
  • ROC Area – 1 / 1

For the Evaluation phase, all of the collected insight was applied together with the results from the prediction models and final predictions were cross-evaluated in Kaggle to get a prediction result.
At the end of the competition, the built prediction model scored 59%, where the best result was 69%.

 

Share this Post

Are you interested to find out more about this project? Feel free to drop me a message below 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *