LDA on Julia using MultivariateStats
I'm learning classification methods using the book of Brunton & Kutz "DataDriven Science and Engineering", but instead of use only the MATLAB and Python code resources, i rewriting the textbook examples using Julia, because is my main programming language.
I can't find why fitting a MulticlassLDA model to the data doesn't work, it returns a DimensionMismatch("Inconsistent array sizes.")
, but as far as i can tell my arrays are dispatched to the fit function as indicated in the documentation.
This is my code:
using MAT, LinearAlgebra, Statistics, MultivariateStats
# Load data in MATLAB format. Abailible in http://www.databookuw.com/
dogs = read(matopen("../DATA/dogData_w.mat"),"dog_wave")
cats = read(matopen("../DATA/catData_w.mat"),"cat_wave")
CD = hcat(dogs,cats)
u, s, v = svd(CD . mean(CD)) #SVD decomposition
xtrain = vcat(v[1:60,2:2:4],v[81:140,2:2:4]) #training data array, dims 120x2
label = Int.(vcat(ones(60),ones(60))) #label's vector, length 120
xtest = vcat(v[61:80,2:2:4],v[141:160,2:2:4])
classf= fit(MulticlassLDA,2,xtrain,label)
1 answer

You have two issues which are fixed this way:
label = [fill(1, 60); fill(2, 60)] # labels must range from 1 to n fit(MulticlassLDA,2,permutedims(xtrain),label) # observations in xtrain must be stored in columns (not rows)
See the comment in https://multivariatestatsjl.readthedocs.io/en/stable/index.html:
All methods implemented in this package adopt the columnmajor convention of JuliaStats: in a data matrix, each column corresponds to a sample/observation, while each row corresponds to a feature (variable or attribute).
And an explanation about
y
argument to fit https://multivariatestatsjl.readthedocs.io/en/stable/mclda.html#dataanalysis:y
– the vector of class labels, of lengthn
. Each element ofy
must be an integer between1
andnc
.I hope this helps.