How do I input a set of images from a directory into python to use as a training set?
I've been able to extract URL datasets and links to be able to be used as a training/testing dataset, however I want to expand this into images. Basically, if I have 150 images of cats, how would I be able to input this in and classify with it?
Current code that extracts from URL using IRIS dataset
import pandas from pandas.plotting import scatter_matrix import matplotlib.pyplot as plt from sklearn import model_selection from sklearn.metrics import classification_report from sklearn.metrics import confusion_matrix from sklearn.metrics import accuracy_score from sklearn.neighbors import KNeighborsClassifier url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data" names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class'] dataset = pandas.read_csv(url, names=names) print(dataset.shape) print(dataset.head(20)) print(dataset.loc) print(dataset.describe()) print(dataset.loc) plt.show() dataset.hist() plt.show() scatter_matrix(dataset) plt.show() array = dataset.values X = array[:,0:4] Y = array[:,4] validation_size = 0.20 seed = 7 X_train, X_validation, Y_train, Y_validation = model_selection.train_test_split(X, Y, test_size=validation_size, random_state=seed) seed = 7 scoring = 'accuracy' models =  models.append(('KNN', KNeighborsClassifier())) # evaluate each model in turn results =  names =  for name, model in models: kfold = model_selection.KFold(n_splits=10, random_state=seed) cv_results = model_selection.cross_val_score(model, X_train, Y_train, cv=kfold, scoring=scoring) results.append(cv_results) names.append(name) msg = "%s: %f (%f)" % (name, cv_results.mean(), cv_results.std()) print(msg) fig = plt.figure() fig.suptitle('Algorithm Comparison') ax = fig.add_subplot(111) plt.boxplot(results) ax.set_xticklabels(names) plt.show() knn = KNeighborsClassifier() knn.fit(X_train, Y_train) predictions = knn.predict(X_validation) print(accuracy_score(Y_validation, predictions)) print(confusion_matrix(Y_validation, predictions)) print(classification_report(Y_validation, predictions))
You can use your library of choice to read images with sequential filenames
import skimage as ski filenames = ['image-%03d.jpg'%n for n in range(150)] images =  for f in filenames: im = ski.imread(f) images.append(im)
imagesis a list of images.
You can also iterate through any sort of filenames, or pull only files from a directory with a certain extension using the
osmodule. The principle is the same. Just construct
However, I recommend using
pims, possibly with a processing pipeline
import pims import numpy as np images = pims.ImageSequence('images-*.jpg') @pims.pipeline def grayarr(im): return np.array(im)[:,:,0] images = grayarr(images)
At this point you can index into
imageswith numpy-like slicing.
pimsis especially helpful when you're dealing with so many images you can't hold them in RAM. You can read about these things in the pims documentation.
You could use Glob and extract the data from directory
from PIL import Image import glob list_of_images =  for filename in glob.glob('file_directory/.jpg'): #assuming you are dealing with #jpg training_set = Image.open(filename) list_of_images.append(training_set)