I am trying to pre-process data for MLP classifier . I am generating 40 MFCC's and 222 spectral centroids per .wav file
This is my model evaluation code
# Display model architecture summary
model.summary()
# Calculate pre-training accuracy
score = model.evaluate(x_test, y_test, verbose=1)
accuracy = 100*score[1]
print("Pre-training accuracy: %.4f%%" % accuracy)
I have converted both of my features into a list('F[]') this way.
X = np.array(featuresdf.feature_mfcc.tolist() , dtype=object)
# print(type(X))
X2 = np.array(featuresdf.sc.tolist() , dtype=object)
# print(type(X2))
# print(X2)
F = []
for i , val in enumerate(X):
temp_x = val
temp_x2 = X2[i]
# concat = temp_x + temp_x2
concat = np.hstack((temp_x,temp_x2))
F.append(concat)
# y dependant variable jo hum predict karenge.
y = np.array(featuresdf.class_label.tolist())
do you know?
how many words do you know
See also questions close to this topic
-
Python File Tagging System does not retrieve nested dictionaries in dictionary
I am building a file tagging system using Python. The idea is simple. Given a directory of files (and files within subdirectories), I want to filter them out using a filter input and tag those files with a word or a phrase.
If I got the following contents in my current directory:
data/ budget.xls world_building_budget.txt a.txt b.exe hello_world.dat world_builder.spec
and I execute the following command in the shell:
py -3 tag_tool.py -filter=world -tag="World-Building Tool"
My output will be:
These files were tagged with "World-Building Tool": data/ world_building_budget.txt hello_world.dat world_builder.spec
My current output isn't exactly like this but basically, I am converting all files and files within subdirectories into a single dictionary like this:
def fs_tree_to_dict(path_): file_token = '' for root, dirs, files in os.walk(path_): tree = {d: fs_tree_to_dict(os.path.join(root, d)) for d in dirs} tree.update({f: file_token for f in files}) return tree
Right now, my dictionary looks like this:
key:''
.In the following function, I am turning the empty values
''
into empty lists (to hold my tags):def empty_str_to_list(d): for k,v in d.items(): if v == '': d[k] = [] elif isinstance(v, dict): empty_str_to_list(v)
When I run my entire code, this is my output:
hello_world.dat ['World-Building Tool'] world_builder.spec ['World-Building Tool']
But it does not see
data/world_building_budget.txt
. This is the full dictionary:{'data': {'world_building_budget.txt': []}, 'a.txt': [], 'hello_world.dat': [], 'b.exe': [], 'world_builder.spec': []}
This is my full code:
import os, argparse def fs_tree_to_dict(path_): file_token = '' for root, dirs, files in os.walk(path_): tree = {d: fs_tree_to_dict(os.path.join(root, d)) for d in dirs} tree.update({f: file_token for f in files}) return tree def empty_str_to_list(d): for k, v in d.items(): if v == '': d[k] = [] elif isinstance(v, dict): empty_str_to_list(v) parser = argparse.ArgumentParser(description="Just an example", formatter_class=argparse.ArgumentDefaultsHelpFormatter) parser.add_argument("--filter", action="store", help="keyword to filter files") parser.add_argument("--tag", action="store", help="a tag phrase to attach to a file") parser.add_argument("--get_tagged", action="store", help="retrieve files matching an existing tag") args = parser.parse_args() filter = args.filter tag = args.tag get_tagged = args.get_tagged current_dir = os.getcwd() files_dict = fs_tree_to_dict(current_dir) empty_str_to_list(files_dict) for k, v in files_dict.items(): if filter in k: if v == []: v.append(tag) print(k, v) elif isinstance(v, dict): empty_str_to_list(v) if get_tagged in v: print(k, v)
-
Actaully i am working on a project and in it, it is showing no module name pip_internal plz help me for the same. I am using pycharm(conda interpreter
File "C:\Users\pjain\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\pjain\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\pjain\AppData\Local\Programs\Python\Python310\Scripts\pip.exe\__main__.py", line 4, in <module> File "C:\Users\pjain\AppData\Local\Programs\Python\Python310\lib\site-packages\pip\_internal\__init__.py", line 4, in <module> from pip_internal.utils import _log
I am using pycharm with conda interpreter.
-
Looping the function if the input is not string
I'm new to python (first of all) I have a homework to do a function about checking if an item exists in a dictionary or not.
inventory = {"apple" : 50, "orange" : 50, "pineapple" : 70, "strawberry" : 30} def check_item(): x = input("Enter the fruit's name: ") if not x.isalpha(): print("Error! You need to type the name of the fruit") elif x in inventory: print("Fruit found:", x) print("Inventory available:", inventory[x],"KG") else: print("Fruit not found") check_item()
I want the function to loop again only if the input written is not string. I've tried to type return Under print("Error! You need to type the name of the fruit") but didn't work. Help
-
How to correctly cast a function to the GPU
I am attempting to get MFCC extraction from audio files using the definition:
def extract_features(file_name): try: durationSeconds = 4 audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast') trimmed = librosa.util.fix_length(audio, int(sample_rate * durationSeconds)) #mfccs = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=nmfcc) mfccs = librosa.feature.mfcc(y=trimmed, sr=sample_rate, n_mfcc=nmfcc) pad_width = max_pad_len - mfccs.shape[1] mfccs = np.pad(mfccs, pad_width=((0, 0), (0, pad_width)), mode='constant') except Exception as e: print("Error encountered while parsing file: ", file_name) return None return mfccs
However when I add the
@jit
command just before definition, I get the error:numba.errors.UnsupportedError: Failed in object mode pipeline (step: analyzing bytecode) Exception object cannot be stored into variable (e). File "Binaryclass_SenSpe_CNNSoftmax_SaveFeatExtr.py", line 52: def extract_features(file_name): <source elided> except Exception as e: ^
Can you tell me if it is possible to cast this definition to the GPU?
Thanks.
-
Mfcc classification: in sklearn how could I solve the dimension error of y
prediction_class = labelencoder.inverse_transform(predicted_label) prediction_class
ValueError: y should be a 1d array, got an array of shape (1, 10) instead.
please help me to solve
-
Get the pronounciation correctness of two audio files using MFCC and DTW
First of all, I really don't have an idea about what am I doing with this code. I simply want to compare two .wav files and check the pronunciation correctness. I have searched the internet and found out that this can be done using MFCC and DWT. I got a sample code and It is working fine. But I want to get the distance between the two sounds as a percentage. Can anyone help me with this, please? And How to read this result, 0.0 means original file and testing file, both are same. That means lower the number is better right?
import librosa from dtw import dtw from numpy.linalg import norm y1, sr1 = librosa.load('original.wav') y2, sr2 = librosa.load('testing_file.wav') mfcc1 = librosa.feature.mfcc(y1, sr1) mfcc2 = librosa.feature.mfcc(y2, sr2) dist, cost, acc_cost, path = dtw(mfcc1.T, mfcc2.T, dist=lambda x, y: norm(x - y, ord=1)) print ('Normalized distance between the two sounds:', dist) #Normalized distance between the two sounds: 52367.556983947754
-
One-Hot Encoding on big dataset
I was wondering what should I do if the dataset have extremely huge numbers of entries which are strings. I have to use them in multilinear regression model. As far as I have searched, I came to conclusion as I have to do One-Hot Encoding for string values in dataset. But the entries are like above 100K in numbers. So, isn't the resources allocated to the system be used entirely when using it in linear regression model because there are a lot of values and is it a good approach to use them like this? Also is there any other way for me to use string values in multi linear regression model ?
-
Join large set of CSV files where the header is the timestamp for the file
I have a large set of CSV files. Approx. 15 000 files. And would like to figure out how to join them together as one file for data processing.
Each file is in a simple pattern with timestamp that corresponds to a period of time that represent the data in the each CSV file.
Ex.
file1.csv
2021-07-23 08:00:00 Unit.Device.No03.ErrorCode;11122233 Unit.Device.No04.ErrorCode;0 Unit.Device.No05.ErrorCode;0 Unit.Device.No11.ErrorCode;0
file2.csv
2021-07-23 08:15:00 Unit.Device.No03.ErrorCode;0 Unit.Device.No04.ErrorCode;44556666 Unit.Device.No05.ErrorCode;0 Unit.Device.No11.ErrorCode;0
Each file starts with the timestamp. I would like to join all the files in a directory, and transpose the "Unit.Device" to columns. And then use the original header as a timestamp column. For each file add a new row with the corresponding "ErrorCode" to each column.
Like this:
Timestamp;Unit.Device.No03.ErrorCode;Unit.Device.No04.ErrorCode;Unit.Device.No05.ErrorCode.. 2021-07-23 08:00:00;11122233;0;0;0;0.... 2021-07-23 08:15:00;0;44556666;0;0;0....
Any simple tools for this, or Python routines?
-
is different word embedding methods can produce same vocabulary in a same dataset.?
I want to ask about why my embedding matrices have different dimension each other. first I using word2vec on IMDB dataset and produce around 17620 vocabulary without using any stop words. secondly I using glove on same dataset and produce around 193093 vocabulary. and last using fast text on same dataset and produce around 222422 vocabulary. is there any some trick to make it same number on vocabulary..? thanks' for any help.
creating corpus
import nltk from nltk.tokenize import word_tokenize nltk.download("punkt") from keras.preprocessing.text import Tokenizer from keras.preprocessing.sequence import pad_sequences train_size = int(train.shape[0] * 0.9) def create_corpus_tk(df): corpus = [] for text in df["text"]: words = [word.lower() for word in word_tokenize(text)] corpus.append(words) return corpus corpus = create_corpus_tk(train[:train_size])
tokenizer code
tokenizer = Tokenizer(num_words=num_words) tokenizer.fit_on_texts(corpus) word_index = tokenizer.word_index print("Number of unique words:", len(word_index))
w2v code
from gensim.models import Word2Vec model = Word2Vec(corpus, sg=1, min_count=1, window=10, vector_size=100) print(model) words = list(model.wv.index_to_key) print(len(words))
fast text code
from gensim.models import FastText modelFS = FastText(corpus, sg=1, min_count=1, window=10, vector_size=100) words = list(modelFS.wv.index_to_key) print(len(words))
glove code
embedding_dict = {} with open(os.getcwd()+"\\ready\\glove.twitter.27B.100d.txt", "r") as f: for line in f: values = line.split() word = values[0] vectors = np.asarray(values[1:], "float32") embedding_dict[word] = vectors f.close() num_words = len(word_index) embedding_matrix = np.zeros((num_words, 100)) for word, i in word_index.items(): if i < num_words: emb_vec = embedding_dict.get(word) if emb_vec is not None: embedding_matrix[i] = emb_vec
-
Interpreting coefs_ in a sklearn MLPCLassifier()
I am training a
sklearn MLPClassifier()
on an input dataset with 23 features. But the layer 0 incoefs_
I.eclf.coefs_[0]
only has 18 arrays in it. Does this mean 5 of the features have zero weights (i.e not used)? If yes, how do I find which features got dropped during training?In general suppose I have N input features and K hidden layers of size (S_1,....S_k), is this correct interpretation of how
coefs_
will look likecoefs_[i] can be interpreted as giving weights for layer i (layer 0 connects inputs with first hidden layer) In above case i will be from 0 to K (1 input layer + K hidden layers)
coefs_[i][j] gives weights for connecting input j in layer i with each of inputs in layer i + 1 E.g
- coefs_[0][0] will have S_1 entries telling how input feature 0 is connected to each of S_1 "features" in hidden layer 1
- coefs_[1][0] will have S_2 entries telling how hidden layer feature 0 is connected to each of S_2 "features" in hidden layer 2
-
How to use multilayer neural network to predict next day energey consumption in r
The objective of this question is to use a multilayer neural network (MLP-NN) to predict the next step-ahead (i.e. next day) electricity consumption for the 11:00 hour case. The first 430 samples will be used as the training data, while the remaining ones will be used as the testing set.
image of the energy consumption data. have 501 of these
Ok so i have no idea where to start. How do i determine the inputs? I have to use autoregressive model. Help
-
Validation accuracy not imporving despite an increase in the model accuracy for ISC-2019 skin lesion dataset using MLP
I have been working on the ISIC_2019 dataset to classify skin lesions. I seem to have hit a roadblock.
I was able to pre-process the data and build a LBP and MLP. MLP layers
I am able to train a model where I can see that the model is learning from the available dataset. Code snipped for MLP and Accuracy details
MLP Model 2. This looks like the model is learning but the accuracy still remains low.
However the validation accuracy is too low. Could someone provide some feedback as to what I might be doing wrong?
-
Spectral signature of classified image in GEE using Random Forest
I am using this script to draw average spectral signature of all classes together and each class separately of a classified image by RF algorithm in GEE.
var bands = ['B1', 'B2', 'B3', 'B4','B5','B6','B7', 'B8', 'B8A', 'B9' ,'B11', 'B12','NDVI', 'EVI', 'GNDVI', 'NBR', 'NDII']; var Training_Points = Water.merge(Residential).merge(Agricultural).merge(Arbusti).merge(BoschiMisti).merge(Latifoglie).merge(Conifere).merge(BareSoil); var classes = ee.Image().byte().paint(Training_Points, "land_class").rename("land_class") var stratified_points = classes.stratifiedSample({ numPoints: 50, classBand: 'land_class', scale: 10, region: Training_Points, geometries: false, tileScale: 6 }) print(stratified_points, 'stratified_points') //Create training data var training_Stratified = RF_classified.select(bands).sampleRegions({ collection: stratified_points, properties: ['land_class'], scale:10, tileScale:2 }); var bands = RF_classified.bandNames() var numBands = bands.length() var bandsWithClass = bands.add('land_class') var classIndex = bandsWithClass.indexOf('land_class') // Use .combine() to get a reducer capable of computing multiple stats on the input var combinedReducer = ee.Reducer.mean().combine({ reducer2: ee.Reducer.stdDev(), sharedInputs: true}) // Use .repeat() to get a reducer for each band and then use .group() to get stats by class var repeatedReducer = combinedReducer.repeat(numBands).group(classIndex) var stratified_points_Stats = training_Stratified.reduceColumns({ selectors: bands.add('land_class'), reducer: repeatedReducer, }) // Result is a dictionary, we do some post-processing to extract the results var groups = ee.List(stratified_points_Stats.get('groups')) var classNames = ee.List(['Water','Residential', 'Agricultural', 'Arbusti', 'BoschiMisti', 'Latifoglie','Conifere', 'BareSoil']) var fc = ee.FeatureCollection(groups.map(function(item) { // Extract the means var values = ee.Dictionary(item).get('mean') var groupNumber = ee.Dictionary(item).get('group') var properties = ee.Dictionary.fromLists(bands, values) var withClass = properties.set('class', classNames.get(groupNumber)) return ee.Feature(null, withClass) })) // Chart spectral signatures of training data var options = { title: 'Average Spectral Signatures', hAxis: {title: 'Bands'}, vAxis: {title: 'Reflectance', viewWindowMode:'explicit', viewWindow: { max:6000, min:0 }}, lineWidth: 1, pointSize: 4, series: { 0: {color: '105af0'}, 1: {color: 'dc350a'}, 2: {color: 'caa712'}, 3: {color: 'b9ffa4'}, 4: {color: '369b47'}, 5: {color: '21ff2d'}, 6: {color: '275b25'}, 7: {color: 'f7e084'}, }}; // Default band names don't sort propertly Instead, we can give a dictionary with labels for each band in the X-Axis var bandDescriptions = { 'B2': 'B2/Blue', 'B3': 'B3/Green', 'B4': 'B4/Red', 'B5': 'B5/Red Edge 1', 'B6': 'B5/Red Edge 2', 'B7': 'B7/Red Edge 3', 'B8': 'B8/NIR', 'B8A': 'B8A/Red Edge 4', 'B11': 'B11/SWIR-1', 'B12': 'B12/SWIR-2' } // Create the chart and set options. var chart = ui.Chart.feature.byProperty({ features: fc, xProperties: bandDescriptions, seriesProperty: 'class' }) .setChartType('ScatterChart') .setOptions(options); print(chart) var classChart = function(land_class, label, color) { var options = { title: 'Spectral Signatures for ' + label + ' Class', hAxis: {title: 'Bands'}, vAxis: {title: 'Reflectance', viewWindowMode:'explicit', viewWindow: { max:6000, min:0 }}, lineWidth: 1, pointSize: 4, }; var fc = training_Stratified.filter(ee.Filter.eq('land_class', land_class)) var chart = ui.Chart.feature.byProperty({ features: fc, xProperties: bandDescriptions, }) .setChartType('ScatterChart') .setOptions(options); print(chart) } classChart(0, 'Water') classChart(1, 'Residential') classChart(2, 'Agricultural') classChart(3, 'Arbusti') classChart(4, 'BoschiMisti') classChart(5, 'Latifoglie') classChart(6, 'Conifere') classChart(7, 'BareSoil')
I receive the error:
Error generating chart: Image.select: Pattern 'B1' did not match any bands.
I do not understand where is the problem since I used the same script before to draw histogram of training data and it worked well.
-
OpenAPI Specification - Use of Discriminator and oneOf - Spectral listing
OpenAPI Discriminator using oneOf
A minimal example of using a discriminator with an openApi spec and linting with Spectral.
Error message:
~/git/openapi_discriminator/openapi/v1/api.yaml 22:23 error oas3-valid-media-example "example" property must match exactly one schema in oneOf paths./discriminatortest.get.responses[200].content.application/json.example
Background
OpenAPI schema with simple GET method which can return different types of
Animal
.A subclass of
Animal
is defined which can either be aChicken
or aDog
.The only property
Animals
have arelegs
. A discriminator is used to distinguish between aChicken
orDog
where aChicken
hastwo
legs
and aDog
hasfour
legs.Aim
I was to verify that the example in a request response matches only one schema.
Question
I thought using a discriminator might mean that anything with
two
legs
is aChicken
and anything withfour
legs
is aDog
.Am I mistaken and it is still legitimate for a
Dog
to havetwo
legs
, and this is why it's erroring?I could change it to
anyOf
but then the discriminator has no use?Code
Code repo - openapi_discriminator
openapi_discriminator/openapi/v1/api.yaml
:openapi: "3.0.3" info: title: Open API Discriminator Example version: "v1" tags: - name: discriminator paths: /discriminatortest: get: tags: - discriminator summary: Example using discriminator description: "Demonstrate a minimal example" responses: "200": description: Created content: application/json: schema: {$ref: "schemas.yaml#/components/schemas/Animal"} example: legs: "two"
openapi_discriminator/openapi/v1/schemas.yaml
:openapi: "3.0.3" components: schemas: Animal: type: object discriminator: propertyName: legs mapping: two: Chicken four: Dog oneOf: - $ref: '#/components/schemas/Dog' - $ref: '#/components/schemas/Chicken' Chicken: type: object required: - legs properties: legs: type: string Dog: type: object required: - legs properties: legs: type: string
openapi_discriminator/openapi/.spectral.yml
extends: spectral:oas rules: info-contact: false info-description: false oas3-api-servers: false openapi-tags: true operation-tags: true operation-operationId: false operation-description: true
Run linting command:
spectral lint "openapi/v1/api.yaml" --ruleset openapi/.spectral.yml