How to automoderate ads using machine learning?
There are ads, they consist of a title + description + pictures and some other features. Text is in Russian or Romanian language.
There are 3 tasks:
 to determine whether the category is correct
 whether the title is correct
 something illegal is not being sold.
I tried to do this for categories task: title + body + pulled out the text from the picture using EfficientNet, then using tfidf encode text and classify with SVC and got 85 max% accuracy , but in production it all drops to 75%. Also tried using fasttext encodings, transliterate text remove/not removing stopwords but didn't get a significant improvement.
As for finding if the title is correct , I'm stuck.
How can I make the algorithm understand the correlation between title and description with photos? Should I use multiple input or Feature Union in scikitlearn?
Please tell me the direction, or what to google / to use. Thank you.
See also questions close to this topic

Python FFT Dominate Frequency
I need to find the dominant frequency in my Coefficient of Lift data. I found the following code on Stacks exchange, but the frequency I am getting is quite large and not the dominating frequency. I know because the 2D analysis is easy to analyze with a graph. It is sinusoidal. As for dominating frequency, I mean the frequency that repeats the most often.
#1/usr/bin/env python import sys import numpy from numpy import sin from math import pi print("Hello World!") data = numpy.loadtxt('CoefficientLiftData.dat', usecols= (0,3)) n = data.size timestep = 0.000005 print(data) fourier = numpy.fft.fft(data) frequencies = numpy.fft.fftfreq(n, d=timestep) positive_frequencies = frequencies[numpy.where(frequencies >= 0)] magnitudes = abs(fourier[numpy.where(frequencies >= 0)]) peak_frequency = numpy.argmax(magnitudes) print(peak_frequency)```

Reloading the specified function from separate files
In file a there is a button that calls the execution of a function from file b and it is removed. Then another button is created that should reset gui to its original state.
Unfortunately, I don't know how to properly configure it to make it work. In trying to fix it, you ended up with the following errors:
ImportError: cannot import name 'GUI' from partially initialized module 'b' (most likely due to a circular import) (c:\Users\user\Desktop\english_app\b.py) TypeError: back() missing 1 required positional argument: 'root'
from tkinter import * from b import dest from functools import partial
file a:
class Gui: def __init__(self, root): self.b = Button(root, text='destroy', command=partial(dest, root)) self.b.pack() if __name__ == "__main__": root = Tk() Gui(root) mainloop()
file b:
from tkinter import * class dest: def __init__(self, root): for widget in root.winfo_children(): widget.destroy() self.b_back = Button(root, text="Back", command=self.back) self.b_back.pack() def back(self, root): for widget in root.winfo_children(): widget.destroy() Gui(root)

How to create a model attribute based on another attribute?
I have a very simple
Item
model, when I create a new istance, let's say through the admin dashboard, I want my slug to be generated automatically based on the title attribute, for ex: if the title is"my item title"
the slug should be"myitemtitle"
.class Item(models.Model): title = models.CharField(max_length=100) @property def slug(self): slug = str(self.title).replace(' ', '') return slug def get_absolute_url(self): return reverse('core:item_detail', kwargs={'slug': self.slug})
But
slug
is not a column in the db, so I cannot get anItem
istance using theslug
value, I have to query using thetitle
attribute.# my view def detail_view(request, slug): item = get_object_or_404(Item, title=str(slug).replace('', ' ')) context = {'item': item} return render(request, 'detail.html', context)
# my urls path path('product/<slug>', detail_view, name='item_detail')
My question is: is there a way to automatically generate a
slug
attribute based ontitle
so I can query the db usingslug
? 
GoogleNet fails to classify images
I built Keras Google Net from here: https://www.analyticsvidhya.com/blog/2018/10/understandinginceptionnetworkfromscratch/ The only difference is that I replaced 1000 classes in output layer with 3 and data is prepared this way :
def grey_preprocessor (xarray): xarray=(xarray/127.5)1 return xarray img_resol = (224,224) train_batches = ImageDataGenerator(horizontal_flip = True, preprocessing_function = grey_preprocessor).flow_from_directory( directory = train_path, target_size=img_resol, classes = ['bacterial', 'healthy', 'viral'], batch_size = 10) valid_batches = ImageDataGenerator(horizontal_flip = True, preprocessing_function = grey_preprocessor).flow_from_directory( directory = valid_path, target_size=img_resol, classes = ['bacterial', 'healthy', 'viral'], batch_size = 10) test_batches = ImageDataGenerator(horizontal_flip = True, preprocessing_function = grey_preprocessor).flow_from_directory( directory = test_path, target_size=img_resol, classes = ['bacterial', 'healthy', 'viral'], batch_size = 10, shuffle = False) assert train_batches.n == 4222 assert valid_batches.n == 300 assert test_batches.n == 150 assert train_batches.num_classes == valid_batches.num_classes == test_batches.num_classes == 3
However, all the accuracies on every batch are 0.3333, which means it doesn't classify at all. I understand that it can be anything. What is a good way to troubleshoot it?

How to load images and text labels for CNN regression from different folders
I have two folders, X_train and Y_train. X_train is images, Y_train is vector and .txt files. I try to train CNN for regression.
I could not figure out how to take data and train the network. When i use "ImageDataGenerator" , it suppose that X_train and Y_train folders are classes.
import os import tensorflow as tf os.chdir(r'C:\\Data') from glob2 import glob x_files = glob('X_train\\*.jpg') y_files = glob('Y_rain\\*.txt')
Above, i found destination of them, how can i take them and be ready for model.fit ? Thank you.

Keras Binary Model Getting stuck at 50% Accuracy
I'm training a model to understand the impact of news on market volatility. The model seems to be find and the dataset classes are balanced, so I'm not sure what's exactly wrong.
I have coded a basic model using pretrained word embeddings:
model = tf.keras.models.Sequential([ tf.keras.layers.Embedding(vocab_size+1, embedding_dim, weights=[embedding_matrix]), tf.keras.layers.LSTM(300, return_sequences=True, activation='relu'), tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(254, activation='relu')), tf.keras.layers.Dropout(0.4), tf.keras.layers.Dense(64, activation='relu'), tf.keras.layers.Dropout(0.4), tf.keras.layers.Dense(32, activation='relu'), tf.keras.layers.Dense(1, activation='sigmoid') ]) model.compile(loss='binary_crossentropy', optimizer='Adam', metrics=['binary_accuracy'])
Training the model, I get this:
109/109 [==============================]  265s 2s/step  loss: 0.6945  binary_accuracy: 0.5032  val_loss: 0.6927  val_binary_accuracy: 0.5161 109/109 [==============================]  265s 2s/step  loss: 0.6945  binary_accuracy: 0.5032  val_loss: 0.6978  val_binary_accuracy: 0.5123 109/109 [==============================]  265s 2s/step  loss: 0.6945  binary_accuracy: 0.5032  val_loss: 0.6859  val_binary_accuracy: 0.5096 109/109 [==============================]  265s 2s/step  loss: 0.6945  binary_accuracy: 0.5032  val_loss: 0.6801  val_binary_accuracy: 0.5245
I thought maybe my issue is that the data somehow isn't related, and the model has nothing to learn, but I'm not even sure about that, actually, I have published the dataset and the notebook on GitHub so that you can reproduce the issue, will be great if you can find what is going on.

What happens when the L1 regularizer penalty is 0?
I am playing around with deep learning. The final layer of my model is a dense layer and when I set the L1 regularizer to 0, it actually performs better than for any other value I have tested. I'm just wondering what is going on internally here as clearly it is not just dividing the penalty by zero and giving an error, as I would have expected.

MovieLens Dataset: Using the Timestamp for recommendations
I am currently working on developing a recommender system with the movielens dataset. The dataset provides a timestamp, but I do not know how to deal with it?
What can I do with the timestamps? How can I import it in the recommendation engine?

Combine the output of 2 classifiers to automatically detect prediction
I have a rather complicated problem I need to solve. I created 2 deep learning classifiers using 2 different methods and I would like to combine them to increase accuracy. I thought about building and training a single model using the functional API but I would like to keep it simple at first.
So my models output 2 different probability vectors (it's multiclass). Let's say: [0, 0.12, 0.8, 0.08] and [0.1, 0.1, 0.5, 0.3]. In this case, by intuition, we can agree it's class 2.
I could sum or multiply them and take the argmax. But I would like to set like a "threshold" and not classify the inputs that are below that.
I could for example consider the input correctly classified if both classifiers agree on the same label with >50% confidence. But the problem is that sometimes, one of those classifiers could fail (input too noisy, bad generalization) and I wouldn't like to reject that input, especially if the other classifier was sure about the prediction.
The aim of having 2 models for classification is to have a fallback solution in case one them fails. But finetuning the correct equation could take ages. Since I already have the target labels, I was thinking of building a 3rd model that learns the best equation to combine those 2 outputs.
This model could be as simple as a multiclass logistic regression. It could learn the subtelties of both the models for every class and perhaps even which model should we trust more for each class.
However, I have no idea how I could input my probability vectors into such a model. Will it be okay if I just concatenate the two vectors ? Something like this: [[0, 0.12, 0.8, 0.08], [0.1, 0.1 0.5, 0.3]] > [0, 0.12, 0.8, 0.08, 0.1, 0.1, 0.5, 0.3]
But where would the threshold be? And what if both classifiers fail? Will the regression overfit to that "failure" so it can still predict the correct output? This is really mind boggling!
If you have any better ideas, I would gladly take it!

Why when I do 4 clusters clustering with Kmeans, I have only one intertia and not 4?
I have a dataframe and I did 4 clusters clustering using sklearn KMeans function:
km = KMeans(n_clusters=4, init='random', n_init=10, max_iter=10, tol=1e4, random_state=10, algorithm='full', ) km.fit(df)
So , i have 4 clusters, but when i do this:
km.inertia_
I get only one value:
1732.350
However according to definition of inertia, it is a sum of squared distances of samples to their closest cluster center. So there must be 4 inertia values not 1 or am i wrong?

MSE in sklearn Tree
Can you tell me what the MSE describes in each node. I had assumed that this was the actual mse for the region with the samples, but this would say that we are getting worse after the splits. In the beginning MSE would be 7.592 the sum of the MSE of the terminal nodes would be much higher, which could be not possible, right. I suppose I am misunderstanding the MSE here, can someone be kind enough to enlighten me?

UnimplementedError:Cast string to float is not supported
UnimplementedError Traceback (most recent call last)
UnimplementedError: 2 root error(s) found. (0) Unimplemented: Cast string to float is not supported [[node functional_11/Cast (defined at :1) ]] (1) Cancelled: Function was cancelled before it was started 0 successful operations. 0 derived errors ignored. [Op:__inference_train_function_47870]
Function call stack: train_function > train_function
Here is my code. Any idea what wrong I am doing ?
lstm_layer = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(lstm_units, dropout=0.2, recurrent_dropout=0.2)) # loading our matrix emb = tf.keras.layers.Embedding(max_words, embedding_dim, input_length=300, weights=[embedding_matrix],trainable=False) input1 = tf.keras.Input(shape=(300,)) e1 = emb(input1) x1 = lstm_layer(e1) input2 = tf.keras.Input(shape=(300,)) e2 = emb(input2) x2 = lstm_layer(e2) mhd = lambda x: tf.keras.backend.abs(x[0]  x[1]) merged = tf.keras.layers.Lambda(function=mhd, output_shape=lambda x: x[0],name='L1_distance')([x1, x2]) preds = tf.keras.layers.Dense(1, activation='sigmoid')(merged) model = tf.keras.Model(inputs=[input1, input2], outputs=preds) model.compile(loss='mse', optimizer='adam') model.summary() history = model.fit([X_train[:,0], X_train[:,1]], y_train, epochs=100, validation_data=([X_valid[:,0], X_valid[:,1]], y_valid))```

How to predict if an item should be recomended based on co purchase history using MLP?
I have the amazon books rating dataset.
Using MLP(1 hidden layer with 4 units) I have to predict if a user 'U' would buy an item 'I' given 'k' (for eg: k = 2) other users who have co purchased the item. (Similar to CBOW model to predict the main word, given the context words)
Till now I have made useruser one mode projection(with ratings as weight) and made the mode2vec embedding(length 10) of the graph.
I am unable to understand what would be the input to the MLP and what would be in the output layer.
My thought: input would be a flattened array of the embeddings of k+1 users but I am unable to incorporate the item in the input. Also, I am confused about the output layer.
Please tell me if I am thinking in the right direction(about the MLP). If not please give me some hint.
PS: this is an assignment question. I made that uni mode projection as I was told to.

How to remove Rows that have English sentences in pandas
I have a pandas data frame which has 2 columns, first contains Arabic sentences and the second one contain labels (1,0)
I want to remove all rows that contain English sentences.
Any suggestions?
Here is an example, I want to remove the second row
إيطاليا لتسريع القرارات اللجوء المهاجرين، الترحيل [0]
Border Patrol Agents Recover 44 Migrants from Stash House [0]
الديمقراطيون مواجهة الانتخابات رفض عقد اجتماعات تاون هول [0]
شاهد لايف: احتفال ترامب "اجعل أمريكا عظيمة مرة أخرى"  بريتبارت [0]
المغني البريطاني إم آي إيه: إدارة ترامب مليئة بـ "كذابون باثولوجي" [0]