Recommender system in python
I have built a web-site on my machine. I want to incorporate a "Product Recommendation" feature.
I already have user names, product names and user reviews for those products in a csv file which I have exported using "mongoexport".
I want to recommend a user an item using the Collaborative filtering, which will generate a csv or text file containing users and their recommendations based on their similarity with other users. So that I can use this file to display recommended product for user on the website.
Any suggestions how to implement it is welcome.
See also questions close to this topic
Keras Lambda Layer Before Embedding: Use to Convert Text to Integers
I currently have a
kerasmodel which uses an
Embeddinglayer. Something like this:
input = tf.keras.layers.Input(shape=(20,) dtype='int32') x = tf.keras.layers.Embedding(input_dim=1000, output_dim=50, input_length=20, trainable=True, embeddings_initializer='glorot_uniform', mask_zero=False)(input)
This is great and works as expected. However, I want to be able to send text to my model, have it preprocess the text into integers, and continue normally.
1) The Keras docs say that Embedding layers can only be used as the first layer in a model: https://keras.io/layers/embeddings/
2) Even if I could add a
Lambdalayer before the
Embedding, I'd need it to keep track of certain state (like a dictionary mapping specific words to integers). How might I go about this stateful preprocessing?
In short, I need to modify the underlying Tensorflow DAG, so when I save my model and upload to ML Engine, it'll be able to handle my sending it raw text.
PySpark: Sort or OrderBy DataFrame Column *Numerically* not working correctly
I have some fitness data that I'm trying to sort numerically, however the results are not turning out as the manual, and other examples, e.g. Spark DataFrame groupBy and sort in the descending order (pyspark), show:
display(df.sort(col("Calories Burned").desc()))#fails to sort correctly, shows 876, then 4756, display(df.orderBy("Calories Burned", ascending = False))#fails to sort correctly, shows 876, then 4756 display(df.sort(desc("Calories Burned")))
All these examples display the following two columns data (there are more columns, but I'm abbreviating for space):
Date Calories Burned 2018-10-18 876 2018-05-26 4756 2018-05-05 4440
As you can see, these are not sorting numerically. Spark isn't taking the number of digits into account, so 876 appears before 4756.
Whether or not I include the
colfunction makes no difference, either.
How can this dataset be sorted numerically so the data looks more like this?
Date Calories Burned 2018-05-26 4756 2018-05-05 4440 2018-10-18 876
Turtle is not reacting to onkeypress [SOLVED]
So, I am new in python so I took some time and watched some videos about how to make a simple "snake" game, I was doing everything that dude was saying, but when it came to the keyboard binding something went wrong and I can't move my turtle..
import turtle import time delay = 0.1 # Screen wn = turtle.Screen() wn.title("Snake Game By AniPita") wn.bgcolor('black') wn.setup(600, 600) wn.tracer(0) # Snake Head head = turtle.Turtle() head.speed(0) head.shape("square") head.color("white") head.penup() head.goto(0, 0) head.direction = "stop" # Functions def go_up(): head.direction == "up" def go_down(): head.direction == "down" def go_left(): head.direction == "left" def go_right(): head.direction == "right" def move(): if head.direction == "up": y = head.ycor() head.sety(y + 10) if head.direction == "down": y = head.ycor() head.sety(y - 10) if head.direction == "left": x = head.xcor() head.setx(x - 10) if head.direction == "right": x = head.xcor() head.setx(x + 10) # Keyboard Bindings wn.onkeypress(go_up(), 'w') wn.onkeypress(go_down(), 's') wn.onkeypress(go_left(), 'a') wn.onkeypress(go_right(), 'd') wn.listen() # Main Game while True: wn.update() time.sleep(delay) move() wn.mainloop()
Item to item collaborative filtering: How to optimize the size required for similarity matrix?
I am working on a recommendation problem. The similarity matrix is very large in size and taking up space of nearly 260 MB. The similarity matrix is presently stored in a CSV file. How can I reduce the file size required for the similarity matrix?
tips for algorithms/pseudocode preparation
I'm a 2-day new entrant into the world of programming/algorithms. I have no prior experience or knowledge regarding the same.
In class, I have been taught to write basic algorithms in pseudocode. I will be having a test soon - to test these concepts and more. I'm quite intimidated by that and would like some more practice.
To give an idea of the difficulty, these are some of our class questions: 1. Write a function that returns if a number is prime or not. 2. Print the sum of divisors of a number n. 3. Count all the palindromes from 1001 to 10000.
To find if a number is palindrome: I did this -
while n > 0
d = n mod 10 t = (n - d)/10
if t = n then
"number is a palindrome"
I don't know if this is right or not - I have seen some answers in which they do t = n/10 but I don't understand how this gives us a value. As in the first round, if I pick a number, say 252, it will become 25.2 by the end of the function and then go back and become even smaller. How can we have decimals?
Which library can I use for building recommendation application at android studio?
I want to build recommendation application at android studio ,but I don't know which library can I use for that. Can anyone help me please?
How to handle NaN value in ALS recommendation model in Pyspark?
I am using pyspark ALS model for recommendation purpose. I have Gameid and the Rating of those Gameid given by Players who ever played that game. I split the data randomly into two parts and trying to build ALS model and test the model performance via RMSE value.
I found there are NaN values in test data prediction specially for those Players who were not part of training data set and for also those players who were both in training and test data set but having very few games in both datasets, for example a player having 2 games in training data and 2 games in test data.
How to handle this scenario in ALS model?
Basic filtering of data based on user & item in Python SciKit
I am trying to implement a recommender system to users based on their rating. I think the most common one. I was reading alot and shortlisted Surprise, a python-scikit based recommender systems.
While am able to import data and run prediction, its not exactly as I would want it.
Right now what I have: I can pass a user_id, item_id and rating and get the probability of that user giving the rating I passed.
What I really want to do: Pass a user_id and in return get a list of items that would be potentially liked/rated highly by that user based on the data.
from surprise import Reader, Dataset from surprise import SVD, evaluate # Define the format reader = Reader(line_format='user item rating timestamp', sep='\t') # Load the data from the file using the reader format data = Dataset.load_from_file('./data/ecomm/e.data', reader=reader) # Split data into 5 folds data.split(n_folds=5) algo = SVD() # Retrieve the trainset. trainset = data.build_full_trainset() algo.fit(trainset) //Inputs are: user_id, item_id & rating. print algo.predict(3, 107, 1)
Sample lines from data file.
First column is user_id, 2nd is item id, 3rd is rating and then timestamp.
196 242 3 881250949 186 302 3 891717742 22 377 1 878887116 244 51 2 880606923 166 346 1 886397596 298 474 4 884182806 115 265 2 881171488 253 465 5 891628467 305 451 3 886324817 6 86 3 883603013
Vector coefficients based on similarity
I've been looking for a solution to create a recommendation system based on vectors similarity. Basically, i have a few vectors per user for example:
User1: [0,3,7,8,5] , [3,5,8,2,4] , [1,5,3,9,4] User2: [3,1,6,7,9] , [2,4,1,3,8] , [7,8,3,3,1]
For every vector i need to calculate a coefficient and based on that coefficient differentiate a vector from another. I've found formulas that would calculate coefficients based on similarity of 2 vectors which i don't really want that.I need a formula that would calculate a coefficient per vector and then i do some other calculations with those coefficients.Are there any good formulas for this? Thanks