which recommendation algorithm would be suitable for this use case?
I am working on the useruser based collaborative filtering recommendation system where i want to generate list of products for each category customers who has viewed maximum products.
currently i am taking both implicit and explicit data for the recommendation system. What features should we include for the matrix factorization and the best algorithm for use case. Should we categorize the products for each user before matrix factorization. Can i get the final results like customer_id,category_customers,product_id.
customer_id  customer  product_id 01  A  101 02  A  102 03  B  103
Am I heading in the right direction? Would really appreciate some help on what direction to follow.
See also questions close to this topic
 Why is this an invalid syntax?

Compact Lists OR Faster lists?
Is there a way to create List/ListsofLists (and may be dicts) that act as lists in python but take less memory space ?
Even if the access is slower for in memory structure.
Or the other way around faster but take more memory.
Using inmemory DBs like redis I suppose is slower and takes more memory!
One possible usage is NLP tasks and ML where we have to store big chunks of parsed text. Or features.
one way for words is to create lexer/dict and have integerlist, but it is still a python list and i suppose the meta info overhead will be bigger percentage wise.
In [280]: sys.getsizeof(list(range(100))) Out[280]: 1008 In [281]: sys.getsizeof(array('i',range(100))) Out[281]: 472 In [282]: sys.getsizeof(list(range(1000))) Out[282]: 9112 In [283]: sys.getsizeof(array('i',range(1000))) Out[283]: 4184

How can I decode the RSA encryption more efficiently?
For a project I'm decoding the RSA encryption. My code works perfectly, but the check I can do, says its too slow.
I've tested the algorithm and I've concluded that the bottleneck is in the following code:
message = (c**d) % n
Without this, the code runs instantaneously. c is the encrypted message, d is the Modular multiplicative inverse and n = pq. the encrypted message is 783103, so I get that I'm dealing with large numbers, but now it takes around 1 seconds to run. Is there any way to speed this up?

Saving edge with min and max weight in a strongly connected component and reconstructing the condensed tree in Tarjan's algorithm
for some problem that I'm solving I need to find the strongly connected components of a directed and weighted graph. After finding this components I need to condense the graph so in the new graph every vertex is a strongly connected component in old graph. Also for each strongly connected component that has more than one vertex, I need to store the minimum and maximum weight of edges inside that component. So for I have came up with this code:
struct adjNode { int val, weight; }; struct graphEdge { int start_ver, end_ver, weight; }; class Graph { public: int count_vertices; // No. of vertices int count_edges; // No. of edges list<adjNode> *adjList; // A dynamic array of adjacency lists int sccCount; Graph(int n, int m) { this>count_vertices = n; this>count_edges = m; adjList = new list<adjNode>[n]; } void addEdge(graphEdge edge) { adjNode node{ edge.end_ver, edge.weight }; adjList[edge.start_ver].push_back(node); } void fillGraph(graphEdge edges[]) { for (int i = 0; i < count_edges; i++) { adjNode node{ edges[i].end_ver, edges[i].weight }; adjList[edges[i].start_ver].push_back(node); } } void print() { for (int i = 0; i < count_vertices; i++) { for (auto j = adjList[i].begin(); j != adjList[i].end(); j++) { cout << i << " > " << j>val << " weight:" << j>weight << endl; } } } int *GetSCCs() { sccCount = 0; int *sccs = new int[count_vertices]; int *ids = new int[count_vertices]; int *low = new int[count_vertices]; bool *visited = new bool[count_vertices]; stack<int> *st = new stack<int>(); // Initialize disc and low, and stackMember arrays for (int i = 0; i < count_vertices; i++) { ids[i] = UNVISITED; low[i] = UNVISITED; visited[i] = false; } for (int i = 0; i < count_vertices; i++) { if (ids[i] == UNVISITED) { SCCDFS(i, 0, ids, low, st, visited, sccs); } } return sccs; } void SCCDFS(int v, int id, int ids[], int low[], stack<int> *st, bool visited[], int sccs[]) { ids[v] = low[v] = id++; st>push(v); visited[v] = true; for (auto i = adjList[v].begin(); i != adjList[v].end(); i++) { int u = i>val; // u is current adjacent of 'v' // If u is not visited yet, then recur for it if (ids[u] == UNVISITED) { SCCDFS(u, id, ids, low, st, visited, sccs); } if (visited[i>val]) { low[v] = min(low[v], low[u]); } } // On recursive callback, if we're at the root node (start of SCC) // empty the seen stack until back to root. if (ids[v] == low[v]) { int tempNode; int minEdgeWeight = numeric_limits<int>::max(); int maxEdgeWeight = numeric_limits<int>::min(); for (tempNode = st>top();; tempNode = st>top()) { st>pop(); sccs[tempNode] = sccCount; visited[tempNode] = false; if (tempNode == v) break; } sccCount++; } } };
The
Graph
object implements the standardTarjan
algorithm to find scc's. As a result the object can pass anscc
array that assigns each node to some component:Graph graph(6, 7); graphEdge edges[]{ {0, 1, 1}, {0, 5, 4}, {2, 1, 2}, {2, 3, 2}, {4, 2, 3}, {1, 4, 3}, {4, 5, 4} }; graph.fillGraph(edges); graph.print(); int *sccs = graph.GetSCCs(); for (int i = 0; i < graph.count_vertices; i++) { cout << sccs[i] << endl; }
Every edge is inserted like :
{start,end,weight}
. The output arrays would be:3 2 2 0 2 1
Which means there are 4 components and nodes number 0,3,5 are single vertex components and nodes number 1,2,4 together from another scc.
To sum up the problem here are my two main goals:
 Modify the above algorithm so that a new Graph
g2
can be constricted from the scc's  Save the the weight of the edge with maximum weight and the the weight of the edge with minimum weight in each component in some data structure.
All this has to be bone keeping in mind that the time complexity should not exceed the original
O(V+E)
When the code reaches
if (ids[v] == low[v])
it has found the head of an component. I think this could help but the only way to save the max/min weights that I came up with was to use the brute force way and check all edges of all vertices of each component on the fly. Also I'm not quite sure how to handle the construction of the condensed/compressed graph.  Modify the above algorithm so that a new Graph

What is the best way to save sequence of elements for scrolling forward and backward?
I'm creating a screen that shows random posts to user. It has two buttons: "back" and "forward". "Forward" button loads next post, "back" button shows the previous post. If list of posts is empty, back button is shadowed (nonclickable). I'm looking for making it using two stacks: one for loaded posts on "forward" button and one for loaded posts on "back" button, so you will scroll through loaded posts while getting to the first post, and vice versa.

Efficiently filling torch.Tensor at equal index positions
I have a 6 dimensional allzero pytorch tensor
lrel_w
that I want to fill with 1s at positions where the indices of the first three dimensions and the indices of the last three dimensions match. I'm currently solving this trivially using 3 nested for loops:lrel_w = torch.zeros( input_size[0], input_size[1], input_size[2], input_size[0], input_size[1], input_size[2] ) for c in range(input_size[0]): for x in range(input_size[1]): for y in range(input_size[2]): lrel_w[c,x,y,c,x,y] = 1
I'm sure there must be a more efficient way of doing this, however I have not been able to figure it out.

How to obtain model weights of pretrained model and output for each tensor in tensorflow?
I have two related questions, but feel free to only answer one if you are unsure about the other. Note, this is using tensorflow in graph mode and not keras since I am working with an existing codebase, which I realize complicates things a bit more.
 I would like to get the model weights for a pretrained model. I am loading the model using the following:
with tf.Session() as sess: last_check = tf.train.latest_checkpoint(checkpoint_dir) saver = tf.train.import_meta_graph(last_check+'.meta') saver.restore(sess, last_check)
However, I am not sure how to actually get the weights. In keras I would normally just do
model.get_weights()
. How can I achieve this in tensorflow? I want to get the intermediate outputs for each tensor in the pretrained model. Normally in keras I could do what I have shown below to obtain the output of each layer in the model, but not sure how to do something similar in tensorflow with a pretrained model. How can I implement this?
inp = model.input outputs = [layer.output for layer in model.layers] functors = [K.function([inp], [out]) for out in outputs] layer_outputs = [func(x) for func in functors]

Tensorflow set_shape function doesn't throw an error with a different shape at runtime
So I loaded some images of resolution 1024x1024 into a list of tensors, and then used the function set_shape to change the shape of every tensor of the list to [128,128,3]: see code example here
However, when I call eval() and check the shape of the image coming from the tensor, it says that the shape is [1024,1024,3]. see code example here
Then why didn't set_shape throw an error?

Recommender neural network architecture
I want to create a recommendation system. engine of recommendation it is a vector representation of items. I want to use this architecture, because my features fit the architecture.
But how the layer is formed before dropout, is it a vector concatenation? How can this be done with pyTorch?

API: Music recommendation on Spotify based on tweets, is word2vec or doc2vec required?
I want to design a recommendation system that recommends user songs based on their tweets that use the hashtag #nowplaying. Initially, I wanted to weigh down their tweets and analyze word by word and see if they've tweeted anything ABOUT certain songs, but it seems like that will be taking more time for me. Since I won't be analyzing each tweet, but rather just looking up for tweets that contain the hashtag, do I still need to use word2vec or doc2vec in this case?
Just for reference, I will provide a brief idea of the system below. Feel free to comment or give your thoughts on them.
 User will give access to their Spotify account using the API.
 Prompt user to log in or if the account is public, they can just input their Twitter username.
 Twitter's API will run a search for #nowplaying on their account (tweets). For now, the timespan will be from 90 days until the most recent tweet. (I read on Twitter API that I can only do a search for tweets within 7 days, is this correct?)
 Analyze the tweet that has the hashtag and extract the song title from the tweets and pass the data to Spotify API to analyze.
 Find the extracted song on Spotify using the Spotify API.
 Based on that song, the API will provide 1020 songs related to be compiled into a playlist for the user.
 Once the songs have been compiled into a playlist, the system will show the user the song recommendations, and the user will have the option to save it to their account or discard it.
I have close to zero experience working with API, but I have some experience working with few languages like Java, Python.

Trouble with SVD Algorithm for Recommendation Engine
I am working on the recommendation engine, the ultimate goal of which is to find the rating for all useritem pairs ("all" to a reasonable extent). I leverage highly useful library "Surprise", although I did give a shot with some other packages. Very elaborate walkthrough is provided here.
Here is the general flow of what I do:
 Take the dataset with user_id, item_id and rating as three columns,
 Split the data into train and test,
 Turn train set into a sparse matrix
 Conduct SVD through stochastic gradient descent
 Fit the model on test set and estimate RMSE and MAE.
Although RMSE and MAE in the output dataframe are rather low, the same metrics for observations where original rating is high are catastrophically off! Model assigns close to average values to most of pairs since distribution of the ratings are skewed to the right. In other words the model labels nonsense predictions to useritem pairs the rating of which shall be high!
I do not think this is a code problem since the same algo works tolerably on other datasets.
Here is my question: can I possibly feed more features that would communicate user similarities to the model? For example, data about user living country, age, interface language etc? If yes, what is the right way to go about it? If no, what are some extra ways to improve the model?
I would very much appreciate any kind of recommendation (no pun intended) or general take on how you would solve this problem!

How can I assess the accuracy of the matrix factorization recommender system using crossvalidation?
I would like to access my model using crossvalidation, but I am failing to fit it into this code.
accuracy < function(x, real_ratings){ #user and book factors #first 100 elements are latent factors for users and 150 elements for books user_factors < matrix(x[1:100], 50, 10) book_factors < matrix(x[101:250], 10, 100) #The userbook: interactionpredictions for users and book factors from dot products predictions < user_factors %*% book_factors #sum of squared errors of over all rated books errors < (real_ratings  predictions) ^ 2 sqrt(mean(errors[!is.na(real_ratings)])) } set.seed(123) # optimization step rec < optim(par = runif(280), accuracy, real_ratings = df, control = list(maxit = 100000000)) rec$convergence rec$value # extract optimal user factors user_factors < matrix(rec$par[1:100], 50, 10) # extract optimal movie factors book_factors < matrix(rec$par[101:250], 10, 100) head(book_factors) set.seed(123) # check predictions for one user predicted_ratings < user_factors %*% book_factors rbind(round(predicted_ratings[1,], 1), as.numeric(df[1,])) #check accuracy errors < (df  predicted_ratings)^2 sqrt(mean(errors[!is.na(df)]))
I don't know how I could go about this.

Simple way of performing Matrix Factorization with tensorflow 2
I've been searching on how to perform matrix factorization for this very simple and basic case that I will show, but didn't find anything. I only found complex and long solutions, so I will present what I want to solve:
U x V = A
I would just like to know how to solve this equation in Tensorflow 2, being A a known sparse matrix, and U and V two random initialized matrices. So I would like to find U and V, so that their multiplication is approximately equal to A.
For example, having these variables:
# I use this function to build a toy dataset for the sparse matrix def build_rating_sparse_tensor(ratings): indices = ratings[['U_num', 'V_num']].values values = ratings['rating'].values return tf.SparseTensor( indices=indices, values=values, dense_shape=[ratings.U_num.max()+1, ratings.V_num.max()+1]) # here I create what will be the matrix A ratings = (pd.DataFrame({'U_num': list(range(0,10_000))*30, 'V_num': list(range(0,60_000))*5, 'rating': np.random.randint(6, size=300_000)}) .sample(1000) .drop_duplicates(subset=['U_num','V_num']) .sort_values(['U_num','V_num'], ascending=[1,1])) # Variables A = build_rating_sparse_tensor(ratings) U = tf.Variable(tf.random_normal( [A_Sparse.shape[0], embeddings], stddev=init_stddev)) # this matrix would be transposed in the equation V = tf.Variable(tf.random_normal( [A_Sparse.shape[1], embeddings], stddev=init_stddev)) # loss function def sparse_mean_square_error(sparse_ratings, user_embeddings, movie_embeddings): predictions = tf.reduce_sum( tf.gather(user_embeddings, sparse_ratings.indices[:, 0]) * tf.gather(movie_embeddings, sparse_ratings.indices[:, 1]), axis=1) loss = tf.losses.mean_squared_error(sparse_ratings.values, predictions) return loss
Is it possible to do this with a particular loss function, optimizer and learning schedule?
Thank you very much.

which algorithm can be used for this scenario?
I want to create a recommendation for the frequently viewed and bought products based on the category for the implicit and explicit data.Kindly suggest me a algorithm for this use case? can we go for user to user collaborative filtering?