How to validate algorithm ant colony optimization vs parallelism ant colony optimization for reduction of dimensionality
I am using the ant colony algorithm for the reduction of dimensionality, and I am going to compare it with an ant colony algorithm but parallel. My question is, what type of database should I use??? and as valid these algorithms???. Help please, it is to present in a paper.
See also questions close to this topic

General parallelism with PostgreSQL CTEs
I'm working with some large data and getting parallel plans in my queries is necessary. I also quite like to use CTEs to express my queries, but following PostgreSQL's documentation, I'm not so sure whether CTEs pose a serious limitation on parallelism or not.
Here, CTEs and temporary tables are marked as 'parallel restricted', where 'parallel restricted' is defined as
A parallel restricted operation is one which cannot be performed in a parallel worker, but which can be performed in the leader while parallel query is in use.
Here, the description on parallel limitations as far as CTEs are considered is a bit different:
If a query contains a datamodifying operation either at the top level or within a CTE, no parallel plans for that query will be generated.
In my case, I don't have any datamodifying operations.
To what degree will CTEs limit the quality of my parallel plan, if at all?
To be fair, I've had some difficulty understanding the implications of the first definition. Since, CTEs can be materialized as temporary tables, then I'm sure that this impact is even more relevant. And the second definition suggests that CTE parallelism limitations are only related to datamodifying operations.

CRCW PRAM primality test
I am thinking of primality test implemented in
CRCW PRAM
model. This is what I've managed:n // input number result = n > 1; for i=2...sqrt(n); i+=6 { do parallel { if (((n<=3) and (n<=1))  (((n%2==0) or (n%3==0)))) { result = false; } else if ((n % i == 0) or (n%(i+2)) == 0) { result = false; } } }
The reason of
result = n > 1;
is to prevent checking negative number.for
loop is doing rest job.My questions are:
I know that in
CRCW PRAM
model I have to write same value to memory cell. But does it matter if part of code is not parallel? I meann > 1;
can betrue
orfalse
. Does it disqualify my solution?Can I use
+=6
in for loop?

spark sql : How to achieve parallel processing of dataframe at group level but with in each group, we require sequential processing of rows
 Apply grouping on the data frame. Let us say it resulted in 100 groups with 10 rows each.
 I have a function that has to be applied on each group. It can happen in parallel fashion and in any order (i.e., it is upto the spark discretion to choose any group in any order for execution).
 But with in group, I need the guarantee of sequential processing of the rows. Because after processing each row in a group, I use the output in the processing of any of the rows remaining in the group.
We took the below approach where everything was running on the driver and could not utilize the spark cluster's parallel processing across nodes (as expected, performance was real bad)
1) Break the main DF into multiple dataframes, placed in an array :
val securityIds = allocStage1DF.select("ALLOCONEGROUP").distinct.collect.flatMap(_.toSeq) val bySecurityArray = securityIds.map(securityIds => allocStage1DF.where($"ALLOCONEGROUP" <=> securityIds))
2) Loop thru the dataframe and pass to a method to process, rowbyrow from above dataframe:
df.coalesce(1).sort($"PRIORITY" asc).collect().foreach({ row => AllocOneOutput.allocOneOutput(row)} )
What we are looking for is a combination of parallel and sequential processing.
Parallel processing at group level. because, these are all independent groups and can be parallelized.
With in each group, rows have to be processed one after the other in a sequence which is very important for our use case.

Fisher's Linear Discriminant Implementation
I am implementing Fisher's Linear Discriminant from the scratch. I am working on the mnist data(with just two numbers 1 & 2  two class problem). My goal was to reduce 28 x 28  784 dimensions to 10 dimensions using PCA. I completed this part.
Now, I want to run Fisher's LD to project the PCA train data to 1 dimension and estimate the threshold to discriminate the two classes. Can you tell me how to do the latter part?

Process finished with exit code 1073740940 (0xc0000374) using Scikitlearn KernelPCA
First of all, I tried to perform dimensionality reduction on my n_samples x 53 data using scikitlearn's Kernel PCA with precomputed kernel. The code worked without any issues when I tried using 50 samples at first. However, when I increased the number of samples into 100, suddenly I got the following message.
Process finished with exit code 1073740940 (0xC0000374)
Here's the detail of what I want to do:
I want to obtain the optimum value of kernel function hyperparameter in my Kernel PCA function, defined as the following.
from sklearn.decomposition.kernel_pca import KernelPCA as drm from somewhere import costfunction from somewhere_else import customkernel def kpcafun(w,X): # X is sample # w is hyperparam n_princomp = 2 drmodel = drm(n_princomp,kernel='precomputed') k_matrix = customkernel (X,X,w) transformed_x = drmodel.fit_transform(k_matrix) cost = costfunction(transformed_x) return cost
Therefore, to optimize the hyperparams I used the following code.
from scipy.optimize import minimize # assume that wstart and optimbound are already defined res = minimize(kpcafun, wstart, method='LBFGSB', bounds=optimbound, args=(X))
The strange thing is when I tried to debug the first 10 iterations of the optimization process, nothing strange has happened all values of the variables seemed normal. But, when I turned off the breakpoints and let the program continue the message appeared without any error notification.
Does anyone know what might be wrong with my code? Or anyone has some tips to resolve a problem like this?
Thanks

Using UMAP with HDBSCAN Clustering with a cosine metric
I am trying to use UMAP for clustering as exemplified in the docs for UMAP, where HDBSCAN clustering is used on a UMAP dimensionreduced data. However, I need to use the cosine metric and the results do not make sense to me. That is why I created a simple example that I still do not understand.
See the comments in the code. I expect the 0th and 1st points to be clustered in one cluster, the 2nd and the 3rd in another.
import numpy as np import matplotlib.pyplot as plt import hdbscan import umap from numpy.linalg import norm #import linear algebra norm # Create 4 points in 3 dimensions. 0th,1st and 2nd,3rd are aligned to have the same cosine (right?) m=np.array([[0,1,0],[0,8,0],[0.5,0.5,0.1],[5,5,1]]) fig = plt.figure(figsize=(12,6)); _ = plt.scatter(m[:,0], m[:,1], marker='o', c=[1,2,3,4],cmap='viridis'); _ = plt.title('4 pts in 3D, 0th and 1st have same cosine, 2nd and 3rd have same cosine') _ = plt.ylim([0,10]); for i,txt in enumerate(range(4)): _ = plt.annotate(str(txt), (m[i, 0], m[i, 1])); plt.savefig('01.png') # As a test, apply HDBSCAN clustering and the labels=[0,0,1,1] show the two pairs of points getting clutered together as expected metric='cosine' #Apparently this should be euclidean, but in that case I get no clusters model_hdbscan = hdbscan.HDBSCAN(min_cluster_size=2, min_samples=1, metric=metric, algorithm='generic').fit(np.double(m)) clustered = (model_hdbscan.labels_ >= 0) fig = plt.figure(figsize=(12,6)) plt.scatter(m[clustered, 0], m[clustered, 1], c=model_hdbscan.labels_[clustered], marker='o', cmap='tab10', label='clustered data'); plt.title('clustering of 4 points in 3D, labels (ie cluster associations):'+str(model_hdbscan.labels_)+'. good!\n' + 'Clustered %.1f percent of data in %d clusters. metric=%s' % (sum(clustered)/len(clustered)*100, max(model_hdbscan.labels_)+1, metric)) for i,txt in enumerate(range(4)): _ = plt.annotate(str(txt), (m[i, 0], m[i, 1])); plt.savefig('02.png') # Now apply UMAP to reduce the dimensions from 3 to 2. I am honestly not sure this is right clusterable_embedding = umap.UMAP(n_components=2, n_neighbors=3, min_dist=0.0, metric=metric, random_state=1).fit_transform(m) fig = plt.figure(figsize=(12,6)) plt.scatter(clusterable_embedding[:, 0], clusterable_embedding[:, 1], marker='o', c=[0,1,2,3],cmap='tab10'); _ = plt.title('UMAP from 3D to 2D using the %s metric' % metric); for i,txt in enumerate(range(4)): _ = plt.annotate(str(txt), (clusterable_embedding[i, 0], clusterable_embedding[i, 1])); plt.savefig('03.png') # Cluster the reduced data. Notice that the labels=[0,1,0,1] are not what I expect metric='cosine' model_hdbscan = hdbscan.HDBSCAN(min_cluster_size=2, min_samples=1, metric=metric, algorithm='generic').fit(np.double(clusterable_embedding)) clustered = (model_hdbscan.labels_ >= 0) fig = plt.figure(figsize=(12,6)) plt.scatter(clusterable_embedding[clustered, 0], clusterable_embedding[clustered, 1], c=model_hdbscan.labels_[clustered], marker='o', cmap='tab10', label='clustered data'); plt.title('Clustering of 4 UMAP transformed points to 2D. labels (ie cluster association):'+str(model_hdbscan.labels_)+'. BAD!') for i,txt in enumerate(range(4)): _ = plt.annotate(str(txt), (clusterable_embedding[i, 0], clusterable_embedding[i, 1])); plt.savefig('04.png')

Preloaded knapsack where there is a constraint on the number of items that can be swapped
I'm looking to solve the following knapsack problem with the following conditions.
 Knapsack is already filled to capacity 'C' with 'n' objects
 The 'n' objects are a subset of a universe of 'm' objects, so a 0/1 knapsack
 We can only make a maximum of 'x' number of object swaps
The knapsack in this instance is a portfolio of fixed income assets. The assets have a number of attributes including price and yield. As the price is changing continuously, it would be ideal to have a dynamic approach to this problem and for that reason I am thinking that an ant colony optimisation algorithm might not be the worst approach but I am open to other ideas.
My current thought process is that the following approach might work:
 Graph of problem is an ant nest ('K') surrounded by 'm' objects
 Each object is connected to the 'K' with an edge
 A subset of the objects, 'n', have edges with a pheromone trail already laid down on it as these are preselected objects
 For each ant, each ant randomly selects 'n' objects from 'm'
 Solutions with '>x' changes from initial knapsack are discarded
 Fitness of remaining solutions is calculated and pheromone is applied to best solution
 Process is repeated until convergence
Appreciate any advice as to whether this is a good approach or if you could suggest alternatives.

need help to create an hybrid algorithm of GA and ant colony for TSP in python
i have two codes written in python of ant colony algorithm and genetic algorithm for traveling salseman problem. i want to make an hybrid approch from these two within single output as an optimal solution