Slicing an image into color based layers with Sklearn
I have an image. I applied KMeans color clustering on this image so I need to present only purple clusters on image and orange clusters on different image. How do I need to do that?
do you know?
how many words do you know
See also questions close to this topic

Keyerror when processing pandas dataframe
For a pathway pi, the CNA data of associated genes were extracted from the CNV matrix (C), producing an intermediate matrix B∈Rn×ri, where ri is the number of genes involved in the pathway pi. That is, the matrix B consists of samples in rows and genes for a given pathway in columns. Using principal component analysis (PCA), the matrix B was decomposed into uncorrelated components, yielding Gpi∈Rn×q, where q is the number of principal components (PCs).
import pandas as pd import numpy as np from sklearn.decomposition import PCA from sklearn.preprocessing import LabelEncoder import csv def get_kegg_pathways(): kegg_pathways = [] with open(directory + "hsa.txt", newline="") as keggfile: kegg = pd.read_csv(keggfile, sep="\t") for row in kegg: #for row in kegg.itertuples(): kegg_pathways.append(row) return kegg_pathways def main(): # Pathway info kegg = get_kegg_pathways() # q : Number of Principal Components (PCs) # C : CNV matrix # G = mRNA expression matrix # M : DNA methylation matrix q = 5 C = [] G = [] M = [] # Process common data (denoted as matrix B) cna_sample_index = {} process_common = True if process_common: for i, p in enumerate(kegg): genes = {} first = True for s in p: if first: first = False else: if s!= "NA": genes[s] = 1 # Loop through each sample B = [] pathways = [] for s in ld: B.append([]) pathways.append(cna_sample_index[p]) Bi = 0 for index, row in cna.df.itertuples(): if row[0].upper() in genes: Bi2 = Bi for c in pathways: B[Bi2].append(cna.df.iloc[index, c]) Bi2 = Bi2 + 1 pca_cna = cna.fit() pca_cna.fit(B)
Traceback:
File "/home/melissachua/main.py", line 208, in <module> main() File "/home/melissachua/main.py", line 165, in main pathways.append(cna_sample_index[p]) KeyError: 'hsa00010_Glycolysis_/_Gluconeogenesis'
kegg
table:0 1 0 hsa00010_Glycolysis_/_Gluconeogenesis NaN 1 hsa00020_Citrate_cycle_(TCA_cycle) NaN 2 hsa00030_Pentose_phosphate_pathway NaN cna
table:Hugo_Symbol TCGA02000101 TCGA02000102 TCGA02000103 0 0.001 0.002 0.003 0.004 1 0.005 0.006 0.007 0.008 
Is there a way to use mutual information as part of a pipeline in scikit learn?
I'm creating a model with scikitlearn. The pipeline that seems to be working best is:
 mutual_info_classif with a threshold
 PCA
 LogisticRegression
I'd like to do them all using sklearn's pipeline object, but I'm not sure how to get the mutual info classification in. For the second and third steps I do:
pca = PCA(random_state=100) lr = LogisticRegression(random_state=200) pipe = Pipeline( [ ('dim_red', pca), ('pred', lr) ] )
But I don't see a way to include the first step. I know I can create my own class to do this, and I will if I have to, but is there a way to do this within sklearn?

Why KMedoids and Hierarchical return different results?
I have a huge dataframe which only contains 0 and 1, and I tried to use the method
scipy.cluster.hierarchy
to get the dendrogram and then use the methodsch.fcluster
to get the cluster by a specific cutoff. (the metric for distance matrix is Jacccard, the method for linkage is "centroid")However, when I want to specify the optimistic numbers of clusters for my dataframe, I notice the method of KMedoids combined with the Elbow Method can help me. Then after I know the best numbers of clusters such as 2, I tried to use
KMedoids(n_clusters=2,metric='jaccard').fit(dataset)
to get clusters, but the result is different from Hierarchical method. (the reason why I don't use Kmeans is that it is too slow for my dataframe)Therfore, I did a test (the index 0,1,2,3 will be grouped):
import pandas as pd import numpy as np from scipy.spatial.distance import pdist label1 = np.random.choice([0, 1], size=20) label2 = np.random.choice([0, 1], size=20) label3 = np.random.choice([0, 1], size=20) label4 = np.random.choice([0, 1], size=20) dataset = pd.DataFrame([label1,label2,label3,label4]) dataset
Method KMedoids:
since there only are 4 indexes, so the cluster number was set to 2.
from sklearn_extra.cluster import KMedoids cobj = KMedoids(n_clusters=2,metric='jaccard').fit(dataset) labels = cobj.labels_ labels
the clustering result as shown below:
Method Hierarchical:
import scipy.cluster.hierarchy as such #calculate distance matrix disMat = sch.distance.pdist(dataset, metric='jaccard') disMat1 = sch.distance.squareform(disMat) # cluster: Z2=sch.linkage(disMat1,method='centroid') sch.fcluster(Z2, t=1, criterion='distance')
to meet the same number of clusters I tried several cutoff, the number of cluster was 2 when the cutoff was set to 1. Here is the result:
And I googled about the dataframe which was passed to KMedoids should be the original dataframe, not the distance matrix. but it seems that KMedoids will convert the original dataframe to a new one which I don't know for some reason. because I got the data conversion warning:
DataConversionWarning: Data was converted to boolean for metric jaccard warnings.warn(msg, DataConversionWarning)
I also got warning when I perform Hierarchical method:
ClusterWarning: scipy.cluster: The symmetric nonnegative hollow observation matrix looks suspiciously like an uncondensed distance matrix
Purpose:
What I want is to find some method to get the clusters if I know the optimal number of clusters. but the method Hierarchical need to try different cutoff, while the KMedoids don't, but it turns a different result.
Can anybody explain this to me? And are there better ways to perform clustering?

how to calculate rand index for a kmeans clustering?
I want to calculate rand index after applying Kmeans clustering that repeats for 30 times then from the results i need to calculate the mean and std of the rand index.
i have already tried to do it but the only value that i get is 1.0
this is what i've done so far:
kmeans_model = KMeans(n_clusters=2, random_state=1,max_iter=30,init="random").fit(data) y = kmeans_model.fit_predict(data[data.columns]) cluster_labels = kmeans_model.labels_ sample_silhouette_values = metrics.silhouette_samples(data, cluster_labels) sample_randIndex_values=metrics.adjusted_rand_score(y,kmeans_model.labels_) silhouette_score_list = [] rand_index_list= [] for label in range(2): silhouette_score_list.append(sample_silhouette_values[cluster_labels == label].mean()) print(silhouette_score_list) data['class']=y sample_randIndex_values

object segmentation using mean shift
i have this image enter image description here
I am interested to do segmentation only in the objects that appear in the image so i did something like this
import numpy as np import cv2 from sklearn.cluster import MeanShift, estimate_bandwidth #from skimage.color import rgb2lab #Loading original image originImg = cv2.imread('test/2019_00254.jpg') # Shape of original image originShape = originImg.shape # Converting image into array of dimension [nb of pixels in originImage, 3] # based on r g b intensities flatImg=np.reshape(originImg, [1, 3]) # Estimate bandwidth for meanshift algorithm bandwidth = estimate_bandwidth(flatImg, quantile=0.1, n_samples=100) ms = MeanShift(bandwidth = bandwidth, bin_seeding=True) # Performing meanshift on flatImg ms.fit(flatImg) # (r,g,b) vectors corresponding to the different clusters after meanshift labels=ms.labels_ # Remaining colors after meanshift cluster_centers = ms.cluster_centers_ # Finding and diplaying the number of clusters labels_unique = np.unique(labels) n_clusters_ = len(labels_unique) print("number of estimated clusters : %d" % n_clusters_) segmentedImg = cluster_centers[np.reshape(labels, originShape[:2])] cv2.imshow('Image',segmentedImg.astype(np.uint8)) cv2.waitKey(0) cv2.destroyAllWindows()
but the problem is its doing segmentation in the whole image including the background so how can i do segmentation on the objects only note that i have bboxes coordinates of each object

Texture transformation
I am working on eigen transformation  texture to detect object from an image. This work was published in ACCV 2006 page number 71. Full pdf is available on chapter3 in this pdf https://www.divaportal.org/smash/get/diva2:275069/FULLTEXT01.pdf. I am not able to follow after getting texture descriptors. I took patches of 32*32 and for every patch calculated eigenvalues and got the texture descriptor. After that what to do with these texture descriptors is what I am not able to follow. Any help to unblock will be really appreciated.