Document Cluster

I am doing clustering for news articles. I used Universal Sentence Encoder for the document embedding. I passed the embedding to the HDSCAN cluster algorithm but the resulting clusters are very problematic. The distance metric I am using is cosine distance.I must note that my dataset is very noisy (contains ads, comments etc that are passed as articles). What would be the best approach to get some good results? My initial idea is to perform dimension reduction techniques (ex: UMAP or NMF) and then pass it to KMeans algorithm. The cosine distance that I am passing at HDBSCAN is 4000x4000 dimensions. If I perform dimension reduction, what is a way to choose the reduction (till what extend should I reduce?)?

PS. I must note this is my first time working on a NLP task, so please dont be too harsh.