Hi, all here,
Since we are not able to use accuracy chart for Clustering algorithms there. So how can we verify the accuracy of clustering algorithm models here in terms of its classification and regression tasks?
Thank you very much in advance for your guidance and advices for that.
With best regards,
Yours sincerely,
Actaully, you are able to use accuracy chart if you define any columns as predictable for the clustering model. You can verify this by using the Targeted Mailing data set from the Adventure Works DW. You can create a Microsoft Clustering model and set the bike buyer as predict. After you process your model, you will be able to get lift chart, profit chart and classification matrix for your clustering model.
Good luck,
|||Of course, the strength of clustering is to find intrinsic categories in your data to allow you understand your data better. You can use Cluster Diagram, cluster profiles, cluster characteristics and Cluster discrimination to explore your data. As a domain expert, this tools should also help you validate the quality of the resulting clusters.
We have different clustering algorithms, such as: K-Mean and EM, etc. You can also use our live sample Clustering Art to test and try different clustering algorithms. This will help you understand each algorithm better. If you feel like it, you could also compose your own data set with interesting categories, and see whether a specific clustering algorithm can successfully locate them.
Good luck,
|||Hi, Yimin Wu,
Thank you very much indeed.
With best regards,
Yours sincerely,
|||
You can also do a
SELECT TOP 1 MSOLAP_NODE_SCORE FROM MyClusterModel.CONTENT
To get a number as to how well the model clusters the training cases. This number is really only comparable between models built on the same dataset and if you tried to use it between disparate models the comparison would be meaningless.
A better way to determine cluster "accuracy" is to create a test set and then do an average of the PredictCaseLikelihood() of all the cases in the testing set. The higher the result, the better the cluster model works against new data.
|||Hi, Jamie, thanks a lot for your guidance. Very helpful.
With best regards,
Yours sincerely,
No comments:
Post a Comment