hi am working on a project which is detecting heart failure and now I want to use the k_means algorithm for clustering and SVM for classification. I need to know if I can split the dataset into training and testing? since am using k_means is it ok?? please help...thanks

CodePudding user response：

Yes, you can cut randomly in two sets. You can cut in sequential sets. You can cut in large temporally-adjacent tests. That is what the ANOVA tests are all about.

CodePudding user response：

Take a look at these two papers for setting best k in k-means clustering and how to split dataset into test and analysis data:

An Approach for Characterizing Workloads in Google Cloud to Derive Realistic Resource Utilization Models

2013 IEEE Seventh International Symposium on Service-Oriented System Engineering

Ismael Solis Moreno, Peter Garraghan, Paul Townend, Jie Xu School of Computing University of Leeds Leeds, UK

{scism, scpmg, p.m.townend, j.xu} @ leeds.ac.uk

==================================================

Analysis, Modeling and Simulation of Workload Patterns in a Large-Scale Utility Cloud

Ismael Solis Moreno, Peter Garraghan, Paul Townend, and Jie Xu, Member, IEEE

IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 2, APRIL-JUNE 2014