hi am working on a project which is detecting heart failure and now I want to use the k_means algorithm for clustering and SVM for classification. I need to know if I can split the dataset into training and testing? since am using k_means is it ok?? please help...thanks
CodePudding user response:
Yes, you can cut randomly in two sets. You can cut in sequential sets. You can cut in large temporally-adjacent tests. That is what the ANOVA tests are all about.
CodePudding user response:
Take a look at these two papers for setting best k in k-means clustering and how to split dataset into test and analysis data:
An Approach for Characterizing Workloads in Google Cloud to Derive Realistic Resource Utilization Models
2013 IEEE Seventh International Symposium on Service-Oriented System Engineering
Ismael Solis Moreno, Peter Garraghan, Paul Townend, Jie Xu School of Computing University of Leeds Leeds, UK
{scism, scpmg, p.m.townend, j.xu} @ leeds.ac.uk
==================================================
Analysis, Modeling and Simulation of Workload Patterns in a Large-Scale Utility Cloud
Ismael Solis Moreno, Peter Garraghan, Paul Townend, and Jie Xu, Member, IEEE
IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 2, APRIL-JUNE 2014
