Evaluating subset selection methods for use case points estimation

Šilhavý, Radek; Šilhavý, Petr; Prokopová, Zdenka

dc.title	Evaluating subset selection methods for use case points estimation	en
dc.contributor.author	Šilhavý, Radek
dc.contributor.author	Šilhavý, Petr
dc.contributor.author	Prokopová, Zdenka
dc.relation.ispartof	Information and Software Technology
dc.identifier.issn	0950-5849 Scopus Sources, Sherpa/RoMEO, JCR
dc.date.issued	2018
utb.relation.volume	97
dc.citation.spage	1
dc.citation.epage	9
dc.type	article
dc.language.iso	en
dc.publisher	Elsevier Science BV
dc.identifier.doi	10.1016/j.infsof.2017.12.009
dc.relation.uri	https://www.sciencedirect.com/science/article/pii/S0950584917305153
dc.subject	Software Development Effort Estimation	en
dc.subject	Software size estimation	en
dc.subject	Clustering techniques	en
dc.subject	Spectral Clustering	en
dc.subject	K-means	en
dc.subject	Moving Window	en
dc.subject	Use Case Points	en
dc.description.abstract	When the Use Case Points method is used for software effort estimation, users are faced with low model accuracy which impacts on its practical application. This study investigates the significance of using subset selection methods for the prediction accuracy of Multiple Linear Regression models, obtained by the stepwise approach. K-means, Spectral Clustering, the Gaussian Mixture Model and Moving Window are evaluated as appropriate subset selection techniques. The methods were evaluated according to several evaluation criteria and then statistically tested. Evaluation was performing on two independent datasets-which differ in project types and size. Both were cut by the hold-out method. If clustering were used, the training sets were clustered into 3 classes; and, for each of class, an independent regression model was created. These were later used for the prediction of testing sets. If Moving Window was used, then window of sizes 5, 10 and 15 were tested. The results show that clustering techniques decrease prediction errors significantly when compared to Use Case Points or moving windows methods. Spectral Clustering was selected as the best-performing solution, because it achieves a Sum of Squared Errors reduction of 32% for the first dataset, and 98% for the second dataset. The Mean Absolute Percentage Error is less than 1% for the second dataset for Spectral Clustering; 9% for moving window; and 27% for Use Case Points. When the first dataset is used, then prediction errors are significantly higher -53% for Spectral Clustering, but Use Case Points produces a 165% result. It can be concluded that this study proves subset selection techniques as a significant method for improving the prediction ability of linear regression models - which are used for software development effort prediction. It can also be concluded that the clustering method performs better than the moving window method.	en
utb.faculty	Faculty of Applied Informatics
dc.identifier.uri	http://hdl.handle.net/10563/1007858
utb.identifier.obdid	43878637
utb.identifier.scopus	2-s2.0-85039969351
utb.identifier.wok	000428008600001
utb.source	j-wok
dc.date.accessioned	2018-04-23T15:01:48Z
dc.date.available	2018-04-23T15:01:48Z
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.rights.access	openAccess
utb.contributor.internalauthor	Šilhavý, Radek
utb.contributor.internalauthor	Šilhavý, Petr
utb.contributor.internalauthor	Prokopová, Zdenka
utb.fulltext.affiliation	Radek Silhavy , Petr Silhavy, Zdenka Prokopova Tomas Bata University in Zlin, Faculty of Applied Infomatics, Nad Stranemi 4511, Zlin 76001, Czech Republic Corresponding author. E-mail addresses: radek@silhavy.cz, rsilhavy@fai.utb.cz (R. Silhavy).
utb.fulltext.dates	Received 30 June 2017 Received in revised form 18 December 2017 Accepted 21 December 2017 Available online 28 December 2017
utb.fulltext.references	[1] G. Karner, Metrics For Objectory', Diploma, University of Linkoping, Sweden, December 1993, p. 21 No. LiTH-IDA-Ex-9344. [2] R Silhavy, P Silhavy, Z Prokopova, Algorithmic optimisation method for improving use case points estimation, PloS ONE 10 (11) (2015) e0141887. [3] M. Ochodek, B. Alchimowicz, J. Jurkiewicz, J. Nawrocki, Improving the reliability of transaction identification in use cases, Inf. Softw. Technol. 53 (8) (2011) 885–897, http://dx.doi.org/10.1016/J.Infsof.2011.02.004 PubMed PMID: WOS:000292176300007. [4] A.B. Nassif, D Ho, L.F. Capretz, Towards an early software estimation using log-linear regression and a multilayer perceptron model, J. Syst. Softw. 86 (1) (2013) 144–160. [5] V. Anandhi, R.M. Chezian, Regression techniques in software effort estimation using cocomo dataset, International Conference on Intelligent Computing Applications (Icica 2014), 2014, pp. 353–357, , http://dx.doi.org/10.1109/Icica.2014.79 PubMed PMID: WOS:000358253500072. [6] M. Ochodek, J. Nawrocki, K. Kwarciak, Simplifying effort estimation based on use case points, Inf. Softw. Technol. 53 (3) (2011) 200–213, http://dx.doi.org/10.1016/j.infsof.2010.10.005. [7] M. Jorgensen, Regression models of software development effort estimation accuracy and bias, Empir. Softw. Eng. 9 (4) (2004) 297–314, http://dx.doi.org/10.1023/B:EMSE.0000039881.57613 cb. PubMed PMID: WOS:000224569200003. [8] T. Urbanek, Z. Prokopova, R. Silhavy, V. Vesela, Prediction accuracy measurements as a fitness function for software effort estimation, SpringerPlus 4 (2015) 17, http://dx.doi.org/10.1186/s40064-015-1555-9 PubMed PMID: WOS:000368718000002. [9] M. Jorgensen, M. Shepperd, A systematic review of software development cost estimation studies, IEEE T Softw. Eng. 33 (1) (2007) 33–53, http://dx.doi.org/10.1109/Tse.2007.256943 PubMed PMID: WOS:000242312200003. [10] J.F. Wen, S.X. Li, Z.Y. Lin, Y. Hu, C.Q. Huang, Systematic literature review of machine learning based software development effort estimation models, Inf. Softw. Technol. 54 (1) (2012) 41–59, http://dx.doi.org/10.1016/j.infsof.2011.09.002 PubMed PMID: WOS:000297871500003. [11] R. Silhavy, P. Silhavy, Z. Prokopova, Analysis and selection of a regression model for the use case points method using a stepwise approach, J. Syst. Softw. 125 (2017) 1–14 http://dx.doi.org/10.1016/j.jss.2016.11.029. [12] A. Idri, F.A. Amazal, A. Abran, Analogy-based software development effort estimation: a systematic mapping and review, Inf. Softw. Technol. 58 (2015) 206–230, http://dx.doi.org/10.1016/j.infsof.2014.07.013 PubMed PMID: WOS:000347022800012. [13] A. Nassif, M. Azzeh, L. Capretz, D. Ho, Neural network models for software development effort estimation: a comparative study, Neural Comput. Appl. (2015) 1–13, http://dx.doi.org/10.1007/s00521-015-2127-1. [14] M. Azzeh, A.B. Nassif, A hybrid model for estimating software project effort from use case points, Appl. Soft Comput. (2016). [15] V.K. Bardsiri, D.N.A. Jawawi, S.Z.M. Hashim, E. Khatibi, Increasing the accuracy of software development effort estimation using projects clustering, Iet Softw. 6 (6) (2012) 461–473, http://dx.doi.org/10.1049/iet-sen.2011.0210 PubMed PMID: WOS:000310517200001. [16] Z. Prokopova, R. Silhavy, P. Silhavy, The effects of clustering to software size estimation for the use case points methods, Adv. Intell. Syst. Comput. (2017) 479–490. [17] V.K. Bardsiri, D.N.A. Jawawi, S.Z.M. Hashim, E. Khatibi, A flexible method to estimate the software development effort based on the classification of projects and localization of comparisons, Empir. Softw. Eng. 19 (4) (2014) 857–884, http://dx.doi.org/10.1007/s10664-013-9241-4 PubMed PMID: WOS:000336388500003. [18] J. Kennedy, R. Eberhart, Particle swarm optimization, 1995 IEEE International Conference on Neural Networks Proceedings, 1–6 1995, pp. 1942–1948, , http://dx.doi.org/10.1109/Icnn.1995.488968 PubMed PMID: WOS:A1995BF46H00374. [19] J. Hihn, L. Juster, J. Johnson, T. Menzies, G. Michael, Improving and expanding NASA software cost estimation methods, Aerospace Conference, 2016 IEEE, IEEE, 2016. [20] C. Lokan, E. Mendes, Applying moving windows to software effort estimation, Int. Symp. Emp. Softw. (2009) 111–122 PubMed PMID: WOS:000274866100011. [21] S. Amasaki, C. Lokan, The effects of moving windows to software estimation: comparative study on linear regression and estimation by analogy, Proceedings of the 2012 Joint Conference of the 22nd International Workshop on Software Measurement and the 2012 Seventh International Conference on Software Process and Product Measurement (Iwsm-Mensura 2012), 2012, pp. 23–32, , http://dx.doi.org/10.1109/Iwsm-Mensura.2012.13 PubMed PMID: WOS:000317102600006. [22] A. Saxena, M. Prasad, A. Gupta, N. Bharill, O.P. Patel, A. Tiwari, et al., A review of clustering techniques and developments, Neurocomputing 267 (Supplement C) (2017) 664–681 https://doi.org/10.1016/j.neucom.2017.06.053. [23] C. Lokan, E. Mendes, Investigating the use of moving windows to improve software effort prediction: a replicated study, Empir. Softw. Eng. 22 (2) (2017) 716–767, http://dx.doi.org/10.1007/s10664-016-9446-4 PubMed PMID: WOS:000399891400004. [24] S. Amasaki, C. Lokan, Evaluation of moving window policies with CART, Int. Worksh. Empir. Eng. (2016) 24–29, http://dx.doi.org/10.1109/Iwesep.2016.10 PubMed PMID: WOS:000381744800005. [25] S. Amasaki, C. Lokan, On applicability of fixed-size moving windows for ANN-based effort estimation, Proceedings of 2016 Joint Conference of the International Workshop on Software Measurement and the International Conference on Software Process and Product Measurement (Iwsm-Mensura), 2016, pp. 213–218, , http://dx.doi.org/10.1109/IWSM-Mensura.2016.31 PubMed PMID: WOS:000399139200029. [26] M. Azzeh, A. Nassif, S. Banitaan, F. Almasalha, Pareto efficient multi-objective optimization for local tuning of analogy-based estimation, Neural Comput. Appl. (2015) 1–25, http://dx.doi.org/10.1007/s00521-015-2004-y. [27] R. Silhavy, P. Silhavy, Z. Prokopova, Applied least square regression in use case estimation precision tuning, Software Engineering in Intelligent Systems, Springer International Publishing, 2015, pp. 11–17. [28] A. de Myttenaere, B. Golden, B. Le Grand, F. Rossi, Mean absolute percentage error for regression models, Neurocomputing 192 (Supplement C) (2016) 38–48 https://doi.org/10.1016/j.neucom.2015.12.114. [29] B.A. Kitchenham, L.M. Pickard, S.G. MacDonell, M.J. Shepperd, What accuracy statistics really measure [software estimation], Softw. IEE Proc. 148 (3) (2001) 81–85, http://dx.doi.org/10.1049/ip-sen:20010506. [30] M. Shepperd, M. Cartwright, G. Kadoda, On building prediction systems for software engineers, Empir. Softw. Eng. 5 (3) (2000) 175–182, http://dx.doi.org/10.1023/a:1026582314146. [31] G. James, D. Witten, T. Hastie, R. Tibshirani, An Introduction to Statistical Learning, Springer, 2013. [32] D. Reynolds, Gaussian Mixture Models, in: SZ Li, A Jain (Eds.), Encyclopedia of Biometrics. Boston, MA: Springer US, 2009, pp. 659–663. [33] U. von Luxburg, A tutorial on spectral clustering, Stat. Comput. 17 (4) (2007) 395–416, http://dx.doi.org/10.1007/s11222-007-9033-z. [34] M. Soltanolkotabi, E. Elhamifar, E.J. Candes, Robust subspace clustering, Ann. Stat. 42 (2) (2014) 669–699, http://dx.doi.org/10.1214/13-Aos1199 PubMed PMID: WOS:000336888400014. [35] T. Li, S. Zhu, M. Ogihara, Using discriminant analysis for multi-class classification: an experimental investigation, Knowl. Inf. Syst. 10 (4) (2006) 453–472, http://dx.doi.org/10.1007/s10115-006-0013-y. [36] B. Kitchenham, S.L. Pfleeger, B. McColl, S. Eagan, An empirical study of maintenance and development estimation accuracy (vol 64, pg 57, 2002), J. Syst. Softw. 74 (2) (2005) 227, http://dx.doi.org/10.1016/j.jss.2004.07.010 PubMed PMID: WOS:000224874700010. [37] A Subriadi, P Ningrum, Critical review of the effort rate value in use case point method for estimating software development effort, J. Theoretical Appl. Inf. Technol. 59 (3) (2014) 735–744.
utb.fulltext.sponsorship	-
utb.wos.affiliation	[Silhavy, Radek; Silhavy, Petr; Prokopova, Zdenka] Tomas Bata Univ Zlin, Fac Appl Infomat, Nad Stranemi 4511, Zlin 76001, Czech Republic
utb.scopus.affiliation	Silhavy R., Tomas Bata University in Zlin, Faculty of Applied Infomatics, Nad Stranemi 4511, Zlin, 76001, Czech Republic; Silhavy P., Tomas Bata University in Zlin, Faculty of Applied Infomatics, Nad Stranemi 4511, Zlin, 76001, Czech Republic; Prokopova Z., Tomas Bata University in Zlin, Faculty of Applied Infomatics, Nad Stranemi 4511, Zlin, 76001, Czech Republic
utb.fulltext.projects	-