Publikace UTB
Repozitář publikační činnosti UTB

Towards improving the efficiency of software development effort estimation via clustering analysis

Repozitář DSpace/Manakin

Zobrazit minimální záznam


dc.title Towards improving the efficiency of software development effort estimation via clustering analysis en
dc.contributor.author Vo Van, Hai
dc.contributor.author Ho, Le Thi Kim Nhung
dc.contributor.author Prokopová, Zdenka
dc.contributor.author Šilhavý, Radek
dc.contributor.author Šilhavý, Petr
dc.relation.ispartof IEEE Access
dc.identifier.issn 2169-3536 Scopus Sources, Sherpa/RoMEO, JCR
dc.date.issued 2022
utb.relation.volume 10
dc.citation.spage 83249
dc.citation.epage 83264
dc.type article
dc.language.iso en
dc.publisher Institute of Electrical and Electronics Engineers Inc.
dc.identifier.doi 10.1109/ACCESS.2022.3185393
dc.relation.uri https://ieeexplore.ieee.org/document/9803030
dc.subject software effort estimation en
dc.subject Function Point Analysis en
dc.subject dataset clustering en
dc.subject K-Means en
dc.subject categorical variables en
dc.subject machine learning en
dc.subject clustering methods en
dc.subject computational modeling en
dc.subject estimation en
dc.subject industries en
dc.subject organizations en
dc.subject software en
dc.subject software algorithms en
dc.description.abstract Introduction: The precise estimation of software effort is a significant difficulty that project managers encounter during software development. Inaccurate forecasting leads to either overestimating or underestimating software effort, which can be detrimental for stakeholders. The International Function Point Users Group's Function Point Analysis (FPA) method is one of the most critical methods for software effort estimation. However, the practice of using the FPA method in the same fashion across all software areas needs to be reexamined. Aim: We propose a model for evaluating the influence of data clustering on software development effort estimation and then finding the best clustering method. We call this model the effort estimation using machine learning applied to the clusters (EEAC) model. Method: We cluster the dataset according to the clustering method and then apply the FPA and EEAC methods to these clusters for effort estimation. The clustering methods we use in this study include five categorical variable criteria (Development Platform, Industrial Sector, Language Type, Organization Type, and Relative Size) and the k-means clustering algorithm. Results: The experimental results show that the estimation accuracy obtaining with clustering consistently outperforms the accuracy without clustering for both the FPA and EEAC methods. Significantly, using the FPA method, the average improvement rate from using clustering as opposed to non-clustered was highest at 58.06%, according to the RMSE. With the EEAC method, this number reached 65.53%. The Industry Sector categorical variable achieves the best accuracy estimation compared to the other clustering criteria and k-means clustering. The improvement in accuracy in terms of the RMSE when applying this criterion is 63.68% for the FPA method and 72.02% for the EEAC method. Conclusion: Better results are obtained through dataset clustering compared to no clustering for both the FPA and EEAC methods. The Industry Sector is the most suitable clustering method among the tested clustering methods. Author en
utb.faculty Faculty of Applied Informatics
dc.identifier.uri http://hdl.handle.net/10563/1011054
utb.identifier.obdid 43884089
utb.identifier.scopus 2-s2.0-85133809040
utb.identifier.wok 000842087800001
utb.source j-scopus
dc.date.accessioned 2022-07-27T09:08:40Z
dc.date.available 2022-07-27T09:08:40Z
dc.description.sponsorship Faculty of Applied Informatics, Tomas Bata University in Zlin [IGA/CebiaTech/2022/001, RVO/FAI/2021/002]
dc.rights Attribution 4.0 International
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.access openAccess
utb.contributor.internalauthor Vo Van, Hai
utb.contributor.internalauthor Ho, Le Thi Kim Nhung
utb.contributor.internalauthor Prokopová, Zdenka
utb.contributor.internalauthor Šilhavý, Radek
utb.contributor.internalauthor Šilhavý, Petr
utb.fulltext.affiliation Vo Van Hai1, Ho Le Thi Kim Nhung1, Zdenka Prokopova1, Radek Silhavy1, and Petr Silhavy1 1 Faculty of Applied Informatics, Tomas Bata University in Zlin, 75501 Zlin, Czech Republic Corresponding author:
utb.fulltext.references [1] M. Jorgensen, M. Shepperd, "A systematic review of software development cost estimation studies," IEEE Transactions on Software Engineering, vol. 33, no. 1, pp. 33-53, Jan 2007. [2] F.J. Heemstra, "Software cost estimation," Information and Software Technology, vol. 34, no. 10, pp. 627-639, 1992. [3] T. Vera, S.F. Ochoa, D. Perovich, "Survey of software development effort estimation taxonomies," Technical Report, Computer Science Department, University of Chile, Chile, 2017. [4] B. Khan, W. Khan, M. Arshad, N. Jan, "Software cost estimation: algorithmic and non-algorithmic approaches," International Journal of Data Science and Advanced Analytics, vol.2, no. 2, pp. 1-5, 2020. [5] M. Azzeh, A.B. Nassif, "Analogy-based effort estimation: a new method to discover a set of analogies from dataset characteristics," IET Software, vol. 9, no. 2, pp. 39–50, 2015. [6] P. Faria, E. Miranda, "Expert judgment in software estimation during the bid phase of a project—an exploratory survey," in Software Measurement and the 2012 Seventh International Conference on Software Process and Product Measurement (IWSM-MENSURA). IEEE, 2012, pp. 126-131. [7] B. Boehm, Software Engineering Economics. New Jersey, USA; Prentice-Hall, 1981. [8] F. W. Quentin, J. M. Koppelman, "Earned value project management," Project Management Institute, 2010. [9] Y. Okayama, L. D. Chirillo, "Product work breakdown structure," National Shipbuilding Research Program, in cooperation with Todd Pacific Shipyards Corporation, 1982. [10] J. Grenning, "Planning poker or how to avoid analysis paralysis while release planning," Hawthorn Woods: Renaissance Software Consulting, pp. 1-3, 2002. [11] M. Cohn, Agile Estimating and Planning. New Jersey, USA: Prentice Hall, 2005. [12] L. H. Putnam," A general empirical solution to the macro software sizing and estimating problem," IEEE Transactions on Software Engineering, vol. SE-4, no. 4, pp. 345–361, July 1978. [13] A. S. Jamil, "Used SLIM model to estimate software cost," AlMansour Journal, no. 10, pp. 49-63, 2007. [14] B. Boehm, C. Abts, A. W. Brown, S. Chulani, B. K. Clark, E. Horowitz, R. Madachy, D. J. Reifer, B. Steece, Software Cost Estimation with COCOMO II. Englewood Cliffs, NJ, USA: Prentice Hall, 2000. [15] G. Karner, "Resource Estimation for Objector Projects," Objective Systems SF AB, 1993. [16] Software and systems engineering -- Software measurement -- IFPUG functional size measurement method, ISO/IEC 20926:2009, 2009. [17] https://www.ifpug.org/, accessed April 2022. [18] Software engineering -- COSMIC: a functional size measurement method, ISO/IEC 19761:2011, 2011. [19] Information technology -- Systems and software engineering - FiSMA 1.1 functional size measurement method, ISO/IEC 29881:2010, 2010. [20] Software engineering – MK II function point analysis – counting practices manual, ISO/IEC 20968:2002, 2002. [21] Software engineering – NESMA functional size measurement method version 2.1 - definitions and counting guidelines for the application of function point analysis, ISO/IEC 24570:2005, 2005. [22] P. Sharma, J. Singh, "Systematic literature review on software effort estimation using machine learning approaches," in International Conference on Next Generation Computing and Information Systems (ICNGCIS), 2017, pp. 43-47. [23] V. V. Hai, H. L. T. K. Nhung, H. T. Hoc, "A review of software effort estimation by using functional points analysis," Advances in Intelligent Systems and Computing, vol. 1047. Springer, 2019. [24] A. J. Albrecht, "Measuring application development productivity," Proc. IBM Applications Develop. Symp., pp. 83, 1979. [25] V. V. Hai, H. L. T. K. Nhung, Z. Prokopova, R. Silhavy, P. Silhavy, "A new approach to calibrating functional complexity weight in software development effort estimation," MDPI Computer, vol.11, no. 15, 2022. [26] M. O. Elish, "Assessment of voting ensemble for estimating software development effort," Proc. IEEE Symp. Comput. Intell. Data Mining (CIDM), pp. 316-321, Apr 2013. [27] K, An and J. Meng, "Voting-averaged combination method for regressor ensemble," in International Conference on Intelligent Computing, 2010, pp. 540–546. [28] D. H. Wolpert, "Stacked generalization," Neural Networks, vol. 5, pp. 241-259, 1992. [29] P. Rai, S. Kumar, D. K. Verma, "Prediction of software effort in the early stage of software development: a hybrid model," IEEE Canadian Journal of Electrical and Computer Engineering, vol. 44, no. 3, pp. 376-383, 2021. [30] P. Silhavy, R. Silhavy, Z. Prokopova, "Spectral clustering effect in software development effort estimation," Symmetry, vol. 13, no. 11, p. 2119, 2021. [31] P. Silhavy, R. Silhavy, Z. Prokopova, "Categorical variable segmentation model for software development effort estimation," IEEE Access, vol. 7, pp. 9618-9626, 2019. [32] P. Silhavy, R. Silhavy, Z. Prokopova, "Stepwise Regression Clustering Method in Function Points Estimation," Advances in Intelligent Systems and Computing, vol 859. Springer, Cham, 2019. [33] International Software Benchmarking Standards Group, ISBSG Repository August 2020 R1, https://www.isbsg.org/ [34] D. R. Anderson, D. J. Sweeney, T. A. William, Statistics for Business and Economics. Ohio, USA: Thomson South-Western, 2009. [35] A. Ross, V. L. Willson, "Paired sample t-test," Basic and Advanced Statistical Tests, pp. 17-19, 2017. [36] F. González-Ladrón-de-Guevara, M. Fernández-Diego, C. Lokan, "The usage of ISBSG data fields in software effort estimation: a systematic mapping study," Journal of Systems and Software, vol. 113, pp. 188–215, 2016. [37] Z. Prokopova, R. Silhavy, P. Silhavy, "The effects of clustering to software size estimation for the use case points methods," Software Engineering Trends and Techniques in Intelligent Systems, pp. 479–490, 2017. [38] Ö. F. Saraç, N. Duru, "A novel method for software effort estimation: estimating with boundaries," IEEE INISTA, pp. 1-5, 2013. [39] R. Silhavy, P. Silhavy, Z. Prokopova, "Evaluating subset selection methods for use case points estimation," Inf. Softw. Technol, vol. 97, pp. 1-9, 2018. [40] S. K. Sehra, J. Kaur, Y. S. Brar, N. Kaur, "Analysis of data mining techniques for software effort estimation," in 11th International Conference on Information Technology: New Generations, 2014, pp. 633-638. [41] P. Silhavy, R. Silhavy, Z. Prokopova, "Spectral clustering effect in software development effort estimation, " MDPI Symmetry, vol. 13, no. 11, 2021. [42] W. D. Sunindyo, C. Rudiyanto, "Improvement of COCOMO II model to increase the accuracy of effort estimation," in International Conference on Electrical Engineering and Informatics (ICEEI), 2019, pp. 140-145. [43] M. Fernández-Diego and J. Torralba-Martínez, "Discretization methods for NBC in effort estimation: An empirical comparison based on ISBSG projects," Proceedings of the 2012 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, pp. 103-106, 2012. [44] K. Iwata, T. Nakashima, Y. Anan, N. Ishii, "Applying machine learning classification to determining outliers in effort for embedded software development projects," in 6th International Conference on Computational Science/Intelligence and Applied Informatics (CSII), 2019, pp. 78-83. [45] J. Huang, Y. Li, J. W. Keung, Y. T. Yu, W. K. Chan, "An empirical analysis of three-stage data-preprocessing for analogy-based software effort estimation on the ISBSG data," in IEEE International Conference on Software Quality, Reliability and Security (QRS), 2017, pp. 442-449. [46] K. Meridji, K. T. Al-Sarayreh, M. Abu-Arqoub, W. M. Hadi, "Exploration of development projects of renewable energy applications in the ISBSG dataset: empirical study," in 2nd International Conference on the Applications of Information Technology in Developing Renewable Energy Processes & Systems (IT-DREPS), 2017, pp. 1-6. [47] Z. Prokopová, P. Šilhavý, R. Šilhavý, "Influence analysis of selected factors in the function point work effort estimation" Advances in Intelligent Systems and Computing, pp. 112–124, 2019. [48] K. Kaewbanjong, S. Intakosum, "Statistical analysis with prediction models of user satisfaction in software project factors," in 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTICON), 2020, pp. 637-643. [49] S. P. Pillai, S. D. Madhukumar, T. Radharamanan, "Consolidating evidence-based studies in software cost/effort estimation — A tertiary study," in TENCON 2017 - IEEE Region 10 Conference, 2017, pp. 833-838. [50] C. López-Martín, "Feedforward neural networks for predicting the duration of maintained software projects," in 15th IEEE International Conference on Machine Learning and Applications (ICMLA), 2016, pp. 528-533. [51] Y. Li, L. Shi, J. Hu, Q. Wang, J. Zhai, "An empirical study to revisit productivity across different programming languages," in 24th AsiaPacific Software Engineering Conference (APSEC), 2017, pp. 526-533. [52] M. Fernández-Diego, F. González-Ladrón-de-Guevara, "Application of mutual information-based sequential feature selection to ISBSG mixed data, " Software Qual J., vol.26, pp. 1299–1325, 2018. [53] P. Silhavy, R. Silhavy, Z. Prokopová, "Categorical variable segmentation model for software development effort estimation," IEEE Access, vol. 7, pp. 9618-9626, 2019. [54] K. Usharani, V. V. Ananth, D. Velmurugan, "A survey on software effort estimation," in International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), 2016, pp. 505-509. [55] J. Liu, Q. Du, J. Xu, "A learning-based adjustment model with genetic algorithm of function point estimation," in IEEE 20th International Conference on High-Performance Computing and Communications, 2018, pp. 51-58. [56] P. Pospieszny, B. Czarnacka-Chrobot, A. Kobylinski, "An effective approach for software project effort and duration estimation with machine learning algorithms," Journal of Systems and Software, vol. 137, pp.184-196, 2018. [57] J. I. Saavedra Martínez, F. Valdés Souto, M. Rodríguez Monje, "Analysis of automated estimation models using machine learning," in 8th International Conference in Software Engineering Research and Innovation (CONISOFT), 2020, pp. 110-116. [58] L. Song, L. L. Minku, X. Yao, "Software effort interval prediction via Bayesian inference and synthetic bootstrap resampling," ACM Trans. Software Engineering Methodology, vol. 28, no. 1, pp. 1-46, 2019. [59] IFPUG, Function Point Counting Practices Manual, Release 4.3.1. Westerville, Ohio, USA: International Function Point Users Group, 2010. [60] M. Azzeh, A. B. Nassif, "Analyzing the relationship between project productivity and environment factors in the use case points method," J. Softw. Evolution Process, vol. 29, no. 9, p. e1882, Sep 2017. [61] M. Azzeh, A. B. Nassif, S. Banitaan, "Comparative analysis of soft computing techniques for predicting software effort-based use case points," IET Softw., vol. 12, no. 1, pp. 19–29, Feb. 2018. [62] J. B. MacQueen, "Some methods for classification and analysis of multivariate observations," Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297, 1967. [63] S. S. Khan, A. Ahmad, "Cluster center initialization algorithm for kmeans clustering," Pattern Recognition Letters, vol. 25, no. 11, pp. 1293-1302, 2004. [64] P. Bholowalia, A. Kumar, "EBk-means: a clustering technique based on elbow method and k-means in WSN," Int. J. Comput. Appl., vol. 105, no. 9, pp. 17-24, 2014. [65] Y. Q. Xie, R. M. Fang, "A k-means clustering algorithm for automatically obtaining k value," in 3rd International Conference on Electrical Control and Automation Engineering (ECAE 2018), 2018, pp. 135-139. [66] Yellowbrick, https://www.scikit-yb.org, accessed April 2022. [67] J. M. Lichtenberg, O. Şimşek, "Simple regression models," Proceedings of Machine Learning Research, vol. 58, pp. 13-25, 2016. [68] M. Hammad, A. Alqaddoumi, "Features-level software effort estimation using machine learning algorithms," in International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT), 2018, pp. 1-3. [69] International Software Benchmarking Standards Group, ISBSG Repository August 2020 R1 Field description. [70] A. B. Nassif, L. F. Capretz, D. Ho, "Estimating software effort based on use case point model using Sugeno fuzzy inference system," in 23rd IEEE International Conference on Tools with Artificial Intelligence, 2011. [71] L. C. Briand, K. E. Emam, D. Surmann, I. Wieczorek, K.D. Maxwell, "An assessment and comparison of common software cost estimation modeling techniques," ICSE, 1999. [72] M. Shepper, S. MacDonell, "Evaluating prediction systems in software project estimation," Information and Software Technology, vol. 54, no. 8, pp. 820-827, 2012. [73] M. Azzeh, A.B. Nassif, S. Banitaan, F. Almasalha, “Pareto efficient multi-objective optimization for local tuning of analog-based estimations,” Neural Computing and Applications, vol. 27, no. 8, pp. 2241-2265, 2016. [74] H. L. T. K. Nhung, V. Van Hai, R. Silhavy, Z. Prokopova, P. Silhavy, "Parametric software effort estimation based on optimizing correction factors and multiple linear regression," IEEE Access, vol. 10, pp. 2963-2986, 2022. [75] M. Azzeh, A. B. Nassif, I. B. Attili, “Predicting software effort from use case points: a systematic review,” Science of Computer Programming, 2021. [76] A. de Myttenaere, B. Golden, B. Le Grand, F. Rossi, "Mean absolute percentage error for regression models," Neurocomputing, vol. 192, pp. 38–48, Jun 2016. [77] T. Chai, R. R. Draxler, "Root mean square error (RMSE) or mean absolute error (MAE)? - Arguments against avoiding RMSE in the literature, " Geoscientific Model Development, vol. 7, pp. 1247-1250, 2014. [78] K. Todros, J. Tabrikian, "On order relations between lower bounds on the MSE of unbiased estimators," IEEE International Symposium on Information Theory, pp. 1663-1667, 2010.
utb.fulltext.sponsorship This study was supported by the Faculty of Applied Informatics, Tomas Bata University in Zlin, under projects IGA/CebiaTech/2022/001 and RVO/FAI/2021/002.
utb.wos.affiliation [Vo Van Hai; Ho Le Thi Kim Nhung; Prokopova, Zdenka; Silhavy, Radek; Silhavy, Petr] Tomas Bata Univ Zlin, Fac Appl Informat, Zlin 75501, Czech Republic
utb.scopus.affiliation Faculty of Applied Informatics, Tomas Bata University in Zlin, Zlin, Czech Republic
utb.fulltext.projects IGA/CebiaTech/2022/001
utb.fulltext.projects RVO/FAI/2021/002
utb.fulltext.faculty Faculty of Applied Informatics
utb.fulltext.ou -
utb.identifier.jel -
Find Full text

Soubory tohoto záznamu

Zobrazit minimální záznam

Attribution 4.0 International Kromě případů, kde je uvedeno jinak, licence tohoto záznamu je Attribution 4.0 International