Comparing multiple linear regression, deep learning and multiple perceptron for functional points estimation

Huynh Thai, Hoc; Šilhavý, Radek; Prokopová, Zdenka; Šilhavý, Petr

dc.title	Comparing multiple linear regression, deep learning and multiple perceptron for functional points estimation	en
dc.contributor.author	Huynh Thai, Hoc
dc.contributor.author	Šilhavý, Radek
dc.contributor.author	Prokopová, Zdenka
dc.contributor.author	Šilhavý, Petr
dc.relation.ispartof	IEEE Access
dc.identifier.issn	2169-3536 Scopus Sources, Sherpa/RoMEO, JCR
dc.date.issued	2022
utb.relation.volume	10
dc.citation.spage	112187
dc.citation.epage	112198
dc.type	article
dc.language.iso	en
dc.publisher	Institute of Electrical and Electronics Engineers Inc.
dc.identifier.doi	10.1109/ACCESS.2022.3215987
dc.relation.uri	https://ieeexplore.ieee.org/document/9925239
dc.relation.uri	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9925239
dc.subject	function point analysis	en
dc.subject	industry sector	en
dc.subject	multiple linear regression	en
dc.subject	multiple perceptron neural network	en
dc.subject	one-hot encoding	en
dc.subject	relative size	en
dc.subject	software effort estimation	en
dc.subject	software work effort	en
dc.description.abstract	This study compares the performance of Pytorch-based Deep Learning, Multiple Perceptron Neural Networks with Multiple Linear Regression in terms of software effort estimations based on function point analysis. This study investigates Adjusted Function Points, Function Point Categories, Industry Sector, and Relative Size. The ISBSG dataset (version 2020/R1) is used as the historical dataset. The effort estimation performance is compared among multiple models by evaluating a prediction level of 0.30 and standardized accuracy. According to the findings, the Multiple Perceptron Neural Network based on Adjusted Function Points combined with Industry Sector predictors yielded 53% and 61% in terms of standardized accuracy and a prediction level of 0.30, respectively. The findings of Pytorch-based Deep Learning are similar to Multiple Perceptron Neural Networks, with even better results than that, with standardized accuracy and a prediction level of 0.30, 72% and 72%, respectively. The results reveal that both the Pytorch-based Deep Learning and Multiple Perceptron model outperformed Multiple Linear Regression and baseline models using the experimental dataset. Furthermore, in the studied dataset, Adjusted Function Points may not contribute to higher accuracy than Function Point Categories.	en
utb.faculty	Faculty of Applied Informatics
dc.identifier.uri	http://hdl.handle.net/10563/1011200
utb.identifier.obdid	43884050
utb.identifier.scopus	2-s2.0-85140787575
utb.identifier.wok	000875651600001
utb.source	j-scopus
dc.date.accessioned	2022-11-29T07:49:18Z
dc.date.available	2022-11-29T07:49:18Z
dc.description.sponsorship	Faculty of Applied Informatics, Tomas Bata University in Zlin [IGA/CebiaTech/2022/001, RVO/FAI/2021/002]
dc.rights	Attribution 4.0 International
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.rights.access	openAccess
utb.contributor.internalauthor	Huynh Thai, Hoc
utb.contributor.internalauthor	Šilhavý, Radek
utb.contributor.internalauthor	Prokopová, Zdenka
utb.contributor.internalauthor	Šilhavý, Petr
utb.fulltext.affiliation	HUYNH THAI HOC , RADEK SILHAVY , ZDENKA PROKOPOVA , AND PETR SILHAVY Faculty of Applied Informatics, Tomas Bata University in Zlín, 76001 Zlín, Czech Republic Corresponding author: Petr Silhavy (psilhavy@utb.cz) HUYNH THAI HOC was born in Tra Vinh, Vietnam, in 1980. He received the B.S. degree in mathematics and computer science from the University of Science (HCMUS), Vietnam, in 2002, and the M.S. degree in geographic information system from the University of Technology (HCMUT), Vietnam, in 2007. He is currently pursuing the Ph.D. degree in software engineering with Tomas Bata University in Zlín, Czech Republic. He worked as a GIS Developer at the DITAGIS, HCMUT, from 2002 to 2007. From 2007 to 2014, he was a Lecturer with the Faculty of Information Technology, University of Natural Resources and Environment (HCMUNRE). From 2011 to 2018, he was a Lecturer with the Faculty of Information Technology, Industrial University of Ho Chi Minh City (IUH), Vietnam. From 2018 to 2019, he was a Lecturer with the Faculty of Information Technology, School of Engineering and Technology, Van Lang University, Ho Chi Minh City, Vietnam. His research interests include software effort estimation and data science. RADEK SILHAVY was born in Vsetin, in 1980. He received the B.Sc., M.Sc., and Ph.D. degrees in engineering informatics from the Faculty of Applied Informatics, Tomas Bata University in Zlín, in 2004, 2006, and 2009, respectively. He is currently an Associate Professor and a Researcher with the Department of Computer and Communication Systems. His major research interests include effort estimation in software engineering and empirical methods in software and system engineering. ZDENKA PROKOPOVA was born in Rimavska Sobota, Slovakia, in 1965. She received the master’s degree in automatic control theory and the Ph.D. degree in technical cybernetics from Slovak Technical University, in 1988 and 1993, respectively. She worked as an Assistant at Slovak Technical University, from 1988 to 1993. From 1993 to 1995, she worked as a programmer of database systems in the data-lock business firm. From 1995 to 2000, she worked as a Lecturer at the Brno University of Technology. Since 2001, she has been at the Faculty of Applied Informatics, Tomas Bata University in Zlín. She currently holds the position of an Associate Professor with the Department of Computer and Communication Systems. Her research interests include programming and applications of database systems, mathematical modeling, and computer simulation and the control of technological systems. PETR SILHAVY was born in Vsetin, in 1980. He received the B.Sc., M.Sc., and Ph.D. degrees in engineering informatics from the Faculty of Applied Informatics, Tomas Bata University in Zlín, in 2004, 2006, and 2009, respectively. From 1999 to 2018, he was appointed as a CTO in a company specialized on database systems development. He currently holds the position of an Associate Professor with Tomas Bata University in Zlín. His major research interests include software engineering, empirical software engineering, system engineering, data mining, and database systems.
utb.fulltext.dates	Received 3 October 2022 accepted 16 October 2022 date of publication 19 October 2022 date of current version 28 October 2022
utb.fulltext.references	[1] A. Trendowicz and R. Jeffery, ‘‘Software project effort estimation,’’ in Foundations and Best Practice Guidelines for Success, Constructive Cost Model–(COCOMO), vol. 12. Springer, 2014, pp. 277–293. [2] Y.-S. Seo, D.-H. Bae, and R. Jeffery, ‘‘AREION: Software effort estimation based on multiple regressions with adaptive recursive data partitioning,’’ Inf. Softw. Technol., vol. 55, no. 10, pp. 1710–1725, Oct. 2013. [3] S. W. Munialo and G. M. Muketha, ‘‘A review ofagile software effort estimation methods,’’ Int. J. Comput. Appl. Technol. Res., 2016. [4] H. T. Hoc, V. V. Hai, and H. L. T. K. Nhung, ‘‘A review of the regression models applicable to software project effort estimation,’’ in Proc. Comput. Methods Syst. Softw., 2019, pp. 399–407. [5] C. A. Behrens, ‘‘Measuring the productivity of computer systems development activities with function points,’’ IEEE Trans. Softw. Eng., vol. SE-9, no. 6, pp. 648–652, Nov. 1983. [6] IFPUG. International Function Point Users Group. Accessed: Dec. 2021. [Online]. Available: http://www.ifpug.org/ [7] International Function Point Users Group, Function Point Counting Practices Manual, Princeton Junction, Princeton, NJ, USA, 2010. [8] P. Suresh Kumar, H. S. Behera, A. K. K. J. Nayak, and B. Naik, ‘‘Advancement from neural networks to deep learning in software effort estimation: Perspective of two decades,’’ Comput. Sci. Rev., vol. 38, Nov. 2020, Art. no. 100288. [9] P. Pospieszny, B. Czarnacka-Chrobot, and A. Kobylinski, ‘‘An effective approach for software project effort and duration estimation with machine learning algorithms,’’ J. Syst. Softw., vol. 137, pp. 184–196, Mar. 2018. [10] A. Ali and C. Gravino, ‘‘A systematic literature review of software effort prediction using machine learning methods,’’ J. Softw., Evol. Process, vol. 31, no. 10, Oct. 2019. [11] P. S. Rao, K. K. Reddi, and R. U. Rani, ‘‘Optimization of neural network for software effort estimation,’’ in Proc. Int. Conf. Algorithms, Methodol., Models Appl. Emerg. Technol. (ICAMMAET), Feb. 2017, pp. 1–7. [12] M. Azzeh and A. B. Nassif, ‘‘Project productivity evaluation in early software effort estimation,’’ J. Softw., Evol. Process, vol. 30, no. 12, p. e2110, 2018. [13] A. Sharma and N. Chaudhary, ‘‘Linear regression model for agile software development effort estimation,’’ in Proc. 5th IEEE Int. Conf. Recent Adv. Innov. Eng. (ICRAIE), Dec. 2020, pp. 1–4. [14] E. Stevens, L. Antiga, and T. Viehmann. (2020). Deep Learning With PyTorch. [Online]. Available: https://pytorch.org/assets/deeplearning/Deep-Learning-with-PyTorch.pdf [15] V. Nejkovic, M. Radenkovic, and N. Petrovic, ‘‘Ultramarathon result and injury prediction using PyTorch,’’ in Proc. 15th Int. Conf. Adv. Technol., Syst. Services Telecommun. (TELSIKS), Oct. 2021, pp. 249–252, doi: 10.1109/TELSIKS52058.2021.9606348 [16] S. Imambi, K. B. Prakash, and G. R. Kanagachidambaresan, PyTorch in Programming With TensorFlow: Solution for Edge Computing Applications. Cham, Switzerland: Springer, 2021, pp. 87–104. [17] R. Silhavy, P. Silhavy, and Z. Prokopova, ‘‘Algorithmic optimisation method for improving use case points estimation,’’ PLoS ONE, vol. 10, no. 11, Nov. 2015, Art. no. e0141887. [18] H. L. T. K. Nhung, V. Van Hai, R. Silhavy, Z. Prokopova, and P. Silhavy, ‘‘Parametric software effort estimation based on optimizing correction factors and multiple linear regression,’’ IEEE Access, vol. 10, pp. 2963–2986, 2022, doi: 10.1109/ACCESS.2021.3139183. [19] A. J. Albrecht and J. E. Gaffney, ‘‘Software function, source lines of code, and development effort prediction: A software science validation,’’ IEEE Trans. Softw. Eng., vols. SE–9, no. 6, pp. 639–648, Nov. 1983. [20] A. J. Albrecht, ‘‘Measuring application development productivity,’’ in Proc. Joint Share, Guide, IBM Appl. Develop. Symp., 1979. [21] C. Gencel and O. Demirors, ‘‘Conceptual differences among functional size measurement methods,’’ in Proc. 1st Int. Symp. Empirical Softw. Eng. Meas. (ESEM), Sep. 2007, pp. 305–313. [22] G. C. Low and D. R. Jeffery, ‘‘Function points in the estimation and evaluation of the software process,’’ IEEE Trans. Softw. Eng., vol. 16, no. 1, pp. 64–71, Jan. 1990. [23] N. Rankovic, D. Rankovic, M. Ivanovic, and L. Lazic, ‘‘A new approach to software effort estimation using different artificial neural network architectures and Taguchi orthogonal arrays,’’ IEEE Access, vol. 9, pp. 26926–26936, 2021. [24] J. E. Matson, B. E. Barrett, and J. M. Mellichamp, ‘‘Software development cost estimation using function points,’’ IEEE Trans. Softw. Eng., vol. 20, no. 4, pp. 275–287, Apr. 1994. [25] H. T. Hoc, V. van Hai, and H. L. T. K. Nhung, ‘‘An approach to adjust effort estimation of function point analysis,’’ in Proc. Comput. Sci. OnLine Conf., 2021, pp. 522–537. [26] S. K. T. Ziauddin and S. Zia, ‘‘An effort estimation model for agile software development,’’ Adv. Comput. Sci. Appl., vol. 2, no. 1, pp. 314–324, 2012. [27] ISBSG, Release R1, International Software Benchmarking Standards Group, South Melbourne VIC, USA, 2020. [28] H. T. Hoc, V. van Hai, and H. L. T. K. Nhung, ‘‘AdamOptimizer for the optimisation of use case points estimation,’’ in Proc. Comput. Methods Syst. Softw., 2020, pp. 747–756. [29] P. Silhavy, R. Silhavy, and Z. Prokopova, ‘‘Categorical variable segmentation model for software development effort estimation,’’ IEEE Access, vol. 7, pp. 9618–9626, 2019. [30] Z. Prokopova, P. Silhavy, and R. Silhavy, ‘‘Influence analysis of selected factors in the function point work effort estimation,’’ in Proc. Comput. Methods Syst. Softw., 2018, pp. 112–124. [31] Z. Prokopova, P. Šilhavý, and R. Šilhavý, ‘‘VAF factor influence on the accuracy of the effort estimation provided by modified function points methods,’’ in Proc. Ann. DAAAM, Int. DAAAM Symp. Danube Adria Association for Automation and Manufacturing (DAAAM), 2018. [32] Y. Mahmood, N. Kama, A. Azmi, A. S. Khan, and M. Ali, ‘‘Software effort estimation accuracy prediction of machine learning techniques: A systematic performance evaluation,’’ Softw., Pract. Exper., vol. 52, no. 1, pp. 39–65, Jan. 2022. [33] M. A. Ramessur and S. D. Nagowah, ‘‘A predictive model to estimate effort in a sprint using machine learning techniques,’’ Int. J. Inf. Technol., vol. 13, no. 3, pp. 1101–1110, Jun. 2021, doi: 10.1007/s41870-021-00669-z. [34] S. Shukla and S. Kumar, ‘‘Applicability of neural network based models for software effort estimation,’’ in Proc. IEEE World Congr. Services (SERVICES), Jul. 2019, pp. 339–342. [35] S. Goyal and P. K. Bhatia, ‘‘A non-linear technique for effective software effort estimation using multi-layer perceptrons,’’ in Proc. Int. Conf. Mach. Learn., Big Data, Cloud Parallel Comput. (COMITCon), Feb. 2019, pp. 1–4. [36] A. B. Nassif, M. Azzeh, L. F. Capretz, and D. Ho, ‘‘Neural network models for software development effort estimation: A comparative study,’’ Neural Comput. Appl., vol. 27, no. 8, pp. 2369–2381, Nov. 2016. [37] E. Okewu, S. Misra, and F.-S. Lius, ‘‘Parameter tuning using adaptive moment estimation in deep learning neural networks,’’ in Proc. Int. Conf. Comput. Sci. Appl., 2020, pp. 261–272. [38] G. Hutcheson, ‘‘Categorical explanatory variables,’’ J. Model. Manage., vol. 6, no. 2, Jul. 2011. [39] M. K. Dahouda and I. Joe, ‘‘A deep-learned embedding technique for categorical features encoding,’’ IEEE Access, vol. 9, pp. 114381–114391, 2021. [40] S. Gnat, ‘‘Impact of categorical variables encoding on property mass valuation,’’ Proc. Comput. Sci., vol. 192, pp. 3542–3550, Jan. 2021. [41] M. Shepperd and S. MacDonell, ‘‘Evaluating prediction systems in software project estimation,’’ Inf. Softw. Technol., vol. 54, no. 8, pp. 820–827, Aug. 2012. [42] S. D. Conte, H. E. Dunsmore, and Y. E. Shen, Software Engineering Metrics and Models. San Francisco, CA, USA: Benjamin-Cummings, 1986. [43] M. Azzeh, A. B. Nassif, and L. L. Minku, ‘‘An empirical evaluation of ensemble adjustment methods for analogy-based effort estimation,’’ J. Syst. Softw., vol. 103, pp. 36–52, May 2015. [44] A. Idri, I. Abnane, and A. Abran, ‘‘Evaluating Pred(p) and standardized accuracy criteria in software development effort estimation,’’ J. Softw., Evol. Process, vol. 30, no. 4, p. e1925, 2018. [45] A. S. Hadi and S. Chatterjee, Regression Analysis by Example. Hoboken, NJ, USA: Wiley, 2015. [46] A. K. Jain, J. Mao, and K. M. Mohiuddin, ‘‘Artificial neural networks: A tutorial,’’ Computer, vol. 29, no. 3, pp. 31–44, Mar. 1996. [47] M. M. Zahra, M. H. Essai, and A. R. Abd Ellah, ‘‘Performance functions alternatives of MSE for neural networks learning,’’ Int. J. Eng. Res. Technol., vol. 3, no. 1, pp. 967–970, 2014. [48] C. Nwankpa, W. Ijomah, A. Gachagan, and S. Marshall, ‘‘Activation functions: Comparison of trends in practice and research for deep learning,’’ 2018, arXiv:1811.03378. [49] A. B. Nassif, M. Azzeh, A. Idri, and A. Abran, ‘‘Software development effort estimation using regression fuzzy models,’’ Comput. Intell. Neurosci., vol. 2019, pp. 1–17, Feb. 2019. [50] C. Jones, Software Estimating Rules of Thumb. IEEE, 2007. [51] P. J. Rousseeuw and M. Hubert, ‘‘Robust statistics for outlier detection,’’ Wiley Interdiscipl. Rev., Data Mining Knowl. Discovery, vol. 1, no. 1, pp. 73–79, 2011. [52] S. Gopal Krishna Patro and K. Kumar Sahu, ‘‘Normalization: A preprocessing stage,’’ 2015, arXiv:1503.06462. [53] G. Guo and D. Neagu, ‘‘Similarity-based classifier combination for decision making,’’ in Proc. IEEE Int. Conf. Syst., Man Cybern., vol. 1, Oct. 2005, pp. 176–181. [54] T. Xia, R. Shu, X. Shen, and T. Menzies, ‘‘Sequential model optimization for software effort estimation,’’ IEEE Trans. Softw. Eng., vol. 48, no. 6, pp. 1994–2009, Jun. 2022. [55] L. Taylor and G. Nitschke, ‘‘Improving deep learning with generic data augmentation,’’ in Proc. IEEE Symp. Comput. Intell. (SSCI), Nov. 2018, pp. 1542–1547, doi: 10.1109/SSCI.2018.8628742. [56] M. Steininger, K. Kobs, P. Davidson, A. Krause, and A. Hotho, ‘‘Densitybased weighting for imbalanced regression,’’ Mach. Learn., vol. 110, no. 8, pp. 2187–2211, Aug. 2021, doi: 10.1007/s10994-021-06023-5. [57] G.-W. Weber, İ. Batmaz, G. Köksal, P. Taylan, and F. Yerlikaya-Özkurt, ‘‘CMARS: A new contribution to nonparametric regression with multivariate adaptive regression splines supported by continuous optimization,’’ Inverse Problems Sci. Eng., vol. 20, no. 3, pp. 371–400, Apr. 2012, doi: 10.1080/17415977.2011.624770. [58] F. Yerlikaya, ‘‘A new contribution to nonlinear robust regression and classification with MARS and its applications to data mining for quality control in manufacturing,’’ M.S. thesis, Middle East Tech. Univ., 2008. [59] F. Yerlikaya-Özkurt and İ. Batmaz, ‘‘A computational approach to nonparametric regression: Bootstrapping CMARS method,’’ Mach. Learn., vol. 101, nos. 1–3, pp. 211–230, Oct. 2015, doi: 10.1007/s10994-015-5502-3.
utb.fulltext.sponsorship	This work was supported by the Faculty of Applied Informatics, Tomas Bata University in Zlín, under Project IGA/CebiaTech/2022/001 and Project RVO/FAI/2021/002.
utb.wos.affiliation	[Huynh Thai Hoc; Silhavy, Radek; Prokopova, Zdenka; Silhavy, Petr] Tomas Bata Univ Zlin, Fac Appl Informat, Zlin 76001, Czech Republic
utb.scopus.affiliation	Faculty of Applied Informatics, Tomas Bata University in Zlin, Nad Stranemi 4511, Zlin, Czech Republic
utb.fulltext.projects	IGA/CebiaTech/2022/001
utb.fulltext.projects	RVO/FAI/2021/002
utb.fulltext.faculty	Faculty of Applied Informatics
utb.fulltext.faculty	Faculty of Applied Informatics
utb.fulltext.faculty	Faculty of Applied Informatics
utb.fulltext.faculty	Faculty of Applied Informatics
utb.fulltext.ou	-
utb.fulltext.ou	-
utb.fulltext.ou	-
utb.fulltext.ou	-