Application of machine learning approach and its subset algorithms in estimating genomic breeding values

Document Type : Scientific-Extensional Article

Authors

1 Ph.D. Student of Animal and Poultry Breeding & Genetics, Department of Animal Science, College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran

2 Department of Animal Science, Faculty of Agriculture, Tarbiat Modares University, Tehran, Iran.

3 M.Sc. of Animal Breeding and Genetics, Department of Animal Science, College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran

Abstract

Genomic selection strives to make use of genotypic and phenotypic data, simultaneously, in order to evaluate animals genetically in a short period of time to opt superior ones. The development of data mining algorithms related to big data analysis in the digital era makes a great contribution to estimating breeding values in livestock and poultry breeding. Recently, machine learning procedures and their sub-algorithms such as Deep Learning (DL), Random Forest (RF), Support Vector Machine (SVM), and boosting, which are categorized as non-parametric animal evaluation methods, have been introduced to the realm of genomic selection. Machine learning algorithms not only provide breeders with much more potential and efficiency but also they are more adapted with big data. These algorithms enable breeders to estimate non-additive effects such as dominance and epistasis, as well as studying of complex relationships between variables (such as marker interactions). The punch line of these algorithms is to use training data (here the genotypic and phenotypic information of the animals in reference population) to predict their genomic breeding values based on the genotypic information of the candidate population. Some of these methods have been used successfully in animal genomic evaluations and they have provided acceptable results with low error. In fact, the purpose of this study is to define machine learning approaches and their sub-algorithms besides their role in predicting the genetic architecture of traits with complex heritability. As a result, it is likely that using machine learning approach to find the most efficient algorithm, along with increasing the volume of phenotypic and genomic data, will have a significant impact on the future of livestock and poultry breeding.

Keywords


Abdollahi-Arpanahi, R., Gianola, D., and Peñagaricano, F. (2020). “Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes.” Genetics Selection Evolution, 52(1): 1-15.
Badke, Y.M., Bates, R.O., Ernst, C.W., Fix, J. and Steibel, J.P. (2014). “Accuracy of estimation of genomic breeding values in pigs using low-density genotypes and imputation.” Genes Genomes Genetics, 4(4): 623-631.
Bellot, P., de Los Campos, G., and Pérez-Enciso, M. (2018). “Can deep learning improve genomic prediction of complex human traits?” Genetics, 210(3): 809-819.
Bishop, C.M. (2006). “Pattern recognition and machine learning.” Springer, Vol. 1, New York.
Boser, B.E., Guyon, I.M. and Vapnik, V.N. (1992). “A training algorithm for optimal margin classifiers. Proceedings of the fifth annual workshop on computational learning theory.” Association for Computing Machinery, 144-152.
Boser, B., Guyon, I. and Vapnik, V. (1992). “A training algorithm for optimal margin classifiers.” In ‘Proceedings of the fifth annual workshop on computational learning theory. Pittsburgh (USA). 27–29.
Breiman, L. (2001). “Random forests.” Machine Learning, 45: 5-32.
Buch, L.H., Kargo, M., Berg, P., Lassen, J., and Sørensen, A.C. (2012). “The value of cows in reference populations for genomic selection of new functional traits.” Animal, 6(6): 880-886.
Burocziova, M. and Riha, J. (2009). “Horse breed discrimination using machine learning methods.” Journal of Applied Genetics, 50(4): 375-77.
 Freund, Y. and Schapire, R.E. (1997). “A decision-theoretic generalization of on-line learning and an application to boosting.” Journal of Computer and System Sciences, 55(1): 119–139.
Ghafouri, F., Mehrabani Yeganeh, H., Mohamadian Jeshvaghani, S. (2020). “Big data and the role of high-throughput technologies in livestock and poultry breeding.” Professional Journal of Domestic, 20(1): 34-40.                                                      
Ghafouri-Kesbi, F., Rahimi-Mianji, G., Honarvar, M. and Nejati-Javaremi, A. (2016) Tuning and application of random forest algorithm in genomic evaluation. Research on Animal Production, 7 (13): 178-185 (In Persian).
Ghafouri-Kesbi, F., Rahimi-Mianji, G., Honarvar, M. and Nejati-Javaremi, A. (2017). “Predictive ability of Random Forests, Boosting, Support Vector Machines and Genomic Best Linear Unbiased Prediction in different scenarios of genomic evaluation.” Journal of Animal Production Science, 57(2): 229-36.
Goldstein, B.A., Hubbard, A.E., Cutler, A. and Barcellos, L.F. (2010). “An application of Random Forests to a genome-wide association dataset: methodological considerations & new findings.” Journal of BMC Genetics, 11(1): 49.
González-Recio, O. and Forni, S. (2011). “Genome-wide prediction of discrete traits using Bayesian regressions and machine learning.” Journal of Genetics Selection Evolution, 43(1): 7.
González-Recio, O., Jiménez-Montero, J.A. and Alenda, R. (2013). “The gradient boosting algorithm and random boosting for genome-assisted evaluation in large data sets.” Journal of Dairy Science, 96: 614–624.
Gorgani Firozjah, N., Atashi, H., Dadpasand, M. and Zamiri, M. (2014). “Effect of marker density and trait heritability on the accuracy of genomic prediction over three generations.” Journal of Livestock Science and Technologies, 2(2): 53-58.
Heslot, N., Yang, H.P., Sorrells, M.E. and Jannink, J.L. (2012). “Genomic selection in plant breeding: a comparison of models.” Crop Science, 52: 146-160.
Hofer, A. (1998). “Variance component estimation in animal breeding: a review.” Journal of Animal Breeding and Genetics, 115(1‐6), 247-265.
Hoh, J., Wille, A., Zee, R., Cheng S., Reynolds R., and et al. (2000). “Selecting SNPs in two‐stage analysis of disease association data: a model‐free approach.” Ann Hum Genet, 64: 413– 417.
Li, B., Zhang, N., Wang, Y.G., George, A.W., Reverter, A. and et al. (2018a). “Genomic prediction of breeding values using a subset of SNPs identified by three machine learning methods.” Frontiers in Genetics, 9: 237–256.
Li, Y., Raidan, F.S.S., Li, B., Vitezica, Z.G. and Reverter, A. (2018b). “Using Random Forests as a prescreening tool for genomic prediction: impact of subsets of SNPs on prediction accuracy of total genetic values.” Proceedings of the 11th World Congress on Genetics Applied to Livestock Production (WCGALP). 248.
Long, N., Gianola, D., Rosa, G.J.M., Weigel, K.A. and Avendaño, S. (2007). “Machine learning classification procedure for selecting SNPs in genomic selection: application to early mortality in broilers.” Journal of Animal Breeding and Genetics, 124: 377–389.
Meuwissen, T.H., Hayes, B.J. and Goddard, M.E. (2001). “Prediction of total genetic value using genome-wide dense marker maps.” Genetics, 157:1819–1829.
Mitchell, T.M. (1997). “Machine learning.” Boston, McGraw-Hill.
Naderi, S., Yin, T. and König, S. (2016). “Random forest estimation of genomic breeding values for disease susceptibility over different disease incidences and genomic architectures in simulated cow calibration groups.” Journal of Dairy Science, 99(9): 7261-7273.
Naderi, Y. (2018). “Evaluation of genomic prediction accuracy in different genomic architectures of quantitative and threshold traits with the imputation of simulated genomic data using random forest method.” Research on Animal Production, 9(20): 129-138 (In Persian).
Nayeri, S., Sargolzaei, M. and Tulpan, D. (2019). “A review of traditional and machine learning methods applied to animal breeding.” Animal Health Research Reviews, 20: 31-46.
Nejati-Javaremi, A., Smith, C. and Gibson, J. (1997). “Effect of total allelic relationship on accuracy of evaluation and response to selection.” Journal of Animal Science, 75: 1738-1745.
Nilsson, N.J. (1998). “Introduction to Machine Learning.” Stanford University. Stanford, USA. 412.
Ogutu, J.O., Piepho, H.P. and SchulzStreeck, T. (2011). “A comparison of random forests, boosting and support vector machines for genomic selection.” BMC proceedings. BioMed Central, 5(3): 11.
Pérez-Enciso, M., and Zingaretti, L.M. (2019). “A guide on deep learning for complex trait genomic prediction.” Genes, 10(7), 553.
Pimentel, E.C., Wensch-Dorendorf, M., König, S., and Swalve, H.H. (2013). “Enlarging a training set for genomic selection by imputation of un-genotyped animals in populations of varying genetic architecture.” Genetics Selection Evolution, 45(1): 12.
Yang, P., Hwa Yang, Y., Zhou, B.B. and Zomaya, Y A. (2010). “A review of ensemble methods in bioinformatics.” Current Bioinformatics, 5(4): 296-308.
Yin, T., Pimentel, E. Borstel, U.K.V. and Konig, S. (2014). “Strategy for the simulation and analysis of longitudinal phenotypic and genomic data in the context of a temperature× humidity-dependent covariate.” Journal of Dairy Science, 97(4): 2444-2454.