کاربرد رویکرد یادگیری ماشین و الگوریتم‌های زیر مجموعۀ آن در برآورد ارزش‌های اصلاحی ژنومی

غفوری, فرزاد; علی پور, سمیه; محمدیان جشوقانی, صادق

doi:10.22059/domesticsj.2020.310252.1050

کاربرد رویکرد یادگیری ماشین و الگوریتم‌های زیر مجموعۀ آن در برآورد ارزش‌های اصلاحی ژنومی

نوع مقاله : مقاله علمی- ترویجی

نویسندگان

¹ دانشجوی دکتری تخصصی ژنتیک و اصلاح نژاد دام و طیور، گروه علوم دامی، پردیس کشاورزی و منابع طبیعی دانشگاه تهران ، کرج، ایران

² گروه علوم دامی، دانشکده کشاورزی، دانشگاه تربیت مدرس، تهران، ایران

³ کارشناسی ارشد ژنتیک و اصلاح‌نژاد دام، گروه علوم دامی، پردیس کشاورزی و منابع طبیعی دانشگاه تهران، کرج، ایران

10.22059/domesticsj.2020.310252.1050

چکیده

هدف از انتخاب ژنومی استفاده همزمان از داده‌های ژنوتیپی به همراه داده‌های فنوتیپی است تا بتوان در مدت زمان کوتاه، دام‌ها را ارزیابی نموده و دام‌های برتر از نظر ژنتیکی را گزارش نمود. توسعه الگوریتم‌های داده کاوی مرتبط با اَبر داده‌ها در عصر دیجیتال در برآورد ارزش‌های اصلاحی نقش قابل توجهی در اصلاح‌نژاد دام و طیور ایفا می‌کند. اخیراً روش‌های یادگیری ماشین و الگوریتم‌های زیرمجموعۀ آن مانند یادگیری عمیق، جنگل تصادفی، ماشین بردار پشتیبان و بوستینگ که جزء روش‌های غیرپارامتریک هستند، به مباحث انتخاب ژنومی وارد شده‌اند. یکی از مزایای روش‌های یادگیری ماشین، پتانسیل و کارآیی بسیار بالای آن‌ها به خصوص برای داده‌های با حجم بالا یا به اصلاح اَبر داده‌ها و برآورد اثرات غیرافزایشی مانند غالبیت و اپیستازی و همچنین بررسی روابط پیچیده بین متغیرها (مانند اثرات متقابل بین نشانگرها) است. ایده اصلی در این الگوریتم‌ها استفاده از داده‌های آموزشی (در این جا اطلاعات ژنوتیپی و فنوتیپی حیوانات جمعیت مرجع) است تا الگوریتم براساس اطلاعات ژنوتیپی افراد جمعیت کاندید، ارزش‌های اصلاحی ژنومی آن‌ها را پیش‌بینی نماید. برخی از این روش‌ها به طور موفقیت‌آمیزی در ارزیابی‌های ژنومی مورد استفاده قرار گرفته‌اند و نتایج قابل قبولی را با حداقل خطای ممکن ارائه داده‌اند. در واقع هدف از این مطالعه بیانِ تعریفی از رویکرد یادگیری ماشین و الگوریتم‌های زیرمجموعۀ آن و نیز نقش آن‌ها در پیش‌بینی معماری ژنتیکی صفات با وراثت‌پذیری پیچیده است. در نتیجه، احتمالاً استفاده از رویکرد یادگیری ماشین با هدف یافتن کارآمدترین الگوریتم، همزمان با افزایش حجم داده‌های فنوتیپی و ژنومی تأثیر قابل توجهی در آینده اصلاح‌نژاد دام و طیور، به ویژه پیشرفت ژنتیکی دام‌ها به دنبال خواهد داشت.

کلیدواژه‌ها

عنوان مقاله [English]

Application of machine learning approach and its subset algorithms in estimating genomic breeding values

نویسندگان [English]

Farzad Ghafouri ¹
Somayeh Alipour ²
Sadegh Mohamadian Jeshvaghani ³

¹ Ph.D. Student of Animal and Poultry Breeding & Genetics, Department of Animal Science, College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran

² Department of Animal Science, Faculty of Agriculture, Tarbiat Modares University, Tehran, Iran.

³ M.Sc. of Animal Breeding and Genetics, Department of Animal Science, College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran

چکیده [English]

Genomic selection strives to make use of genotypic and phenotypic data, simultaneously, in order to evaluate animals genetically in a short period of time to opt superior ones. The development of data mining algorithms related to big data analysis in the digital era makes a great contribution to estimating breeding values in livestock and poultry breeding. Recently, machine learning procedures and their sub-algorithms such as Deep Learning (DL), Random Forest (RF), Support Vector Machine (SVM), and boosting, which are categorized as non-parametric animal evaluation methods, have been introduced to the realm of genomic selection. Machine learning algorithms not only provide breeders with much more potential and efficiency but also they are more adapted with big data. These algorithms enable breeders to estimate non-additive effects such as dominance and epistasis, as well as studying of complex relationships between variables (such as marker interactions). The punch line of these algorithms is to use training data (here the genotypic and phenotypic information of the animals in reference population) to predict their genomic breeding values based on the genotypic information of the candidate population. Some of these methods have been used successfully in animal genomic evaluations and they have provided acceptable results with low error. In fact, the purpose of this study is to define machine learning approaches and their sub-algorithms besides their role in predicting the genetic architecture of traits with complex heritability. As a result, it is likely that using machine learning approach to find the most efficient algorithm, along with increasing the volume of phenotypic and genomic data, will have a significant impact on the future of livestock and poultry breeding.

کلیدواژه‌ها [English]

Machine learning
Genetic algorithm
Deep learning
Random forest
Breeding value
Non-parametric methods

مراجع

Abdollahi-Arpanahi, R., Gianola, D., and Peñagaricano, F. (2020). “Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes.” Genetics Selection Evolution, 52(1): 1-15.

Badke, Y.M., Bates, R.O., Ernst, C.W., Fix, J. and Steibel, J.P. (2014). “Accuracy of estimation of genomic breeding values in pigs using low-density genotypes and imputation.” Genes Genomes Genetics, 4(4): 623-631.

Bellot, P., de Los Campos, G., and Pérez-Enciso, M. (2018). “Can deep learning improve genomic prediction of complex human traits?” Genetics, 210(3): 809-819.

Bishop, C.M. (2006). “Pattern recognition and machine learning.” Springer, Vol. 1, New York.

Boser, B.E., Guyon, I.M. and Vapnik, V.N. (1992). “A training algorithm for optimal margin classifiers. Proceedings of the fifth annual workshop on computational learning theory.” Association for Computing Machinery, 144-152.

Boser, B., Guyon, I. and Vapnik, V. (1992). “A training algorithm for optimal margin classiﬁers.” In ‘Proceedings of the ﬁfth annual workshop on computational learning theory. Pittsburgh (USA). 27–29.

Breiman, L. (2001). “Random forests.” Machine Learning, 45: 5-32.

Buch, L.H., Kargo, M., Berg, P., Lassen, J., and Sørensen, A.C. (2012). “The value of cows in reference populations for genomic selection of new functional traits.” Animal, 6(6): 880-886.

Burocziova, M. and Riha, J. (2009). “Horse breed discrimination using machine learning methods.” Journal of Applied Genetics, 50(4): 375-77.

Freund, Y. and Schapire, R.E. (1997). “A decision-theoretic generalization of on-line learning and an application to boosting.” Journal of Computer and System Sciences, 55(1): 119–139.

Ghafouri, F., Mehrabani Yeganeh, H., Mohamadian Jeshvaghani, S. (2020). “Big data and the role of high-throughput technologies in livestock and poultry breeding.” Professional Journal of Domestic, 20(1): 34-40.

Ghafouri-Kesbi, F., Rahimi-Mianji, G., Honarvar, M. and Nejati-Javaremi, A. (2016) Tuning and application of random forest algorithm in genomic evaluation. Research on Animal Production, 7 (13): 178-185 (In Persian).

Ghafouri-Kesbi, F., Rahimi-Mianji, G., Honarvar, M. and Nejati-Javaremi, A. (2017). “Predictive ability of Random Forests, Boosting, Support Vector Machines and Genomic Best Linear Unbiased Prediction in different scenarios of genomic evaluation.” Journal of Animal Production Science, 57(2): 229-36.

Goldstein, B.A., Hubbard, A.E., Cutler, A. and Barcellos, L.F. (2010). “An application of Random Forests to a genome-wide association dataset: methodological considerations & new findings.” Journal of BMC Genetics, 11(1): 49.

González-Recio, O. and Forni, S. (2011). “Genome-wide prediction of discrete traits using Bayesian regressions and machine learning.” Journal of Genetics Selection Evolution, 43(1): 7.

González-Recio, O., Jiménez-Montero, J.A. and Alenda, R. (2013). “The gradient boosting algorithm and random boosting for genome-assisted evaluation in large data sets.” Journal of Dairy Science, 96: 614–624.

Gorgani Firozjah, N., Atashi, H., Dadpasand, M. and Zamiri, M. (2014). “Effect of marker density and trait heritability on the accuracy of genomic prediction over three generations.” Journal of Livestock Science and Technologies, 2(2): 53-58.

Heslot, N., Yang, H.P., Sorrells, M.E. and Jannink, J.L. (2012). “Genomic selection in plant breeding: a comparison of models.” Crop Science, 52: 146-160.

Hofer, A. (1998). “Variance component estimation in animal breeding: a review.” Journal of Animal Breeding and Genetics, 115(1‐6), 247-265.

Hoh, J., Wille, A., Zee, R., Cheng S., Reynolds R., and et al. (2000). “Selecting SNPs in two‐stage analysis of disease association data: a model‐free approach.” Ann Hum Genet, 64: 413– 417.

Li, B., Zhang, N., Wang, Y.G., George, A.W., Reverter, A. and et al. (2018a). “Genomic prediction of breeding values using a subset of SNPs identified by three machine learning methods.” Frontiers in Genetics, 9: 237–256.

Li, Y., Raidan, F.S.S., Li, B., Vitezica, Z.G. and Reverter, A. (2018b). “Using Random Forests as a prescreening tool for genomic prediction: impact of subsets of SNPs on prediction accuracy of total genetic values.” Proceedings of the 11th World Congress on Genetics Applied to Livestock Production (WCGALP). 248.

Long, N., Gianola, D., Rosa, G.J.M., Weigel, K.A. and Avendaño, S. (2007). “Machine learning classification procedure for selecting SNPs in genomic selection: application to early mortality in broilers.” Journal of Animal Breeding and Genetics, 124: 377–389.

Meuwissen, T.H., Hayes, B.J. and Goddard, M.E. (2001). “Prediction of total genetic value using genome-wide dense marker maps.” Genetics, 157:1819–1829.

Mitchell, T.M. (1997). “Machine learning.” Boston, McGraw-Hill.

Naderi, S., Yin, T. and König, S. (2016). “Random forest estimation of genomic breeding values for disease susceptibility over different disease incidences and genomic architectures in simulated cow calibration groups.” Journal of Dairy Science, 99(9): 7261-7273.

Naderi, Y. (2018). “Evaluation of genomic prediction accuracy in different genomic architectures of quantitative and threshold traits with the imputation of simulated genomic data using random forest method.” Research on Animal Production, 9(20): 129-138 (In Persian).

Nayeri, S., Sargolzaei, M. and Tulpan, D. (2019). “A review of traditional and machine learning methods applied to animal breeding.” Animal Health Research Reviews, 20: 31-46.

Nejati-Javaremi, A., Smith, C. and Gibson, J. (1997). “Effect of total allelic relationship on accuracy of evaluation and response to selection.” Journal of Animal Science, 75: 1738-1745.

Nilsson, N.J. (1998). “Introduction to Machine Learning.” Stanford University. Stanford, USA. 412.

Ogutu, J.O., Piepho, H.P. and SchulzStreeck, T. (2011). “A comparison of random forests, boosting and support vector machines for genomic selection.” BMC proceedings. BioMed Central, 5(3): 11.

Pérez-Enciso, M., and Zingaretti, L.M. (2019). “A guide on deep learning for complex trait genomic prediction.” Genes, 10(7), 553.

Pimentel, E.C., Wensch-Dorendorf, M., König, S., and Swalve, H.H. (2013). “Enlarging a training set for genomic selection by imputation of un-genotyped animals in populations of varying genetic architecture.” Genetics Selection Evolution, 45(1): 12.

Yang, P., Hwa Yang, Y., Zhou, B.B. and Zomaya, Y A. (2010). “A review of ensemble methods in bioinformatics.” Current Bioinformatics, 5(4): 296-308.

Yin, T., Pimentel, E. Borstel, U.K.V. and Konig, S. (2014). “Strategy for the simulation and analysis of longitudinal phenotypic and genomic data in the context of a temperature× humidity-dependent covariate.” Journal of Dairy Science, 97(4): 2444-2454.

دوره 20، شماره 2 - شماره پیاپی 17
آذر 1399
صفحه 19-29

تعداد مشاهده مقاله: 1,455
تعداد دریافت فایل اصل مقاله: 819

کاربرد رویکرد یادگیری ماشین و الگوریتم‌های زیر مجموعۀ آن در برآورد ارزش‌های اصلاحی ژنومی

Application of machine learning approach and its subset algorithms in estimating genomic breeding values

مراجع

دوره 20، شماره 2 - شماره پیاپی 17
آذر 1399
صفحه 19-29

فایل ها

هم رسانی

ارجاع به این مقاله

آمار

کاربرد رویکرد یادگیری ماشین و الگوریتم‌های زیر مجموعۀ آن در برآورد ارزش‌های اصلاحی ژنومی

Application of machine learning approach and its subset algorithms in estimating genomic breeding values

مراجع

دوره 20، شماره 2 - شماره پیاپی 17آذر 1399صفحه 19-29

فایل ها

هم رسانی

ارجاع به این مقاله

آمار

دوره 20، شماره 2 - شماره پیاپی 17
آذر 1399
صفحه 19-29