CAPTO - A method for understanding problem domains for data science projects
CAPTO - Um método para entendimento de domínio de problema para projetos em ciência de dados
Palavras-chave:
Data science, Knowledge capture, Knowledge Discovery in DatabasesResumo
Data Science aims to infer knowledge from facts and evidence expressed from data. This occurs through a knowledge discovery process (KDD), which requires an understanding of the application domain. However, in practice, not enough time is spent on understanding this domain, and consequently, the extracted knowledge may not be correct or not relevant. Considering that understanding the problem is an essential step in the KDD process, this work proposes the CAPTO method for understanding domains, based on knowledge management models, and together with the available/acquired tacit and explicit knowledge, proposes a strategy for construction of conceptual models to represent the problem domain. This model will contain the main dimensions (perspectives), aspects and attributes that may be relevant to start a data science project. As a case study, it will be applied in the Type 2 Diabetes domain. Results show the effectiveness of the method. The conceptual model, obtained through the CAPTO method, can be used as an initial step for the conceptual selection of attributes.
Downloads
Referências
ALI, O. Genetics of type 2 diabetes. World J Diabetes, 4(4), p. 114-23, 2013. doi: 10.4239/wjd.v4.i4.114.
ARAÚJO, A. S.; SILVA, A. R.; ZÁRATE, L .E. Extreme precipitation prediction based on neural network model – A case study for southeastern Brazil, Journal of Hydrology, V. 606, 127454 2022. doi: 10.1016/j.jhydrol.2022.127454.
BARSAGLINI, R. A.; CANESQUI, A. M. A alimentação e a dieta alimentar no gerenciamento da condição crônica do diabetes. Saude soc. 19 (4), 2010. doi: 10.1590/S0104-12902010000400018.
BIRD, Y.; LEMSTRA, M.; ROGERS, M.; MORAROS, J.; The relationship between socioeconomic status/income and prevalence of diabetes and associated conditions: A cross-sectional population-based study in Saskatchewan, Canada. Int J Equity Health, 14:93, 2015. doi: 10.1186/s12939-015-0237-0.
CAO, L. Domain Driven Data Mining: Challenges and Prospects. In: IEEE Transactions on Knowledge and Data Engineering, v. 22, n. 6, p. 755-769, 2010. Disponível em: <http://www.computer.org/csdl/trans/tk/2010/06/ttk2010060755-abs.html>
CHAN, S.; Complex adaptive systems. In: ESD. 83 Research seminar in engineering systems. Cambridge, MA, USA: MIT, p. 1-9, 2001. Disponível em: < https://web.mit.edu/esd.83/www/notebook/Complex%20Adaptive%20Systems.pdf>
CHOO, Ch. W. Information management for the intelligent organization: the art of scanning the environment. Information Today (Ed), Inc., ISBN 1573871257, 2002. 325 p.
COSTA, M. D., KRUCKEN, L. Aplicações de mapeamento do conhecimento para a competitividade empresarial. In: KM BRASIL 2004 - Gestão do Conhecimento na Política Industrial Brasileira, São Paulo, 2004. Disponível em: < https://cmapspublic3.ihmc.us/rid=1237746139625_1042789295_8469/mapas%2Bdo%2Bconhecimento%2Bcosta%2Bkrucken.pdf>
de OLIVEIRA, F. A.; NOBRE, C.; ZÁRATE, L. E. Applying Artificial Neural Networks to prediction of stock price and improvement of the directional prediction index – Case study of PETR4, Petrobras, Brazil, Expert Systems with Applications, v. 40, n. 18, p. 7596-7606, 2018. doi: 10.1016/j.eswa.2013.06.071.
FAYYAD, U.; PIATETSKY-SHAPIRO, G.; SMYTH, P. The kdd process for extracting useful knowledge from volumes of data. Communication ACM, New York, NY, USA, 39(11), p. 27-34, 1996. doi: 10.1145/240455.240464.
FRÁGUAS, R.; SOARES, S. M.; BRONSTEIN, M. D. Depressão e diabetes mellitus. Arch. Clin. Psychiatry (São Paulo), 3, s. 3, 2009. doi: 10.1590/S0101-60832009000900005
FREITAS, P. S.; MATTA, S. R.; MENDES, L. V. P.; LUIZA, V. L.; CAMPOS, M. R. Uso de serviços de saúde e de medicamentos por portadores de Hipertensão e Diabetes no Município do Rio de Janeiro, Brasil. Ciênc. saúde colet., v. 23, n. 7, 2018. doi: 10.1590/1413-81232018237.21602016
GAGLIARDINO, J. J.; ELGART, J. F.; BOURGEOIS, M.; ETCHEGOYEN, G.; FANTUZZI, G.; RÉ, M.; RICART, J. P.; GARCÍA, S.; GIAMPIERI, C.; GONZÁLEZ, L.; SUÁREZ-CRIVARO, F.; KRONSBEIN, P.; ANGELINI, J. M.; MARTÍNEZ, C.; MARTÍNEZ, J.; RICART, A.; SPINEDI, E.; Diabetes primary prevention program: New insights from data analysis of recruitment period. Diabetes Metab Res Rev., v. 34, n. 1, 2018. doi: 10.1002/dmrr.2943.
GALE, E.; GILLESPIE, K. Diabetes and gender. Diabetologia, v. 44, p. 3–15, 2001. doi: 10.1007/s001250051573
GUYON, I.; SUN-HOSOYA, L.; BOULLÉ, M.; ESCALANTE, H. J.; ESCALERA, S.; LIU, Z.; JAJETIC, D.; RAY, B.; SAEED, M.; SEBAG, M.; STATNIKOV, A.; TU, W-W; VIEGAS, E. Analysis of the AutoMl Challenge series 2015-2018. Frank Hutter; Lars Kotthoff; Joaquin Vanschoren (eds). AutoML: Methods, Systems, Challenges, Springer Verlag, In: press, The Springer Series on Challenges in Machine Learning. 2019. doi: 10.1007/978-3-030-05318-5_10
HEISIG, P. Harmonisation of knowledge management: comparing 160 KM frameworks around the globe. Journal of knowledge management, v. 13, n. 4, p. 4-31, 2009. doi: 10.1108/13673270910971798
HARRISON, T. A.; HINDORFF, L. A.; KIM, H.; WINES, R. C. M.; BOWEN, D. J.; MCGRATH, B .B.; EDWARDS, K. L. (2003) Family history of diabetes as a potential public health tool, American Journal of Preventive Medicine, v. 24, n. 2, p. 152-159. https://doi.org/10.1016/S0749-3797(02)00588-3.
HONG, T., HAN, I. Knowledge-based data mining of news information on internet using cognitive maps and neural networks. Expert Systems with Applications, v. 23, p. 1-8, 2002. doi: 10.1016/S0957-4174(02)00022-2
HU, G.; QIAO, Q.; SILVENTOINEN, K.; ERIKSSON, J. G.; JOUSILAHTI, P.; LINDSTRÖM, J.; VALLE, T. T.; NISSINEN, A.; TUOMILEHTO, J. Occupational, commuting, and leisure-time physical activity in relation to risk for Type 2 diabetes in middle-aged Finnish men and women. Diabetologia, v. 46, n. 3, p. 322-9, 2003. doi: 10.1007/s00125-003-1031-x
IBGE, 2019. Instituto Brasileiro de Geografia e Estatística. Pesquisa nacional de saúde: 2019: percepção do estado de saúde, estilos de vida, doenças crônicas e saúde bucal: Brasil e grandes regiões. IBGE, Coordenação de Trabalho e Rendimento. Rio de Janeiro: IBGE; 2020. 113p. Disponível em: <https://www.ibge.gov.br/estatisticas/sociais/saude/29540-2013-pesquisa-nacional-de-saude.html>
IMCCU, (2002). Institute of Medicine (US) Committee on the Consequences of Uninsurance. Care Without Coverage: Too Little, Too Late. Washington (DC): National Academies Press (US); 2002. Disponível em: <https://www.ncbi.nlm.nih.gov/books/NBK220639/ doi: 10.17226/10367>
JONE, M. D. The Thinker´s Toolkit”: fourteen powerful techniques for problem solving. Time Business, Random House. 1998, 384 p.
KURIAKOSE, K. K.; RAJ, B.; MURTY, S. A. V. S.; SWAMINATHAN, P. Knowledge Management Maturity Models – A Morphological Analysis. Journal of Knowledge Management Practice, v. 11, n. 3, p. 1-10, 2010.
LEONG, K. S.; WILDING, J. P.; Obesity and diabetes, Best Practice & Research Clinical Endocrinology & Metabolism, v. 13, n. 2, 1999. Doi: 10.1053/beem.1999.0017
KOYE, D. N.; MAGLIANO, D. J.; NELSON, R. G.; PAVKOV, M. E. The Global Epidemiology of Diabetes and Kidney Disease. Adv Chronic Kidney Dis, 25(2), p. 121-132, 2018. doi: 10.1053/j.ackd.2017.10.011.
LAMONTE, M. J.; BLAIR, S. N.; CHURCH, T. S. Physical activity and diabetes prevention. Journal of Applied Physiology, v. 99, p. 1205-1213, 2005. doi: 10.1152/japplphysiol.00193.2005
LAMONTE, M. J.; BLAIR, S. N.; CHURCH, T. S. (2005). Physical activity and diabetes prevention. Journal of Applied Physiology, v. 99, n. 3, p. 1205-1213, 2005. Doi: 10.1152/japplphysiol.00193.2005
LAAKSO, M.; PYÖRÄLÄ, K. (1985). Age of Onset and Type of Diabetes. Diabetes Care, v. 8, n. 2, p. 114–117, 1985.
LEE, J-W.; MOON, J.S.; KANG, D.R.; LEE, S.J.; SON, J-W.; YOUN, Y.J.; AHN, S.G.; AHN, M-S.; KIM, J-Y.; YOO, B-S.; LEE, S-H.; KIM, J.H.; JEONG, M.H.; PARK, J-S.; CHAE, S.C.; HUR, S.H.; CHO, M-C.; RHA, S.W.; CHA, K.S.; CHAE, J.K.; CHOI, D-J.; SEONG, I.W.; OH, S.K.; HWANG, J.Y.; YOON, J. Clinical Impact of Atypical Chest Pain and Diabetes Mellitus in Patients with Acute Myocardial Infarction from Prospective KAMIR-NIH Registry. Journal of Clinical Medicine, v. 9, n. 2, p. 505, 2020. Doi: 10.3390/jcm9020505
MARTINEZ, I.; VILES, E.; OLAIZOLA, I. G. Data Science Methodologies: Current Challenges and Future Approaches. Big Data Research. v. 24, 100183, 2021. doi:10.1016/j.bdr.2020.100183.
MCCANCE, D.R. Pregnancy and diabetes. Best Practice & Research Clinical Endocrinology & Metabolism, v. 25, n. 6, p. 945-58, 2011. doi: 10.1016/j.beem.2011.07.009.
MEIRELLES, W. C. L.; ZÁRATE, L. E. Data mining in the reduction of the number of places of experiments for plant cultivates, Computers and Electronics in Agriculture, v. 113, p. 136-147, 2015. DOI: 10.1016/j.compag.2015.02.006.
NONAKA, I. The knowledge creating company. Harvard Business Review, 69, (Nov-Dec), p. 96-104, 1991.
NONAKA, I.; TAKEUCHI, H. The Knowledge-Creating Company: How Japanese Companies Create the Dynamics of Innovation. Oxford University Press, New York. 1995. Disponível em: <https://id.lib.harvard.edu/alma/990052925990203941/catalog>
PASTOR, A.; CONN, J.; MACISAAC, R. J.; BONOMO, Y. Alcohol and illicit drug use in people with diabetes. Lancet Diabetes & Endocrinology, v. 8, n. 3, p. 239-248, 2020. Doi: 10.1016/S2213-8587(19)30410-3
POLANYI, M. The tacit dimension. London: Routledge and Kegan Paul, 1967.
POLSKY, S.; AKTURK, H. K. (2017). Alcohol Consumption, Diabetes Risk, and Cardiovascular Disease Within Diabetes. Curr Diab Rep. v. 17, n. 12, p. 136, 2017. Doi: 10.1007/s11892-017-0950-8
RIBEIRO, C. E.; ZÁRATE, L. E. Classifying longevity profiles through longitudinal data mining, Expert Systems with Applications, v. 117, p. 75-89, 2019. DOI: 10.1016/j.eswa.2018.09.035
SILVA, P. R.; DIAS, S. M.; BRANDÃO, W. C.; SONG, M. A.; ZÁRATE, L. E. Professional Competence Identification Through Formal Concept Analysis. In Proceedings of the 19th International Conference on Enterprise Information Systems (ICEIS 2017), v. 1, p. 123-134, 2018. DOI: 10.5220/0006333401230134
VON KROGH, G.; ROOS, J. Organizational Epistemology. New York, NY: St. Martin's Press. 1995. DOI: 10.1007/978-1-349-24034-0.
WIIG, K. M. Knowledge Management Foundations: Thinking about Thinking: How People and Organizations Create, Represent and Use Knowledge, 1993, 471 p.
ZHANG, X.; BULLARD, K. M.; GREGG, E. W.; BECKLES, G. L.; WILLIAMS, D. E.; BARKER, L.E.; ALBRIGHT, A.L. Imperatore G. Access to health care and control of ABCs of diabetes. Diabetes Care. v. 35, n.7, p. 1566-71, 2012. doi: 10.2337/dc12-0081.