Content area
Full Text
Correspondence to Dr Michael Oliver Chaiton, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada; [email protected]
Introduction
The urgent status of eliminating tobacco-related health problems and the increasingly complex facets of tobacco research call for sophisticated analytical methods to deal with vast amounts of data and perform highly specialised tasks. This paper provides a brief introduction to machine learning (ML) and a scoping review to assess the tobacco literature for studies that self-identified as using ML for analyses.
A gentle introduction to machine learning
ML was historically described as ‘a field of study that gives computers the ability of learn without being explicitly programmed’.1 A more intuitive definition of ML is ‘a set of methods that can automatically detect patterns in data, and then use the uncovered patterns to predict future data, or to perform other kinds of decision making under uncertainty’.2 The core of ML is on the use of brute computational force to replace human guidance in data analysis; thus, ML can be viewed as a natural extension to traditional statistical approaches by having a much lower human-guided vs machine-guided ratio in the analytical pipeline.3
ML is commonly divided into three classes: supervised, unsupervised and reinforcement learning.2 A broader definition of ML also includes deep learning, which entails the use of human brain-inspired artificial neutral network to perform either supervised or unsupervised or reinforcement learning tasks.4 Each class of ML aims to solve a distinct problem and has unique features that may appeal to tobacco researchers.
Supervised learning deals with prediction. It involves the training and validation of a model to ‘predict the values of one or more outputs or response variables for a given set of input or predictor variables’.5 When the goal of is obtaining a highly accurate predictive model for future data through repeated trials of training and testing, such task is supervised learning. On the other hand, if the objective is to convey statistical inference (ie, hypothesis testing and estimations of point or CI), such task falls into the realm of statistical modelling, not supervised learning.6 Supervised learning is relevant in any tobacco research that demands highly accurate prediction, such as the development of a public health surveillance...