Publication Library

Publication Library

Causally Colored Reflections on Leo Breimans Statistical Modeling The Two Cultures

Description: This note provides a re-assessment of Breiman’s contributions to the art of statistical modeling, in light of recent advances in machine learning and causal inference. It highlights the crisp separation between the data-fitting and data-interpretation components of statistical modeling.

Created At: 14 December 2024

Updated At: 14 December 2024

Statistical Modeling The Two Cultures

Description: There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated bya given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown. The statistical communityhas been committed to the almost exclusive use of data models. This commitment has led to irrelevant theory, questionable conclusions, and has kept statisticians from working on a large range of interesting current problems. Algorithmic modeling, both in theory and practice, has developed rapidly in fields outside statistics. It can be used both on large complex data sets and as a more accurate and informative alternative to data modeling on smaller data sets. If our goal as a field is to use data to solve problems, then we need to move away from exclusive dependence on data models and adopt a more diverse set of tools.

Created At: 14 December 2024

Updated At: 14 December 2024

Artificial Intelligence Index Report 2024

Description: Artificial Intelligence Index Report 2024

Created At: 14 December 2024

Updated At: 14 December 2024

Model Selection Using Database Characteristics Developing a Classification Tree for Longitudinal Incidence Data

Description: When managers and researchers encounter a dataset, they typically ask two key questions: (1) which model (from a candidate set) should I use? and (2) if I use a particular model, when is it going to likely work well for my business goal? This research addresses those two questions, and provides a rule, i.e., a decision tree, for data analysts to portend the “winning model” before having to fit any of them for longitudinal incidence data. We characterize datasets based on managerially relevant (and easy-to-compute) summary statistics, and we use classification techniques from machine learning to provide a decision tree that recommends when to use which model. By doing the “legwork” of obtaining this decision tree for model selection, we provide a time-saving tool to analysts. We illustrate this method for a common marketing problem (i.e., forecasting repeat purchasing incidence for a cohort of new customers) and demonstrate the method’s ability to discriminate among an integrated family of a hidden Markov model (HMM) and its constrained variants. We observe a strong ability for dataset characteristics to guide the choice of the most appropriate model, and we observe that some model features (e.g., the “back-and-forth” migration between latent states) are more important to accommodate than others (e.g., the inclusion of an “off” state with no activity). We also demonstrate the method’s broad potential by providing a general “recipe” for researchers to replicate this kind of model classification task in other managerial contexts (outside of repeat purchasing incidence data and the HMM framework).

Created At: 14 December 2024

Updated At: 14 December 2024

Connecting Workforce Analytics to Better Business Results

Description: Connecting Workforce Analytics to Better Business Results

Created At: 14 December 2024

Updated At: 14 December 2024

First 19 20 21 22 23 24 25 Last