1 Class Introduction

Disclaimer: The appearance of U.S. Department of Defense (DoD) visual information does not imply or constitute DoD endorsement. The views expressed in this presentation are those only of the author and do not represent the official position of the U.S. Army, DoD, or the federal government.

1.1 Topics & Class Structure

  1. Overview of modeling
  2. Tidymodels (R)
  3. scikit-learn (Python)

1.2 Software Prerequisites

  1. R 3.6.x or newer
  2. RStudio 1.2.x or newer
  3. Python 3.6 or newer
  4. scikit-learn 1.0.0 or newer

1.3 Human Prerequisites

We assume you have:

  1. A working knowledge of R and RStudio and/or Python;
  2. Some experience with contemporary ‘tidy’ coding concepts;
  3. An understanding of modeling principals.

Let’s take an informal poll to see everyone’s experience / comfort level with these topics.

Do your best to follow along. We are happy to answer questions. This presentation is available at https://rwward.github.io/etf2021-r-py-modeling/.

1.4 Tutorial Challenges

  1. We recognize everyone has different statistical and coding backgrounds.
  2. Don’t be afraid to ask questions.
  3. If you miss something we said, it is likely others have too - you’ll be helping them by speaking up.
  4. It’s difficult to know how we should pace the class, so please communicate!

1.5 End State

  1. Students generally understand the modeling process in R and Python;
  2. Students have access to resources to learn more.

1.6 Instructors Introduction

1.6.1 MAJ Dusty Turner

Army

  • Combat Engineer
  • Platoon Leader / Executive Officer / Company Commander
  • Geospatial / Sapper / Route Clearance
  • Hawaii / White Sands Missile Range / Iraq / Afghanistan

Education

  • West Point ’07
    • Operations Research, BS
  • Missouri University of Science and Technology ’12
    • Engineering Management, MS
  • THE Ohio State ’16
    • Integrated Systems Engineering, MS
    • Applied Statistics, Graduate Minor

Data Science

1.6.2 Robert Ward

Education

  • University of Chicago, ’13
    • Political Science & English, BA
  • Columbia University School of International and Public Affairs, ’18
    • Master of International Affairs, Specialization in Advanced Policy and Economic Analysis

Data Science

  • R user since 2011; also know some python and forgot some Stata
  • Worked for Government Accountability Office Applied Research & Methods
  • Operations Research Systems Analyst at the Center for Army Analysis (CAA) and Army Leader Dashboard/Vantage PM team

1.7 Let’s Get Started…

1.7.1 Prerequisite Packages

install.packages(c("tidyverse", "tidymodels", "reticulate", "glmnet", "randomForest"), dependencies = TRUE)
pip install scikit-learn pandas matplotlib