Modeling in R and Python
8 DEC 2021
1 Class Introduction
Disclaimer: The appearance of U.S. Department of Defense (DoD) visual information does not imply or constitute DoD endorsement. The views expressed in this presentation are those only of the author and do not represent the official position of the U.S. Army, DoD, or the federal government.
1.1 Topics & Class Structure
- Overview of modeling
- Tidymodels (R)
- scikit-learn (Python)
1.2 Software Prerequisites
- R 3.6.x or newer
- RStudio 1.2.x or newer
- Python 3.6 or newer
- scikit-learn 1.0.0 or newer
1.3 Human Prerequisites
We assume you have:
- A working knowledge of R and RStudio and/or Python;
- Some experience with contemporary ‘tidy’ coding concepts;
- An understanding of modeling principals.
Let’s take an informal poll to see everyone’s experience / comfort level with these topics.
Do your best to follow along. We are happy to answer questions. This presentation is available at https://rwward.github.io/etf2021-r-py-modeling/.
1.4 Tutorial Challenges
- We recognize everyone has different statistical and coding backgrounds.
- Don’t be afraid to ask questions.
- If you miss something we said, it is likely others have too - you’ll be helping them by speaking up.
- It’s difficult to know how we should pace the class, so please communicate!
1.5 End State
- Students generally understand the modeling process in R and Python;
- Students have access to resources to learn more.
1.6 Instructors Introduction
1.6.1 MAJ Dusty Turner
Army
- Combat Engineer
- Platoon Leader / Executive Officer / Company Commander
- Geospatial / Sapper / Route Clearance
- Hawaii / White Sands Missile Range / Iraq / Afghanistan
Education
- West Point ’07
- Operations Research, BS
- Missouri University of Science and Technology ’12
- Engineering Management, MS
- THE Ohio State ’16
- Integrated Systems Engineering, MS
- Applied Statistics, Graduate Minor
Data Science
- R User Since ’14
- Catch me on Twitter
@dtdusty
- http://dustysturner.com/
1.6.2 Robert Ward
Education
- University of Chicago, ’13
- Political Science & English, BA
- Columbia University School of International and Public Affairs, ’18
- Master of International Affairs, Specialization in Advanced Policy and Economic Analysis
Data Science
- R user since 2011; also know some python and forgot some Stata
- Worked for Government Accountability Office Applied Research & Methods
- Operations Research Systems Analyst at the Center for Army Analysis (CAA) and Army Leader Dashboard/Vantage PM team
1.7 Let’s Get Started…
1.7.1 Prerequisite Packages
install.packages(c("tidyverse", "tidymodels", "reticulate", "glmnet", "randomForest"), dependencies = TRUE)
-learn pandas matplotlib pip install scikit
1.7.2 Follow Along!
Book:
https://rwward.github.io/etf2021-r-py-modeling/
GitHub repo for data and code:
https://github.com/rwward/etf2021-r-py-modeling