16
Tidy Data Science with the Tidyverse and Tidymodels is licensed under a Creative Commons Attribution 4.0 International License.
Create a reproducible example (reprex)
Goal: create the simplest example possible to illustrate the problem/question, that anyone can run on their own machine
Can't use data stored on your computer (others won't have that)
Can't assume options or settings are the same across computers
Goal: create the simplest example possible to illustrate the problem/question, that anyone can run on their own machine
Can't use data stored on your computer (others won't have that)
Can't assume options or settings are the same across computers
reprex to the rescue!
Question: How do I sort by a sum and then all component columns?
Question: How do I sort by a sum and then all component columns?
dat3#> # A tibble: 50 x 4#> student_id skill_1 skill_2 skill_3#> <int> <int> <int> <int>#> 1 3462 0 0 1#> 2 3510 1 1 1#> 3 9717 1 0 1#> 4 3985 0 1 0#> 5 2841 1 0 1#> 6 4370 1 0 1#> 7 5760 0 0 1#> 8 7745 0 0 0#> 9 3756 0 0 1#> 10 6106 1 0 1#> # … with 40 more rows
dat4#> # A tibble: 50 x 5#> student_id skill_1 skill_2 skill_3 skill_4#> <int> <int> <int> <int> <int>#> 1 1472 0 1 1 1#> 2 7097 0 1 0 1#> 3 2148 0 1 1 0#> 4 3036 0 1 0 1#> 5 3312 1 1 1 1#> 6 8740 0 1 0 0#> 7 9649 0 1 1 1#> 8 2077 0 0 0 1#> 9 6014 0 1 0 0#> 10 6657 1 0 0 1#> # … with 40 more rows
I have some data that shows which of 3 skills each student has mastered. I want to sort the data by the total number of skills mastered, and then by each skill. But the number of skills can change. How can I write a solution that will work for any number of skills?
Question: How do I sort a data frame by total skills and then each component skill?
dat3#> # A tibble: 50 x 4#> student_id skill_1 skill_2 skill_3#> <int> <int> <int> <int>#> 1 3462 0 0 1#> 2 3510 1 1 1#> 3 9717 1 0 1#> 4 3985 0 1 0#> 5 2841 1 0 1#> 6 4370 1 0 1#> 7 5760 0 0 1#> 8 7745 0 0 0#> 9 3756 0 0 1#> 10 6106 1 0 1#> # … with 40 more rows
Question: How do I sort a data frame by total skills and then each component skill?
dat3#> # A tibble: 50 x 4#> student_id skill_1 skill_2 skill_3#> <int> <int> <int> <int>#> 1 3462 0 0 1#> 2 3510 1 1 1#> 3 9717 1 0 1#> 4 3985 0 1 0#> 5 2841 1 0 1#> 6 4370 1 0 1#> 7 5760 0 0 1#> 8 7745 0 0 0#> 9 3756 0 0 1#> 10 6106 1 0 1#> # … with 40 more rows
You can't do anything with this. You don't have dat3
on your computer, and you can't copy/paste this df into an R object. Would have to build it by hand.
library(tidyverse)ex_data <- tibble(stu = c(1, 2, 3, 4, 5), skill_1 = c(0, 0, 1, 1, 1), skill_2 = c(1, 1, 0, 0, 0), skill_3 = c(0, 1, 0, 1, 1))ex_data#> # A tibble: 5 x 4#> stu skill_1 skill_2 skill_3#> <dbl> <dbl> <dbl> <dbl>#> 1 1 0 1 0#> 2 2 0 1 1#> 3 3 1 0 0#> 4 4 1 0 1#> 5 5 1 0 1
Bad: How do I sort by a sum and then all component columns?
Bad: How do I sort by a sum and then all component columns?
Better: How can I sort a data frame by total skills and then each component skill?
Bad: How do I sort by a sum and then all component columns?
Better: How can I sort a data frame by total skills and then each component skill?
Best: Provide an example of what you want (including the better question), and solutions you've tried.
# What I want:ex_data %>% mutate(total = skill_1 + skill_2 + skill_3) %>% arrange(total, desc(skill_1, skill_2, skill_3))#> # A tibble: 5 x 5#> stu skill_1 skill_2 skill_3 total#> <dbl> <dbl> <dbl> <dbl> <dbl>#> 1 3 1 0 0 1#> 2 1 0 1 0 1#> 3 4 1 0 1 2#> 4 5 1 0 1 2#> 5 2 0 1 1 2
But without specifying each skill individually, because the number of skills may change.
# What I've triedex_data %>% rowwise() %>% mutate(total = sum(c_across(starts_with("skill")))) %>% ungroup() %>% arrange(total, desc(starts_with("skill")))#> Error: arrange() failed at implicit mutate() step. #> * Problem with `mutate()` input `..2`.#> x `starts_with()` must be used within a *selecting* function.#> ℹ See <https://tidyselect.r-lib.org/reference/faq-selection-context.html>.#> ℹ Input `..2` is `starts_with("skill")`.
Brief description of what you're doing
Reproducible data
What you've tried
What you've gotten
What you want to get
Reproducible data
What you've tried
What you've gotten
reprex makes this part easier
reprex()
The reprex()
function from the reprex package will run code, format it nicely, and render the output to your clipboard.
reprex(x = NULL, venue, session_info, style)
reprex()
The reprex()
function from the reprex package will run code, format it nicely, and render the output to your clipboard.
reprex(x = NULL, venue, session_info, style)
The reprex. Looks first on the clipboard.
reprex()
The reprex()
function from the reprex package will run code, format it nicely, and render the output to your clipboard.
reprex(x = NULL, venue, session_info, style)
Where is the question being posted.
reprex()
The reprex()
function from the reprex package will run code, format it nicely, and render the output to your clipboard.
reprex(x = NULL, venue, session_info, style)
Whether or not to include session information.
reprex()
The reprex()
function from the reprex package will run code, format it nicely, and render the output to your clipboard.
reprex(x = NULL, venue, session_info, style)
Whether or not to format code in tidy style.
ex_data %>% rowwise() %>% mutate(total = sum(c_across(starts_with("skill")))) %>% ungroup() %>% arrange(total, across(starts_with("skill"), desc))#> # A tibble: 5 x 5#> stu skill_1 skill_2 skill_3 total#> <dbl> <dbl> <dbl> <dbl> <dbl>#> 1 3 1 0 0 1#> 2 1 0 1 0 1#> 3 4 1 0 1 2#> 4 5 1 0 1 2#> 5 2 0 1 1 2
We need to use across()
in the arrange function.
R4DS: Expanding on this workshop. Much more to learn!
MDSR: beginning to end- data management, programming, statistics, machine learning, special topics in DS
MD: More statistics (more regression, hypothesis testing, confidence intervals, etc.)
AdvR: How R works (environments, data structures, meta programming)
R Packages: How to make your own package! 2nd edition work in progress
HOPR: Intro to R as a programming language, in the context of data science/data analysis
SocViz: Intro to good looking graphics with ggplot2
Cookbook: Basic recipes for creating and customizing plots
Fundamentals: Made with ggplot2 & Rmd, but no code in book. Focus is on what makes a graphic informative, and appealing.
TMWR: How to use tidymodels, best practices, etc.
FEATENG: Recipes -- how to extract more information from you data, including best practices, recommendations, etc.
HOML: Focused on machine learning methods and models - random forest, clustering algos, gradient boosting machines, neural networks, stacking, more!
RMD: Everything you could ever want to know about R Markdown. Includes chapters on extensions as well.
cookbook: popular how-tos for how to do different things in rmarkdown
bookdown: writing books, articles, dissertations, etc.
Blogs
My personal list of R community members
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
s | Toggle scribble toolbox |
Esc | Back to slideshow |