Back to notes
What is the syntax for grouping data and then summarizing it?
Press to flip
`grouped_data <- original_data %>% group_by(column_name) %>% summarize(mean_col = mean(column_name), ...)`
What does the `filter` function do and what is its basic syntax?
`filter` selects rows based on a condition. The syntax is `filtered_data <- original_data %>% filter(condition)`.
How can you rename columns while selecting them using the `select` function?
`opt_rename <- opt %>% select(newName = oldName)`
What is the syntax to select specific columns from a dataset using `tidyverse`?
`new_data <- original_data %>% select(column1, column2)`
How do you sort data by multiple columns using the `arrange` function?
The syntax is `sorted_data <- original_data %>% arrange(column1, column2)`.
Describe how to replace NA values in a dataset.
`polyps7 <- polyps %>% mutate(column_name = replace_na(column_name, 0))`
How do you create a new R Markdown document in RStudio?
Go to `File -> New File -> R Markdown`, name the document (e.g., `Chapter 2`), and save it in the project directory.
How do you apply conditional logic to create new columns using `mutate` and `case_when`?
Example: `polyps3 <- polyps2 %>% mutate(Improvement = case_when(Total > 0 ~ 'Decline', Total == 0 ~ 'No Change', Total < 0 ~ 'Improvement'))`.
Which packages are required for the data wrangling tasks in this class?
The required packages are `tidyverse` and `medicaldata`.
Provide a code example for filtering data where `Clinic` is `NY` in the `opt` dataset.
`filtered_data <- opt %>% filter(Clinic == 'NY')`
What function is used to find missing data in specific columns, and provide an example?
The `summarize` function is used. Example: `polyps %>% summarize(missing_baseline = sum(is.na(Baseline)), ...)`.
What is the first step in creating a new project for data wrangling and analysis in RStudio?
Navigate to `File -> New Project -> New Directory -> New Project`, name the project (e.g., `intro_wrangling_analysis`), and save it in the appropriate folder.
What is the purpose of the `mutate` function in `tidyverse`?
The `mutate` function creates new columns or modifies existing ones based on expressions or computations.
How do you calculate summary statistics while ignoring NAs in calculations?
`summary_data <- polyps %>% summarize(mean_value = mean(column_name, na.rm = TRUE))`
How do you drop rows with NA values in a specific column?
`polyp6 <- polyps %>% drop_na(column_name)`
Why might you need to redefine the rounding function in R, and provide the custom rounding function code mentioned in the notes?
R defaults to rounding to the nearest even number. Custom rounding function: `round2 <- function(x, digits) { posneg <- sign(x); z <- abs(x)*10^digits; z <- z + 0.5; z <- trunc(z); z <- z/10^digits; z*posneg }`.
Previous
Next