Week6: Grain development III

R-intermediate
Author

Tien-Cheng

Published

May 23, 2023

Welcome to the sixth course! You will learn more about dataframe wrangling:

Learning goals
  1. data frame wrangling with dplyr
  2. ggplot()
Discussion
  1. What is the component of function()? How could you document them?
  2. How many data type can you remember?
  3. How many ways could you match a cell in dataframe? which functions are available?
  4. What is the function of mutate(), filter() and select()? (dplyr) 4.5 If you have multiple options in these function, what operator should you use for separation? 4.6 what is the meaning of , in filter(condition1, condition2)?
  5. What are the elements of ggplot()
  6. What are |,&,!,== and %in%? When to use them?
  7. How do you have an overview of a dataframe? Which function can you use?
  8. If you want to combine two columns into one, how many ways are available?

To match the pattern,how the order before or after %in% matters?

data.frame(x=letters) %>% 
  filter(x%in%c('a','b'))
data.frame(x=letters) %>% 
  filter(c('a','b')%in%x)

c('a','b')%in%letters
letters%in%c('a','b')

advanced mutate add-on :across

apply function for multiple columns

across( column_name (e.g., "y") or column_position (e.g., 2),FUNCTION)

FUNCTION examples

  1. function(x){x +1}

syntax: ~{.x+1}

  1. function(x){mean(x, ...)}

syntax: ~{mean(.x,...)}

... means the additional option other than the first input argument. For example, ... could be na.rm=True.

Now here is one example of applying function to multiple columns

climate %>% 
  select(ends_with("Temperature")) %>% 
  head(.,3) %>%
  glimpse()


climate %>% 
  mutate(across(where(is.numeric),~{round(.x, digits = 2)})) %>%
  # mutate(across(where(is.numeric),function(x){round(x, digits = 2)})) %>%
  select(ends_with("Temperature")) %>% 
  head(.,3) %>% 
  glimpse()

# reduce your code chunk by using function
display <- function(x){
  # subset dataframe and summarized for displaying purporse
  # x: input data frame
  x %>% 
    dplyr::select(ends_with("Temperature")) %>% 
    head(.,3) %>% 
    dplyr::glimpse()
}

climate %>% 
  mutate(across(where(is.numeric),function(x){round(x, digits = 2)})) %>%
  display()

Conclude your dataframe by groups

group_by()

Learn from examples!

practice
  1. read climate.csv from data folder using relative path and name it as climate.
  2. find the sowing date (i.e., the first date) for each year (DFG_year) and each sowing treatment(sowing_date) from .
climate_sub <- climate %>% 
  dplyr::select(DayTime,DailyMean_Temperature,DFG_year,sowing_date)
  1. calculate thermal time based on DailyMean_Temperature from climate_sub
thermal_time <- climate_sub%>%
  group_by(DFG_year,sowing_date) %>% 
  mutate(DayTime=as.Date(DayTime,format="%Y-%m-%d")) %>% 
  arrange(DayTime) %>% 
  mutate(ACC=cumsum(DailyMean_Temperature))


# check with
library(ggplot2)
merge_thermal_time <- thermal_time%>% 
  merge(.,climate %>% 
          select(DayTime,DFG_year,sowing_date,Acc_Temperature)) 

merge_thermal_time %>% 
  ggplot(.,aes(x=ACC,y=Acc_Temperature))+
  geom_point(shape=0,size=.5,alpha=.5)+
  geom_abline(intercept=0)+
  facet_grid(sowing_date~DFG_year)

recommendation

purrr