Week2: Working directory and accessor

vector
working directory
Author

Tien-Cheng

Welcome to the second course! You will learn working directory, subset elements from vector, list and dataframe.

Note
  1. data type logical and operator
  2. accessor []
  3. What is working directory (wd)?
  4. How to access elements from vector, list and dataframe

1 Conditions: logical operators and vectors

Logical vector Logical operators 1

  • L==R: direction doesn’t matters.
  • L%in%R: one sided, check if object L is presence in R.
  • !: negate the logical vector.
# check if pattern exist in vector
3%in%c(1,3)
# what is the difference?
c(1,3)%in%3
2%in%c(1,3) 
1==2 
1==c(2,1) 
# elementwise check whether L equals
c(2,1)==1
# check identity pairwise
c(1,2)==c(2,4)
# is '!' reverse the logical vector?
!1==2 
1!=2 
c(1,3)==2
# what does which() returns?
which(c(1,3)==3) 
# what will be the data type? check with str()
c(1,2,NA) %>% is.na() 
c(1,2,NA) %>% is.na() %>% which() 
c(1,2,NA) %>% is.na() %>% !.
c(1,2,NA) %>% !is.na() 
!is.na(c(1,2,NA))

Preconditions examples inside a function

# check if data type match
arg <- ""
is.character(arg)
if(is.character(arg)){
  print("character")
}
if(is.character(arg)){
  print("character")
}else{
  error("type other than character")
}
if(is.character(arg)){
  warning("wrong")
}
if(is.character(arg)){
  stop("wrong")
}
challenge

Inside your plusone function, please check first whether input x is numeric, then proceed the process. if not, return with message “wrong input type” using stop()

2 Working directory

2.1 preparation

  1. Open the folder that contain Wheat_BSC_project.Rproj
  2. download the data from HU-box, save it in data.
  3. create Week2.R and save it in folder src.

‘folder structure’

What is working directory (wd)?

2.2 abbreviation path: “.” for wd and “..” for parent of wd

"." means the working directory (wd) where this R script exists.

".." means the parent (one level higher) directory of ".".

library(dplyr)

# working directory, abbreviated as "."
getwd()
# parent directory, abbreviated as ".."
dirname(getwd())
# assign current path to variable
current_path <- getwd()
# check the type 
current_path %>% str()


# check files in the directory

# are they different?
"." %>% list.files(path=.)
getwd() %>% list.files(path=.)

# are they different?
".." %>% list.files(path=.)
getwd() %>% dirname() %>% list.files(path=.)
challenge

Although the meaning of . is the same as getwd(), the content is depending on the environment you are working with.

Right click R studio logo, open a new R studio window, compare the result of getwd() in R project and R

2.3 accessing files and folder inside a R project

Which one do you prefer? Why do we prefer relative path?

# absolute path, did you get error?
"C:/Users/marse/seadrive_root/Tien-Che/My Libraries/PhD_Tien/Project/Postdoc_teaching/BSC_project_IPFS2023/data" %>% list.files(path=.)
# relative path in R base
parent_path <- getwd() 
paste0(parent_path,"/data") %>% list.files(path=.)

# Does this works? 
".\data" %>% list.files(path=.)
"data" %>% list.files(path=.)
symbol Absolute path relative path color
A C/users/Wheat_BSC_project .. black
B C/users/Wheat_BSC_project/data ? blue
C C/users/Wheat_BSC_project/src . red
D C/users/Wheat_BSC_project/src/data ? blue

Below are four relative paths. Please rewrite them in absolute (full) path form. Which two are the same? Based on the figure illustated below, path 1-4 should be A,B,C or D?

  1. "ear_summarized.csv"

  2. "data/ear_summarized.csv"

  3. "./data/ear_summarized.csv"

  4. "../data/ear_summarized.csv"

3 get element from a vetor with accessors []

accessor

vector indexing start from 1 to the length of the vector.

empty_vec <- c()
length(empty_vec)
# what is the type of the empty vec?
empty_vec %>% str()

# NULL: empty 
empty_vec[1]
empty_vec[0]


vec <- c(1,3,5)
vec[1]
#reorder the vector 
vec[c(2,1,3)]

# removing the indexed elements
vec[-1]
vec[-2]

# indexing start from 1, not 0
# therefore you get, numeric(0)
vec[0]
# when access exceeding the range of a vector, what datatype do you get? 
vec[4]
vec %>% .[length(.)+1]
vec[1:4]
vec[4:1]

# find specific element or position
vec[c(F,T,F)]
vec[vec==5]
# when codition not match at all, it will return? 
vec[vec==2]
vec[c(F,F,F)]
vec %>% .[c(F)]
vec[vec=="a"]

# default str vector
letters
LETTERS
# when the query does not match, guess what will be the datatype? 
letters %>% .[.==2]
letters %>% .[c(F)]
# vector over write
vec
vec <- c(2,1,3)
vec
challenge
vec <- c(1, 2, 3, 4, 5)
logical_vec <- c(TRUE, FALSE)
subset_vec <- vec[logical_vec]
subset_vec
[1] 1 3 5

what did you observe? Is there any vector recycling?

What happen when you enter vec[TRUE]?

Supplementary information of special datatypes:2

  • 2 Data type emplty: NULLat zero position: numeric(0)

  • 4 list: keep the diversity of data type

    Make a list is like put a cookie(content of list element) in the cookie jar(list element).

    # list without element name
    list_a <- list(c(1,2))

    list without name There are 3 common accessors for list:

    1. access the list element (cookie jar)
    • [] access the list position
    1. access the content of list element (all cookies in jar)
    • [[]] access the content of a list element by position or name
    1. access the specific content of list element (selected cookie(s))
    • [[]][], position of logical vector
    vec_obj <- c(1,2,4,5)
    # position vector
    pos_vec <- c(2,3,1)
    list_obj <- list(vec_obj)
    # element id
    ele_ind <- 1
    action vector list
    extract from content vec_obj[pos_vec] list_obj[[ele_ind]][pos_vec]
    refer to content (data type:list) vec_obj list_obj[[ele_ind]]
    refer to position (data type:list) list_obj[ele_ind]
    • pos_vec or ele_ind could also be either numeric or logical

    4.1 access content by name

    $ access the content of a list element by name list_object$element_name or list_object[[element_name]]

    # list without element name
    list_b <- list(nam=c(1,2))

    list with name

    More about the accessors. 3

    # create a simple list
    list(1)
    # create a simple list with name "x" for first element
    list(x=1)
    list(x=1)["x"]
    # extract content
    list(x=1)$"x"
    list(x=1)[[1]]
    list(x=1)[["x"]]
    
    # extract with pipe
    list(x=1) %>% .[[1]]
    list(x=1) %>% .$"x"
    
    # long list
    long_list_example <- list(1,c(1,2),
                              T,c(T,T),
                              "str",c("a","b"),
                              list(1),
                              mean,data.frame())
    # check the content
    long_list_example
    # check structure of this list 
    # list_complex_example %>% str()
    # list_complex_example %>% glimpse()
    # list_complex_example
    # first list 
    long_list_example[1]
    # content of first list
    long_list_example[[1]]
    # first element of content of first list
    long_list_example[[1]][1]
    challenge

    can you guess what data type are these?

    # non-sense
    long_list_example[[1]][2]
    long_list_example[1][1]
    long_list_example[1][2]
    long_list_example[2][2]
    # meaningful
    long_list_example[[2]][2]

    4.2 lapply: apply functions and return list

    lapply(vector, function) ?lapply

    # input is vector
    c(1,4) %>% 
      lapply(.,FUN=function(x){x+3})
    # input is list
    list(2,4,c(1,4)) %>% 
      lapply(.,FUN=function(x){x+3})
    # input has differnt type
    list(2,4,c(1,4),"8") %>% 
      lapply(.,FUN=function(x){x+3})
    challenge

    Why you get error in the last line?

    5 dataframe is a special type of list

    each column has one data type

    # create a dataframe 
    df <- data.frame(time=as.Date("2023-04-16",format="%Y-%m-%d")+seq(1,3,1),
                     temp=c(20,15,13),
                     thermal_time=cumsum(c(20,15,13)))
    # another way
    df <- data.frame(time=as.Date("2023-04-16",format="%Y-%m-%d")+seq(1,3,1)) 
    df$temp=c(20,15,13)
    df$thermal_time=cumsum(df$temp)
    
    # third method
    library(dplyr)
    df <- data.frame(time=as.Date("2023-04-16",format="%Y-%m-%d")+seq(1,3,1)) %>% 
      mutate(temp=c(20,15,13), 
             thermal_time=cumsum(temp))
    df
    challenge

    Is it possible to create data frame with vectors of different length?

    data.frame(time=as.Date("2023-04-16",format="%Y-%m-%d")+seq(1,3,1), 
               temp=c(20,13))

    5.1 extract columns from data frame

    You can subset dataframe by indexing [row,column]

    dataframe[,column] select the whole role for selected columnn

    dataframe[row,] select the whole column of selected rows

    Select multiple row or column by puting logical or numeric vector in the square bracket.

    challenge

    use df,

    1. Access column thermal_time as vector

    2. Extract temp when time is 2023-04-17

    3. Extract first row and first column with [1,]and [,1]

    if you want to turn a data frame (df) by 90 degree (“transpose”), which function can you use? Could you find the answer on google or chatGPT?