R Data type ( Chapter 1 Episode 2)

Variables are used to store information. They preserve memory locations for the purpose of storing values. When a variable is created, some space is reserved in the memory. You will often need to store information about different data types, including integer, floating point, Boolean, etc. This means that the data type of the variable matters. It determines the amount of memory allocated to the space and the kind of value that can be stored in the space. In most programming languages such as C and Java, a variable is defined as a particular data type. This is not the case with R since the variables are assigned with R-objects, and the data type of the R-object will become the data type of the variable. There are different types of R-objects, but the common ones include the following:

  • Vectors

  • Matrices

  • Lists

  • Arrays

  • Data Frames

  • Factors

Vector

A vector is simply a one-dimensional array. A vector can be created using any of the basic data types that you know. The vector should have elements of the same type. The simplest way for us to create a vector in R is by use of the c command. The command indicates that we have various elements that will be combined into a vector. For example,

student <- c("james","mark","jane")
print(student)

# Get the class of our vector.
print(class(student))

output

[1] "james" "mark"  "jane" 
[1] "character"
vector1 <- c(1,2,5.3,6,-2,4) # a numeric vector
vector2 <- c(TRUE,TRUE,TRUE,FALSE,TRUE,FALSE) #a logical vector
print(vector1)
print(vector2)

output

[1]  1.0  2.0  5.3  6.0 -2.0  4.0
[1]  TRUE  TRUE  TRUE FALSE  TRUE FALSE
print(vector1[c(2,4)])# prints 2nd and 4th elements of the vector 
print(vector2[c(1,4)]) # prints 1st and 4th elements of the vector

output

[1] 2 6
[1]  TRUE FALSE

If you need to create a more complex sequence, you can use the seq() function. A good example is when you need to define the number of points in an interval, or the step size. For example:

seq(1, 3, by=0.2)

seq(1, 5, length.out=4)

output

 [1] 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0
[1] 1.000000 2.333333 3.666667 5.000000 #counts the number of steps btw the 2 numbers and divides the value specified in the length.out by it e.g 4/3 in this case

Matrices

A matrix is a data set represented in a two-dimensional rectangular form. All the columns of the matrix should be of the same data type and the same length. It is similar to a vector, but it comes with the dimension attribute. To check the dimensions of an object, use the attributes() function. For example:

numbers = c(1,2,3,4,5,6,7,8)
myMatrix <- matrix(numbers, nrow= 2, ncol=4 )

output

     [,1] [,2] [,3] [,4]
[1,]    1    3    5    7
[2,]    2    4    6    8
x <- matrix(1:12, nrow = 4, dimnames = list(c("W","X","Y" ,"Z"), c("A","B","C")))

output

  A B  C
W 1 5  9
X 2 6 10
Y 3 7 11
Z 4 8 12

We can also create a matrix by use of the cbind() and rbind() functions for column bind and row bind respectively. cbind() creates a matrix joining vectors by column while rbind() creates a matrix joining matrix by row. For example:

Row Bind

vectorA <- seq(3, 36, by=3)
print(vectorA)
vectorB <- seq(4,48 , by =4)
print(vectorB)

r <- rbind(vectorA, vectorB)

output

print(vectorA)
[1]  3  6  9 12 15 18 21 24 27 30 33 36
print(vectorB)
[1]  4  8 12 16 20 24 28 32 36 40 44 48
print(r)
        [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
vectorA    3    6    9   12   15   18   21   24   27    30    33    36
vectorB    4    8   12   16   20   24   28   32   36    40    44    48

#To check if the matrix has column names

colnames(r)

We can add custom column name to the matrix by passing the list in a vector:

colnames(r) <- c('1','2','3','4','5','6','7','8','9','10','11','12')
r

Note: ensure the count of columns defined is equal to the count of columns generated.

output

        1 2  3  4  5  6  7  8  9 10 11 12
vectorA 3 6  9 12 15 18 21 24 27 30 33 36
vectorB 4 8 12 16 20 24 28 32 36 40 44 48

Column Bind

vectorA <- seq(3, 36, by=3)
print(vectorA)
vectorB <- seq(4,48 , by =4)
print(vectorB)

c <- cbind(vectorA, vectorB)

output

print(vectorA)
[1]  3  6  9 12 15 18 21 24 27 30 33 36
print(vectorB)
[1]  4  8 12 16 20 24 28 32 36 40 44 48
print(c)
             vectorA vectorB
 [1,]       3       4
 [2,]       6       8
 [3,]       9      12
 [4,]      12      16
 [5,]      15      20
 [6,]      18      24
 [7,]      21      28
 [8,]      24      32
 [9,]      27      36
[10,]      30      40
[11,]      33      44
[12,]      36      48

#To check if the matrix has row names

rownames(c)

We can add custom row name to the matrix by passing the list in a vector:

rownames(c) <- c('row1','row2','row3','row4','row5','row6','row7','row8','row9','row10','row11','row12')
c

Note: ensure the count of columns defined is equal to the count of columns generated.

output

     vectorA vectorB
col1        3       4
col2        6       8
col3        9      12
col4       12      16
col5       15      20
col6       18      24
col7       21      28
col8       24      32
col9       27      36
col10      30      40
col11      33      44
col12      36      48

Lists

A list is an R-object which can be used for holding a number of different elements inside it. This means that a list may hold together elements that are not related. Examples of such elements include functions, vectors, and even other lists. For example:

mylist  <- list(FALSE, TRUE, c(1,2,3,4), 'Ruqy')
print(mylist)
typeof(mylist)
length(mylist)

output

[[1]]
[1] FALSE

[[2]]
[1] TRUE

[[3]]
[1] 1 2 3 4

[[4]]
[1] "Ruqy"

# typeof(mylist)
[1] "list"

#  length(mylist)
4

To get the 3rd index in the list:

mylist[[3]]  #prints the 3rd index in the list
mylist[[3]][3] #gets the 3rd value in the 3rd index
mylist[[3]][1] #gets the 1st value in the 3rd index

output

[1] 1 2 3 4
[1] 3
[1] 1

Arrays

An array can be of any dimensions. The array attribute takes the “dim” attribute which specifies the number of dimensions that you want to create for the array. For example:

x <- array(c('A', 'B', 'C'),dim = c(3,3,2)) #creates an array of 3 rows , 3 columns and repeats 2ce
print(x)

output

, , 1

     [,1] [,2] [,3]
[1,] "A"  "A"  "A" 
[2,] "B"  "B"  "B" 
[3,] "C"  "C"  "C" 

, , 2

     [,1] [,2] [,3]
[1,] "A"  "A"  "A" 
[2,] "B"  "B"  "B" 
[3,] "C"  "C"  "C"

Data Frames

Data frames are simply tabular data objects. It is possible for you to have data of different data types in the different columns of the data frame, which is not allowed in a matrix. The first column of the data frame can be numeric, the second can contain logical values, the third column can have characters, etc. It is simply a list of vectors of an equal length. Note that the data frame is a two dimensional object. For example:

x = data.frame('name' = c('Seun', 'Toby'), 's/n'= 1:2, 'age'= c(12,45))

output

  name s.n age
1 Seun   1  12
2 Toby   2  45

To access the elements of a data frame as a list, we can use the [, [[ or $ operators, and we will be able to access the columns of the data frame.

x['name']

output

  name
1 Seun
2 Toby

To get more than one column in the data frame:

x[c('name', 'age')]

output

  name age
1 Seun  12
2 Toby  45

Factors

Factors are data objects used for the purpose of categorising data and then storing them under levels. They can be used for storage of both strings and integers. Factors are only useful in the columns with a limited number of unique values. They are good in data analysis and statistical modelling. For example:

d <- factor(c("East","West","East","North","North","East","West","West","West","East"))
print(d)
class(d)

output

> print(d)
 [1] East  West  East  North North East  West  West  West  East 
Levels: East North West

> typeof(d)
[1] "integer"

> class(d)
[1] "factor"