R Data type ( Chapter 1 Episode 2)
Variables are used to store information. They preserve memory locations for the purpose of storing values. When a variable is created, some space is reserved in the memory. You will often need to store information about different data types, including integer, floating point, Boolean, etc. This means that the data type of the variable matters. It determines the amount of memory allocated to the space and the kind of value that can be stored in the space. In most programming languages such as C and Java, a variable is defined as a particular data type. This is not the case with R since the variables are assigned with R-objects, and the data type of the R-object will become the data type of the variable. There are different types of R-objects, but the common ones include the following:
Vectors
Matrices
Lists
Arrays
Data Frames
Factors
Vector
A vector is simply a one-dimensional array. A vector can be created using any of the basic data types that you know. The vector should have elements of the same type. The simplest way for us to create a vector in R is by use of the c command. The command indicates that we have various elements that will be combined into a vector. For example,
student <- c("james","mark","jane")
print(student)
# Get the class of our vector.
print(class(student))
output
[1] "james" "mark" "jane"
[1] "character"
vector1 <- c(1,2,5.3,6,-2,4) # a numeric vector
vector2 <- c(TRUE,TRUE,TRUE,FALSE,TRUE,FALSE) #a logical vector
print(vector1)
print(vector2)
output
[1] 1.0 2.0 5.3 6.0 -2.0 4.0
[1] TRUE TRUE TRUE FALSE TRUE FALSE
print(vector1[c(2,4)])# prints 2nd and 4th elements of the vector
print(vector2[c(1,4)]) # prints 1st and 4th elements of the vector
output
[1] 2 6
[1] TRUE FALSE
If you need to create a more complex sequence, you can use the seq() function. A good example is when you need to define the number of points in an interval, or the step size. For example:
seq(1, 3, by=0.2)
seq(1, 5, length.out=4)
output
[1] 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0
[1] 1.000000 2.333333 3.666667 5.000000 #counts the number of steps btw the 2 numbers and divides the value specified in the length.out by it e.g 4/3 in this case
Matrices
A matrix is a data set represented in a two-dimensional rectangular form. All the columns of the matrix should be of the same data type and the same length. It is similar to a vector, but it comes with the dimension attribute. To check the dimensions of an object, use the attributes() function. For example:
numbers = c(1,2,3,4,5,6,7,8)
myMatrix <- matrix(numbers, nrow= 2, ncol=4 )
output
[,1] [,2] [,3] [,4]
[1,] 1 3 5 7
[2,] 2 4 6 8
x <- matrix(1:12, nrow = 4, dimnames = list(c("W","X","Y" ,"Z"), c("A","B","C")))
output
A B C
W 1 5 9
X 2 6 10
Y 3 7 11
Z 4 8 12
We can also create a matrix by use of the cbind() and rbind() functions for column bind and row bind respectively. cbind() creates a matrix joining vectors by column while rbind() creates a matrix joining matrix by row. For example:
Row Bind
vectorA <- seq(3, 36, by=3)
print(vectorA)
vectorB <- seq(4,48 , by =4)
print(vectorB)
r <- rbind(vectorA, vectorB)
output
print(vectorA)
[1] 3 6 9 12 15 18 21 24 27 30 33 36
print(vectorB)
[1] 4 8 12 16 20 24 28 32 36 40 44 48
print(r)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
vectorA 3 6 9 12 15 18 21 24 27 30 33 36
vectorB 4 8 12 16 20 24 28 32 36 40 44 48
#To check if the matrix has column names
colnames(r)
We can add custom column name to the matrix by passing the list in a vector:
colnames(r) <- c('1','2','3','4','5','6','7','8','9','10','11','12')
r
Note: ensure the count of columns defined is equal to the count of columns generated.
output
1 2 3 4 5 6 7 8 9 10 11 12
vectorA 3 6 9 12 15 18 21 24 27 30 33 36
vectorB 4 8 12 16 20 24 28 32 36 40 44 48
Column Bind
vectorA <- seq(3, 36, by=3)
print(vectorA)
vectorB <- seq(4,48 , by =4)
print(vectorB)
c <- cbind(vectorA, vectorB)
output
print(vectorA)
[1] 3 6 9 12 15 18 21 24 27 30 33 36
print(vectorB)
[1] 4 8 12 16 20 24 28 32 36 40 44 48
print(c)
vectorA vectorB
[1,] 3 4
[2,] 6 8
[3,] 9 12
[4,] 12 16
[5,] 15 20
[6,] 18 24
[7,] 21 28
[8,] 24 32
[9,] 27 36
[10,] 30 40
[11,] 33 44
[12,] 36 48
#To check if the matrix has row names
rownames(c)
We can add custom row name to the matrix by passing the list in a vector:
rownames(c) <- c('row1','row2','row3','row4','row5','row6','row7','row8','row9','row10','row11','row12')
c
Note: ensure the count of columns defined is equal to the count of columns generated.
output
vectorA vectorB
col1 3 4
col2 6 8
col3 9 12
col4 12 16
col5 15 20
col6 18 24
col7 21 28
col8 24 32
col9 27 36
col10 30 40
col11 33 44
col12 36 48
Lists
A list is an R-object which can be used for holding a number of different elements inside it. This means that a list may hold together elements that are not related. Examples of such elements include functions, vectors, and even other lists. For example:
mylist <- list(FALSE, TRUE, c(1,2,3,4), 'Ruqy')
print(mylist)
typeof(mylist)
length(mylist)
output
[[1]]
[1] FALSE
[[2]]
[1] TRUE
[[3]]
[1] 1 2 3 4
[[4]]
[1] "Ruqy"
# typeof(mylist)
[1] "list"
# length(mylist)
4
To get the 3rd index in the list:
mylist[[3]] #prints the 3rd index in the list
mylist[[3]][3] #gets the 3rd value in the 3rd index
mylist[[3]][1] #gets the 1st value in the 3rd index
output
[1] 1 2 3 4
[1] 3
[1] 1
Arrays
An array can be of any dimensions. The array attribute takes the “dim” attribute which specifies the number of dimensions that you want to create for the array. For example:
x <- array(c('A', 'B', 'C'),dim = c(3,3,2)) #creates an array of 3 rows , 3 columns and repeats 2ce
print(x)
output
, , 1
[,1] [,2] [,3]
[1,] "A" "A" "A"
[2,] "B" "B" "B"
[3,] "C" "C" "C"
, , 2
[,1] [,2] [,3]
[1,] "A" "A" "A"
[2,] "B" "B" "B"
[3,] "C" "C" "C"
Data Frames
Data frames are simply tabular data objects. It is possible for you to have data of different data types in the different columns of the data frame, which is not allowed in a matrix. The first column of the data frame can be numeric, the second can contain logical values, the third column can have characters, etc. It is simply a list of vectors of an equal length. Note that the data frame is a two dimensional object. For example:
x = data.frame('name' = c('Seun', 'Toby'), 's/n'= 1:2, 'age'= c(12,45))
output
name s.n age
1 Seun 1 12
2 Toby 2 45
To access the elements of a data frame as a list, we can use the [, [[ or $ operators, and we will be able to access the columns of the data frame.
x['name']
output
name
1 Seun
2 Toby
To get more than one column in the data frame:
x[c('name', 'age')]
output
name age
1 Seun 12
2 Toby 45
Factors
Factors are data objects used for the purpose of categorising data and then storing them under levels. They can be used for storage of both strings and integers. Factors are only useful in the columns with a limited number of unique values. They are good in data analysis and statistical modelling. For example:
d <- factor(c("East","West","East","North","North","East","West","West","West","East"))
print(d)
class(d)
output
> print(d)
[1] East West East North North East West West West East
Levels: East North West
> typeof(d)
[1] "integer"
> class(d)
[1] "factor"