4.2 Vectors, matrices and data.frames

Vectors are collection of elements of the same type without dimensions. Vectors of length 1 are called scalars.

scalar_1 <- 1
scalar_2 <- 2
scalar_3 <- 3
length(scalar_1) #length of a the scalar is 1

## [1] 1

(vector_123 <- c(scalar_1, scalar_2, scalar_3))

## [1] 1 2 3

(vector_long <- c(vector_123, vector_123))

## [1] 1 2 3 1 2 3

dim(vector_123) #dimensions of the vector returns NULL

## NULL

Scalars come in four different types

typeof(TRUE)

## [1] "logical"

typeof(1.5)

## [1] "double"

typeof(1L)

## [1] "integer"

typeof("one")

## [1] "character"

R can serve as a calculator.

#basic mathematical operations
scalar_1+scalar_2

## [1] 3

scalar_3^scalar_2

## [1] 9

Computations on vectors are performed element-wise.

vector_123*scalar_2

## [1] 2 4 6

Functions in R have names. A basic function is print().

print("Hello world!")

## [1] "Hello world!"

log() is another useful function and it has two arguments, x and base.

#order matters if arguments are not specified 
log(8, base = 2)

## [1] 3

log(x = 8, base = 2)

## [1] 3

log(8, 2)

## [1] 3

log(base = 2, x = 8)

## [1] 3

log(2, 8)

## [1] 0.3333333

Another important function is sample() that takes a sample from a vector.

sample(1:5, 3) #random sample without replacement

## [1] 2 3 1

sample(1:5, 3, replace = TRUE) #random sample with replacement

## [1] 4 3 5

sample(1:5, 3, prob = c(0.2,0.1,0.3,0.1,0.3)) #odd-biased sample without replacement

## [1] 3 1 5

A matrix is just a vector with dimensions.

#adding dimensions to a vector transforms it to matrix
vector_long_2 <- vector_long
identical(vector_long, vector_long_2)

## [1] TRUE

(dim(vector_long_2) <- c(2,3))

## [1] 2 3

identical(vector_long, vector_long_2)

## [1] FALSE

dim(vector_long)

## NULL

dim(vector_long_2)

## [1] 2 3

Matrices can be created using the matrix() function.

matrix(vector_long, 2, 3)

##      [,1] [,2] [,3]
## [1,]    1    3    2
## [2,]    2    1    3

The default for matrix() function is to fill values by-column. This can changed by setting byrow to TRUE

matrix(vector_long, 2, 3, byrow = TRUE)

##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    1    2    3

Example: Simulate an adjacency matrix for a network⁴

set.seed(1992)
#1 = link,0 = no-link
x <- sample(c(1,0), 25, replace = TRUE, prob=c(.5,.5))
#names of the network's nodes
dim_names <- list(c("Thea", "Pravin", "Troy", "Albin", "Clementine"),
                  c("Thea", "Pravin", "Troy", "Albin", "Clementine"))
#creat 5x5 adjacency matix
(matrix_data2 <- matrix(x,
                        nrow=5, 
                        ncol=5,
                        byrow =TRUE,
                        dimnames = dim_names)#set names of the rows and columns
  )

##            Thea Pravin Troy Albin Clementine
## Thea          1      1    1     1          0
## Pravin        1      0    1     1          1
## Troy          0      0    1     0          1
## Albin         1      1    0     1          0
## Clementine    1      0    0     1          0

isSymmetric(matrix_data2)

## [1] FALSE

A data.frame is a collection of vectors of the same length. We can convert a matrix into a data.frame and vice versa.

#convert matrix to dataframe
class(matrix_data2)

## [1] "matrix" "array"

df_data2 <- as.data.frame(matrix_data2)
class(df_data2)

## [1] "data.frame"

data.frames are inefficient in R and are increasingly being replaced by user-created data classes, such as data.table.⁵
Another important data structure that you need to familiarize yourself with if you’re new to R is lists

For undirected graphs, adjacency matrix is symmetric for one-mode network. This symmerty might not hold for two-mode AKA bipartite networks. In case of directed graphs the adjacency matrix can be asymmetric to reflect directionality of the link/edge. (Thanks Pierre Olivier for your input)↩︎
adjacency matrix , especially large ones, are recommended to be stored as sparse matrix for memory efficiency.↩︎