MASM22/FMSN30(40) Linear and Logistic Regression (with Data Gathering)

Basic computing

This is a quick introduction to R. You will note that it has many similarities with Matlab, as well as some confusing differences.

R is made for computing things. If you want to find the result of \(2 + 4\) you simply write

2 + 4

and R will answer

2 + 4
#> [1] 6

The notation [1] 6 means that the first value (in this case the only value) of the answer is 6. If you want to do a multiplication you write

2 * 4
#> [1] 8

All common mathematical functions are available. In order to calculate 42, \(\sqrt{4}\), \(\ln(4)\) and \(e^4\) you write

4^2
#> [1] 16
sqrt(4)
#> [1] 2
log(4)
#> [1] 1.386294
exp(4)
#> [1] 54.59815

Note that R uses decimal period, never decimal comma.

If you want to save the result of a calculation you have to give the result a name. This is done using the notation <- (less-than immediately followed by a dash). If we want to calculate \(2+4\) and save the result under the name myresult you give the command

myresult <- 2 + 4

A variable named myresult should now be listed in the Environment window. You can also see that it contains the value 6. You can ask R to print the answer by writing

myresult
#> [1] 6

We can use this variable in other functions. For example, we can write

sqrt(myresult)
#> [1] 2.44949

to get the square root of 6. The expression sqrt() is a function. All functions in R end in brackets, even if they have no argument, e.g., q().

3. Variables

You can collect several values into one variable, a vector, using the function c() (c for combine or collect):

x <- c(3, 5, 7, 11, 13)

You can then perform the same calculation as before but on all the values at the same time:

x + 3
#> [1]  6  8 10 14 16
sqrt(x)
#> [1] 1.732051 2.236068 2.645751 3.316625 3.605551

You can also combine several variables into one longer variable:

y <- c(17, 19, 23, 29, 31)
z <- c(x, y)
z
#>  [1]  3  5  7 11 13 17 19 23 29 31

Sometimes you will want to create structured data, e.g., series or repeated sequences. There are two commands for this: seq() and rep(). In addition you can use the colon sign “:”. Try out the following commands and try to understand what they do:

seq(1, 100, 9)
seq(to = 100, from = 1, by = 9)
seq(f = 1, t = 100, length.out = 10)
1:3
3:1
rep(c(1, 2, 3), times = 3)
rep(1:3, each = 4)
rep(1:3, t = 3, e = 4)
rep(1:3, length.out = 20)

If you need help on a particular function you can use the help function by writing help(seq) or ?seq. You can also use the ‘’Help’’ window in R Studio. The colon sign is not a function but an operator so you have to write help(":") using quotes.

Sometimes you only want some of the values in a variable. We can choose values using []:

myvalues <- 21:30
myvalues
#>  [1] 21 22 23 24 25 26 27 28 29 30
myvalues[1]
#> [1] 21
myvalues[c(1, 3, 5)]
#> [1] 21 23 25
myvalues[1:3]
#> [1] 21 22 23

You can also choose to exclude values:

myvalues[-1]
#> [1] 22 23 24 25 26 27 28 29 30
myvalues[-c(1, 3, 5)]
#> [1] 22 24 26 27 28 29 30
myvalues[-(2:4)]
#> [1] 21 25 26 27 28 29 30

4. Standard functions

There is a large number of functions in R. Here are some examples of basic statistical functions. The first one creates 100 random numbers from a standard normal distribution. Run help on the others to find out what they do.

x <- rnorm(100)
x
mean(x)
var(x)
sd(x)
median(x)
boxplot(x)
boxplot(x, horizontal = TRUE)
hist(x)

5. Objects

All variables in R are objects. You can see the objects you have created in the Environment window. You can also list them using the command ls(). If you want to remove an object you use the command remove() or, shorter, rm():

rubbish <- c(1, 19, 23.4)
ls()
#> [1] "myresult" "myvalues" "rubbish"  "x"        "y"        "z"
remove(rubbish)
ls()
#> [1] "myresult" "myvalues" "x"        "y"        "z"

If you want to remove all objects you can combine the two commands into

remove(list = ls())
ls()
#> character(0)

Be careful! R will NOT warn you that you are removing anything. It assumes you know what you are doing. Now we have a nice empty environment. Save your script file and close R Studio. You can answer “No” when asked to save the workspace. Since you saved your script file you can run the commands and recreate them again next time you run R.

Continue with Computer exercise 1.