News
 [16/519] Corrected a misprint in Project 3. Of course the loglink implies that ln(μ) = xβ., not the other way around.
 [14/519] Project 3 added.
 [13/519] Code and data for Lecture 10 added.
 [9/519] Slides for Lecture 10 added. Material on Quantile regression added.
 [7/519] Slides and Rcode for Lecture 9 added. Project 2 updated.
 [2/519] Slides and Rcode for Lecture 8 added.
 [23/419} Signup for oral exams and oral presentations of Project 3 open. Questions (oral_vt19.pdf) added.
 [15/419] Slides and Rcode for Lecture 7 added. Solutions to compulsory lab 3 added. First part of Project 2 added.
 [12/419] Signup for the individual oral exams opened for six exchange students. Opens for everyone else Tuesday 23/4.
 [8/419] Solutions to Lab 2 added. Slides, Rcode and data for Lecture 6 added.
 [5/419] Slides and Rcode for Lecture 5 added.
 [4/419] Files for compulsory computer lab 3 added.
 [3/419] Project 1 and Oslo data updated.
 [2/419] Rcode for Lecture 4 added.
 [1/419] Rcode for Lecture 3 and solutions to Lab 1 added. Slides for Lecture 4 added.
 [29/319] Slides for Lecture 3 added.
 [28/319] Instructions and data for compulsory conputer lab 2 added.
 [27/319] Added some more course specific R help.
 [25/319] Slides and Rcode for Lecture 2 added. The first part of project 1, with data, has been added. The rest will show up later this week.
 [22/319] Slides and Rcode for Lecture 1 added. Signup for all three compulsory exercises open.
 [20/319] Instructions for computer exercise 1 added, see the appropriate time in the schedule below.
 [12/319] Signup for compulsory lab 1 open. If either of the times gets booked full we will add a second room for that time slot. Update 14/3: 1012 has got a second room.
 [12/319] Welcome letter
 [23/119] The course starts Monday 25 March, 2019, 13.1515.00 in MH:Riesz.
 This course is taught jointly with FMSN40 Linear and logistic regression with data gathering, 9hp: please check http://www.maths.lth.se/matstat/kurser/fmsn40/
 [23/119] FMSN30/MASM22 schedule
 [23/119] Compulsory computer exercises: Wednesday 27 March, 10.1512.00 or 13.1515.00; Wednesday 3 April, 10.1512.00 or 13.1515.00; Wednesday 10 April, 10.1512.00 or 13.1515.00.
 [23/119] Peer review of project reports: Project 1 Monday 15 April, 13.1514.00; Project 2 Wednesday 15 May, 8.159.00.
 [23/119] Project 1+2 deadlines: Project 1 Wednesday 17 April, 16.00; Project 2 Friday 17 May, 16.00.
 [23/119] Project 3 presentations: Monday 27 May 8.1510 or 10.1512 or 15.1517 or Tuesday 28 May 10.1512 or 15.1517 or Wednesday 29 May, 8.1510 or 13.1515 or 15.1517.
 [23/119] Oral exams:
320 June. 31 May  20 June.
R
We will use the statistical program R which can be downloaded from http://ftp.acc.umu.se/mirror/CRAN/ free of charge for all major platforms. It is a good idea to install it on your own computer, if you have one. Also, a good programming practice is to consider an appropriate editor for writing and executing R programs; therefore I have set a page for Rstudio (for Windows/Linux/MacOS).
Notice that this course is about Statistics and is not an indepth course about R. We will discuss the commands needed to produce the desired output and answer the relevant statistical questions. However we will not consider tipsandtricks, good programming practice or any advanced use of such powerful computer language. R has a large and friendly user community and you will be able to find plenty of good guides, tutorials and answered questions by a simple Google search. Here follow some of the many guides freely available on the web:
 R.pdf A (small) R Tutorial
 RTutorial.pdf A Short R
Tutorial
 Rintro.pdf An Introduction to R
Course specific help:
 Introduction to RStudio and RStudio projects
 Basic computations in R
 Matrix manipulation
 Rmarkdown for combining Rcode, output and text.
 lab1_vt19_useful.pdf
 lab2_vt19_useful.pdf
 lab3_vt19_useful.pdf
Computer Labs
You will have the chance to book specific computer labs sessions. That is you do not have to attend all labs reported in the schedule below, only the ones you book. Special attention should be devoted to mandatory labs denoted in bold: you MUST attend one of those each week for the first three weeks.
Literature
 Rawlings, J.O., Pantula, S.G., Dickey, D.A.: Applied Regression Analysis  A Research Tool, 2ed, Springer, available as ebook,
 Agresti, A. An Introduction To Categorical Data Analysis, 2ed Wiley, 2007, available as ebook.
Schedule spring 2019
Week  Place  Contents  Material 

w13  Mon 25/3, 13.1515, MH:Riesz  Lecture 1: Introduction; Review of simple linear regression, linear relationships, linear models and basic assumptions (normality, homoscedasticity, linearity, independence), least squares estimation, basic properties of expectation, variance and covariance; mean and variance of least squares estimators.  Rawlings Ch.1;
lecture1_vt19.pdf; lecture1_vt19.R 
Wed 27/3, 8.1510, MH:Riesz  Lecture 2: Continuation of simple linear regression; distribution of least squares estimators; prediction; confidence intervals; hypothesis testing, pvalues, quantiles.  Rawlings Ch.1;
lecture2_vt19.pdf; lecture2_vt19.R  
Wed 27/3, 10.1512 MH:230+231 or 13.1515, MH:230  Compulsory computer lab 1 
lab1_vt19.pdf;
lab1_vt19_solutions.pdf; lab1_vt19_solutions.R lab1_vt19_solutions.Rmd  
Thur 28/3, 8.1510, MH:230+231  work on Project 1  project1_vt19.pdf
(updated!);
oslo.txt (updated!)  
w14  Mon 1/4, 13.1515, MH:Riesz  Lecture 3. Multiple Regression: matrix notation, properties of least squares estimators for multiple regression; confidence intervals for multiple regression; critical requirements; illranked design matrices, lack of invertibility. categorical variables  Rawlings Ch.3, 4, 6.5
lecture3_vt19.pdf; lecture3_vt19.R 
Wed 3/4, 8.1510, MH:Riesz  Lecture 4. Analysis of variance: variability decomposition. Global Ftest. ANOVA tables. Partial Ftest.  Rawlings Ch.9.
lecture4_vt19.pdf; lecture4_vt19.R  
Wed 3/4, 10.1512 MH:230+231 or 13.1515, MH230  Compulsory computer lab 2 
lab2_vt19.pdf;
sleep.txt; lab2_vt19_solutions.pdf; lab2_vt19_solutions.R; lab2_vt19_solutions.Rmd  
Thu 4/4, 8.1510, MH:230+231  Work on project 1  
w15  Mon 8/4, 13.1515, MH:Riesz  Lecture 5. Rsquared, AdjustedRsquared. AIC & BIC, automatic selection methods  Rawlings Ch.7
lecture5_vt19.pdf lecture5_vt19.R 
Wed 10/4, 8.1510, MH:Riesz  Lecture 6. Problem areas in least squares; Regression diagnostics: outliers w.r.t. X (leverage), distribution of residuals, standardised and studentised residuals; graphical tools for residual analysis. Influential observations (Cook's distance, DFBETAS)  Rawlings Ch.1011
lecture6_vt19.pdf; lecture6_vt19.R; f6data.txt;  
Wed 10/4, 10.1512 MH:230+231 or 13.1515, MH:230  Compulsory computer lab 3 
lab3_vt19.pdf;
CDI.txt; lab3_vt19_solutions.pdf; lab3_vt19_solutions.R; lab3_vt19_solutions.Rmd  
Thu 11/4, 8.1510, MH:230+231  Work on project 1  
w16  Mon 15/4, 13.1515, MH:Riesz  13.1514:00: Peer assessment,
project 1
14:1515: Wrapping up linear regression 

Tue 16/4, 8.1510, MH:Riesz  Lecture 7. Binary data, Bernoulli and binomial distributions, odds ratios and started talking of Logistic regression  Agresti Ch. 1, sec 1.2.1, sec 2.3;
lecture7_vt19.pdf; lecture7_vt19.R  
Wed 17/4, 13.1515, MH:230  Finish project 1 and start on project 2  project2_vt19.pdf
(updated 7/5);
CDI.txt (same data as for lab 3)  
Wed 17/4, 16.00  Project 1 final deadline at 16:00. MASM22/FMSN30 students email the report to FMSN30@matstat.lu.se. Subject field: Project1 by studid1 and studid2  
w17  EASTER BREAK and REEXAM PERIOD  
w18  
w19  Mon 6/5, 13.1515, MH:Riesz  Lecture 8. Maximum likelihood estimation, NewtonRaphson, properties, deviance and likelihood ratio tests. Akaike (again), PseudoR2.  Agresti: 1.3.1, 1.4.1, 2.3.12.3.3; several topics
scattered in chapter 4, particularly sections 4.14.2.
lecture8_vt19.pdf; lexture8_vt19.R 
Wed 8/5, 8.1510, MH:Riesz  Lecture 9. Residuals and model validation in logistic regression. 
lecture9_vt19.pdf;
lecture9_vt19.R  
Wed 8/5, 10.1512, MH:230  Work on project 2  
Thu 9/5, 8.1510, MH:230  Work on project 2  
w20  Mon 13/5, 13.1515, MH:Riesz  Lecture 10. Poisson distribution and Poisson regression; Negative binomial regression. Quantile regression  Agresti: several sections in Chapter 3.
QuantileRegression.pdf; lecture10_vt19.pdf; lecture10_vt19.R; negbin_data.txt 
Wed 15/5, 8.1510, MH:Riesz  8.159.00: Peer assessment
project 2:
9.1510: Wrapping up. 

Wed 15/5, 10.1512, MH:230  Work on project 2 and start on project 3  project3_vt19.pdf;
oslo.txt  
Thu 16/5, 8.1510, MH:230  Work on project 2 and/or start on project 3  
Fri 17/5, 16.00  Project 2 final deadline at 16.00.. MASM22/FMSN30 students email the report to FMSN30@matstat.lu.se. Subject field: Project2 by studid1 and studid2  
w21  Wed 22/5, 13.1515, MH:230  Work on project 3  
Thu 23/5, 8.1510, MH:230  Work on project 3  
w22  Mon 27/5, 8.1510, MH:Sigma  Project 3 oral presentations  Signup 
Mon 27/5, 10.1512, MH:Sigma  Project 3 oral presentations  
Mon 27/5, 15.1517, MH:Sigma  Project 3 oral presentations  
Tue 28/5, 10.1512, MH:Sigma  Project 3 oral presentations  
Tue 28/5, 15.1517, MH:Sigma  Project 3 oral presentations  
Wed 29/5, 8.1510, MH:Sigma  Project 3 oral presentations  
Wed 29/5, 13.1515, MH:Sigma  Project 3 oral presentations  
Wed 29/5, 15.1517, MH:Sigma  Project 3 oral presentations  
Fri 31/5, 817, MH:2278  Individual oral exams
Send Project 3 presentation to fmsn30@matstat.lu.se before your oral exam. Subject: "Project3 by studid1 and studid2". 
Signup
oral_vt19.pdf  
w23  Mon 3/6, Tue 4/6, Wed 5/6, Fri 7/6, 8.1517.00. MH:2278  
w24  Mon 10/6, Tue 11/6, Wed 12/6, Fri 14/6, 8.1517.00. MH:2278  
w25  Mon 17/6, Tue 18/6, Wed 19/6, Thu 120/6, 8.1517.00. MH:227 
Level
Advanced level.
Aim
Regression analysis deals with modelling how one characteristic (height, weight, price, concentration, etc) varies with one or several other characteristics (sex, living area, expenditures, temperature, etc). Linear regression is introduced in the basic course in mathematical statistics but here we expand with, e.g., "how do I check that the model fits the data", "what should I do if it doesn't fit", "how uncertain is it", and "how do I use it to draw conclusions about reality".
When performing a survey where people can answer "yes/no" or "little/just fine/much", or "car/bicycle/bus" or some other categorical alternative, you cannot use linear regression. Then you need logistic regression instead. This is the topic in the second half of the course.
Contents
Least squares and maximumlikelihoodmethod; odds ratios; Multiple linear and logistic regression; Matrix formulation; Methods for model validation, residuals, outliers, influential observations, multi colinearity, change of variables; Choice of regressors, Ftest, likelihoodratiotest; Confidence intervals and prediction. Introduction to: Correlated errors, Poisson regression as well as multinomial and ordinal logistic regression.
Prerequisites
At least 60 ECTS at university level including an introductory course in mathematical statistics, e.g. MASA01 Matematical statistics, basic course, 15hp, or MASB02 Mathematical statistics (for chemists) 7.5hp, or MASB03 Mathematical statistics (for physicists) 9hp or MASB11 Biostatistics, basic course 7.5hp, or equivalent.
Teaching and examination
The teaching consists of lectures, computer exercises and project work. Attendance to the three exercises is compulsory. The examination is written and oral in the form of written reports for project 1 and 2, oral presentation of project 3 and individual oral examination.
Lecturer
Anna Lindgren, tel 0462224276, office MH:136, Matematikcentrum anna@maths.lth.se.
Teaching Assistants
Learning outcomes
Knowledge and understanding
For a passing grade the student must
 Describe the differences between continuous and discrete data, and the resulting consequences for the choice of statistical model
 Give an account of the principles behind different estimation principles,
 Describe the statistical properties of such estimates as appear in regression analysis,
 Interpret regression relations in terms of conditional distributions,
 Explain the concepts of odds and odds ratio, and describe their relation to probabilities and to logistic regression.
Skills and abilities
For a passing grade the student must
 Formulate a multiple linear regression model for a concrete problem,
 Formulate a multiple logistic regression model for a concrete problem,
 Estimate the parameters in the regression model and interpret them,
 Examine the validity of the model and make suitable modifications of the model,
 Use the model resulting for prediction,
 Use some statistical computer program for analysis of regression data, and interpret the results,
 Present the analysis and conclusions of a practical problem in a written report and an oral presentation.
Judgement and approach
For a passing grade the student must
 Always control the prerequisites before stating a regression model,
 Evaluate the plausibility of a performed study,
 Relect over the limitations of the chosen model and estimation method, as well as alternative solutions.
Validate: HTML / CSS  Top of page