0. This document

This is an Rmarkdown (.Rmd) document, and you may be reading the HTML version (after its compilation). Please get the .Rmd version in our website, in order to:

In case of errors, look for help or contact your instructor.

PROBLEM 1: a study of cars

Loading the data

The following lines load a dataset in a variable called mtcars, and show its structure as an R variable.

data(mtcars) # loads predefined data
str(mtcars) # shows structure of the variable
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

You can ask for help about the data, by calling help(mtcars). Then you will know a bit more about the nature, units of measurement, etc., of every column of the spreadsheet.

You can see the whole dataset or just the first rows:

head(mtcars) # shows first rows
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
mtcars # shows the whole dataset (watch out if too long!)
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
summary(mtcars) # summary of every column
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000

Be aware that columns vs and am are numeric, but numbers are really codes:

  • vs=0 means V-engine and vs=1 means straight engine.
  • am=0 means automatic car and am=1 means manual car.

Also, columns cyl, gear and carb are numeric, but they are better considered as categorical for some plots.

There is nothing else to be done in this section.

Exercices on mtcars

1.1. (a) Show the general plot in order to see the pairwise relations among the columns of the data. (b) Choose the pair of columns showing the more appealing scatterplot. (c) In that choice, which variable could be seen as the “cause” and which one could be seen as the “effect”? Explain why.

# here you put some code and get the right plot
# you can hide the R code in the output HTML
# by setting the option echo=FALSE
# for instance  ```{r 1-1, echo=FALSE}

Here you put your comments

1.2. Show the plot of efficiency (miles per gallon, mpg) as a function of the weight of the car. Is the trend apparently linear? Do you think the same trend will continue beyond the plot? Why or why not?

# here you put some code and get the right plot

Here you put your comments

1.3. Compute the average “miles per gallon” for the sample of automatic cars, as well as for the sample of manual cars.

# here you put some code and get the right values
mean.auto = 0
mean.manual = 0

The mean value for the subsample of automatic cars is 0 and the mean value for the subsample of manual cars is 0.

1.4. Let us assume that the variable “miles per gallon” follows a normal distribution (\(N(\mu_A, \sigma^2_A)\) for the automatic cars, and \(N(\mu_M, \sigma^2_M)\) for the manual cars). In spite of the two different mean values found in problem 1.3, is it possible that the two true population mean values \(\mu_A\) and \(\mu_M\) are actually the same value? Name a statistical technique for solving that question, and apply it with R. Comment on the obtained result.

# here you put some code and get the answer

Here you put your comments

1.5. (a) Draw comparative boxplots of “miles per gallon”, for the three groups of cars, defined by the “number of cylinders”. (b) What does the plot of this sample “apparently” tells about the comparison of the mean values of the three populations? (c) In spite of the plot, is it possible that the three true population mean values (\(\mu_4\), \(\mu_6\) and \(\mu_8\)) for the populations of cars of 4, 6 and 8 cylinders, respectively, have indeed the same value? Name a statistical technique for solving that question, and apply it with R. Comment on the obtained result.

# here you put some code and get the right plot

Here you put your comments

1.6. (a) Draw the effect of the “displacement” on the time for the “1/4-mile run”. Color the points per transmission type. Can you see a clear boundary line separating points of automatic and manual cars? (b) If an unknown car had a displacement of 225 cubic inches, and performed the 1/4-mile in 16 seconds, how would you classify it: as an automatic or as a manual car? Why?

# here you put some code and get the right plot

Here you put your comments

1.7 Which are the names of the automatic cars with the maximum and minimum time, respectively, for the 1/4-mile run?

# here you put some code and get the names of the cars

Here you put your comments

1.8. Which is the ranking position of “Honda Civic” with respect to the variable “miles per gallon”? (Hint: have a look at the rank() function)

# here you put some code and get the position

Here you put your comments

PROBLEM 2: write a function for standardizing a multivariate sample contained in a matrix or data.frame.

# Instructions:
# the name of the function is 'standardize'
# the argument is a matrix or data.frame
# try to avoid "for loops"
# 1st: compute vector of mean values of all columns
# 2nd: compute vector of standard deviation of all columns
# 3rd: subtract each mean to each column
# 4th: divide each colum per each s.d.
# 5th: return the new data.frame (or matrix)
# verify your result with some examples
standarize = function(x) {
  # here your code
}

PROBLEM 3: (Classical kinematic physics problem) A shooter is lying on the surface of the ground, and pointing his bazooka at an angle of \(\pi/4\) over the surface of the ground. This angle has indeed an error modeled by a Gaussian distribution of mean \(0\) and standard deviation \(\pi/60\). The initial speed of the bullet should be \(300\) m/s, but it is actually a completely random value between \(299.00\) and \(301.00\) m/s. Finally, the effect of the wind should be null, but it provides a horizontal acceleration whose value follows the Gaussian distribution of mean \(0\) m/s\(^2\) and standard deviation \(0.1\) m/s\(^2\). The vertical acceleration is given by the gravity force (a perfect \(9.8\) m/s\(^2\) downwards). (a) If everything was fine (no error at any parameter), at which distance of the shooting would impact the shot on the ground? (b) But life is not easy: what is the probability that the impact lies within a distance of 50m around the theoretical target? Explain how you have made it. (Remark: this is a 2D problem, not 3D. Typical parabolic shooting explained in secondary school)

# here your computations

Here your comments