GCU Statistics Project

Description

Applied
13. This question should be answered using the Weekly data set, which
is part of the ISLR2 package. This data is similar in nature to the
Smarket data from this chapter?s lab, except that it contains 1, 089
weekly returns for 21 years, from the beginning of 1990 to the end of
2010.
(a) Produce some numerical and graphical summaries of the Weekly
data. Do there appear to be any patterns?
(b) Use the full data set to perform a logistic regression with
Direction as the response and the five lag variables plus Volume
as predictors. Use the summary function to print the results. Do
any of the predictors appear to be statistically significant? If so,
which ones?
(c) Compute the confusion matrix and overall fraction of correct
predictions. Explain what the confusion matrix is telling you
about the types of mistakes made by logistic regression.
(d) Now fit the logistic regression model using a training data period
from 1990 to 2008, with Lag2 as the only predictor. Compute the
confusion matrix and the overall fraction of correct predictions
for the held out data (that is, the data from 2009 and 2010).
(e) Repeat (d) using LDA.
(f) Repeat (d) using QDA.
(g) Repeat (d) using KNN with K = 1.
(h) Repeat (d) using naive Bayes.
(i) Which of these methods appears to provide the best results on
this data?
(j) Experiment with di?erent combinations of predictors, including possible transformations and interactions, for each of the
methods. Report the variables, method, and associated confusion matrix that appears to provide the best results on the held
out data. Note that you should also experiment with values for
K in the KNN classifier.
194
4. Classification
14. In this problem, you will develop a model to predict whether a given
car gets high or low gas mileage based on the Auto data set.
(a) Create a binary variable, mpg01, that contains a 1 if mpg contains
a value above its median, and a 0 if mpg contains a value below
its median. You can compute the median using the median()
function. Note you may find it helpful to use the data.frame()
function to create a single data set containing both mpg01 and
the other Auto variables.
(b) Explore the data graphically in order to investigate the association between mpg01 and the other features. Which of the other
features seem most likely to be useful in predicting mpg01? Scatterplots and boxplots may be useful tools to answer this question. Describe your findings.
(c) Split the data into a training set and a test set.
(d) Perform LDA on the training data in order to predict mpg01
using the variables that seemed most associated with mpg01 in
(b). What is the test error of the model obtained?
(e) Perform QDA on the training data in order to predict mpg01
using the variables that seemed most associated with mpg01 in
(b). What is the test error of the model obtained?
(f) Perform logistic regression on the training data in order to predict mpg01 using the variables that seemed most associated with
mpg01 in (b). What is the test error of the model obtained?
(g) Perform naive Bayes on the training data in order to predict
mpg01 using the variables that seemed most associated with mpg01
in (b). What is the test error of the model obtained?
(h) Perform KNN on the training data, with several values of K, in
order to predict mpg01. Use only the variables that seemed most
associated with mpg01 in (b). What test errors do you obtain?
Which value of K seems to perform the best on this data set?
15. This problem involves writing functions.
(a) Write a function, Power(), that prints out the result of raising 2
to the 3rd power. In other words, your function should compute
23 and print out the results.
Hint: Recall that x^a raises x to the power a. Use the print()
function to output the result.
(b) Create a new function, Power2(), that allows you to pass any
two numbers, x and a, and prints out the value of x^a. You can
do this by beginning your function with the line
> Power2 Power2 (3 , 8)
on the command line. This should output the value of 38 , namely,
6, 561.
(c) Using the Power2() function that you just wrote, compute 103 ,
817 , and 1313 .
(d) Now create a new function, Power3(), that actually returns the
result x^a as an R object, rather than simply printing it to the
screen. That is, if you store the value x^a in an object called
result within your function, then you can simply return() this
return()
result, using the following line:
return ( result )
The line above should be the last line in your function, before
the } symbol.
(e) Now using the Power3() function, create a plot of f (x) = x2 .
The x-axis should display a range of integers from 1 to 10, and
the y-axis should display x2 . Label the axes appropriately, and
use an appropriate title for the figure. Consider displaying either
the x-axis, the y-axis, or both on the log-scale. You can do this
by using log = “x”, log = “y”, or log = “xy” as arguments to
the plot() function.
(f) Create a function, PlotPower(), that allows you to create a plot
of x against x^a for a fixed a and for a range of values of x. For
instance, if you call
> PlotPower (1:10 , 3)
then a plot should be created with an x-axis taking on values
1, 2, . . . , 10, and a y-axis taking on values 13 , 23 , . . . , 103 .
Please complete the problems from the textbook listed below. For each problem, I?ve provided guidance
for what you should submit. The exercises for Chapter 4 can be found in section 4.8, with the first
section covering conceptual questions that do not require R code and the second covering applied
questions that do require R code.
?
?
?
?
?
?
?
Chapter 4 Exercise 5 (conceptual)
o Write your answer in full sentences.
o You do not need to provide a proof nor is any math necessary for answering any parts of
this question.
o No R code or output is required.
Chapter 4 Exercise 6 (conceptual)
o This exercise requires you to do some simple calculations.
o You are not required to write anything; just show your calculation.
o No R code or output is required.
Chapter 4 Exercise 7 (conceptual)
o Write your answer in full sentences.
o You are not required to write anything; just show your calculation.
o This exercise requires you to do some calculations; there are more difficult than exercise
6?s but still reasonable. If you struggle with this question, please contact me for help.
o No R code or output is required.
Chapter 4 Exercise 8 (conceptual)
o Write your answer in full sentences.
o This exercise requires you to do a calculation that is extremely simple if understand how
KNN works. If you struggle with this question, please contact me for help.
o No R code or output is required.
Chapter 4 Exercise 9 (conceptual)
o Write your answer in full sentences.
o This exercise requires you to do a calculation that is extremely simple (and should be
review of week 1 material)
o No R code or output is required.
Chapter 4 Exercise 12 (conceptual)
o Write your answer in full sentences.
o This exercise requires you to do some algebra. For assistance, see slide 29 of the Logistic
Regression PPT, page 141 of the textbook, and Video 2 ? Logistic Regression.
o No R code or output is required.
Chapter 4 Exercise 13 (applied)
o For each question, include the following:
? The R code
? Do not use screenshots – I need to be able to copy and paste your code
into RStudio.
? Please use Courier New font to make the code more readable.
? The R output ? for this assignment, you can use screenshots.
? For parts (a), (b), (c), and (i) you should answer the questions using full
sentences.
?
?
?
Note, you only need to provide R code and output for (d), (e), (f), (g), and (h);
however, part (i) has you discuss (d)-(h) and your discussion should use full
sentences.
o Tip for part (a): discuss the summary of each variable, the correlation matrix for the
quantitative variables, and the pairs plot for all variables.
o Tip for part (j): this question is deliberately open-ended. Please try at least 15 different
variable/method combinations and provide the code you used to implement them and
judge their performance. However, you only need to provide an explanation and
confusion matrix for the best performing model.
Chapter 4 Exercise 14 (applied)
o For each question, include the following:
? The R code
? Do not use screenshots – I need to be able to copy and paste your code
into RStudio.
? Please use Courier New font to make the code more readable.
? The R output ? for this assignment, you can use screenshots.
? Your answers to the questions in full sentences.
o To help with creating the binary variable in part (a), I recommend that you review the
code on page 56 that creates the Elite variable for the College data set and adds it to the
data frame.
Chapter 4 Exercise 15 (applied)
o Writing functions is important if you want to use R for real applications. You will get
practice doing so while completing this problem.
o For all parts, include the following:
? The R code
? Do not use screenshots – I need to be able to copy and paste your code
into RStudio.
? Please use Courier New font to make the code more readable.
? The R output ? for this assignment, you can use screenshots. Note, no output is
required for part (d).
? For part (e), explain why you might display the x- or y-axis on a log-scale.
Here?s an example of how you might answer questions that require R code. Note, this example is related
to regression and not classification.
Question: Using the Boston dataset, regress the median home price (medv) on the percentage of
residents with lower social status (lstat). Does there appear to be a relationship between the variables?
What is the R^2 of the model and what does it say about the model fit?
Rcode
summary(lm(medv ~ lstat))
R Output
Answer
Yes ? the regression indicates that the percentage of lower status residents (lstat) is related to the
median home value (mdev) for a census tract as the p-value for its regression coefficient is less than
0.05. The output shows the R^2 is 0.5441, which indicates that about 54.41% of the variability in mdev is
explained by lstat. This is a relatively high R^2 for a simple linear regression.
Tip: you can expand the window that includes the R output by clicking the highlight icon in the
screenshot below. This will help you capture the full R output in one screenshot.

Purchase answer to see full
attachment

Description

We offer the best custom paper writing services. We have done this question before, we can also do it for you.

Why Choose Us

How It Works