Please answer the following questions guidelines mentioned below in separate documents
1. Overview of Data Mining Process (Chapter – 2): (30 points)
Problem 2.1: (8 points)
Problem 2.3: (2 points)
Problem 2.5: (5 points)
Problem 2.6: (5 points)
Problem 2.10: (5 points)
Problem 2.11: (5 points) – Use the Toyota Corolla file
a. While exploring the data please display all the column names and remove redundant ones. Take time to understand the data so it is easier to use.
i. In 2.11 b, i, Replace “Metallic” with “Color”
ii. random seed is the default seed, so you do not need to specify it in your code.
2. Data Exploration and Visualization (Chapter – 3): (20 points)
Q1. Car Insurance: Scatter Plots: A company’s sales department that gives car insurance is looking for prospective customers. The company is interested in classifying customers as prospective Yes or No response based on Income and previous year claim amount. The marketing expert looked at a random sample of 24 households, given in the file car_insurance.csv
- Using R, create a scatter plot of Total Claim amount vs. Income, color-coded by the outcome
variable Yes/No. Make sure to obtain a well-formatted plot using ggplot (create legible
labels and a legend, etc.) (7 points)
- Can we in a generic case say that if a customer has higher income, he is likely to respond Yes to an insurance claim? (3 points)
Q2. Problem 3.3 – LaptopSales2008 (10 points): You need not use ggplot. Simple graphs will do.
3. Case Study – Movie Theater Releases: (30 points) –> (see the file “Case – Movie Theater Releases.pdf)
Histograms: 10 points
Scatter Diagram: 10 points
Interpretation: 10 Points
Submission Requirements: (Multiple files would need to be uploaded) (20 points)
All submission files must be named as: LN_FN_Asignment #. R or docx, etc.
- R-Code (10 points): Must Submit the Fully Functional R-Code that is ready to run. Please include reasonable comments like Question number. Submit only .R file and not .txt or any other file type.
- Summary Write-up (10 points): Your report must be fully self-sufficient to a reader. That is,
(a) it must include the problem statement number (Match with the textbook/e-learning),
(b) R-Code, (Cut/paste from your code file),
(c) Illustrations – Graphs/Tables (Properly numbered and visible).
(d) Proper Explanation of results (referring to figures, tables, as appropriate.).
- The summary write-up (from d) must be submitted in the “Turnitin” below. Failing to do so will reduce your score by 20%