THIS ASSIGNMENT MUST BE DONE IN EXCEL. THERE ARE 2 ” NEW QUESTIONS” AT THE BOTTOM OF THIS PAGE. THOSE ANSWERS MUST BE DONE IN WORD. The purpose of this assignment is to use spreadsheet capabilities to perform data manipulation and to explain the process used in the handling of the data.
For this assignment, you will use the “Claims” dataset. In the dataset, the claims data for n = 608 people are recorded. The data derive from a random sample of females diagnosed with ischemic heart disease over 24 months (see Exercise 7.27 in the textbook).
Instead of using urgent care centers, some people rely on the Emergency Room (ER) to address most, if not all, of their medical needs. In fact, someone who has three or more ER visits within 24 months is considered a high ER user. Complete the steps below to execute this assignment.
- Using the dataset and Excel, create a new column titled “High_ER_User” with “Yes” if three or more ER visits; otherwise “No.”
- Duration is measured in days, but 30-day intervals are more appropriate for most reporting purposes. Using Excel, create a new column titled “Duration_Months” by converting the duration into 30-day intervals.
- Many times complications and comorbidities are rare; therefore, these two negative events are summed together. Using Excel, create a new column titled “Comps_Comorbs” by adding complications with comorbidities.
- Many times age is grouped in 10-year intervals. Using Excel’s VLOOKUP function, create a new column titled “Age_Group” with grouped ages of “21-30 yrs,” “31-40 yrs,” and so on for 10-year intervals. The last age group would be “61-70 yrs.” Use a tab titled “Age_Groups” for this task.
Next you will create a pivot table with the data and execute the following (refer to the examples in the resource “Data Manipulation Screenshots”).
- Use “High_ER_User” as a filter to obtain two filtered views of the pivot table.
- Summarize the data to get counts of claims, sum of claims and months, and average of procedures, prescribed drugs, ER visits, and complications/comorbidities.
- Add a calculated field titled “Claims PM” to the pivot table. This calculated field is the sum of claims divided by the sum of duration months and measures the average claim amount per month (PM).
APA format is not required, but solid academic writing is expected.
This assignment uses a grading rubric. Please review the rubric prior to beginning the assignment to become familiar with the expectations for successful completion. Excel spreadsheet column updates specified in steps 1-4 are complete and correct. The pivot table described in steps 1-3 is complete and correct. New Question Suppose you had daily temperature data indicating the “high” point of each day for 2015. If you want to show how the high differs over time, what are some of the plot types that will allow you do this? What are some benefits to binning the data into one of 52 weeks and plotting the average high for each week? Would it make sense to do something similar for the four quarters in the year? Why or why not? New Question Many times, data are missing because of various reasons. This poses some challenges when doing data analysis. For example, suppose you wanted to do some analysis of the yearly incomes of the faculty at GCU. When asked for their incomes, 25% of the faculty did not participate in the survey; therefore, their incomes are missing from the dataset. How would you summarize the income data in this case? Is it appropriate to ignore the missing incomes and summarize the data without them? Should you estimate the missing incomes, perhaps with the overall average, to complete the data set?