ECON 335: Applied StatisticsProject (Due
- Brief report (should be at least two pages)
- 80 points
- Use the template on Canvas
- Use the data on Canvas with your name
- Upload to Canvas
You are the owner of a small movie theater. Since you only have one screen, you need to decide which movie to show. You decide to run a regression analysis that seeks to explain daily ticket sales (in dollars) using the movie’s budget (in millions) and length (in minutes).
With your results, you want to predict the daily ticket sales for the following movies:
Inferno of Impact: has a budget of 98 million dollars and is 110 minutes long.
Triple Payback: has a budget of 65 million dollars and is 123 minutes long.
NOTE: for this project, it is okay if you extrapolate your prediction (i.e. predict outside of the relevant range of your data).
Your goal is to write a summary of the problem and your results, using the data provided. You want to:
- write an introduction: describe the problem, your methodology, and the data you will use.
- present descriptive statistics: report the minimum, maximum, average, and standard deviation for all variables.Include at least one scatterplot that shows the relationship between the dependent variable and an independent variable.
- write down a regression model: identify your dependent and independent variable(s) and write down the regression equation you are going to estimate.
- describe the results: you want to report:
- the overall F-statistic and what it implies
- the estimated coefficients
- the interpretation of coefficients
- statistical tests of the coefficients- i.e., formulate a null and alternative hypothesis for whether a coefficient is statistically significant, and then report the results.
- the predicted ticket sales for both movies
- write a conclusion: summarize what you did, and which movie you chose to show
This project is worth 80 points.To provide consistency in grading, I will use a standardized rubric, which can be found on Canvas by clicking on the “written assignment” link under the “Paper” module.
This is an individual project. Any plagiarization will result, at minimum, in a zero for the project.
Econ 335 Project
As an entrepreneur, I run a small movie theater. However, owing to the scarcity of resources, I have managed to acquire only one screen. There are two movies with a high demand: Inferno of Impact, and Triple Payback. Since, I cannot show them concurrently, I must decide which one to show. In making this decision, I must take a step that maximizes the daily ticket sales subject to budget and time constraints. The budgeted amount for the first movie is 98M dollars and it takes 110 minutes to air. The budgeted amount for the second one is 65M dollars and it takes 123 minutes to show.
The data used was collected from the ticket sales and length of movies for the last fifty days. The data was first analyzed using standard deviation, mean, minimum and maximum. The minimum statistics denote the smallest data value. On the other hand, the maximum statistics represent the largest data value (NCSS Statistical 3). Mean and standard deviation shows the average and dispersion of daily ticket sales, budget and length respectively.
Second, I decided to run a regression analysis. It was useful in solving the underlying ticket sales issues limited to the budget and time factors. First, it guided me in making stronger causal conclusions from the observed relationship between daily ticket sales, budget and length of the movie. Second, it helped me in predicting the likely daily sales resulting from the each of the two movies, Inferno of Impact and Triple Payback, based on their budgets and lasting time. Since we are in a technological era, I used IBM SPSS Statistics 21 software to perform all analyses. The results were presented in tables and figures. After analysis of data, the obtained constant and coefficient values helped me in deciding the movie to show.
Mean, Standard Deviation, Minimum and Maximum
From table 1, the minimum value for the ticket sales a day is $80.65 while the maximum value is $763.63. It is also evident that the minimum value for the movie budget is $50.00 million and the maximum one is $149.00 million. Additionally, the table indicates that the minimum value for time in minutes it takes to air one movie is 46 minutes while the maximum value for time is 180 minutes. Using these results the range for daily ticket sales, budget and length is $682.98, $99 million and 134 minutes respectively.
It is also clear from table 1 that the mean values for daily ticket sales, budget and length are $386.83, $103.64, and 121.82 minutes respectively. The standard values of $147. 812, $26.645 million and 39.069 shows the spread or dispersion of daily ticket sales, budget and length from the mean respectively.
Fig. 1 Scatterplot showing daily ticket sales against movie’s budget
The results indicate that in the absence of movie’s budget, daily ticket sales are $278. Additionally, in the isolation of the length of the movie, each unit of increase in budget led to 0.00000105 units increase in daily ticket sales
Fig. 2. Scatterplot showing daily ticket sales against movie’s length
The results indicate that when movie’s length is zero, daily ticket sales are $329. Additionally, in the isolation of the budget, each unit of increase in length led to 0.48 units increase in daily ticket sales.
Y = α+β1B +β2 L + e
The problem that this paper seeks to address is the ability to predict the daily ticket sales from two movies. As posited by Troeger a regression model is useful while investigating bivariate as well as multivariate associations among variables. The investigator hypothesizes that one variable depends on another or a combination of several variables (1). Therefore, the item of the daily ticket sales is the dependent variable, which is represented by Y in the regression model. It is also important to note that the daily sales from tickets will be a function of a constant or intercept term denoted by alpha (α), budget of each movie as represented by (B), the time it takes to show one movie, which is denoted by (L), and e, which is an error term. As noted by Freedman, this error represents other predictors that have not been included in this equation (1). The budget and the length in time are the predictor variables. β1 and β2 represent the coefficients of the budget in millions and time in minutes for every movie respectively.
Model Summary Output
|Model||R||R2||Adjusted R2||Std. Error of the Estimate|
|a. Predictors: Length of a Movie, Movie’s Budget|
The computation of the model summary table values was important as it gives information necessary in interpreting how good the regression line was in accounting for the total changes in the predicted variable. R2 is the coefficient of multiple determination, and explains the variability in daily ticket sales that can be explained by the length and budget of the movie. Therefore, the value of R2 (0.05) shows that the length and budget of the movie only account for 5% of the variation in the daily ticket sales. Since the value of R2 is very low or far from 1, it appears that the model is very weak in making predictions among the study variables.
Analysis of Variance
|Summation of Squares||df||Mean Squares||F-value||Sig/ p.value|
|a. Dependent Variable: Daily Ticket Sales|
|b. Predictors: (Constant), Length of a Movie, Movie’s Budget|
The ANOVA technique is useful in a multiple regression as it helps in testing the hypothesis about the quality of given parameters (Shalabh 23). The F-statistics value works like R Square because it shows the goodness of fit of the regression. In fact, R Square can be obtained by subtracting the ratio of sum of squares of regression (SSreg) to the total sum of squares (SSt), i.e., R2 =1- (SSreg/SSt). SSreg/SSt indicates the part of variability that the regression model explains to the total change.
From table 3, F statistic value is 1.241 and has a p-value of 0.298. Decision on the usefulness of the regression model in predicting responses is arrived at by considering the size of the p-value. Since the analysis was conducted at a significance level (α) of 5%, then the model was not useful since the p-value as computed is 0.0298. In other words, p-value = 0.298 > 0.005. Therefore, there lacked a strong basis to conclude that the length of a movie and/or its budget significantly influenced the daily ticket sales.
Coefficients, Testing and Analysis
|Model||Unstandardized Values||Standardized Values||t||Sig.|
|Length of a Movie||.455||.538||.120||.846||.402|
|a. Dependent Variable: Daily Ticket Sales|
The coefficients results in table 4 were useful in determining how much each predictor contributed to the model and its significance. The p-value of t-test for every independent variable was analyzed relative to the predetermined significance level, 5%. These p-values assisted in testing the null hypothesis that there is no effect of predictor variable (s) over the criterion variable. Predictors whose p-values are equal to or lower than 5% implies that they are meaningful additions to the regression model and they significantly influence the changes in criterion variable. On the other hand, larger p-values indicate that variations in independent variables insignificantly influence changes in dependent variable.
From table 4 results, the beta value of movie’s budget is 0.000001027 and p-value is 0.200. The value of 0.000001027 implies that controlling for the length of a movie, each unit of increase in its budget led to 0.000001027 units increase in daily ticket sales.
These results were useful in testing the hypotheses:
Ho: β1=0: Movie’s budget is not useful in forecasting daily ticket sales
Ho: β1≠0: Movie’s budget is useful in forecasting daily ticket sales
Since the movie’s budget p-value of 0.200 >0.05, the null hypothesis (Ho: β1=0) is not rejected. Therefore, the conclusion is that, at 5% significance level, movie’s budget is not useful in forecasting daily ticket sales.
It is also observed that the beta value for the length of movie is 0.455 with a p-value of 0.402. This beta value implies that, controlling for movie’s budget, every unit increase in the length of a movie predicted 0.455 unit rise in the daily ticket sales.
Ho: β1=0: Length of a movie is not useful in forecasting daily ticket sales
Ho: β1≠0: Length of a movie is useful in forecasting daily ticket sales
Based on the fact that p-value of 0.402 >0.05, the null hypothesis is not rejected, and a conclusion that the length of a movie is not useful in forecasting daily ticket sales is made.
From results in table 4, a multiple regression model took the form of: Y = 224.991 +0.000001027 B +0.455L + e. From this equation, it was observed that when both length of a movie and its budget are zero, the predicted daily ticket sales were $224.991.
Predicting Ticket Sales
The predicted ticket sales for both movies were as follows:
Inferno of Impact = 224.991 +0.000001027 *98,000,000 + 0.455*110
Triple Payback = 224.991 +0.000001027 *65,000,000 + 0.455*123
The regression model used was very weak in making predictions among the study variables. The obtained coefficients and p-values showed that a movie’s budget and length were not useful in forecasting daily ticket sales. As a result, null hypotheses that: (1) movie’s budget is not useful in forecasting daily ticket sales, and (2) length of a movie is not useful in forecasting daily ticket sales were adopted. From the regression analysis results, the daily ticket sales for Inferno of Impact were $ 375.687 while those of Triple Payback were $347.711. Therefore, I chose to show the Inferno of Impact because it had higher ticket sales value. Through this analysis, I also understood that other significant factors affected the daily ticket sales.
Freedman, David A,. “What is the error term in a regression equation?” 2005. University of California, Berkeley. https://www.stat.berkeley.edu/~census/epsilon.pdf. 17 November 2018.
NCSS Statistical . “Descriptive statistics – Summary tables.” n.d. NCSS Statistical . https://ncss-wpengine.netdna-ssl.com/wp-content/themes/ncss/pdf/Procedures/NCSS/Descriptive_Statistics-Summary_Tables.pdf. 17 November 2018.
Shalabh. “Chapter 2 simple linear regression analysis.” n.d. Indian Institute of Technology Kanpur. http://home.iitk.ac.in/~shalab/econometrics/Chapter2-Econometrics-SimpleLinearRegressionAnalysis.pdf. 17 November 2018.
Troeger, Vera E. “The simple linear regression model.” n.d. University of Warwick. https://warwick.ac.uk/fac/soc/economics/staff/vetroeger/teaching/po906_week567.pdf. 17 November 2018.
Use the following coupon code :