Thursday, March 26, 2015

Data Analysis for Mega Poll (Prologue)

I haven't really made a post on here in a long time.  However, I wanted to help out PushDustin in his newest poll.  I have taken some of the data and analyzed it and here are my preliminary results.  I will also make some post with some of the new details that have come out.

To give some background of what I did, I'm run two analysis

-Hypothesis Test - This is where I try to prove a hypothesis using statistics. Say, for instance, I want to know who spends more on shoes, Men or Women? I would take the data for each and compare the means (averages). If the means are statistically different, it means there is a difference between price paid for shoes and gender. In order for there to be statistical significance, the P-Value has to be below 0.05.

Regression Analysis - This is where I'm trying to see does the change in Y explain the Change in X. One of the analysis I ran was comparing age to spending. This gives me a beta. The beta means that for every change in Y, there is a corresponding change in X. I also did correlations as part of this.

I want to also note I made some adjustment. For age, I threw away any value that was over 80 and for hours played, I threw out any that was over 100. There were some obvious joke answers.

For the money, I made a few changes
-I converted everything to dollars. Exchange rates are based on data from 3/16/15. 
-If they game me a range, I used the average
-If they said "anything" I used 25 as it was a high number
-I deleted anything that was unclear. Also, some people put there favorite character. Popular choices were Wolf, Ice Climbers, Lucas, Snake, Rayman, Issac, K Rool and Bob. I should note that some people would pay more for specific characters.

By the way, that part took about an hour to do. I won't do much with money until the final results are in. 

Melee compared to other games
I ran an ANOVA (hypothesis) test comparing if someone people who said Melee was their favorite game would also have played Project M. For this test, I made "If Melee was your favorite game" I put 1, if it wasn't, I put 0. I compared it to "Yes: I Played Project M," and "No/I Don't Know." The yes had a count of 1,421 and the No/I Don't Know had a count of 910. The P-Value was 2.05. This means that I could not reject the Null hypothesis and that there was no statistical difference between Melee fans and playing Project M. 

That was the only hypothesis test I ran for now. Next, I compared how "Hardcore" someone was. I compared hours played to For Fun/For Glory ranking. This was a regression analysis, so I'm comparing if the Hours Played would effect how "Hardcore" they were. I expected that as hours played a week went up, so did the "Hardcore" number. What I found was it didn't. The beta was negative 0.62, which fits with what I suggested (unless I'm mixing up the numbers), but the R -Squared was 0.006. This means only 0.6 percent of the change in hardcore is explained with the change in hours. 


Money$$$

Now, for the fun one. I ran a regression analysis to see how the change in Age and the change in hours played affected how much money someone would spend. I expected the former would have a negative correlation and the latter would have a positive correlation. The correlation for both was about what I said, a correlations of 0.07 when compared to hours and a negative correlation of 0.02. The betas were about the same, but the R-Squares were laughably low, so the changes in either age or hours played does not effects how much someone will pay for character DLC. One last thing to note is the mean and standard deviation (SD). To explain, SD is how something falls in a bell curve. 68 percent of the results will fall between plus/minus one standard deviation.  A higher SD means the results were more spread out.  A lower one means they were closer together. The mean age is 19.6 and 68 percent fall between 24.1 and 15.1 (one SD +/-). The mean for DLC price is $7.55. The SD is $9.05. This is because there was a wide variety of answers including a lot of 0, 5 and even 20 or more.  

I will note that there is a lot more I can gleam from the data. However, the trend so far is there is little relation to the variables. What this is telling me is that Smash Brothers fans are very diverse people. There is no common trend that is coming from the data even when it would be reasonable to expect one. There is defiantly a lot of difference in the people who play and love Smash. 


Additionally, my interpretation of that data is that what Sakurai said about DLC is somewhat true. The SD of Price for DLC characters is very high and there were a lot of reposes that were 0 which implies they don't want to pay for it. The mean was still about $7 which is consistent with the $5 you see in other games. The data implies there is a market for it in that people are defiantly willing to pay. The issue will be setting a price that is reasonable to the greatest number of people. 

That's all for now. If you have anything you want to know, just ask in the comments. I likely wont do the money stuff until it's all done. Oh, and this was all on English data.  I plan to go over it more when the poll officially ends in a few days.  Since this took so long, i will likely do it over a few blog post.  Thank for reading.


Don't forget to take the poll.  The more data, the better
https://docs.google.com/forms/d/17deELX3QVDofXksu8roKCl_kyREFSThygPSoEVT2eNs/viewform

No comments:

Post a Comment