Malstats v0.02

2016-05-06
    The first working build of Malstats is now working. Took about 3 days and I finished it yesterday and let people populate the DB for a day and it seems it's still running fine. Currently, it only provides users with a "hipster" score, a % that is determined as the average % of how many people have NOT completed a show you have. Essentially, finishing a show that no one else in the DB has seen would give that show a hipster score of 100%, take every show you've finished and average that value and you got your score. Currently it provides a pure average, which I believe won't be as representational once many users are added to the db and it becomes harder to have a show that no one has seen. What I'll do is I'll simply square the hipster score for each show, meaning shows closer to 100% drop only a little in value, while shows closer to 0% can drop significantly.

For example, we have 2 users, each with 3 entries
User 1's show's hipster scores: 100%, 100%, 0%
User 2's show's hipster scores: 66%, 66%, 66%

It's assumable that User 1 should have more of a hipster score, since they've seen 2 shows no one else, but they both average to a hipster score of 66%. Now if we square those values and then average them

User 1's show's hipster scores: 100%, 100%, 0%
User 2's show's hipster scores: 44%, 44%, 44%

User 1 still has an avg of 66%, but User 2 has now dropped to 44%, and after we square root the average, results in User 1 with a score of 81% and User 2 with a score of 66%, which better reflects whether or not they watch more obscure shows.

Another example
User 1: 100%, 0%, 50%, 50%, 75%, 25% - 6 entries, 50% avg
User 2: 100%, 100%, 0%, 90%, 20%, 90%, 25%, 25%, 25%, 25% - 10 entries, also 50% avg

Squared and then rooted
User 1: 100%, 0%, 25%, 25%, 56%, 6% - 35% rooted-> 60% avg
User 2: 100%, 100%, 0%, 81%, 4%, 81%, 6%, 6%, 6%, 6% - 39% rooted-> 62% avg

Though User has watched a lot of popular shows, since they've also seen an amount of obscure shows, it gives them a higher score in the end

    Before the new layout design, I had a script that calculated hipster score, in addition to grabbing the usual tags for the DB. One problem however were tags not provided in Mal's xml data - season, duration, VAs. That would have to be done with an HTML scraper. HTML scrapers though, seem to take their time with each request. So when the script has to fetch data from 200-600 pages, and it takes a second or two to load each page, that causes loading times to be long. So I trashed that script and had to redesign with some sort of queue in mind. Available queue systems seemed to complicated for me to understand and implement, but I was able to find this short one that didn't require any libraries and consisted of just 2 php scripts and a crontab shell to automate them. Tinkering with this code turned out quite troublesome as when I placed the SQL login into the worker script, it crashed the SQL DB since it was in an infinite loop. Figuring out how to restart MySQL took an hour or two, but in the process, I figured out to use cron to run background scripts, which gave me what I needed to make my own queue system. My design was pretty simple and would probably get ridicule from an actual web developer, but how I got it to work is: So yeah, it seems to work fine, now I just to got to expand the features.

    The first step is to make individual pages, currently I just have everyone's score on your page, regardless of what username you submitted. That'll be moved to /global/ and your page will just display your own score in addition the whatever else. Second, I plan on cataloguing cast and staff members. I recently learned of AJAX's capabilities so that greatly helps how I'm going to code it (Was going to just dump every statistic through php and then hide the elements). Third will be stats per season and then hopefully replications of the graphs I create in excel in the future.