Any thoughts?
Data was pulled from MyWaifuList
Also, anything else waifu-related you want me to investigate?
Any thoughts?
Data was pulled from MyWaifuList
Also, anything else waifu-related you want me to investigate?
im not familiar with the site
what kind of other metrics do they track?
This same plot but with average per age
Could be just a colored line over the rest
Who are these outliers?
It's a public dataset from kaggle
They have 15000+ waifus on record, but most have little to no data
They also track weight, height, bust, waist, origin, blood type, series, birth date, tags
like average of different age categories?
I cut off the plot above 50, since some waifus have a registered age of multiple thousands of years
The waifu with highest like/trash ratio is Miyamori Aoi, with 171 likes and 4 trashes
There's higher values, but they are all missing age in the set
Yeah, average for each column of the plot
rating vs bust size
lets end this debate
>young girls are more desirable
whoa... really activated my almonds...
Thick black line denotes mean, boxes denote quantiles
Mean of 14-18 looks to be the highest, but there's not a clear difference with other groups
Plot the difference or the logarithm of the ratio. Like this, you can only see if there are particularly high like-numbers, while a 1/1 and a 1/10000 look practically the same. Another way to plot it would be two histograms, one for likes one for trashes, similar to steam's positive/negative review histograms.
Can you plot like to trash ratio in dependence of popularity?
If you have the technical know-how, can you do an unbinned likelihood fit for the data shown in the OP?
If you have the technical know-how, can you train a neural network to try and produce the perfect waifu?
No clear correlation between the two
Most waifus have a bust of around 89-90cm
Keep in mind: of the 15425+ waifus, bust size was only reported for 1775
Just an amateur statistician here
To correct for popularity I could add weights per series or characters, for all 5000+ series in the dataset that would take forever
By taking the ratio of likes/trashes I have partly corrected for popularity instead
I also censored every waifu with less than 4 ratings
like/trash ratio's with 0 trash ratings were calculated as likes/1
the dataset is far from perfect, the only truly unbiased variables are name, series, trashes, likes
Besides, is the MyWaifuList community's opinion objectively right?
It should be possible to estimate like/trash scores per waifu if I had more complete data though...
>MyWaifuList
What the fuck
Did you extract only the like trash ratio from the dataset?
If you still have the absolute number of trashes and likes you can use that as your measure of popularity.
like this?
most waifus have less than 30 ratings, while the popular ones can have over 3000, plots will turn out very squashed on the lower numbers if I isolate likes or trashes
The plot is a bit fucked, due to the fact that low ratings can have a large variation compared to higher ratings
Both dimensions are log transformed
I think you could say that popular waifus mostly have positive ratio's
Most trash waifu is jailbreak from The Emoji Movie btw (52 trash vs 0 likes)
>le waifu == female character xd
I'm getting to old for this shit
What tools do you use to construct these graphs?
>binning
I use R studio, kernel/github for the database, ggplot2 package for the nicer plots
>rstudio
naisu
schoolwork has me most familiar with mathematica, numpy/pyplot as well as excel's more complicated bits
Do a ListPlot3D
Don't really get into maths that much in my studies
I am usually not that into coding either, but since I discovered publicly available datasets I've been coding in my spare time