Any thoughts?

Any thoughts?
Data was pulled from MyWaifuList

Also, anything else waifu-related you want me to investigate?

Attached: like to trash ratio vs. age.png (531x466, 10K)

im not familiar with the site
what kind of other metrics do they track?

This same plot but with average per age

Could be just a colored line over the rest

Who are these outliers?

It's a public dataset from kaggle
They have 15000+ waifus on record, but most have little to no data
They also track weight, height, bust, waist, origin, blood type, series, birth date, tags

like average of different age categories?

I cut off the plot above 50, since some waifus have a registered age of multiple thousands of years
The waifu with highest like/trash ratio is Miyamori Aoi, with 171 likes and 4 trashes
There's higher values, but they are all missing age in the set

Yeah, average for each column of the plot

rating vs bust size
lets end this debate

Attached: flat vs fat.png (1208x994, 747K)

>young girls are more desirable
whoa... really activated my almonds...

Thick black line denotes mean, boxes denote quantiles
Mean of 14-18 looks to be the highest, but there's not a clear difference with other groups

Attached: like to trash ratio vs. age categories.png (531x466, 9K)

Plot the difference or the logarithm of the ratio. Like this, you can only see if there are particularly high like-numbers, while a 1/1 and a 1/10000 look practically the same. Another way to plot it would be two histograms, one for likes one for trashes, similar to steam's positive/negative review histograms.

Can you plot like to trash ratio in dependence of popularity?
If you have the technical know-how, can you do an unbinned likelihood fit for the data shown in the OP?
If you have the technical know-how, can you train a neural network to try and produce the perfect waifu?

No clear correlation between the two
Most waifus have a bust of around 89-90cm
Keep in mind: of the 15425+ waifus, bust size was only reported for 1775

Attached: like to trash ratio vs. bust size.png (972x466, 13K)

Just an amateur statistician here
To correct for popularity I could add weights per series or characters, for all 5000+ series in the dataset that would take forever
By taking the ratio of likes/trashes I have partly corrected for popularity instead
I also censored every waifu with less than 4 ratings

like/trash ratio's with 0 trash ratings were calculated as likes/1

the dataset is far from perfect, the only truly unbiased variables are name, series, trashes, likes
Besides, is the MyWaifuList community's opinion objectively right?
It should be possible to estimate like/trash scores per waifu if I had more complete data though...

>MyWaifuList
What the fuck

Did you extract only the like trash ratio from the dataset?
If you still have the absolute number of trashes and likes you can use that as your measure of popularity.

like this?
most waifus have less than 30 ratings, while the popular ones can have over 3000, plots will turn out very squashed on the lower numbers if I isolate likes or trashes

Attached: log(like to trash ratio) vs. age.png (972x658, 20K)

The plot is a bit fucked, due to the fact that low ratings can have a large variation compared to higher ratings
Both dimensions are log transformed

I think you could say that popular waifus mostly have positive ratio's

Attached: log(ratings) vs log(liketrash ratio).png (972x658, 85K)

Most trash waifu is jailbreak from The Emoji Movie btw (52 trash vs 0 likes)

Attached: trash.jpg (880x966, 330K)

>le waifu == female character xd
I'm getting to old for this shit

What tools do you use to construct these graphs?

>binning

I use R studio, kernel/github for the database, ggplot2 package for the nicer plots

>rstudio
naisu
schoolwork has me most familiar with mathematica, numpy/pyplot as well as excel's more complicated bits

Attached: Screen Shot 2017-04-04 at 10.46.26 AM.png (1786x1414, 455K)

Do a ListPlot3D

Don't really get into maths that much in my studies
I am usually not that into coding either, but since I discovered publicly available datasets I've been coding in my spare time