statistics – Wagner Gonçalves Pinto

An analysis of the height of volleyball players

I have been recently following two major volleyball events: the men’s and women’s 2022 FIVB Volleyball World Championship.

I got interested in the different physical profile of the teams. The biotype of the players of the Japanese Women’s team seemed completely different from the others, nevertheless they are quite successful. Interested in taking a deeper look into it and looking at how the height correlates with how successful a volleyball team is (and if there is any correlation at all), I decided to check the numbers by myself.

Volleyball on air by Steven Ross, flickr, CC-BY-NC-2.0, cropped

This post is an adapted version of a Jupyter notebook that includes all data scrapping, treatment and plotting scripts on Python that you may find on my github.

Continue reading “An analysis of the height of volleyball players”

“Ink, toil, tears and sweat”

During my PhD, I kept all the pens that I have finished. There was not much thinking at the beginning, but when the numbers started getting significant I though it could be a fun and useless statistics I could easily keep record.

On my last day I took a picture with them, having a bar graph in mind. I forgot about it, and now, almost 2 years later and with timing for posting long time gone, I have crossed the images once more when cleaning my hard drive. Finally, I present my use of pens.

Histogram of consumed pens categorized by color

In total, there are 9 pens: 7 black, 1 blue and 1 red. Naturally, they do not represent the totality of pens I have used when working over the 3 years of my PhD, they are the ones that I have used on a daily basis and have completely deprived from ink. As you can see, I’m more a gel than a ballpoint kind of guy, as I fell that the latter does not really leave a trail of what is being written. The same motivation is used for favoring the color black, after realizing that blue is not really as readable to me. Blue and red were only used in revisions or at the time I draw diagrams.

Legal number of working hours per year in France is of 1607, this means that during my 3 years as a PhD student (4821 hours) I consumed 0.00186 pens per hour, or 0.01307 a day. On average, 537 hours were necessary to finish a pen, 3 per year. I have no idea if my consumption was too small or the opposite and I leave it here hoping that this random stat I tracked may interest anyone.

How many packs to complete the album?

The World Cup was recently over. Along with the competition, the sticker album also arises, it’s a quite big tradition, but I’ve never joined it. I got interested in the statistics behind it and asked myself how many stickers you must buy to fill the album completely.

My approach is a statistical simulation, modeling each package, until the album is complete. The same procedure is repeated for a large number of runs to get an estimated distribution of the total number of packages/stickers that are necessary to complete the album. First, I tested the convergence of the routine, initially based on 2 unanimous assumptions: the distribution of the stickers is uniform (that means, you have an equal chance to get any of the stickers) and that there are no repeated stickers for each pack (this one is maintained for all the tests here). Secondly, I tested what are the advantages of buying the missing stickers (from 1 to 50). Finally, two cases where the distribution is not uniform are evaluated: for a selected nation, the stickers are more abundant (from +10% to +50%) or rarer (from -10% to -50%) than the others.

This analysis can be performed for any album, being the number of stickers in the album and the number of stickers in a pack the necessary variables. So, for this case, the values for the Panini World Cup sticker book are selected:

681 stickers in the album;
5 stickers per pack.

Also, the possibility to buy missing stickers directly from them (maximum of 50) is also considered in this work.

Continue reading “How many packs to complete the album?”

A pie chart about pie charts names

As a part of my PhD, I’ve re-taken classes of statistics recently. Somewhere in the process, I realized that pie charts are called camembert, a type of cheese, in France. After a couple of seconds until I realized that those are pie charts, I recalled that they are also called differently in Brazil: pizza charts. Since then I’ve been thinking of what circle charts are called in different countries/languages.

To fulfill my curiosity, I’ve looked at Wikipedia articles on circle charts in several languages (full list on the Wikidata page). I’ve also stumbled across this french course on circle charts by J. R. Lobry of the University of Lyon that took me to the ISI (International Statistics Institute) glossary.

To see what the graph nicknames are, I used Google translate, always from the original language to English. The process started from the “Also known as” column on the Wikidata page, and later on the article itself if necessary, where I looked for expressions that looked like “something diagram” or “something chart” and occasionally I translated the full article. A total of 38 languages were analyzed, mostly indo-european (26).

The results are presented in the following graph: pie (36.8%) stands for languages where circle graphs are called pie charts, or some regional recipe that Wikipedia told me that it was a type of pie; cake (15.8%), pizza (2.6%) and cheese (2.6%) respect the same idea; pie/cake (13.2%) are either cases where the two versions were presented, such as for german, Kuchen-oder Tortendiagramm, or the translated word resulted in the two terms; and none (28.9%) represents cases where I could not recognize any related food analogy in the articles. In most of those cases only different terminologies related to circle or sectors were found.

A circle chart about circle charts. Each sector shows a food that associated to circle charts in different languages: pie (36.8%); none (28,9%); pie/cake (13,2%); cake (15.8%); pizza (2.6%); and cheese (2.6%). Each slice is filled by the food it stands for, light grey for none. — The nicknames of circle charts.

Although the pie chart is not really a good choice for representing any type of data, I considered it a must for the analysis of pie charts. Data treatment and plotting is done with Excel 2016, with a little help of Inkscape to prepare the images. The raw data is available here.

My results are obviously limited to my not so extensive sources, that don’t account for regionalisms (such as the use of different terms in countries that speak the same language). Also, there is a strong chance that mistakes were made in translation, what is really a problem when similar foods, such as cake and pie, may be called by the same word and vary by the situation. Certainly, such type of nuances are neglected by Google translate when no context is given (or even when it is supplied!). Finally, I miss the proper knowledge to technically distinguish a pie and a cake, so the “cake” and “cake/pie” categories must be considered with care.

Surprisingly, the only outliers (highlighted in the figure) are the ones that I have personally encountered in my academic life, according to my results. That explains why I have not found any type of analysis like this over the Internet.

If you find this slight interesting, please share!

Photos used to build the graph:

“Cherry pie” and “And yet another apple pie” by Benny Mazur, cropped, used under CC BY 2.0
“All Good Pizza” by Dale Cruse, cropped, used under CC BY 2.0
“Banana cake made by Liam, Isaac, Julia, Jamie Oliver recipe” by Alpha, cropped, used under CC BY-NC 2.0
“Normandy Camembert Cheese” by jackmac34, cropped, used under CC0