During my PhD, I kept all the pens that I have finished. There was not much thinking at the beginning, but when the numbers started getting significant I though it could be a fun and useless statistics I could easily keep record.
On my last day I took a picture with them, having a bar graph in mind. I forgot about it, and now, almost 2 years later and with timing for posting long time gone, I have crossed the images once more when cleaning my hard drive. Finally, I present my use of pens.
In total, there are 9 pens: 7 black, 1 blue and 1 red. Naturally, they do not represent the totality of pens I have used when working over the 3 years of my PhD, they are the ones that I have used on a daily basis and have completely deprived from ink. As you can see, I’m more a gel than a ballpoint kind of guy, as I fell that the latter does not really leave a trail of what is being written. The same motivation is used for favoring the color black, after realizing that blue is not really as readable to me. Blue and red were only used in revisions or at the time I draw diagrams.
Legal number of working hours per year in France is of 1607, this means that during my 3 years as a PhD student (4821 hours) I consumed 0.00186 pens per hour, or 0.01307 a day. On average, 537 hours were necessary to finish a pen, 3 per year. I have no idea if my consumption was too small or the opposite and I leave it here hoping that this random stat I tracked may interest anyone.
For a set of destinations, finding the route that gives the smallest length is the well known Traveling Salesman Problem, a combinatorial optimization problem.
For a reduced number of stops, there is no difficulty in looking at all possible combinations. This becomes impossible with increase of the number of destinations because of the factorial relationship between them and the number of routes. For example, for only 12 destinations there are 39,916,800 possible itineraries.
A way to solve this problem, with no guarantee to find the best path, but at least a good one, is inspired on the behavior of ants. Proposed by Dorigo in 1993, it takes the dynamics of pheromones as a mean of selecting the shortest route.
In this post I present my implementation of the ant system in Python. An application to a real world problem will show you how to use ants to plan your next route trip.
Recently, and gladly, vaccination campaigns against COVID-19 are popping up around the world. The first person to be vaccinated is mostly a political choice, a great opportunity to pass a message to the population. After seeing the first images of UK, Europe and Brazil, I got interested in looking for who received the first injection in every country. Until the day I gathered the data, the average first recipient is a 64-years-old retired woman. Next, I present in details my small research.
From March 17 to May 11 2020, France was in lockdown due to COVID-19 [1, 2]. At these times, leaving your house was limited to essential displacements (buying groceries, work if working from home was impossible, short close-to-home workout, etc). Several locations remained closed after May 11 (like movie theaters), but “non-essential movements” were allowed. As expected, these restrictions had an incredible impact on the motion of people and vehicles, thus, the urban noise. I live next to a 4 lanes avenue, about 8 km from an airport and 3 km from a hospital, so transportation noise is something that is part of my routine. A few days into the start of moving restrictions I had an idea to somehow measure the effect of the lockdown on the noise pollution I’m confronted.
For that, from March 31 to June 30 (92 days), I went to my balcony at 18h30 and recorded 5 minutes of ambiance sound using my smartphone (with Smart Recorder app, at the sampling frequency 44.1 kHz and with automatic gain control disabled). All analysis, from data treatment to plotting, is performed in Python. An example of what I recorded is presented next (note that the audios are downsampled and compressed for publication):
Sample of recording (May 28, after the end of lockdown) where we can hear, for example, vehicles passing by [1:40-2:00, 3:15-3:25] and an airplane landing [2:03-2:40].
I got interested in seeing how do the spectral distribution of everyday sounds look like. So I got an app in my phone (Smart Recorder) and started recording them. The most interesting result (until now) is from the simplest sound I have recorded: a bottle of milk being filled at my kitchen’s tap. I present the audio, associated spectrogram and the theoretical analysis in this post. All the work is performed in Python, from reading the data to plotting.
Here is the audio:
As it sounds, it is just a bottle being filled with water. At the beginning (\(t\) < 1 second) there is nothing, until I open the tap. After about 32 seconds, the bottle is full and water is overflowing to the sink. There is a constant component lied to the impact of the particles on the bottom of the bottle/water column. Besides that, an indistinguishable and interesting tone that is changing in time can be heard. This sound is a resonance of air column with a closed-end (that is actually the water) and an open-end:
There is an increase of the frequency with the reduction of the wavelength \(\lambda\), that is linear in time until around 20 seconds. After that the increase is not constant due to the non linear modification of the available space for the air inside the bottle originated from the reduction of the diameter with the height.
The World Cup was recently over. Along with the competition, the sticker album also arises, it’s a quite big tradition, but I’ve never joined it. I got interested in the statistics behind it and asked myself how many stickers you must buy to fill the album completely.
My approach is a statistical simulation, modeling each package, until the album is complete. The same procedure is repeated for a large number of runs to get an estimated distribution of the total number of packages/stickers that are necessary to complete the album. First, I tested the convergence of the routine, initially based on 2 unanimous assumptions: the distribution of the stickers is uniform (that means, you have an equal chance to get any of the stickers) and that there are no repeated stickers for each pack (this one is maintained for all the tests here). Secondly, I tested what are the advantages of buying the missing stickers (from 1 to 50). Finally, two cases where the distribution is not uniform are evaluated: for a selected nation, the stickers are more abundant (from +10% to +50%) or rarer (from -10% to -50%) than the others.
This analysis can be performed for any album, being the number of stickers in the album and the number of stickers in a pack the necessary variables. So, for this case, the values for the Panini World Cup sticker book are selected:
681 stickers in the album;
5 stickers per pack.
Also, the possibility to buy missing stickers directly from them (maximum of 50) is also considered in this work.
As a part of my PhD, I’ve re-taken classes of statistics recently. Somewhere in the process, I realized that pie charts are called camembert, a type of cheese, in France. After a couple of seconds until I realized that those are pie charts, I recalled that they are also called differently in Brazil: pizza charts. Since then I’ve been thinking of what circle charts are called in different countries/languages.
To fulfill my curiosity, I’ve looked at Wikipedia articles on circle charts in several languages (full list on the Wikidata page). I’ve also stumbled across this french course on circle charts by J. R. Lobry of the University of Lyon that took me to the ISI (International Statistics Institute) glossary.
To see what the graph nicknames are, I used Google translate, always from the original language to English. The process started from the “Also known as” column on the Wikidata page, and later on the article itself if necessary, where I looked for expressions that looked like “something diagram” or “something chart” and occasionally I translated the full article. A total of 38 languages were analyzed, mostly indo-european (26).
The results are presented in the following graph: pie (36.8%) stands for languages where circle graphs are called pie charts, or some regional recipe that Wikipedia told me that it was a type of pie; cake (15.8%), pizza (2.6%) and cheese (2.6%) respect the same idea; pie/cake (13.2%) are either cases where the two versions were presented, such as for german, Kuchen-oder Tortendiagramm, or the translated word resulted in the two terms; and none (28.9%) represents cases where I could not recognize any related food analogy in the articles. In most of those cases only different terminologies related to circle or sectors were found.
Although the pie chart is not really a good choice for representing any type of data, I considered it a must for the analysis of pie charts. Data treatment and plotting is done with Excel 2016, with a little help of Inkscape to prepare the images. The raw data is available here.
My results are obviously limited to my not so extensive sources, that don’t account for regionalisms (such as the use of different terms in countries that speak the same language). Also, there is a strong chance that mistakes were made in translation, what is really a problem when similar foods, such as cake and pie, may be called by the same word and vary by the situation. Certainly, such type of nuances are neglected by Google translate when no context is given (or even when it is supplied!). Finally, I miss the proper knowledge to technically distinguish a pie and a cake, so the “cake” and “cake/pie” categories must be considered with care.
Surprisingly, the only outliers (highlighted in the figure) are the ones that I have personally encountered in my academic life, according to my results. That explains why I have not found any type of analysis like this over the Internet.
If you find this slight interesting, please share!