1.8 Meeting Videos
1.8.1 Cohort 1
Meeting chat log
00:04:49 Madeline Arnold (she/her): Hello everyone! I’m eating breakfast so going to have my camera off for now :)
00:15:08 Scott Nestler: Effectively, the median is a trimmed mean that removes 50% of the lower and upper values in a data set.
00:15:33 Morgan Grovenburg: Haha!
00:16:38 Scott Nestler: Also, if you need a standard error (SE) calculation, you can't do that with a median, but you can with a trimmed mean, for both normal and non-normal data.
00:17:56 Scott Nestler: Actually, I mis-spoke. You can calculate the SE of the median, but it is generally higher than for the mean (or trimmed mean). So trimmed mean is a balance of resistance to outliers and providing a low SE.
00:20:14 pavitra: Scott, so is trimmed mean best practice, or median?
00:21:33 Jone Aliri: the trimmed mean it's less conservative than the median
00:21:53 Jone Aliri: what is best practice depends on the data
00:22:10 Diego Ramírez González: A trimmed mean is probably not a good idea if you don't have a lot of data points
00:22:46 pavitra: makes sense!
00:24:24 Madeline Arnold (she/her): For folks who’ve used trimmed mean (new to me!) do you often use the 10% percentile cutoff described in PS4DS or some other cutoff for outliers?
00:27:04 Scott Nestler: https://en.wikipedia.org/wiki/Median_absolute_deviation#Relation_to_standard_deviation
00:29:10 pavitra: there are no dumb questions
00:33:37 Madeline Arnold (she/her): In my experience in biology research, if spread of data is bigger it means I need a bigger n (need to have more samples to be confident about the estimated mean being accurate)
00:33:53 Kaytee Flick: Same Madeline...that's what I was thinking about
00:35:18 shamsuddeen: Thanks
00:38:36 Diego Ramírez González: if you increase the sample size the mean and the median (50th percentile) will be closer
00:39:42 jonathan.bratt: And maybe deciles rather than percentiles are easier to read for this example.
00:45:48 Morgan Grovenburg: <3 boxplots
00:45:50 Scott Nestler: I really like comparative box plots, when you are trying to look at the distribution of data for 2 or more categories.
00:45:56 pavitra: still cant see distribution in boxplots
00:46:51 Diego Ramírez González: the box is the 25th and 75th percentiles, the line in the middle of the box is the median, the whiskers are 1.5*IQR and the points outside are outliers
00:47:10 Rahul: This was very helpful to understand why 1.5 is used https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51
00:47:18 Francisco Escobar: 1.58 gives ~95% confidence interval for the median https://ggplot2.tidyverse.org/reference/geom_boxplot.html
00:47:38 Diego Ramírez González: i like boxplot, but violin plots are better :)
00:47:43 Diego Ramírez González: boxplots*
00:48:02 Anne Hoffrichter: I like the combination of violin and boxplots ;)
00:48:04 Kaytee Flick: I was just going to ask how we feel about violin plots:P
00:48:17 jiwan: https://twitter.com/CedScherer/status/1387155336998670344/photo/1
00:48:19 Diego Ramírez González: yeah, violin + boxplot is even better
00:49:41 pavitra: the good ole' area under curve
00:49:54 Diego Ramírez González: this is my favorite gif :)
https://twitter.com/danilobzdok/status/1341893126592593924
00:50:46 Kaytee Flick: That's epic
00:50:50 Scott Nestler: Ridgeline plots!
00:50:51 Madeline Arnold (she/her): @Diego love it
00:51:06 pavitra: is there something called "raindrop" plot or something?
00:51:54 pavitra: I really like raindrop plots
00:51:58 Diego Ramírez González: anything but dynamite plots
00:53:31 Diego Ramírez González: i agree, those two distributions don't even overlap, the dynamite plot is not so bad here
00:53:33 pavitra: Sorry to belabor this. However, why is MAD not as prevalent as SD? No rush to answer here - maybe we can discuss further in the slack channel
00:54:59 Jone Aliri: It's because MAD is calculated with absolute values
00:55:06 pavitra: Nassim Taleb loves MAD and says it is more accurate for skewed data
00:55:56 pavitra: Good point, Jone
00:56:04 Scott Nestler: Also, the absolute value function is non-smooth, which used to create all sorts of calculation issues.
01:01:37 pavitra: great presentation, Jon!
01:02:41 priyanka gagneja: Bi-modal you mean
01:03:12 pavitra: thanks a lot, y'all.Gotta go.
01:03:35 Jone Aliri: And the other problema is that teh distance could be different from 1 to 2 or from 3 to 4 in Likert
01:04:55 Diego Ramírez González: i guess it depends on the distribution of your data and what assumptions you are willing to make about the measurement
01:05:59 Madeline Arnold (she/her): I hope we learn more about this topic! Thanks for the question Sheila
01:06:22 Kaytee Flick: Congrats Scott!!!
01:06:35 Jone Aliri: In psychology we use a lot of scales with Likert... and a lot of times we add them...
01:07:23 Scott Nestler: A good way to remember not to use means on Likert scale data is to think: The average of Agree and Strongly Agree is not Agree-And-A-Half.
01:07:30 Scott Nestler: https://bookdown.org/Rmadillo/likert/summary.html
01:07:53 Diego Ramírez González: yeah, but for some of these instruments people will add the answers to get a total score
01:08:13 Jim Gruman: not sure if this adds anything, but there are domains where a geometric mean is more appropriate than the average
01:08:29 Morgan Grovenburg: I don't have a good answer, but I use the `HH` package to visualized likert scales https://xang1234.github.io/likert/
01:09:06 Jim Gruman: and in market research, my company skips to valuing only the "top box scores"
01:10:01 Jone Aliri: Yes @Diego that's it, we get the total score which we can use like a continuos scale :)
01:10:11 Jim Gruman: thank you!!
01:10:16 Andrew G. Farina: Thanks Jon!