R dplyr summarize percent

9/2/2023

Interpretation: 50% of the participants are females and 50% are males. Returning to tables, instead of showing the number of occurrences of each category, we can show the proportion of each category: prop.table(table(dat$gender)) We can use bar plots to visualize these 2 frequency tables: par(mfrow=c(1,2)) # show the following plots side by sideīarplot(table(dat$gender), ylab = 'Number of participants')īarplot(table(dat$smoking), ylab = 'Number of participants') And 26 participants are current smokers, 24 are past smokers, and 30 are non-smokers. Interpretation: Our sample consists of 40 females and 40 males. Summarizing gender and smoking, one variable at a timeĪ frequency table shows the number of occurrences of each category of a variable: table(dat$gender) Next, we will create a frequency table and a bar plot to summarize these data one variable at a time, then we will create a contingency table and a stacked bar plot to describe the relationship between the 2 variables. # $ smoking: Factor w/ 3 levels "Current smoker".: 2 1 3 2 1 1 1 2 2 2. Smoking = sample(c('Past smoker', 'Current smoker', 'Non-smoker'), 80, replace = TRUE)ĭat = ame(gender = as.factor(gender), Gender = sample(c('Female', 'Male'), 80, replace = TRUE) # create 2 categorical variables with 80 observations each This data set does not have a great example of this, so I’ll make one.Let’s start by creating our own data, consisting of 2 categorical variables: gender and smoking: set.seed(10) You can override these potentially undesirable defaults in gtsummary. Making an educated guess and only seeing three unique values, gtsummary will treat this as a categorical variable and return frequencies of those values however, you may still want a mean. For example, consider a rating scale with possible values of 1, 2, 3, … 7, but in which respondents only select values of 3, 4, 5. One default I frequently correct is treatment of discrete numeric values. Pay attention to the footnote on the statistical tests performed and adjust if needed with the test argument in the add_p function. In addition, gtsummary makes an educated guess on how to summarize your data and which statistical test to use.

Statistical tests performed: Wilcoxon rank-sum test chi-square test of independence Statistics presented: Mean (SD) % (n / N) Note that there is an overall N that corresponds to the number of observations, and each each variable can have its own N that corresponds to the number of non-missing observations for that variable. Here are a few modifications you might be interested in trying to customize your table, including adding an overall column, custom statistic formatting, and table styling. Statistics presented: Median (IQR) n (%)Īnd wait - did you see that?! The raw data had variable names of q12, stheight, and q69 but the table printed the variable label! (I previously tweeted about the awesome package pairing of haven and gtsummary.) If your data does not come with handy labels, you can create them with the label option in tbl_summary or with the var_label function in the labelled package. I’ll demonstrate with the Youth Risk Behavior Surveillance System (YRBSS) data my previous post Leveraging labelled data in R has more background details. My favourite R package for: summarising data by Dabbling with data (2018) How to make beautiful tables in R by R for the Rest of Us (2019). If you are still searching for your favorite table package, here are two round up resources: The gtsummary documentation is excellent so I won’t cover all of its awesome functionality, but I will add a bit of my specific experience. This blog post is to promote gtsummary and make it more searchable for those still seeking the one table to rule them all.

When I showed him gtsummary in 5 minutes, his reaction was all Try it out! BackgroundĪ colleague learning R just told me that he spent 45 minutes searching for a summary table function and couldn’t quite find anything that met his needs. The gtsummary package in R creates amazing publication / presentation / whatever-you-need-it-for ready tables of summary statistics. Figure 1: Happy R adapted from artwork by the beach and cocktail images are from ,

0 Comments

R dplyr summarize percent

Leave a Reply.

Author

Archives

Categories