12.7 Performance Comparison

res <- read_rds('./data/res.rds')

res |> 
  select(1:8) |> 
  knitr::kable()

expression	min	median	itr/sec	mem_alloc	gc/sec	n_itr	n_gc
summary_tbl	0.0000	1.00e-09	8.90e+08	0	0.00	10000	0
collect(summary_duckdb)	0.0258	2.70e-02	3.69e+01	516704	4.34	17	2
collect(summary_arrow)	0.1258	1.37e-01	7.44e+00	123880	2.48	3	1

#        tbl      arrow     duckdb 
# 2004855136        504      51352

# A tibble: 3 × 2
#   format    footprint
#   <chr>   <fs::bytes>
# 1 duckdb        1.95G
# 2 parquet     350.46M
# 3 rds         211.92M

If your data is small (i.e., less than a couple hundred megabytes), just use CSV because it’s easy, cross-platform, and versatile.
If your data is larger than a couple hundred megabytes and you’re just working in R (either by yourself or with a few colleagues), use .rds because it’s space-efficient and optimized for R.
If your data is around a gigabyte or more and you need to share your data files across different platforms (i.e., not just R but also Python, etc.) and you don’t want to use a SQL-based RDBMS, store your data in the Parquet format and use the arrow package.
If you want to work in SQL with a local data store, use DuckDB, because it offers more features and better performance than RSQLite, and doesn’t require a server-client architecture that can be cumbersome to set up and maintain.
If you have access to a RDBMS server (hopefully maintained by a professional database administrator), use the appropriate DBI interface (e.g., RMariaDB, RPostgreSQL, etc.) to connect to it.