More on I/O with arrow
(2)
Partitioning large csv files to a Parquet dataset consisting of multiple files:
This will result in a massive difference in performance.
There’s also arrow::to_duckdb()
for converting a dataset to DuckDB.
arrow also provides read_csv_arrow()
, read_tsv_arrow()
, read_delim_arrow()
and read_json_arrow()
.