More on I/O with arrow (2)
Partitioning large csv files to a Parquet dataset consisting of multiple files:
This will result in a massive difference in performance.
There’s also arrow::to_duckdb() for converting a dataset to DuckDB.
arrow also provides read_csv_arrow(), read_tsv_arrow(), read_delim_arrow() and read_json_arrow().