22.8 The parquet format (2)

Benefits of Parquet

  • Smaller than original CSV file due to compression
  • Can track column data types vs CSV reader making guesses
  • Follows R’s style of sorting data column by column vs row by row in CSV readers
  • Splits data into different pieces you can skip over

Flaws of Parquet

  • So efficiently organized that humans can’t read it
  • If you try reading parquet files, you only get file metadata only the computer understands