Partitioning

  • Split data across files so analyses can skip unused data
  • Experiment to find best partition for your data
  • Recommendations:
    • 20 MB < Filesize < 2 GB
    • <= 10,000 files