• wise_pancake@lemmy.ca
    link
    fedilink
    arrow-up
    1
    arrow-down
    1
    ·
    11 hours ago

    I’d probably just use line delimited JSON or CSV for this use case. It plays nicely with cat and other standard tools and basically all the yaml is doing is wrapping raw json and adding extra parse time/complexity.

    In the end consider converting this to parquet for analysis, you probably won’t get much from compression or row-group clustering, but you will get benefits from the column store format when reading the data.

    • qaz@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      11 hours ago

      Thanks for the advice, but this is just the format of some eyetracking software I had to use not something I develop myself