• F/15/Cali@threads.net@sh.itjust.works
      link
      fedilink
      arrow-up
      22
      arrow-down
      2
      ·
      edit-2
      9 days ago

      The answer is simpler than you could ever conceive. Companies piloted by incompetent, selfish pricks are just scraping the entire internet in order to grab every niblet of data they can. Writing code to do what they’re doing in a less destructive fashion would require effort that they are entirely unwilling to put in. If that weren’t the case, the overwhelming majority of scrapers wouldn’t ignore robot.txt files. I hate AI companies so fucking much.

      • pivot_root@lemmy.world
        link
        fedilink
        arrow-up
        8
        ·
        9 days ago

        “robots.txt files? You mean those things we use as part of the site index when scraping it?”

        — AI companies, probably