Cyberia
  • Communities
  • Create Post
  • Create Community
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
cm0002@lemmy.world to Programmer Humor@programming.dev · 1 month ago

APIs vs Web Scrapers

lemmy.ml

message-square
9
link
fedilink
  • cross-posted to:
  • programmerhumor@lemmy.ml
276

APIs vs Web Scrapers

lemmy.ml

cm0002@lemmy.world to Programmer Humor@programming.dev · 1 month ago
message-square
9
link
fedilink
  • cross-posted to:
  • programmerhumor@lemmy.ml
alert-triangle
You must log in or # to comment.
  • HappyFrog@lemmy.blahaj.zone
    link
    fedilink
    arrow-up
    37
    ·
    1 month ago

    As long as the scrapers follows robots.txt

    • Jankatarch@lemmy.world
      link
      fedilink
      arrow-up
      29
      ·
      1 month ago

      It’s equivalent to “the code.”

      • kautau@lemmy.world
        link
        fedilink
        arrow-up
        16
        ·
        1 month ago

      • dejected_warp_core@lemmy.world
        link
        fedilink
        arrow-up
        2
        ·
        25 days ago

        It really should be “parlay.txt”.

  • TropicalDingdong@lemmy.world
    link
    fedilink
    arrow-up
    21
    ·
    1 month ago

    beautiful soup

  • mspencer712@programming.dev
    cake
    link
    fedilink
    arrow-up
    12
    ·
    1 month ago

    I feel like there should be a third box with Wall Street raider types, for scrapers that use Selenium browser automation.

    I don’t think it’s entirely unblockable - adsense seems to know to only serve unmonetized PSA ads - but I think it’s very difficult to discriminate between “this is a real browser controlled by an end user” and “this is a real browser being controlled by automated test software”.

    • erytau@programming.dev
      link
      fedilink
      English
      arrow-up
      5
      ·
      1 month ago

      Fourth panel as well, with those bots collecting data for AI training that don’t respect your robots.txt, change user agents and overload your servers

      • dejected_warp_core@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        25 days ago

        War boys from Fury Road?

  • Kojichan@lemmy.world
    link
    fedilink
    arrow-up
    2
    ·
    29 days ago

    I just recently seen a python scraper in my server logs earlier today. Strangest thing to see.

Programmer Humor@programming.dev

programmer_humor@programming.dev

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !programmer_humor@programming.dev

Welcome to Programmer Humor!

This is a place where you can post jokes, memes, humor, etc. related to programming!

For sharing awful code theres also Programming Horror.

Rules

  • Keep content in english
  • No advertisements
  • Posts must be related to programming or programmer topics
Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 609 users / day
  • 5.56K users / week
  • 9.2K users / month
  • 11.1K users / 6 months
  • 1 local subscriber
  • 24.6K subscribers
  • 540 Posts
  • 5.98K Comments
  • Modlog
  • mods:
  • adr1an@programming.dev
  • Feyter@programming.dev
  • BurningTurtle@programming.dev
  • Pierre-Yves Lapersonne@programming.dev
  • BE: 0.19.12
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org