• Rodeo@lemmy.ca
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    Because that becomes distribution.

    Which is the crux of this issue: using the data for training was probably legal use under copyright, but if the AI begins to share training data that is distribution, and that is definitely illegal.

    • CapeWearingAeroplane@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      0
      arrow-down
      1
      ·
      2 years ago

      First of all no: Training a model and selling the model is demonstrably equivalent to re-distributing the raw data.

      Secondly: What about all the copyleft work in there? That work is specifically licensed such that nobody can use the work to create a non-free derivative, which is exactly what openAI has done.

      • Rodeo@lemmy.ca
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 years ago

        Copyleft is the only valid argument here. Everything else falls under fair use as it is a derivative work.

        • CapeWearingAeroplane@sopuli.xyz
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 years ago

          If I scrape a bunch of data, put it in a database, and then make that database queryable only using obscure, arcane prompts: Is that a derivative work permitted under fair use?

          Because if you can get chatgpt to spit out raw training data with the right prompt, it can essentially be used as a database of copyrighted stuff that is very difficult to query.

          • Rodeo@lemmy.ca
            link
            fedilink
            English
            arrow-up
            0
            ·
            2 years ago

            No because that would be distribution, as I’ve already stated.

            If it doesn’t spit out raw data and instead changes it somehow, it’s a derivative work.

            I can spell out the distinction for you twice more if you still don’t get it.

            • CapeWearingAeroplane@sopuli.xyz
              link
              fedilink
              English
              arrow-up
              1
              ·
              2 years ago

              Exactly! Then you agree that because chatgpt can be coerced into spitting out raw, unmodified data, distributing it is a violation of copyright. Glad we’re on the same page.

              You should look up the term “rhetorical question” by the way.

              • Rodeo@lemmy.ca
                link
                fedilink
                English
                arrow-up
                1
                ·
                2 years ago

                So you understand the distinction between distribution and derivative work? Great!