• 0 Posts
  • 17 Comments
Joined 2 years ago
cake
Cake day: June 24th, 2023

help-circle

  • Wouldn’t the algorithm that creates these models in the first place fit the bill? Given that it takes a bunch of text data, and manages to organize this in such a fashion that the resulting model can combine knowledge from pieces of text, I would argue so.

    What is understanding knowledge anyways? Wouldn’t humans not fit the bill either, given that for most of our knowledge we do not know why it is the way it is, or even had rules that were - in hindsight - incorrect?

    If a model is more capable of solving a problem than an average human being, isn’t it, in its own way, some form of intelligent? And, to take things to the utter extreme, wouldn’t evolution itself be intelligent, given that it causes intelligent behavior to emerge, for example, viruses adapting to external threats? What about an (iterative) optimization algorithm that finds solutions that no human would be able to find?

    Intellegence has a very clear definition.

    I would disagree, it is probably one of the most hard to define things out there, which has changed greatly with time, and is core to the study of philosophy. Every time a being or thing fits a definition of intelligent, the definition often altered to exclude, as has been done many times.


  • The flute doesn’t make for a good example, as the end user can take it and modify it as they wish, including third party parts.

    If we force it: It would be if the manufacturer made it such that all (even third party) parts for These flutes can only be distributed through their store, and they use this restriction to force any third party to comply with additional requirements.

    The key problem is isn’t including third party parts, it is actively blocking the usage of third party parts, forcing additional rules (which affect existing markets, like payment processors) upon them, making use of control and market dominance to accomplish this.

    The Microsoft case was, in my view, weaker than this case against Apple, but their significant market dominance in the desktop OS market made it such that it was deemed anti-competitive anyways. It probably did not help that web standards suffered greatly when MS was at the helm, and making a competitive compatible browser was nigh impossible: most websites were designed for IE, using IE specific tech, effectively locking users into using IE. Because all users were using IE, developing a website using different tech was effectively useless, as users would, for other websites, end up using IE anyways. As IE was effectively the Windows browser (ignoring the brief period for IE for Mac…), this effectively ensured the Windows dominance too. Note that, without market dominance, websites would not pander specifically to IE, and this specific tie-in would be much less problematic.

    In the end, Google ended IE’s reign by using Google Chrome, advertising it using the Google search engine’s reach. But if Microsoft had locked down the OS, like Apple does, and required everything to go through their ‘app store’. I don’t doubt we would have ended up with a similar browser engine restriction that Apple has, with all browsers being effectively a wrapper around the exact same underlying browser.



  • Yes, true, but that is assuming:

    1. Any potential future improvement solely comes from ingesting more useful data.
    2. That the amount of data produced is not ever increasing (even excluding AI slop).
    3. No (new) techniques that makes it more efficient in terms of data required to train are published or engineered.
    4. No (new) techniques that improve reliability are used, e.g. by specializing it for code auditing specifically.

    What the author of the blogpost has shown is that it can find useful issues even now. If you apply this to a codebase, have a human categorize issues by real / fake, and train the thing to make it more likely to generate real issues and less likely to generate false positives, it could still be improved specifically for this application. That does not require nearly as much data as general improvements.

    While I agree that improvements are not a given, I wouldn’t assume that it could never happen anymore. Despite these companies effectively exhausting all of the text on the internet, currently improvements are still being made left-right-and-center. If the many billions they are spending improve these models such that we have a fancy new tool for ensuring our software is more safe and secure: great! If it ends up being an endless money pit, and nothing ever comes from it, oh well. I’ll just wait-and-see which of the two will be the case.


  • Not quite, though. In the blogpost the pentester notes that it found a similar issue (that he overlooked) that occurred elsewhere, in the logoff handler, which the pentester noted and verified when spitting through a number of the reports it generated. Additionally, the pentester noted that the fix it supplied accounted for (and documented) a issue that it accounted for, that his own suggested fix for the issue was (still) susceptible to. This shows that it could be(come) a new tool that allows us to identify issues that are not found with techniques like fuzzing and can even be overlooked by a pentester actively searching for them, never mind a kernel programmer.

    Now, these models generate a ton of false positives, which make the signal-to-noise ratio still much higher than what would be preferred. But the fact that a language model can locate and identify these issues at all, even if sporadically, is already orders of magnitude more than what I would have expected initially. I would have expected it to only hallucinate issues, not finding anything that is remotely like an actual security issue. Much like the spam the curl project is experiencing.


  • The key point that is being made is that it you are doing de facto copyright infringement of plagiarism by creating a copy, it shouldn’t matter whether that copy was made though copy paste, re-compressing the same image, or by using AI model. The product being the copy paste operation, the image editor or the AI model here, not the (copyrighted) image itself. You can still sell computers with copy paste (despite some attempts from large copyright holders with DRM), and you can still sell image editors.

    However, unlike copy paste and the image editor, the AI model could memorize and emit training data, without the input data implying the copyrighted work. (exclude the case where the image was provided itself, or a highly detailed description describing the work was provided, as in this case it would clearly be the user that is at fault, and intending for this to happen)

    At the same time, it should be noted that exact replication of training data isn’t exactly desirable in any case, and online services for image generation could include a image similarity check against training data, and many probably do this already.






  • I think the video LegalEagle uploaded explains it quite succinctly: for the sale there was a certain split between the debtors, the debtors with the largest portion were willing to forego a portion such that the other debtors would get a larger portion if The Onion’s bid was the winning one. In effect, the other debtors would get more money out of the 1.75m than the 3.5m bid, and the debtors that ‘got less’ are the ones that offered the money in the first place.




  • A very similar situation to that analysed in this paper that was recently published. The quality of what is generated degrades significantly.

    Although they mostly investigate replacing the data with ai generated data in each step, so I doubt the effect will be as pronounced in practice. Human writing will still be included and even curation of ai generated text by people can skew the distribution of the training data (as the process by these editors would inevitably do, as reasonable text could get through the cracks.)