1. 30

  2. 7

    SQL injection is solved - in theory - by escaping inputs properly or better by prepared queries. Of course it’s been a hugely disappointing disaster trying to get practice to keep up. Anyway..

    There seems to be no way to correctly prevent prompt injection, is there? You can push the probability down by adding more guard rules to the prompts, but I don’t trust that.

    Perhaps a multi-modal version needs to be trained, with a system prompt completely separate from the user input? I really don’t know if that would even help. Who knows?

    1. 7

      I’d wager there’s no general way to prevent this sort of thing in a text autocompletion engine.

      1. 3

        If it were transforming trees, or structured text, then quotation marks would not just suggest an inner block of text, but denote a hard barrier between outer and inner context.

        At that point, the problem shifts to prompt engineering again, since the prompt must direct the transformer to use the quotation blocks when appropriate.

        1. 4

          I don’t understand LLMs really, but ISTM they lex stuff into tokens, right? Couldn’t you make up a token that is deliberately excluded from the semantics of the universe and then use it as a barrier? Essentially, the fnord solution.

          1. 3

            There are at least two special tokens present in the training process of these systems to mark the start and the end of the text. There is no way for a human to type these two tokens into the prompt, because no utf-8 sequence of characters would produce them.

            Introducing a third special token (or however many) and using it as a separator could prove very effective at avoiding prompt injection. The main challenge here is that the network would have to be retrained (or at least fine-tuned) to understand the semantics of this new token, and training data adjusted to use it consistently.

            1. 2

              Yes, it would need to be added in at the very beginning of the process. All of these prompt jailbreaks come by overcoming the late additions that are either fine tuned or worse just one shot learning.

      2. 6

        I’ve had some decent results by block-quoting the user input in a prompt, but nothing definitive. I think the only safe solution here is to treat the LLM output as unsanitized user input, and not eval it in a trusted environment.

        1. 3

          I’ve had limited success by parsing the model’s output. The model definitely appeared to fuzz the parser!

      3. 6

        It seems to me that remote code execution is the entire feature. It just needs to be run in a sandbox.

        If the authors consider this surprising or an exploit, I really wonder what they expected to happen.

        1. 3

          The author of the web application or the author of the blog post?

          1. 3

            I meant the authors of the web app.

        2. 4


          Not a security risk.

          Bad guy’s wouldn’t say “please”. ;-)

          1. 3

            Can someone please explain to me where the execution of arbitrary code is occurring?

            Input text prompt -> model -> output text -> escaping -> web page code

            The model is bounded in what functions it uses, ie it doesn’t run system()? So it can’t occur in the model step?

            The escaping step could be something simple (plus setting the page charset):

            sed 's|<|&lt;|g' 's|>|&gt;|g' 's|\n|<br />|g'

            That leaves only your website code as being the vulnerable step? In which case this has nothing to do with GPT or machine learning models at all? Or am I missing something?

            1. 2

              Can someone please explain to me where the execution of arbitrary code is occurring?

              it’s happening in their demo web app. they allow the model to generate Rails code based on the user’s prompt and then execute it on their backend.

              1. 9

                Much as I enjoy bashing GPT (and I really do), it seems that the problem here is allowing a user to submit arbitrary code and executing it without sandboxing. The fact that there’s a nice UI in front of the massive security hole doesn’t change anything.

                1. 2

                  Why would they design it to allow that? EDIT: Looks like the linked article asks the same question. Sorry.

                  1. 2

                    I don’t know if this is the correct answer but recently I saw someone showing off chat gpt doing things for him like signing up for a service with his credit card on the internet. I believe this requires chat gpt to execute code based on the user input so that it accomplishes the task for them (which sounds similar to this), admittedly this would be awesome for personal use but of course only if you host it yourself since as everyone here has mentioned it is a giant security hole to normally do such unless there are serious guard rails around what tasks could be accomplished. I do not work in ML this could be complete garbage I just spouted.

                    1. 2

                      So… in theory the bot could also earn them money and direct it to their bank account. Then self-host itself by paying for VPS? Excellent :D

                      1. 1

                        Can I borrow this?! I love it.

                        1. 2

                          I will only allow you to plagiarise this idea if you use a neural network to launder it :)