1. 7

I’m looking for recommendations for an automatic memory management scheme for a programming language. It must be portable, concurrent, and have short pause times. Simpler implementation is more important than good performance. Reference-counting and/or tracing garbage collection are both under consideration (I’m open to other suggestions too). The programmer should not have to think about memory management much, if at all (so, if reference-counting is used, the implementation should also detect reference cycles).

When I look at introductory literature, I see simple schemes with long pause times. When I look at popular programming language implementations, I see schemes that are performant at the cost of a complex implementation. But what I want is a simpler implementation, and I am willing to accept poor performance (except for my concern about pause times). By simpler to implement I mean fewer lines of code; if a library for garbage collection, the LOC in that library counts too. This is all all relative; it is understood that the simplest implementation with short pause times may still be pretty complex, and also in addition I will accept a little implementation complexity in exchange for a lot of performance. The criteria are:

portable, concurrent, short pause times > simple implementation > low memory usage > fast

I’m on the fence about compaction. I do want the user to be able to run programs for a long time without running out of memory. But compaction makes interop more complex. Various contemporary language implementations, such as Golang (Go) and Python have non-compacting GC and I don’t hear complaints about people’s programs crashing due to memory fragmentation. Is fragmentation only a theoretical issue that doesn’t come up in practice – or, do those implementations do something complicated to avoid fragmentation?

This will be a high-level programming language with a runtime. The runtime will know where pointers are and could do things like box values and store GC metadata there, have write barriers and/or read barriers, reference count, etc.

Why not a vanilla mark-sweep collector? That can have long pause times.

Why not the Boehm-Demers-Weiser conservative garbage collector ( https://www.hboehm.info/gc/ )? (a) it is not portable; the docs say “Our collector is not, and cannot be, implemented as completely portable C code.” In concurrent, incremental mode it appears to be even less portable. (b) It has many lines of code. (c) Because our runtime will know where pointers are, we do not need a conservative collector.

What do you recommend? Thanks in advance.

  1.  

  2. 7

    I would start with a stop the world mark and sweep collector, no compacting, copying, or moving, couple it with tagged pointers to avoid churn of numbers, etc. Then when and if you find the stop the world pauses to be too much of a problem start looking at making it more complicated.

    I am being serious here, that is how the JS engines worked for more than a decade (for spider monkey it could be two), and the GC pauses were essentially unnoticed. This was at a time when all active web pages were running from a single shared heap as well, and prior to the more aggressive NAN-boxing of modern engines so were taking more of an allocation hit during numeric heavy operations.

    The need for small pause times is almost entirely as a result of modern JS being used to do a lot at a high frame rate, so historically fine pauses of potentially 20ms are no longer tenable.

    I would ask very simply, why do you need the gc to be concurrent? “because other engines do it” is not a good answer. Making a concurrent GC is vastly more challenging than single threaded, as you are essentially taking on the complexity of incremental and generational GC.

    portable, concurrent, short pause times > simple implementation > low memory usage > fast

    As above, why do you want concurrent? the only reason for concurrent GC is performance, short pause times is related to fast, simple implementation is counter to concurrent, low memory usage means ref counting, as GC cost is proportional to how much of the heap is live - if your heap is filled with live objects you end up having to GC more frequently, and the GC takes longer because time to scan is proportional to how many live objects there are.

    Why not a vanilla mark-sweep collector? That can have long pause times.

    This is a problem that is in general way overblown. A long pause time requires a very large heap (pause time is proportional to live heap), generational GCs partially mitigate that at the cost of much higher memory usage, much more complexity, and still can have very large pause times when the heap gets large.

    The only way to guarantee short pause times is to support incremental GC, which can be built with any of the standard mark/sweep, generational, copying, etc GCs because being incremental is independent of the tracing and collection strategy. It is also very hard to implement.

    Mark/sweep collectors also easily extend to generational collection if you feel that your GC is suffering from scanning large amounts of long lived objects.

    But again, JS engines uses stop the world GCs for years, in principle you can make a GC pause for a very large amount of time. In practice that doesn’t occur very often, Blocking the construction of a GC on the basis of a potential problem, or choosing a complex implementation over a simple one, is imo not a great choice.

    Assuming a good design, your interface to your GC should not expose the actual implementation, so changing in future should not be more complex than implementing the complicated thing first. The only real problem is write barriers, but that’s a large part of the complexity of every GC other than mark sweep.

    1. 1

      long pause times… a problem that is in general way overblown… if you find the stop the world pauses to be too much of a problem start looking at making it more complicated.

      Thank you for your long and detailed answer. It could be true that i am worrying too much about pauses. My worries are probably because in the distant past, I recall hearing about people’s bad experiences with Java GC pause times. Perhaps those experiences were just outliers.

      why do you need the gc to be concurrent?

      One goal of this language is to be suitable for writing programs with very large numbers of threads executing in parallel.

      your interface to your GC should not expose the actual implementation, so changing in future should not be more complex than implementing the complicated thing first

      Yes I agree, I am trying to future-proof the interface to the GC. This is one reason why I do not want to put off the implementation of an incremental GC until later, because I don’t want to miss a requirement for this interface (such as read or write barrriers) that may be needed for fancy GCs but not for simple ones. Here’s what I have so far, can you think of anything I should add to the following list? The runtime may need to be able to:

      • produce a list of the locations of any pointers within data, including in:
        • the registers
        • the stack
        • the heap
        • the globals
      • impose read barriers and write barriers
      • interrupt execution every now and execute GC code
      1. 1

        I recall hearing about people’s bad experiences with Java GC pause times

        In my experience long pause times are rarely a cause, they’re an effect of poor algorithmic / data structure choices. The specific GCs that are bundled with OpenJDK are are pretty well tuned, and the default (G1) is concurrent and generational.

    2. 6

      The “treadmill” garbage collection technique was posted as a topic right here on lobste.rs not that long ago. I don’t have the original link but here is a brief paper (which may be a bit light on detail, but you should be able to find more): https://dl.acm.org/doi/pdf/10.1145/130854.130862

      It may fit the bill for you - seems simple enough, usable in “realtime” scenarios so pause times should be limited (you can control the maximum pause by tweaking the algorithm, bearing in mind that there will always be a trade-off between low pauses and total throughput).

      1. 1

        Thanks, I read that and it seems simple and meets almost all of my criteria (not sure about concurrent, I’ll have to think about that).

        It seems like the key is that there is an always-on read barrier which allows the garbage collector to detect when the mutator accesses something that the garbage collector otherwise might have thought was unused. That sounds like the sort of tradeoff of performance for simplicity that I am after.

      2. 3

        I think what would formally tick all the boxes is:

        • strict FP language which prevents cycles from arising without extra restrictions on programming model (eg, elm or roc)
        • atomic ref-counts
        • deferred freeing of objects: when ref count reaches zero, just toss the object onto garbage list and have a background thread to do actual cleaning and recursive deallocation
        1. 1

          Thanks, in my context I think constraining the programmer to never create cycles is a tall order, but I’ll consider it.

        2. 3

          I would recommend giving reference counting a try. You don’t need a lot of code, it is conceptually simple to understand and avoids GC pauses unless you have excessively deep chains of garbage data. In practice the performance impact is nearly “real time” (but not really real time). Data can remain uncompacted if you have a single block size (or several pools of differently sized blocks). Circular references will need to have a solution (unless your language is side-effect free), which can be done by tracing all data at “safe” points, where the cost of traversing all data is acceptable.

          The one tricky thing is that you have to be very careful to keep track of objects during the execution time of your code, as memory leaks are easy to produce, especially in compiler-generated code. One nice property of RC is that you free memory very early and it is straightforward to attach resource-cleanup operations (“finalizers”). You can implement various schemes to manage total heap size on top of that (another criterion: do you want your system to have a fixed heap or a growing/shrinking one? What about being able to sustain peak loads?)

          But in the end every automatic memory management scheme gets complicated when integrated into a non-trivial runtime system - all have their advantages and disadvantages. Mark/Sweep without compaction is probably the simplest possible approach, but you will have pauses. These pauses (depending on heap size) can be so short that they are not noticable in “normal” applications, but once you have latency issues (e.g. games), they become a real problem, even if short.

          1. 1

            Thanks, I will consider reference counting, with the addition of a tracing GC for detecting cycles when a pause is acceptable. Several pools of differently-sized blocks sounds like a good idea. To answer your question, I’d like a growing heap. A shrinking heap would be nice but is not essential.

          2. 2

            In addition to lots of good advice already given, if you decide to implement your own GC, I recommend the micropython GC as a “good enough and simple” reference, and of course the toy one I wrote, which is conservative, incremental, tracing, not compacting, but not concurrent: https://github.com/robey/mwgc . If you want to write the GC in your own language, you’ll probably end up needing to deeply understand the code anyway, which means writing it yourself. :)

            1. 1

              Thanks, I will check out your GC and the Micropython one. I read https://github.com/robey/mwgc#how-it-works just now, it looks great; it seems simple and meets almost all of my criteria. It seems like the sort of thing that I am after.

              Would it be fair to characterize your GC as a conservative, incremental, not compacting tricolor mark-sweep tracing garbage collector with a write barrier?

              1. 2

                Yes, exactly – with the caveat that the write barrier is required but not implemented. :)

            2. 1

              For later readers, here’s a summary of things I think I’ve learned from the replies (so far in the first 2 days; and from some further reading/thinking I’ve done in that time:

              • everyone suggested a tracing garbage collector (GC) of some sort (one suggestion was for reference counting, with a tracing cycle detector done only when pauses can be tolerated)
              • most commentators feel that my criteria are too stringent, and that it would be better to give up on some of them for now, especially one or more of: portability, concurrency, short pauses, cycle-detection
                • if portability can be given up, then the Boehm-Dehmer-Weiser collector ( https://www.hboehm.info/gc/ ) can be used
                • if the language can get by without mutable data, then cycle-detection isn’t necessary
                • if the language can get by without sharing garbage-collected memory between threads, then concurrency isn’t necessary
                • i could also consider giving up only the intersection of some of these criteria; for example, i could have concurrency without cycle-detection, and have cycle-detection on memory which is only visible to one thread, but not detect cycles in shared memory
              • no one suggested a moving (copying or compacting) garbage collector
              • incremental tricolor mark-sweep tracing GCs were mentioned multiple times, so i should probably search the literature for those (some literature uses the term “real-time” rather than incremental; it’s probably worth it for me to search the literature for both of those terms although i don’t think they are synonymous)
              • a list of special things that the implementation may need to do in order to permit garbage collection includes:
                • produce a list of the locations of any pointers within data, including in:
                  • the registers, stack, globals, anything referenced from external/non-managed code (“the roots”)
                  • the heap
                • read barriers and write barriers
                • interrupt execution every now and then for GC
                • override pointer comparisons (so that eg copying collectors can say that a fromspace pointer and a tospace pointer to the corresponding object are equivalent)
                • derived pointers maybe shouldn’t be stored, not even in registers, b/c moving GCs may move the base pointer, and hence invalidate all derived pointers.
                  • Instead, could use fat pointers that always store the base pointer next to an offset
                    • still can’t do pointer subtraction though
                • associate GC metadata (like color) with each pointer (can either be right next to, or in, the pointer (pointer tagging); or in an object header; or in a separate table of GC metadata)
              1. 1

                Remember, ref-counting can have arbitrarily long pause times too, when the last reference to a large object graph is cleared and that entire graph gets torn down piece by piece. Some systems tried to work around this with “deferred” ref-counting, where they’d only do limited work during a free and queue the rest for later, but I’ve not heard of this being a problem in anything recent.

                Current thinking about concurrency leads to threads not sharing memory, so some systems like Erlang have a heap per thread. This means your memory manager doesn’t have to be concurrent (win!) and you can instantly free all of a thread’s allocations when it exits (win!) but any heap object passed from one thread to another has to be copied (lose!) The latter problem can be pretty serious, as I recall from when co-workers of mine were analyzing CouchDB’s performance. The Pony language has an interesting way around this by letting one heap “borrow” an object from another heap.

                1. 1

                  Thanks, yes, for concurrency, I think not sharing memory is probably the way to go for the most part. It would be a nice bonus to also provide a concurrent garbage-collected memory-sharing mechanism in case the programmer really wants to share some memory, but I suppose that would be a lot of trouble for a mechanism that would only get used now and then. I’ll look into Pony’s borrowing, that sounds interesting.

                2. 1

                  I do not think there is a “simple” collection scheme that is going to meet your requirements of being “simple” to implement. Automatic memory management is actually a hard and complicated problem, as evidenced by the fact that there are constant updates to algorithms and the state of the art.

                  So, I think there are two obvious paths here:

                  1. Accept the dependency and adopt the Boehm-Dehmer-Weiser collector
                  2. Roll up your sleeves and spend the time to research what’s out there, and work through the features of each algorithm and figure out which tradeoffs you can deal with for your purpose.

                  The other option is to build a simple Mark-Sweep collector and do nothing until the language is far enough along that you’re actually seeing it be problematic in the real world. At this point, investing in a rabbit hole will actually pay off because you’ve proven that the language has enough legs that the effort is worth it.

                  1. 1

                    Sorry, I didn’t intend to say that I wanted to do automatic memory management in just a “few” lines of code! I just meant that I would rather have fewER lines of code even at the cost of worse performance. I expect that any solution that detect cycles and has short pause times will still have many lines of code. I edited the question to clarify.

                    Regarding option #1, automatic memory management will be included in the somewhat self-hosting language implementation, therefore if the right answer is Boehm-Dehmer-Weiser, I will have to manually rewrite the Boehm-Dehmer-Weiser collector in my language line-by-line – hence my concern with lines of code in libraries. I’m hoping I can find something shorter than that. Also, it seems like Boehm-Dehmer-Weiser cannot be used in portable code, especially in concurrent, incremental mode; the docs say “Our collector is not, and cannot be, implemented as completely portable C code.” and “The collector uses a mark-sweep algorithm. It provides incremental and generational collection under operating systems which provide the right kind of virtual memory support. (Currently this includes SunOS[45], IRIX, OSF/1, Linux, and Windows, with varying restrictions.)”). This would make it impossible to translate into my higher-level language.

                    Regarding option #2, this post is part of that work.

                    Regarding a vanilla mark sweep collector, my understanding is that that could have very long pause times?

                    1. 4

                      Also, it seems like Boehm-Dehmer-Weiser cannot be used in portable code, especially in concurrent, incremental mode; the docs say “Our collector is not, and cannot be, implemented as completely portable C code.”

                      This is true and will be the case for almost any garbage collector. It needs to be able to scan stack memory, enumerate globals, and so on. Can can do the ‘GC in uncooperative environments’ trick to avoid stack scanning, at the cost of increasing compiler complexity and reducing performance. You can avoid enumerating globals by tracking any that were exposed to your language as implicit roots. Boehm also uses OS-specific hooks for write barriers (e.g. marking pages as read-only to catch traps on accesses). Writing all of that in portable code will require building your own platform abstraction layers over all of these.

                      It would be good to understand why self hosting is a goal for you. In general, it’s a terrible idea for high-level languages because it requires that you provide type-system escape hatches that leave you with a language that combines the performance of Lua with the memory safety of C.

                      1. 1

                        Thanks, what is the “GC in uncooperative environments” trick – are you referring to https://hboehm.info/spe_gc_paper/preprint.pdf ?

                        My plan is to have a virtual machine layer which handles garbage collection (and thread scheduling and everything else that really wants platform-specific optimization), and then to write the rest of the language implementation targeting that virtual machine.

                        So for example, a virtual machine instruction might be “load object from memory into local variable”, and then while executing this instruction the virtual machine implementation could do things like invoke a read barrier, make a note of any new pointers which are being written into the stack, etc.

                        The “reference implementation” will be self-hosting and portable but if the language is used enough to justify it, more performant platform-specific implementations could be created later, in most cases by re-implementing the virtual machine for each platform. The reason for the reference implementation being self-hosting is to make it easir for developers to understand, modify, and port it.

                        So, the key for me is to understand how to structure the virtual machine in order to make it possible for later platform-specific implementations of it to do whatever they need to do to make garbage collection efficient. This is one reason why I do not want to put off the implementation of an incremental GC until later, because I don’t want to miss a requirement for the structure of a virtual machine (such as read or write barrriers) that may be needed for fancy GCs but not for simple ones. Here’s are the requirements that I have collected so far, can you think of anything I should add to the following list?

                        The runtime may need to be able to:

                        • produce a list of the locations of any pointers within data, including in:
                          • the registers
                          • the stack
                          • the heap
                          • the globals
                        • impose read barriers and write barriers
                        • interrupt execution every now and execute GC code
                        1. 2

                          Thanks, what is the “GC in uncooperative environments” trick – are you referring to https://hboehm.info/spe_gc_paper/preprint.pdf ?

                          Yup, that’s the one.

                          So for example, a virtual machine instruction might be “load object from memory into local variable”, and then while executing this instruction the virtual machine implementation could do things like invoke a read barrier, make a note of any new pointers which are being written into the stack, etc.

                          So then the VM can’t be implemented in the language, because the VM sits explicitly under the language’s type safety. If you can implement the GC in the language then the language must be able to express things like converting between integers and pointers to store mark bits inside pointers and so on.

                          It is possible that you are using the term ‘self-hosting’ in a way that I am unfamiliar with. Can you elaborate a bit?

                          produce a list of the locations of any pointers within data, including in: the registers the stack the heap the globals

                          Or, in simpler terms, to produce a set of GC roots. This is necessary for a tracing GC, it is not necessary for a ref-counting model (even ref counting with cycle detection).

                          impose read barriers and write barriers

                          These are necessary for concurrent or incremental tracing. Slightly different barriers are necessary for reference counting.

                          interrupt execution every now and execute GC code

                          This is necessary for most tracing-based approaches (even mostly concurrent collection typically needs at least to pause each thread).

                          In my experience, there is a huge value in co-designing your memory management model with your language. Trying to produce a language that is agnostic to memory-management policy and an implementation that allows many different policies means that you will make a lot of compromises that won’t become apparent until you are a long way into writing code in the language.

                          1. 1

                            So then the VM can’t be implemented in the language, because the VM sits explicitly under the language’s type safety. If you can implement the GC in the language then the language must be able to express things like converting between integers and pointers to store mark bits inside pointers and so on.

                            It is possible that you are using the term ‘self-hosting’ in a way that I am unfamiliar with. Can you elaborate a bit?

                            Perhaps I shouldn’t have said self-hosting. With regards to garbage collection, my plan is that the reference implementation of the GC will be expressed in terms of operations of a portable virtual machine. This virtual machine will have an opaque pointer representation that will not support things like storing mark bits inside pointers. The reference implementation could instead do things like storing mark bits in separate memory locations next to the pointers. Similarly, it won’t be able to take advantage of things like using the OS’s virtual memory system to trap writes for write barriers; instead, the garbage collector implementation could create a “write-with-write-barrier” function and the convention would be that the rest of the higher-level-language implementation will always use the “write-with-write-barrier” function to write to managed memory.

                            This virtual machine, plus a set of functions such as “write-with-write-barrier” (and similar functions which encapsulate not just the GC, but also stuff like scheduling), forms an abstraction layer. The goal is to write the implementation of the new language’s semantics on top of this abstraction layer. So there are three layers here: (1) the virtual machine; (2) stuff like “write-with-write-barrier”; (3) the language semantics.

                            This reference implementation will be portable and somewhat easy to read, but it will be very inefficient. If efficiency is desired, additional platform-specific implementations of the new language could be written in any other language – for example, in addition to the reference implementation, there could be another implementation of the new language in C on the GNU/Linux platform. If anyone desired to use the language for “real-world use”, they would probably need one of these more efficient platform-specific implementations. One easy way to get started writing a platform-specific implementation will be to (manually) translate the reference implementation of the virtual machine into the target language/platform (e.g. port layers 1 and 2 to C on Linux), and then incrementally improve it to take advantage of platform-specific mechanisms (eg put mark bits inside pointers, use OS paging mechanisms to trap writes). The plan is for these platform-specific improvements to replace layer (2) but leave layer (3) unchanged. This will make it easier to keep the platform-specific implementation in sync with the reference implementation as the language’s semantics are changed over time.

                            The reference implementation will have terrible performance by design, so why do I care what kind of garbage collection it does? My main concern is ensuring that the reference implementation will be structured to include whatever ‘hooks’ are needed by fancy garbage collection algorithms. So, for example, if I later wanted to make a C implementation of my language and use a garbage collector with a write barrier, I want to be able to do that by only modifying layers 1 and 2. But if the reference implementation didn’t have even the concept of a write barrier, then you’d have to overhaul layer 3 to replace all of the stores to managed memory with write-with-write-barrier. I feel like the best way to ensure I haven’t missed anything is to at least familiarize myself with the algorithms that might be needed later. If I want to be extra careful to make sure I haven’t missed anything, then I should actually use a somewhat fancy garbage collection algorithm in the reference implementation.

                            In my experience, there is a huge value in co-designing your memory management model with your language.

                            I’m still early in the designing of both the language and the memory management model. The language will be a scripting language with similar semantics and use-cases as Python, except with less emphasis on reference types and more emphasis on permitting very many threads of parallel execution – I was intrigued by the parallel (bitonic?) sort example in the book The Connection Machine and I would like to explore algorithms where the number of “threads” is similar to the number of data items being manipulated.

                            1. 2

                              This virtual machine, plus a set of functions such as “write-with-write-barrier” (and similar functions which encapsulate not just the GC, but also stuff like scheduling), forms an abstraction layer. The goal is to write the implementation of the new language’s semantics on top of this abstraction layer. So there are three layers here: (1) the virtual machine; (2) stuff like “write-with-write-barrier”; (3) the language semantics.

                              My concern here is that you’re explicitly permitting write-without-write-barrier by having this as a layer on top of the core VM, which can then introduce memory safety bugs and means that your language is not memory safe by construction, it is memory safe if you get a load of things right in multiple loosely coupled layers.

                              I’m still early in the designing of both the language and the memory management model. The language will be a scripting language with similar semantics and use-cases as Python, except with less emphasis on reference types and more emphasis on permitting very many threads of parallel execution – I was intrigued by the parallel (bitonic?) sort example in the book The Connection Machine and I would like to explore algorithms where the number of “threads” is similar to the number of data items being manipulated.

                              If you’re designing for concurrency then co-designing the memory management model and the language is even more important. Your GC can easily become a bottleneck. Temporal memory safety is a global property and so requires global synchronisation or some language properties that limit it. In Verona, for example, we allocate objects in regions. A region can have an arbitrary object graph inside but only a single pointer from the outside. We use a linear type for that and so we can deallocate an entire region deterministically when that pointer goes away and we can guarantee no concurrent mutation of objects within a region. This means that we can do local GC of any region, independently of the rest of the system (we can also have different memory-management policies in different regions, such as reference counted, traced, or bump-allocated with no deallocation before the end of the region’s lifetime).

                              For high thread counts (or, more accurately, for high degrees of concurrency), you’re going to see these problems a lot faster than in a language like Python that is mostly single-threaded with some concurrency bolted on as an afterthought. Again in Verona, we have concurrent owners (cowns) that own the root of a tree of regions and we schedule behaviours that run over a set of cowns. A cown may own a single object and so the maximum degree of concurrency is the number of objects. This required us to aggressively co-design the type system and the memory management model.

                              The simplest way of doing this is to follow an Erlang-like model where there is no mutable data at all. This gives you some very nice properties in a GC’d heap (all references are to older objects and so you can do a single-pass compacting GC) or lets you do atomic reference counting on strongly connected components. If you have shared mutable state then you need to have a memory model (what happens when two threads simultaneously read and write the same field of a shared object) and you need to carefully consider how this interacts with your memory management policy because any decision here will aggressively constrain the space of possible implementations.

                              1. 1

                                Thanks for your helpful description of Verona. It’s an interesting approach, and beyond that also it helps to clarify for me what co-designing the memory management model with a language can look like. I’ll read about Verona further.

                                My concern here is that you’re explicitly permitting write-without-write-barrier by having this as a layer on top of the core VM, which can then introduce memory safety bugs

                                Yes, i worry about this too. Some possible mitigations would be: create another intermediate layer in which raw writes are replaced by write-with-write-barrier; or, introduce some sort of automated check; or, just try to be extra careful.

                                1. 2

                                  I’ll read about Verona further.

                                  Happy to answer any questions.

                                  Yes, i worry about this too. Some possible mitigations would be: create another intermediate layer in which raw writes are replaced by write-with-write-barrier; or, introduce some sort of automated check; or, just try to be extra careful.

                                  The Java approach is to have only the safe operations in the bytecode. This is nice, because it means that you cannot express the dangerous thing, but it made it harder to implement things like MMTk. They ended up with the com.java.sun.Unsafe package that lets you bypass all of the rules.

                                  This gets problematic later because any of the invariants that you can guarantee in the language can be used as axioms for defining transforms that are valid for optimisation. If you allow code to break the rules then you can’t rely on the rules holding for optimisation (or you can go the C path and say that breaking the rules is undefined behaviour, but then you don’t have any safety guarantees). In Verona, we say that the language runtime is part of the implementation and so it is allowed to break the rules, but only in ways that the rest of the implementation (specifically, the compiler) knows about. We’ll see if that works.

                      2. 1

                        Regarding a vanilla mark sweep collector, my understanding is that that could have very long pause times?

                        I give a more detailed answer elsewhere, but essentially any non-incremental GC can have long pause times, but “can” have a long pause time is not the same as “will”. JS engines went a long time, and dealt with very large heaps, with stop the world mark/sweep collectors.

                        1. 1

                          Regarding option #2, this post is part of that work.

                          The problem is that there’s really not a one size fits all solution here, and the characteristics and semantics of your language may result in optimizations that work in your situation, but not in others. A great example of this is: One Pass Real-Time Generational Mark-Sweep Garbage Collection, by Armstrong and Virding, that was used in Erlang. Because Erlang is purely functional, this extremely simple collector was viable (it’s since been replaced). A similar algorithm (at least, I think similar, I haven’t read the paper) called “Treadmill” is linked in a sibling thread.

                          Regarding a vanilla mark sweep collector, my understanding is that that could have very long pause times?

                          It’s possible. It just totally depends on how your language executes code, and how memory is utilized in the program, and what the application actually needs to optimize for. If this language doesn’t exist yet, the GC algorithm is the least of your worries, honestly. I’d do the simplest thing and come back to it later.

                          1. 1

                            Thanks, I will take a look at that paper, and I will keep an eye out for any language-specific guarantees that I could make that might help the garbage collector.