Dwedit 16 hours ago

The game is not running from RAM, it is running from Flash ROM. This means that code and static data can be placed on ROM rather than in RAM.

This is comparable to the GBA, which has 384KB of total RAM, and a ROM cartridge slot for storing the game code and data. But the GBA is only 16MHz, the EFR32MG24 system used for this project is overclocked to 136.5MHz.

  • smirutrandola 15 hours ago

    The article says that even if you put all the static data to flash, you still have to fit about 1.5 MB of non static data, if you don't optimize it. Beside that, all graphics is loaded from the relatively slow external SPI flash, which tops at 17 MB/s with overclock. Yes, the GBA is much slower, but the access to cartridge data is faster than 17 MB/s (and also the random-read speed is in the 100 ns range, not 1-2 us range).

  • zahlman 15 hours ago

    >This is comparable to the GBA, which has 384KB of total RAM

    I assume you are thinking of the 32KiB of on-chip work RAM plus 256KiB of on-board work RAM plus 96KiB of video RAM. But pedantically there is also a 1KiB region of palette RAM and 1KiB of "object attribute memory", separate from the VRAM, making 386KiB total. (Not counting the I/O control registers, which one ordinarily wouldn't think of as "memory" but get a dedicated region of that address space.)

    Aside from the ROM on a cartridge - up to 32MiB - there is 16KiB of BIOS ROM, and the system can address 64KiB of EEPROM for game save data.

    https://problemkaputt.de/gbatek.htm#gbamemorymap

    • Dwedit 13 hours ago

      I really don't count Palette and OAM as extra memory, despite me having used unused palette memory as a place to store sound sample data.

bittwiddle 17 hours ago

Impressive memory optimizations. Streaming out converted pixel values was a neat way of pulling off the "framebuffer" without having enough memory for storing all the 16 bit values. Solid engineering.

  • Dwedit 13 hours ago

    Ooh, once we get to "streaming pixel values" out, then we're secretly using the LCD screen's internal memory as a second framebuffer.

    • marlone93 7 hours ago

      The LCD internal memory is write only and it is used just to hold the image being shown. Unlike the GBA where the video RAM is like a GP RAM, just slower.

vardump 16 hours ago

A great achievement, given the hardware.

Quake will probably run at 60 FPS on RP2350. Double buffered and with full sound quality. But it's nowhere near as hard to achieve it as on Arduino Nano Matter board. RP2350 got 520 kB RAM, dual core Cortex M33 and can run even at 300 MHz (150 MHz nominal).

Earlier: https://news.ycombinator.com/item?id=41195669

thesnide 6 hours ago

This shows the power of having a fixed computing budget.

Many modern software should really be done this way to limit the amount of energy used. Specially on laptops, but also in the cloud.

Yet, it is mostly never worth it to optimize compared to adding more features to fill a list ;)

lacoolj 18 hours ago

what's with the website load time? like individual elements on this page taking multiple seconds to show. is it not 2024 yet?

  • bragr 17 hours ago

    Having a CDN doesn't help your performance when you tell it not to cache the page

      bragr@<>:~$ dig +short community.silabs.com
      community.silabs.com.00da0000000l2kimas.live.siteforce.com.
      sdc.prod.communities.salesforce.cdn.edgekey.net.
      e78038.dsca.akamaiedge.net.
      173.223.234.17
      173.223.234.11
      bragr@<>:~$ curl -Is https://community.silabs.com/s/share/a5UVm000000Vi1ZMAS/quake-ported-to-arduino-nano-matter-and-sparkfun-thing-plus-matter-boards?language=en_US | grep -i cache
      cache-control: no-cache,must-revalidate,max-age=0,no-store,private
      x-origin-cache-control: no-cache,must-revalidate,max-age=0,no-store,private
    
    That said, the assets are cacheable so there was probably just a thundering hurd for the assets until they were well cached by Akamai's mid and edge tiers
    • toast0 17 hours ago

      When I've used a CDN, there were separate headers to control the CDN with the same semantics as cache-control... so you can serve the cache-control you want to browsers and control the CDN separately.

      If it doesn't feel like it's cached, it probably isn't; but you can't assume the cache-control headers you see are controlling the CDN.

      • bragr 17 hours ago

        Depends on the Akamai property config which could be anything. IIRC by default it uses the standard cache headers and doesn't strip or rewrite them, although it definitely can.

    • iknowstuff 16 hours ago

      ugh, old.reddit.com sends a no-store when signed in and its driving me mad because it breaks back/forward cache.

      • ahoka 15 hours ago

        All “security” guidelines blindly suggest no-store. Also private with no-store makes no sense.

  • Muromec 16 hours ago

    >individual elements on this page taking multiple seconds to show. is it not 2024 yet?

    It's exactly what 2024 feels like. Future sucks.

  • Gee101 18 hours ago

    Maybe it's running on an Arduino Nano Matter.

ant6n 15 hours ago

The real hackery is the port for GBA mentioned in the article (running on 16.7MHz): https://www.xda-developers.com/how-quake-ported-game-boy-adv...

  • smirutrandola 15 hours ago

    Yes that is really impressive.

    Still it was done with 50% more memory, 1/3 of resolution and not implementing the whole game features.

    • vardump 4 hours ago

      But with a fraction of CPU resources. Arduino Nano's Cortex M33 is overclocked at 135 MHz, while GBA's ARM7TDMI is running at mere 16.78 MHz.

      ARM7TDMI takes 1-4 cycles to perform a simple 32bit x 32bit multiply, depending on the multiplier. I believe Cortex M33 takes just 1 cycle to do same. ARM7TDMI has no divide instruction and critically, no FPU that Quake requires.

      GBA has only 32 kB of 0-wait state RAM (AKA internal working RAM). Versus 276 kB on the Arduino Nano.

      GBA's 256 kB RAM block (external working RAM) has massive 6 cycle access time when loading a 32-bit value.

      It's a true miracle someone managed to even get 1/3 of resolution on this weak hardware!

      • marlone93 4 hours ago

        I think the article says the same. The gba port is impressive.

        I guess FPU would not be even required with 120 pix horizontal resolution.

        CM33 does in a single cycle even more: 2 16 bits multiplications, addition and accumulation, for instance.

        Still it is the first time the "full" Quake was ported in less than 300 kB.

        • vardump 2 hours ago

          Agreed on other counts except for FPU.

          Quake performs one FPU divide per pixel for texture mapping perspective correction.

          ARM7TDMI does not have any kind of divide, so perspective correction is tricky, even if it's just 120 px horizontally.

          • marlone93 2 hours ago

            Afaik, Quake does not do one divide per pixel, it is in steps of 8 pixels (see dscan.c in winquake). Yes, there is non divide but instead of taking hundreds of cycles, tables and other approximations could be used. Of course, div/vdiv which take only 14 cycles or less are a strong boost on CM4/33.

            • vardump 2 hours ago

              Oh, it divides only once every 8 pixels and interpolates in between and still looks so good? I stand corrected.

              By the way, it's "d_scan.c" for anyone who's trying to web search for it.

              • marlone93 an hour ago

                It means almost an order of magnitude less divisions (and additional calculations as well).

                Quake had to do this because it would have been too much especially for a low-end Pentium when it was released in 1996. Yes it is not even noticeable, especially at low res.

anthk 14 hours ago

This is witchcraft...