AsciiDoc3 speeds up 15-20% with @functools.lru_cache

Published: 2018-09-26
AsciiDoc3 is fast, but it's getting even faster. No tricks needed, just making use of Python's "included batteries". Read more about my tuning of AsciiDoc3's core asciidoc3.py. (Feel free to follow the steps and try yourself!)
The first step is to gather some information about the behavior of asciidoc3.py before we start the tuning procedure. We use module cProfile in a quick dirty way (output shortened):
pc:~/.asciidoc3/doc$ python3 -m cProfile ../asciidoc3.py userguide.txt
8979863 function calls (8952700 primitive calls) in 13.708 seconds
Ordered by: standard name
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
41    0.000    0.000    0.000    0.000 frozen importlib._bootstrap:103(release)
41    0.000    0.000    0.000    0.000 frozen importlib._bootstrap:143(__init__)
41    0.000    0.000    0.000    0.000 frozen importlib._bootstrap:147(__enter__)
41    0.000    0.000    0.000    0.000 frozen importlib._bootstrap:151(__exit__)
41    0.000    0.000    0.000    0.000 frozen importlib._bootstrap:157(_get_module_lock)
...
46    0.000    0.000    0.000    0.000 abc.py:9(abstractmethod)
3961    0.009    0.000    0.020    0.000 asciidoc3.py:101(items)
2122    0.024    0.000    0.372    0.000 asciidoc3.py:4253(subs)
1599    0.006    0.000    0.096    0.000 asciidoc3.py:4263(isnext)
2648    0.008    0.000    0.027    0.000 asciidoc3.py:427(dovetail)
910823    1.635    0.000    2.364    0.000 asciidoc3.py:4276(match)
1924    0.022    0.000    0.110    0.000 asciidoc3.py:4295(extract_passthroughs)
1924    0.003    0.000    0.004    0.000 asciidoc3.py:4304(restore_passthroughs)
...

We are interested only in functions of asciidoc3.py (see column "filename"), not in system calls like frozen importlib._bootstrap. I tried out some of the shown functions, but in the end I got stuck with this candidate in "class Macros, def match(...)"

910823    1.635    0.000    2.364    0.000 asciidoc3.py:4276(match)

We're talking about more than 900,000 calls and 2.3 seconds cumtime - about 18% of the entire run-time: "def match(...)" deserves some acceleration. Let's start (asciidoc3.py - edited lines are bold #####):

...
from decimal import Decimal, ROUND_HALF_UP
from functools import lru_cache  #####
from io import StringIO as io_StringIO
...

class Macros:
    ...
    def isnext(self):
        """Return matching macro if block macro is next on reader."""
        reader.skip_blank_lines()
        line = reader.read_next()
        if line:
            for m in self.macros:
                if m.prefix == '#':
                    if m.reo.match(line):
                        self.current = m
                        return m
        return False

    @lru_cache()  #####
    def match(self, prefix, name, text):
        """Return re match object matching 'text' with macro type 'prefix',
        macro name 'name'."""
        for m in self.macros:
            if m.prefix == prefix:
                mo = m.reo.match(text)
                if mo:
                    if m.name == name:
                        return mo
                    if re.match(name, mo.group('name')):
                        return mo
        return None

    def extract_passthroughs(self, text, prefix=''):
    ...

    ...

def asciidoc3(backend, doctype, confiles, infile, outfile, options):
    ...
        else:
            if document.attributes.get('DoNotUse') == 'AndDoNotEdit':
                Ad3Codec.fixed_encoding = False
            else:
                Ad3Codec.fixed_encoding = True
            # to ensure encoding in subprocess and conf-files:
            Ad3Codec.add_encoding_for_compatibility()
            if 'quirks' in document.attributes:
                message.verbose('No quirks-mode in AsciiDoc3 any more!')
            writer.newline = config.newline
            try:
                writer.open(outfile, reader.bom)
                try:
                    document.translate(has_header) # Generate the output.
                finally:
                    writer.close()
            finally:
                reader.closefile()
        # inspect cache use               #####
        print(macros.match.cache_info())  #####
    except KeyboardInterrupt:
        raise
    except Exception as e:
        # Cleanup
        ...

The output looks very interesting:

CacheInfo(hits=885758, misses=25065, maxsize=128, currsize=128)
7205035 function calls (7177872 primitive calls) in 10.923 seconds
...
25065 0.046 0.000 0.075 0.000 asciidoc3.py:4275(match)

Trying other values for maxsize in steps of 2**x

@lru_cache(131072)
2**17
CacheInfo(hits=889588, misses=21235, maxsize=131072, currsize=21235)
7196631 function calls (7169468 primitive calls) in 10.912 seconds

@lru_cache(65536)
2**16
CacheInfo(hits=889588, misses=21235, maxsize=65536, currsize=21235)
7196631 function calls (7169468 primitive calls) in 10.915 seconds

BTW, the output-file "userguide.html" is binary identical in all cases.

Wow! We see (13.708 - 10.915) / 0.13708 = ~20% less time.

But beware, we have to keep in mind, AsciiDoc3 should run with any version of Python3, and @functools.lru_cache was implemented in Python 3.2. So we have to edit asciidoc3.py avoiding uncatched exceptions:

from decimal import Decimal, ROUND_HALF_UP
try:                                                   #####
    from functools import lru_cache                    #####
except (NameError, ModuleNotFoundError, ImportError):  #####
    pass                                               #####
from io import StringIO as io_StringIO

...

    def match(self, prefix, name, text):
        """Return re match object matching 'text' with macro type 'prefix',
        macro name 'name'."""
        #print("y")
        for m in self.macros:
            if m.prefix == prefix:
                mo = m.reo.match(text)
                if mo:
                    if m.name == name:
                        return mo
                    if re.match(name, mo.group('name')):
                        return mo
        return None
    try:                                                  #####
        #lru_cache not implemented in Python < 3.3        #####
        match = lru_cache(65536)(match) # 2**16 = 65536   #####
    except (NameError, ModuleNotFoundError, ImportError): #####
        pass                                              #####

...
and remove the "# inspect cache use #print(macros.match.cache_info())" statements.
The mentioned ~20% are also seen for other input files.
The edited "asciidoc3.py" will be part of the next version AsciiDoc3-3.0.3, coming asap!
Summary:
@functools.lru_cache saves about 20% of my developing time!

links