AsciiDoc3
Text based document generation using Python 3.x

»Home  »User Guide  »Download  »Install  »Quickstart  »AsciiDoc3port  »Release Notes  »Blog  »Donate  »Contact  »Legal Stuff

This document is a diary which I wrote during the porting process. You can follow my steps towards AsciiDoc3.
You do not need the information given here to work with AsciiDoc3. But perhaps some details help to understand the logic or the use of the program.
Please note that the following text is not fully rewrited (and not fully completed, too). You may encounter some errors and impreciseness.

1. Introduction, First Steps

1.1. Rationale

AsciiDoc is a well-known and useful tool to build goodlooking html-pages, docbook documents and man-pages out of one single source file of plain text. Via a2x you can avoid the disturbing xml of docbook and get pdf, html, and other. a2x uses AsciiDoc. Both asciidoc.py and a2x.py are written in Python2 and not executable with Python3. My aim is porting asciidoc.py (and in a second step, a2x.py and the other files) from Python2 to Python3 (later in this text often called as v2 and v3, respectively). This document describes in detail my efforts about this. Because of my mother tongue is not english, it comes in a not-so-smooth - but hopefully sufficent - wording. Sorry in advance. On *nix-operating systems (Unix, GNU/Linux, BSD …) asciidoc is easily installed via the well-known repositories using apt-get, yum, emerge and so on. Windows users have to go a more complicated way. If you want more information about this and AsciiDoc or a2x in general, have a look at the following web ressources: "asciidoc.org" or "methods.co.nz/asciidoc". The information given there can almost at 100% found under "asciidoc3.org", too. The latter website tries to give an overview about AsciiDoc3.
I wrote this file in first line for myself; it is to learn more about Python3 and AsciiDoc. You are invited to follow my steps, but you may of course skip all the steps that are not interesting for you. To use asciidoc3 neither the details of the porting procedure or even this document is necessary. You may read the "qickstart", "faqs" or the other docs given at "asciidoc3.org" to make a beginning.

1.2. Open Source, License, GitLab

AsciiDoc is open source. "COPYRIGHT" says: "… you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version…"
Thank you, Stuart, for writing this marvellous piece of software and owe it to the people! I hereby declare all my work on AsciiDoc, modifying it or adding something, Copyright © by Berthold Gehrke. Free use of this software is granted under the terms of the GNU General Public License Version 2 or, at your option, any later version (GNU GPLv2+).
You can find the sources and additional information at https://www.gitlab.com/asciidoc3/asciidoc3.

1.3. Setting up the enviroment

1.3.1. Download and local installation

As a first step I download the tarball from https://sourceforge.net/projects/asciidoc/files/latest/download. The time I’m writing this the current version is "asciidoc-8.6.9.tar.gz". So I deflate asciidoc-8.6.9.tar.gz in a temporary local folder (i.e. "asciidoc-8.6.9") and copy asciidoc.py (only this one file) to my "working directory" ~/ad3. "ad3" stands for asciidoc3. Now I have all asciidoc-files „local“ in ~/asciidoc-8.6.9 and just one file asciidoc.py in ~/ad3. If I notice a missing file when processing the work, I’ll copy the needed file to ~/ad3.

1.3.2. The first text: Hello World

To test my "enviroment", let me try to "translate" a very simple text to html using ~/ad3/asciidoc.py. The very minimalistic input file "helloworld.txt" to process with asciidoc.py is not really complicated. It contents just on line (12 ASCII characters):

helloworld.txt
Hello World!

I save "helloworld.txt" in a new directory ~/ad3/inputfiles and try to write a html-file to the new directory ~/ad3/inputfiles:

python2 ~/ad3/asciidoc.py -o ~/ad3/outputfiles/helloworld.html ~/ad3/inputfiles/helloworld.txt

but: "asciidoc: FAILED: configuration file asciidoc.conf missing".
So I copy "asciidoc.conf" from ~/asciidoc-8.6.9 to ~/ad3; repeating shows
"asciidoc: FAILED: missing backend conf file: xhtml11.conf".
Copying this and run again results to a longer output:

"asciidoc: WARNING: include file not found: ~/ad3/stylesheets/asciidoc.css
asciidoc: WARNING: include file not found: ~/ad3/javascripts/asciidoc.js
asciidoc: FAILED: missing conf file: lang-en.conf"

After copying this (note the two new directories "stylesheets" and "javascripts") and starting it once again, no FAILED or WARNING appears and in folder ~/ad3/outputfiles/ a new file "helloworld.html" shows up. Opening it with Mozilla:

helloworld with AsciiDoc (Python2)

helloworld

Tip The horizontal rule and the line "Last updated: … CEST" can be configured (or omitted) by the file "lang-en.conf" or the respective conf in your own language, i.e. "lang-fr.conf" etc, in the section "[footer-text]"

Please keep in mind, that I work "local", only in my home drectory. There is no command-line like this:

asciidoc -o helloworld.html helloworld.txt

because I have not installed asciidoc "sudo apt-get install asciidoc" and therefore it’s not found in PATH. Due to the make-file and *.conf-files asciidoc assumes to find some information in certain folders and/or is searching for it during execution. If you like to follow my steps of porting to Python3 on your own machine and asciidoc is installed, try

sudo apt-get purge asciidoc

to rule out any interference. So did I, all necessary asciidoc/a2x work in the meantime was made for me via a kvm-virtual installed lubuntu. Of course, something like virtualbox or jails is another option.

The first step is done, let’s have a short break. Now I have the following data in my "working directory" ~/ad3. The other files are still in ~/asciidoc-8.6.9, waiting for the time they are needed.

  1. Directories and Files

.
├── asciidoc.py
├── asciidoc.conf
├── inputfiles
│   ├── helloworld.txt
├── javascripts
│   └── asciidoc.js
├── lang-en.conf
├── outputfiles
│   └── helloworld.html
├── stylesheets
│   └── asciidoc.css
└── xhtml11.conf

1.3.3. Files to be ported

AsciiDoc comes with a bundle of python source code files *.py, configuration files *.conf, txt-files *.txt and other (*.png, *.css, *.js, *.aap …) in a few folders, now located in my directory ~/asciidoc-8.6.9. If you take a second look, you’ll find eight py-files:

"./filters/code/code-filter.py"
"./filters/graphviz/graphviz2png.py"
"./filters/latex/latex2png.py"
"./filters/music/music2png.py"
"./tests/testasciidoc.py"
"a2x.py"
"asciidoc.py"
"asciidocapi.py"

As mentioned before, in the first step I lay my focus on "asciidoc.py" - the other files will follow later, see below.

1.3.4. Python2-, Python3-Interpreter, IDLE

Beside downloading and deflate the asciidoc-files there are almost no requirements. At last I need a python2 and a python3 interpreter running executable on my machine. That is of course given on almost every Linux-System: "python" is in most cases a link to python2, to start python3 you have to enter - you guessed it - "python3". Furthermore I use the included IDLE (2.7 or 3.5) which come with Ubuntu16.04.

1.4. Summary

To migrate asciidoc.py, I use the source (and all the other files) included in asciidoc-8.6.9.tar.gz, that I deflate in an development-working directory. All work is done on a usual PC running with Ubuntu16.04 LTS; Python2/IDLE and Python3/IDLE are used as given.

2. Editing asciidoc2.py

2.1. Adding line numbers to asciidoc2.py

The first thing to do is to copy asciidoc.py to asciidoc2.py, this is done with the following "addlineno.py". This is to underline the diference between the originally asciidoc.py and the edited file: asciidoc2.py
All further work is done with "asciidoc2.py" as the source.

The starting file asciidoc2.py, which is binary identical to asccidoc.py, is a 6265-line source to be executed with python2. For better handling - documentation and customizing via regex - I add "line numbers" at the end of each line.

You can find "addlineno.py" here.

#!/usr/bin/env python3
""" adds 'line numbers' at the end of every line of a given
    python2 or python3 source code file. This is an
    'easy quick tool' written beside a greater project,
    migrate 2to3 Asciidoc, see 'asciidoc3.org'
    Copyright (c) 2017 by Berthold Gehrke
    <berthold.gehrke@gmail.com> Free use of this software
    is granted under the terms of the GNU General Public
    License version 3 (GNU GPL v3).
    """

from os.path import expanduser

homedir = expanduser('~')             # e.g.: /home/username
max_linelength = 0

with open(homedir + "/ad3/asciidoc.py","r") as inputfile:
    """
    find out the maximum length of lines in a file ... """

    for line in inputfile:
        # more exactly than line.split('\n'): see \n\r in Windows-OS
        part = line.splitlines()
        len_a = len(part[0])
        max_linelength = max(max_linelength, len_a)
    # ... and add some whitespace
    max_linelength += 5
    #print(max_linelength)


with open(homedir + "/ad3/asciidoc.py","r") as inputfile:
    """
    writes the line number at the end of the line"""

    with open(homedir + "/ad3/asciidoc2.py","w") as outputfile:  1
        line_nr = 1
        for line in inputfile:
            part = line.splitlines()   # see comment above
            len_a = len(part[0])
            if part[0].strip().endswith('\\'):
                # Do nothing because of '\' (line continuation char)
                # is not always (mostly not) compatible with # comments.
                # The next line - the first without '\' - gets the
                # proper line number.
                pass
            else:
                line = "".join(part[0] + (max_linelength - len_a)*" " \
                               + "##_nr_: " + str(line_nr) + " _" + "\n")
                # the trailing " _" makes it a little easier for module
                # "re" to work: r'##_nr_10' would also find '##_nr_100'
                # + "\n" = Unix or GNU/Linux; replace with '+ "\r\n"'
                # on Windows-OS or "\r" on older Mac-OS etc.
            outputfile.write(line)
            line_nr += 1
1 asciidoc2.py is binary identical to the original asciidoc.py

The result is shown in the following example. Note that some characters/indentation and whitespace preceeding the ##_nr_ are removed to make it more readable.

Source code asciidoc.py: lines 29-32 before adding line numbers
[...]
SUBS_VERBATIM = ('specialcharacters','callouts')

NAME_RE = r'(?u)[^\W\d][-\w]*' # Valid section or ...
OR, AND = ',', '+'             # Attribute list separators.
[...]
Source code, lines 29-32, after adding line numbers, now asciidoc2.py
[...]
SUBS_VERBATIM = ('specialcharacters','callouts')                ##_nr_: 29
                                                                ##_nr_: 30
NAME_RE = r'(?u)[^\W\d][-\w]*' # Valid section or ...           ##_nr_: 31
OR, AND = ',', '+'             # Attribute list separators.     ##_nr_: 32
[...]

Note that "addlineno.py" has to pay attention on the line continuation character "\". Here comes an second example.

Source code asciidoc.py: line 598 et seqq.
[...]
# enveloping quotes and punctuation e.g. a='x', ('x'), 'x', ['x'].
     reo = re.compile(r'(?msu)(^|[^\w;:}])(\[(?P<attrlist>[^[\]]+?)\])?' \
        + r'(?:' + re.escape(lq) + r')' \
        + r'(?P<content>\S|\S.*?\S)(?:'+re.escape(rq)+r')(?=\W|$)')
[...]
Source code, line 598 et seqq., after adding line numbers, now asciidoc2.py - line continuation charcater
[...]
# enveloping quotes and punctuation e.g. a='x', ('x'), 'x', ['x'].   ##_nr_: 598
     reo = re.compile(r'(?msu)(^|[^\w;:}])(\[(?P<attrlist>[^[\]]+?)\])?' \
        + r'(?:' + re.escape(lq) + r')' \
        + r'(?P<content>\S|\S.*?\S)(?:'+re.escape(rq)+r')(?=\W|$)')  ##_nr_: 601
[...]

As you see, "addlineno.py" works fine.

Warning Due to the lineno (the additional characters ##_nr_ …) the option "--doctest" is no longer suitable - this will be corrected later.

2.2. IDLE (2.7.12)

As I mentioned before, I use as an IDE the python-included IDLE; yes, I do – despite of the well known limitations. To see if everything is ok so far, I try to translate the "helloworld.txt" via IDLE. To do so, we have to give the command line options directly hardwired in the file asciidoc2.py. I need somthing like

if not sys.argv[1:]:
    sys.argv += ["-o", "./outputfiles/helloworld.html", "./inputfiles/helloworld.txt"]

Let’s do it: add the above snippet manually to asciidoc2.py …

[...]
import sys, os, re, time, traceback, tempfile, subprocess, codecs, locale, unicodedata, copy    ##_nr_: 9 _
if not sys.argv[1:]:
    sys.argv += ["-o", "~/ad3/outputfiles/helloworld.html", "~/ad3/inputfiles/helloworld.txt"]
                                                                                                ##_nr_: 10 _
### Used by asciidocapi.py ###                                                                  ##_nr_: 11 _
[...]

... open the altered asciidoc2.py with IDLE (2.7.12) and run it (F5). It works, I find "helloworld.html" in ~/ad3/outputfiles/.

2.3. inputone.txt

To test asciidoc2.py and later asciidoc3.py :-) with an "real" file, that makes some more use of AsciiDoc (not only as much as "helloworld.txt"), we use "inputone.txt". You can find it here.

inputone.txt; our long term "test-input"
== First Paragraph ==
Here comes something, nobody cares about this stuff. And now *bold*, and this ist _italic_, and the last +monospaced+. After this (hidden +) +
begins a new line.
The german umlauts and two other non-ascii glyphs / letters: äöüÄÖÜß € (euro) ¢ (cent).

== Second Paragraph
This sample comes with an image. The file name paths are relative to the location of the referring document:
image:redsquare.jpg[caption]
The "caption" in brackets corresponds to HTML alt="caption" and is not seen in the browser. The image is rendered inline; this is perhaps not preferred footnote:[or is it?], we have a footnote here.

Another snippet:

image:redsquare.jpg[caption]

New line and the image at the beginningfootnote:[footnote without space] of the line.

New line:

image::redsquare.jpg["caption",align="center"]

We see the image at the center of the line. At the end a link: Visit the http://www.asciidoc3.org[home of asciidoc3]! +
_END_
Note Before we undergo asciidoc3.py some unittests, this file is renamed to in001.txt to have the option to process it automatically.

For the moment, please note, that our inputone.txt is very simple and covers only a few of the possibilities of AsciiDoc; I do know that. But that is just what I want; you’ll see soon the reason why.

Changing the line "sys.argv += ["-o", "~/ad3/outputfiles/helloworld.html", "~/ad3/inputfiles/helloworld.txt"]" to "sys.argv += [-o" , "./outputfiles/outputone.html", "./inputfiles/inputone.txt"]" in asciidoc2.py leads to this output:

inputone.txt computed with AsciiDoc (Python2): outputone.html

outputone

Warning Javascript has to be enabled in your browser to render inputone.html correctly!
Note Later you’ll find this output as o001.html.

2.4. Ready to start

Now everything is ready to start 2to3: asciidoc2.py runs with IDLE (Python2.x). And it’s working: Inputfile "inputone.txt" hardwired computes to "outputone.html".
I have the following data in my "working directory" ~/ad3.

.
├── asciidoc2.py
├── asciidoc.conf
├── asciidoc.py
├── inputfiles
│   ├── helloworld.txt
│   ├── inputone.txt
│   └── redsquare.jpg
├── javascripts
│   └── asciidoc.js
├── lang-en.conf
├── mytools
│   └── addlineno.py
├── outputfile
├── outputfiles
│   ├── helloworld.html
│   └── outputone.html
├── stylesheets
│   └── asciidoc.css
└── xhtml11.conf

2.5. Why not trying 2to3 right now?

It seems to be the easiest way: Why not use Pythons script 2to3 to migrate asciidoc2.py right now? Let’s see what happens:

2to3 -v  asciidoc2.py

Due to the "-v" ("--verbose") option there is a lot of information output, the tool does not write any changes (dry run).

[...]
--- ~/ad3/asciidoc2.py  (original)
+++ ~/ad3/asciidoc2.py  (refactored)
@@ -64,7 +64,7 @@
         d._keys = self._keys[:]                               ##_nr_: 62 _
         return d                                              ##_nr_: 63 _
     def items(self):                                          ##_nr_: 64 _
-        return zip(self._keys, self.values())                 ##_nr_: 65 _
+        return list(zip(self._keys, list(self.values())))     ##_nr_: 65 _
     def keys(self):                                           ##_nr_: 66 _
         return self._keys                                     ##_nr_: 67 _
     def popitem(self):                                        ##_nr_: 68 _
[...]

Looks nice, so I try

2to3 -v -w -n --add-suffix=3 ~/ad3/asciidoc2.py

Now I find a new file "asciidoc2.py3", rename it to "asciidoc3_try.py" and open it with IDLE (now of course using Python3.5). Indeed, line ##_nr_: 65 _ is changed to

return list(zip(self._keys, list(self.values()))) ##_nr_: 65 _

as seen above in the dry run snippet. So let’s run (F5):

asciidoc3_try: FAILED: unexpected error:
asciidoc3_try: ------------------------------------------------------------
Traceback (most recent call last):
  File "~/ad3/asciidoc3_try.py", line 5925, in asciidoc
    if not config.load_from_dirs('asciidoc.conf',include=['attributes']):  ##_nr_: 5924 _
  File "~/ad3/asciidoc3_try.py", line 4737, in load_from_dirs
    if self.load_file(f, include=include):                                 ##_nr_: 4736 _
  File "~/ad3/asciidoc3_try.py", line 4610, in load_file
    rdr.open(fname)                                                        ##_nr_: 4609 _
  File "~/ad3/asciidoc3_try.py", line 4117, in open
    if Reader1.read(self):                                                 ##_nr_: 4116 _
  File "~/ad3/asciidoc3_try.py", line 4135, in read
    if len(self.__next__) <= self.READ_BUFFER_MIN:                         ##_nr_: 4134 _
AttributeError: 'Reader' object has no attribute '__next__'

What the hell is that message meaning? Let’s look at line ##_nr_: 4134 _ (Python3):

if len(self.__next__) <= self.READ_BUFFER_MIN:                              ##_nr_: 4134 _

and at the corresponding asciidoc2.py line (Python2):

if len(self.next) <= self.READ_BUFFER_MIN:                                  ##_nr_: 4134 _

Doing so it’s easy to see: of course "class Reader()" has no attrtibute '__next__'. See (Python2):

self.next = []          # Read ahead buffer containing                      ##_nr_: 4089 _

The answer is given in the documentation about the fixer next:

next Converts the use of iterator’s next() methods to the next() function. It also renames next() methods to __next__().

— https://docs.python.org/2/library/2to3.html

asciidoc2.py’s "class Reader()" has no iterator but a list-type variable called "next". Probably this can be fixed by excluding the fixer "next":

2to3 -v -w -f all -x next -f buffer -f set_literal -f idioms -f ws_comma -n --add-suffix=3 ~/ad3/asciidoc2.py

And in deed, running the new asciidoc3_try.py avoids the next-error. But it shows a new (the next :-) error message:

asciidoc3_try: FAILED: unexpected error:
asciidoc3_try: ------------------------------------------------------------
Traceback (most recent call last):
  File "~/ad3/asciidoc3_try.py", line 5924, in asciidoc
    if not config.load_from_dirs('asciidoc.conf', include=['attributes']): ##_nr_: 5924 _
  File "~/ad3/asciidoc3_try.py", line 4736, in load_from_dirs
    if self.load_file(f, include=include):                                 ##_nr_: 4736 _
  File "~/ad3/asciidoc3_try.py", line 4609, in load_file
    rdr.open(fname)                                                        ##_nr_: 4609 _
  File "~/ad3/asciidoc3_try.py", line 4116, in open
    if Reader1.read(self):                                                 ##_nr_: 4116 _
  File "~/ad3/asciidoc3_try.py", line 4154, in read
    mo = macros.match('+', r'^include[1]?$', result)                       ##_nr_: 4154 _
  File "~/ad3/asciidoc3_try.py", line 3814, in match
    mo = m.reo.match(text)                                                 ##_nr_: 3814 _
TypeError: cannot use a string pattern on a bytes-like object

And as a matter af fact, the wide field of Python-2to3 concerning byte-like objects vs. strings will take a huge amount of my time … see below. Perhaps I can handle the byte/string matter at this point, but I prefer another approach toward asciidoc3.py. I stop using 2to3 for a while and come back later. Follow me to the next section to learn more about it.
My approach: I’m not going the way with 2to3 as described in the section above. Not yet. It is in deed a hard job to eliminate the "bugs" in asciidoc3_try.py after using 2to3. You have seen a first glance on it in the section above. I’ll try another way. I’ll try to strip all (to be more precisely: a great part) of asciidoc2.py, i.e. lines of source code, that are not mandatory to translate inputone.txt to outputone.html. This "abridged" asciidoc2_new.py is then ported to Python3 via the well known 2to3 script and afterwards I’ll (try to) fix the tracebacks and failures … The next step will see a more difficult inputtwo.txt, then inputthree.txt and so on till all input is working.
To stress out once again: I know, that is perhaps not the "best" or pythonic way, but I’d like to go this way nevertheless. This is also because I want to learn more about Python3 and about asciidoc.

2.6. Using editad2.py

I do not want to edit asciidoc2.py manually: do it very often, and backtrack, and go forward again … Probably using Git would be a solution, but as mentioned before, it is too much overhead for a single developer. So let’s do it with a Python3 program: "editad2.py". It makes heavily use of regex - module re: re.sub() - and string methods like replace(). I’ll append the neccessary tuning of the original asciidoc2.py step by step. Every step appends some lines of code, to do it for yourself, you may outcomment the unneccessary or unused parts. editad2.py resides in the directory "mytools" and is executed with Python3. I do not use 2to3 on the entire original asciidoc2.py source code in the first step. I’ll wipe out source code that is not necessary to translate a simple input to html. That abridged asciidoc2.py is undergoing 2to3 - and then, we’ll see.

In other words, we are preparing asciidoc2.py for migrating to Python3. To do the further steps, we use "editad2.py" — a python3 program — to edit the original (yet linenumbered) asciidoc2.py. The output of editad2.py is "asciidoc2_new.py", another Python2 program. asciidoc2_new.py is expected to do (and has to do!) the same work as asciidoc2.py on the same input respectively.
I can check this easily by letting run asciidoc2_new.py in IDLE 2.7. The output has to be binary identical to the output of asciidoc2.py (and the original asciidoc.py, too).

2.6.1. Starting point

The first step is to add line numbers to asciidoc2.py - that is done already. The second was the IDLE-tuning, adding "sys.argv[1:] …" We did this manually, please see this managed by editad2.py.

editad2.py; customizing asciidoc2.py
#!/usr/bin/env python3
""" Author:  Berthold Gehrke <berthold.gehrke@gmail.com>
    Edits asciidoc2.py to be easier ported to Python v3.
    Detailed Information is given here: 'asciidoc3.org'
    (c) 2017, GPLv3
    """

import re
from os.path import expanduser

homedir = expanduser('~')

# input-file (asciidoc2.py = asciidoc.py with linenr) to edit, no change will made
# 'ad2new' = the abridged asciidoc2.py as a string -> asciidoc2_new.py
with open(homedir + "/ad3/asciidoc2.py","r") as thestart:
    ad2new = thestart.read()

# start editing asciidoc2.py
# import only one module per line in alphabetical order
ad2new, counter = re.subn(r'import sys, os, re.*?##_nr_: 10 _', \
           """import codecs
import copy
import locale
import os
import re
import subprocess
import sys
import tempfile
import time
import traceback
import unicodedata

#########
""", ad2new, flags=re.S)
assert counter == 1

# add sys.argv[] to start via IDLE F5
ad2new, counter = re.subn(r'#########', \
                         r"""
if not sys.argv[1:]:
    sys.argv += ["-o", "./outputfiles/outputone.html", "./inputfiles/inputone.txt"]
    #sys.argv += ["-o", "./outputfiles/outputone.html", "./inputfiles/oldtables.txt"]

""", ad2new)
assert counter == 1

with open(homedir + "/ad3/asciidoc2_new.py","w") as theend:
    theend.write(ad2new)

2.6.2. Import statements, sys.argv

Why did I rearrange the "import"-section to a single import statement per line? First it makes the "import" more clear and readable, see here or here. And script 2to3 works more precisely, though it seems not be necessary in our case. The alphabetic order in one line is recommended by Guido van Rossum, the BDFL himself, see here. Let’s do it your way, Guido!
The next step is the adding of "sys.argv +=", see above.

2.7. Points without changing the program logic

2.7.1. Part i

At this point I go through asciidoc2_new.py from source code line 1 to the end. Perhaps I notice some interesting things regarding porting 2to3. Here we are:

##_nr_: 12 _
VERSION = '8.6.9' … # See CHANGLOG file for version history.
→ to be edited in asciidoc3.py

##_nr_: 14 _
MIN_PYTHON_VERSION = '2.4' … # Require this version of Python or better.
→ of course to be changed in asciidoc3.py (2.4 to 3.0)

##_nr_: 31 _
NAME_RE = r'(?u)[^\W\d][-\w]*' … # Valid section or attribute name.
→ (?u) is set by default as an implicit/deprecated flag in Python3, but there is no need in changing this now. These needless, but innocuous assignments may be deleted when everything works: There is a lot of (?u) in asciidoc2.py …

##_nr_: 41 _
class OrderedDict(dict): …
→ We find an "class collections.OrderedDict()" in Python3’s module "collections". Probably it’s more or less the same as implemented here in AsciiDoc2. In Python 3.6 the dict type has been reimplemented: "The order of elements in **kwargs now corresponds to the order in which keyword arguments were passed to the function…" Of course we do not alter our program, we can do it later, when asciidoc3.py is running fine.

2.7.2. A First Improvement

##_nr_: 382 _
if float(sys.version[:3]) >= 2.6 or sys.platform[:4] == 'java':
→ There is no need of the else-branch beginning in ##_nr_: 410 _, because "float(sys.version[:3])" is of course greater than 2.6 when using Python3, isn’t it?

We have to do three things: 1. wipe out the else-branch 2. dedent the if-branch and wave the if-statement 3. move the "import" to the top. The third, because this is the recommended place for all import-statements as seen above.

Erasing the else-branch
# no need of else branch
# erase else branch
ad2new, counter = re.subn(r'else:   # Use deprecated.*?##_nr_: 474 _', r'', ad2new, flags=re.S)
assert counter == 1

# erase 'if' and import
ad2new, counter = re.subn(r'# The following functions are so we don.*?##_nr_: 386 _', r'', ad2new, flags=re.S)
assert counter == 1

# dedent 'if-branch'
ad2new, counter = re.subn(r'    def get_args\(val\):.*?##_nr_: 409 _', \
           """def get_args(val):                                          ##_nr_: 387 _
    d = {}                                                                ##_nr_: 388 _
    args = ast.parse("d(" + val + ")", mode='eval').body.args             ##_nr_: 389 _
    i = 1                                                                 ##_nr_: 390 _
    for arg in args:                                                      ##_nr_: 391 _
        if isinstance(arg, ast.Name):                                     ##_nr_: 392 _
            d[str(i)] = literal_eval(arg.id)                              ##_nr_: 393 _
        else:                                                             ##_nr_: 394 _
            d[str(i)] = literal_eval(arg)                                 ##_nr_: 395 _
        i += 1                                                            ##_nr_: 396 _
    return d                                                              ##_nr_: 397 _
                                                                          ##_nr_: 398 _
def get_kwargs(val):                                                      ##_nr_: 399 _
    d = {}                                                                ##_nr_: 400 _
    args = ast.parse("d(" + val + ")", mode='eval').body.keywords         ##_nr_: 401 _
    for arg in args:                                                      ##_nr_: 402 _
        d[arg.arg] = literal_eval(arg.value)                              ##_nr_: 403 _
    return d                                                              ##_nr_: 404 _
                                                                          ##_nr_: 405 _
def parse_to_list(val):                                                   ##_nr_: 406 _
    values = ast.parse("[" + val + "]", mode='eval').body.elts            ##_nr_: 407 _
    return [literal_eval(v) for v in values]                              ##_nr_: 408 _
""", ad2new, flags=re.S)
assert counter == 1

# move import-statements
ad2new, counter = re.subn(r'import codecs', r'import ast\nfrom ast import literal_eval\n\g<0>', ad2new)
assert counter == 1
Warning From now on you need Python >= 2.6 to run asciidoc2_new.py and the tests!

2.7.3. Part ii

##_nr_: 1223 _
def char_encoding():
→ Here and in the following lines we see the "coding" of both the input and output document. This will take some "rearrangements" in AsciiDoc3 because of the new principles of coding strings in Python3.

2.7.4. Erasing next

##_nr_: 1299 _
def next():
→ We had some irritation because of next() vs. __next__ (see above). To avoid any interference, I decide to rename "next" to "nxt" in asciidoc2_new.py. Due to this there will be no "next" in asciidoc3.py, too. That is not mandatory: another option is to exclude the fixer "next", but "nxt" makes it somewhat more clear. There is no "next-iterator" at this place.

"next" converts to "nxt"
# BEGIN next to nxt; _nr_: 1299 _ etc
ad2new, counter = re.subn(r'def next\(\):', r'def nxt(): ', ad2new)
assert counter == 1

# nr1616, 1622, 1648, 1660, 1664
# nr1756, 1858, 1936, 1990, 2026
# nr2204, 2274, 2311, 2316, 2809
# nr2833, 2835, 2839, 2965
ad2new, counter = re.subn(r'Lex\.next\(\)', r'Lex.nxt()', ad2new)
assert counter == 19

# seven next in 'def translate_body(terminator=Title): ##_nr_: 2309 to 2324':
# nr2311, 2312, 2312, 2313, 2315, 2316, 2319
def sevennxt(part):
    a, b = re.subn(r'next', r'nxt', part.group())
    assert b == 7
    return a
ad2new, counter = re.subn(r'def translate_body.*?2324 _', sevennxt, ad2new, flags=re.S)
assert counter == 1

# ten next in 'def translate_item(self): ##_nr_: 2818 to 2853':
# nr2831, 2839, 2840, 2841, 2842, 2843, 2844, 2845, 2848, 2850
def tennxt(part):
    a, b = re.subn(r'next', r'nxt', part.group())
    #print("b =", b)
    assert b == 10
    return a
ad2new, counter = re.subn(r'def translate_item.*?2853 _', tennxt, ad2new, flags=re.S)
assert counter == 1

# but _not_ here "continuation = reader.read_nxt() == '+' ##_nr_: 2831 _"
ad2new, counter = re.subn(r"continuation = reader\.read_nxt\(\) == '\+' ", \
                          r"continuation = reader.read_next() == '+'", ad2new)
assert counter == 1

ad2new, counter = re.subn(r'self\.next', r'self.nxt', ad2new)
assert counter == 13
# END next to nxt

2.7.5. Erasing "Old Tables"

##_nr_: 1327 _
elif tables_OLD.isnext():
→ Old tables are deprecated in asciidoc2.py. My decision is to wipe out "old tables".

No need of old tables any more
# 'old table' class is deprecated/obsolete
ad2new, counter = re.subn(r'# Deprecated old table classes.*?5189 _', \
                          r"""#Old table classes are no nonger supported ...
""", ad2new, flags=re.S)
assert counter == 1

ad2new, counter = re.subn(r'    def load\(self,name,entries\):\s+##_nr_: 5217 _.*?5593 _', \
                          r"""
    def translate(self):
        message.verbose('deprecated old tables found --> old tables are no longer supported in asciidoc3.py')
        sys.exit("deprecated old tables found --> old tables are no longer supported in asciidoc3.py")
""", ad2new, flags=re.S)
assert counter == 1

If asciidoc3_new.py detects an old table, a message is shown (only in verbose-mode) and the execution stops.

2.7.6. Part iii

##_nr_: 2082 _
s = ul[:2]*((ul_len+1)/2)
→ We have to be careful with porting the "/"-Divison in Python2 to "//" in Python3.
→ The same in _nr_: 3239, 3322, 3339, and 3340
(→ _nr_: 5350 _ is already erased.)

##_nr_: 5836 _
for f in os.walk(d).next()[1]:
→ I do not rename it to somethimg like ".nxt()[1]", this is really an iterator!

2.7.7. Erasing/ignoring option "unsafe"

##_nr_: 6132 _
#DEPRECATED: --unsafe option
→ "Unsafe" option is deprecated. Asciidoc3 will ignore --unsafe.

no need of option "unsafe" any more
# 'unsafe' option is deprecated/obsolete
ad2new, counter = re.subn(r'        #DEPRECATED: --unsafe option.*?6132 _', \
                          r"""        #DEPRECATED: --unsafe option is ignored!
        if o == '--unsafe':
            message.verbose('unsafe option is ignored!') #document.safe = False""", ad2new, flags=re.S)
assert counter == 1

2.7.8. Erasing "doctest"

##_nr_: 6098 _
Doctests: …
→ Doctests may cause problems, because the output of Python3 (remember: bytes vs. str) is not the "same". I wipe out the one and only doctest here. In addition: our lineno leads to altered output; that is a second reason to wipe out. Or look here or read this, bold words are so in original:

... First of all: do not use doctest. There is a doctest converter in 2to3, but it does not give you much. …

— http://lucumr.pocoo.org/2010/2/11/porting-to-python-3-a-guide/
No need of option "doctest" (for the present)
# 'doctest' option is unuseful
ad2new, counter = re.subn(r'    Doctests:.*?6116 _', r'    """', ad2new, flags=re.S)
assert counter == 1

ad2new, counter = re.subn(r'        # Run module doctests\..*?6236 _', \
                          r"""        # No doctest in asciidoc3
        message.verbose('no doctest in asciidoc3')
        sys.exit("no doctest in asciidoc3")""", ad2new, flags=re.S)
assert counter == 1

2.7.9. Summary

So I have reached the end of asciidoc2_new.py. If I run asciidoc2_new.py, it generates outputone.html as before. My working directory looks like this:

Directories and files at this particular time
.
├── asciidoc2_new.py
├── asciidoc2.py
├── asciidoc3_try.py
├── asciidoc.conf
├── asciidoc.py
├── inputfiles
│   ├── helloworld.txt
│   ├── inputone.txt
│   └── redsquare.jpg
├── javascripts
│   └── asciidoc.js
├── lang-en.conf
├── mytools
│   ├── addlineno.py
│   └── editad2.py
├── outputfiles
│   ├── helloworld.html
│   └── outputone.html
├── stylesheets
│   └── asciidoc.css
└── xhtml11.conf

"asciidoc3_try.py" is unnecessary, so I delete it.

2.8. Temporarly erasing unused stuff

2.8.1. Import trace

As said before, "inputone.txt" uses only a minimum of the power of asciidoc. There is no need of some of the classes, functions, definitions or options in asciidoc2_new.py to deal with inputone.txt. Some hints here are: plugins, themes, header, trace, FloatingTitle …
I’ll try to eliminate the unused stuff step by step.

But how can we see what is "unused stuff"? I make use of Pythons "trace"; there are no important differences between the Python2 and Python3 module regarding this. I add "trace" as "my_trace" to prevent confusion with AsciiDoc’s own trace-option. The function we trace is called "def ad_main()" and consists of nothing else than the "if __name__ == __main__"-part. At last we add my_trace-output statements at the end.

Adding my_trace
# add builtin 'trace' as my_trace (to prevent confusion with asciidoc's trace)
# function to be traced is 'def ad_main()': that is nothing else than 'if __name__ == '__main__' ...
# add my_trace-output statements at the end of source code
ad2new, counter = re.subn(r"(?<=6211 _\n)if __name__ == '__main__':(.)*? ##_nr_: 6212 _", \
           "def ad_main():\g<1>                                ##_nr_: 6212 _new: 'main' is to be traced", ad2new, flags=re.S)
assert counter == 1

ad2new, counter = re.subn(r'##_nr_: 6265 _', \
           """##_nr_: 6265 _

homedir = os.path.expanduser('~')

import trace as my_trace
tracer = my_trace.Trace(ignoredirs = [sys.prefix, sys.exec_prefix],
                        trace = 0,
                        count = 1)
tracer.run('ad_main()')
r = tracer.results()
print '  begin of tracer output '
r.write_results(show_missing=True, coverdir = homedir + '/ad3/my_traceoutput')
print '  end of tracer output '""", ad2new, flags=re.S)
assert counter == 1

2.8.2. Running "trace"

If I run the current asciidoc2_new.py, what happens? First, the output is file outputone.html, which is binary identical to the primary outputone.html. That’s expected. And in addition we get the analysis in ~/ad3/my_traceoutput/asciidoc2_new.cover. A snippet follows:

Part of my_traceoutput
>>>>>> class AttrDict(dict):                                                           ##_nr_: 88 _
           """                                                                         ##_nr_: 89 _
           Like a dictionary except values can be accessed as attributes i.e. obj.foo  ##_nr_: 90 _
           can be used in addition to obj['foo'].                                      ##_nr_: 91 _
           If an item is not present None is returned.                                 ##_nr_: 92 _
           """                                                                         ##_nr_: 93 _
>>>>>>     def __getattr__(self, key):                                                 ##_nr_: 94 _
  193:         try: return self[key]                                                   ##_nr_: 95 _
   70:         except KeyError: return None                                            ##_nr_: 96 _
>>>>>>     def __setattr__(self, key, value):                                          ##_nr_: 97 _
   70:         self[key] = value                                                       ##_nr_: 98 _
>>>>>>     def __delattr__(self, key):                                                 ##_nr_: 99 _
>>>>>>         try: del self[key]                                                      ##_nr_: 100 _
>>>>>>         except KeyError, k: raise AttributeError, k                             ##_nr_: 101 _
>>>>>>     def __repr__(self):                                                         ##_nr_: 102 _
>>>>>>         return '<AttrDict ' + dict.__repr__(self) + '>'                         ##_nr_: 103 _
>>>>>>     def __getstate__(self):                                                     ##_nr_: 104 _
>>>>>>         return dict(self)                                                       ##_nr_: 105 _
>>>>>>     def __setstate__(self,value):                                               ##_nr_: 106 _
>>>>>>         for k,v in value.items(): self[k]=v                                     ##_nr_: 107 _

This snippet shows us that erasing "class AttrDict(dict)" from the source is not a good idea: "try: return self[key]" in line 95 is called 193 times. Scrolling down shows the function "file_in(fname, directory)" is the first with no calls:

Part of my_traceoutput, a uncalled function
>>>>>> def file_in(fname, directory):                                                  ##_nr_: 246 _
           """Return True if file fname resides inside directory."""                   ##_nr_: 247 _
>>>>>>     assert os.path.isfile(fname)                                                ##_nr_: 248 _
           # Empty directory (not to be confused with None) is the current directory.  ##_nr_: 249 _
>>>>>>     if directory == '':                                                         ##_nr_: 250 _
>>>>>>         directory = os.getcwd()                                                 ##_nr_: 251 _
           else:                                                                       ##_nr_: 252 _
>>>>>>         assert os.path.isdir(directory)                                         ##_nr_: 253 _
>>>>>>         directory = os.path.realpath(directory)                                 ##_nr_: 254 _
>>>>>>     fname = os.path.realpath(fname)                                             ##_nr_: 255 _
>>>>>>     return os.path.commonprefix((directory, fname)) == directory                ##_nr_: 256 _

2.8.3. Unused stuff: Shortening asciidoc2_new.py

So I erase these lines 246 to 256 and many other, see the following abridged listing. Some information is found afterwards.

Classes, def() … temporarly erased
# temporarly erasing 'def file_in ... ' 246 ... 256
ad2new, counter = re.subn(r'def file_in.*?256 _', r'# temporarly [...] , ad2new, flags=re.S)
assert counter == 1

# temporarly erasing 'def file_in ... ' 246 ... 256
ad2new, counter = re.subn(r'def file_in.*?256 _', r'# temporarly [...] , ad2new, flags=re.S)
assert counter == 1

# temporarly erasing 'def subs_tag ... ' 620 ... 635
ad2new, counter = re.subn(r'def subs_tag.*?635 _', r'# temporarly [...] , ad2new, flags=re.S)
assert counter == 1

# temporarly erasing 'def dump_section ... ' 698 ... 718
ad2new, counter = re.subn(r'def dump_section.*?718 _', r'# [...] , ad2new, flags=re.S)
assert counter == 1

# temporarly erasing 'def filter_lines ... ' 748 ... 835
ad2new, counter = re.subn(r'def filter_lines.*?835 _', r'#  [...] , ad2new, flags=re.S)
assert counter == 1

# temporarly erasing 'def parse_author ... ' 1672 ... 1706
ad2new, counter = re.subn(r'def parse_author.*?1706 _', r'#  [...] , ad2new, flags=re.S)
assert counter == 1

# temporarly erasing 'class Header ... ' 1748 ... 1825
ad2new, counter = re.subn(r'class Header.*?1825 _', r'#  [...] , ad2new, flags=re.S)
assert counter == 1

# temporarly erasing 'class FloatingTitle ... ' 2197 ... 2213
# class FloatingTitle has to be defined -> pass
ad2new, counter = re.subn(r'class FloatingTitle.*?2213 _', \
                          r'class FloatingTitle(Title): pass #  [...] , ad2new, flags=re.S)
assert counter == 1

# temporarly erasing 'def get_param plus def get_subs plus def dump' 2443 ... 2488
ad2new, counter = re.subn(r'def get_param.*?2488 _', \
                          r'# temporarly erasing: def get_param plus  [...] , ad2new, flags=re.S)
assert counter == 1

# temporarly erasing def translate_entry plus ... plus def translate in class List 2802 ... 2985
ad2new, counter = re.subn(r'def translate_entry.*?2985 _', \
                          r'# temporarly erasing def translate_entry plus  [...] , ad2new, flags=re.S)
assert counter == 1

# temporarly erasing class Column plus class Cell 3108 ... 3136
ad2new, counter = re.subn(r'class Column.*?3136 _', \
                          r'# temporarly erasing class Column plus  [...] , ad2new, flags=re.S)
assert counter == 1

# temporarly erasing def validate_attributes plus ... plus def translate(self) in class Table 3214 ... 3660
ad2new, counter = re.subn(r'def validate_attributes.*?3660 _', \
                          r'# temporarly erasing def validate_attributes  [...] , ad2new, flags=re.S)
assert counter == 1

# temporarly erasing class CalloutMap ...  4035 ... 4072
# class CalloutMap has to be defined -> pass
ad2new, counter = re.subn(r'class CalloutMap.*?4072 _', \
                          r'class CalloutMap: pass # temporarly  [...] , ad2new, flags=re.S)
assert counter == 1

# temporarly erasing def dump 4899 ... 4947
ad2new, counter = re.subn(r'def dump\(self\):\s+##_nr_: 4899 _.*?4947 _', \
                          r'# temporarly erasing def dump  [...] , ad2new, flags=re.S)
assert counter == 1

# temporarly erasing filter and theme plugin commands ... 5662 ... 5857
ad2new, counter = re.subn(r'# filter and theme plugin commands\..*?5857 _', \
                          r'# temporarly erasing filter and  [...] , ad2new, flags=re.S)
assert counter == 1

# temporarly erasing def usage plus def show_help ... 6050 ... 6088
ad2new, counter = re.subn(r'def usage\(msg.*?6088 _', \
                          r'# temporarly erasing def usage  [...] , ad2new, flags=re.S)
assert counter == 1

Class "FloatingTitle" is referenced and so it is not cropped totally, only a "pass" remains; the same for class "CalloutMap". Note that I do not cut every unused line, it is only for shortening the number of source code lines. By the way: "east_asian_widths" (nr1235 ff.) is part of the refactoring in the next chapter, so it remains.

After these steps we comment out the "adding-my_trace-part" as seen in the listing above "Adding my_trace", and let editad2.py run again. Now I have the abrigded asciidoc2.py named asccidoc2_new.py, that has "only" 4510 lines of code compared with the original 6265, that’s appr. 28% less. And - that is of course obligatory - asccidoc2_new.py produces the identical "outputone.html" as before.

At the end there is some pretty printing to avoid a single very long line:

Split a long line
# lineno 'pretty printing': ##_nr_: 853 _ in two lines
ad2new, counter = re.subn(r"if name not in \('e.*?853 _", \
                          r"""if name not in ('eval', 'eval3', 'sys', 'sys2', 'sys3', 'include', 'include1', \\
                    'counter', 'counter2', 'set', 'set2', 'template'):     ##_nr_: 853 _""", ad2new, flags=re.S)
assert counter == 1

2.8.4. 2to3 on asciidoc2.py again

Now we can do 2to3 again …

2to3 -v -w -f all -f buffer -f set_literal -f idioms -f ws_comma -n --add-suffix=23again ~/ad3/asciidoc2_new.py

I use all fixers plus "buffer" and "set_literal" though we have no buffer or sets in asciidoc2_new.py, and idioms and ws_comma to make it complete. The new "asciidoc2_new.py23again" is renamed to "asciidoc3_start.py" - let’s take a first look at it.

3. AsciiDoc3 starts to work

3.1. Introducing editad3.py

All further steps are made with the help of "editad3.py". This is the corresponding file to editad2.py managing the necessary refactoring of the new Python3 program. After running 2to3 and renaming I have "asciidoc3_start.py", hopefully a runnable Python3, but "some work" is still to do. The fist easy step is the rearrangement of the line numbers in one column:

Line numbers to asciidoc3.py
### input-file (asciidoc3_start.py) to edit, but this file is not altered
### pretty printing lineno
with open(homedir + "/.asciidoc3/asciidoc3_start.py","r") as thestart:
    step0 = StringIO()
    for line in thestart:
        if re.search("##_nr_: \d{1,4} _", line):
            part = line.split("##_nr_")
            len_a = len(part[0].rstrip())
            line = "".join(part[0].rstrip() + (100 - len_a)*" " + "##_nr_" + part[1])
        step0.write(line)
    ad3new = step0.getvalue()
    del(step0) # no need of 'step0' any more

The output "step0", an in-memory-file - stringIO() - lives only for some milliseconds; the editing begins now. And again the first is some pretty printing, in this case four lines are shortened:

Format line numbers
# start editing ad3new (a 'long string')
# doing some more "pretty printing"
# erasing lineno that results too long lines
ad3new, counter = re.subn(r'##_nr_: 4333 _', r'', ad3new)
assert counter == 1
ad3new, counter = re.subn(r'##_nr_: 4352 _', r'', ad3new)
assert counter == 1
ad3new, counter = re.subn(r'##_nr_: 4683 _', r'', ad3new)
assert counter == 1
ad3new, counter = re.subn(r'##_nr_: 6004 _', r'', ad3new)
assert counter == 1

3.2. Some easy prearrangements

Before beginning "the real" refactoring, I do some easy changings:
- shebang,
- copyright,
- VERSION,
- MIN_PYTHON_VERSION.

Some Editing at the beginning
# shebang python3 interpreter and _no_ lineno here to prevent the program from disturbing to find the interpreter
ad3new, counter = re.subn(r'env python \s*##_nr_: 1 _', r'env python3', ad3new)
assert counter == 1

# -*- coding: utf-8 -*- this line may be removed in the next version of 'asciidoc3.py' (utf-8 is the default)
ad3new, counter = re.subn(r'env python3', r'\g<0>\n# -*- coding: utf-8 -*-', ad3new)
assert counter == 1

# adding (c) 2to3
ad3new, counter = re.subn(r'asciidoc - converts an AsciiDoc.+?"""', \
                          r'''asciidoc3.py converts an AsciiDoc text file to HTML or DocBook or
Manpage using Python3.x - asciidoc3.py is also used by a2x3.py
Copyright (C) 2002-2010 Stuart Rackham
Copyright (C) 2018 by Berthold Gehrke <berthold.gehrke@gmail.com> for Python3 version and 2to3 work.
Free use of this software is granted under the terms of the
GNU General Public License Version 3 or higher (GNU GPLv3).                                         ##_nr_: 6 _
"""''', ad3new, flags=re.S)
assert counter == 1

ad3new, counter = re.subn(r"MIN_PYTHON_VERSION = '2\.4'", r"MIN_PYTHON_VERSION = '3.0'", ad3new)
assert counter == 1

3.3. Be careful with that "/", Eugene

As mentioned before, there is a pitfall in porting the "/"-Division from Python2 to Python3 when integers are given: here floor division (integer division), there floating point division. 2to3 makes often the wrong decision (more precisely: 2to3 hasn’t sufficient information to make the right), and so it makes here. I correct that line. The lines 3239, 3322, 3339, and 3340 show the same behavior, but they are temporarly erased as an unused part of "class table".

Integer divison in asciidoc3.py
# We have to be careful with porting the "/"-Divison in Python2 --> "//" in Python3
ad3new, counter = re.subn(r's = ul\[:2\]\*\(\(ul\_len\+1\)\/2\)\s+##_nr_: 2082 _', \
                          r's = ul[:2]*((ul_len+1)//2)           ##_nr_: 2082 _', ad3new)
assert counter == 1

3.4. Brute force, nice try, fails

The moment has arrived to start asciidoc3_new.py. Will it run without problems and AsciiDoc is successfully ported? Press F5 and see:

First run of asciidoc3.py
asciidoc3_new: FAILED: unexpected error:
asciidoc3_new: ------------------------------------------------------------
Traceback (most recent call last):
  File "~/ad3/asciidoc3_new.py", line 4240, in asciidoc
    if not config.load_from_dirs('asciidoc.conf', include=['attributes']): ##_nr_: 5924 _
  File "~/ad3/asciidoc3_new.py", line 3696, in load_from_dirs
    if self.load_file(f, include=include):                                 ##_nr_: 4736 _
  File "~/ad3/asciidoc3_new.py", line 3569, in load_file
    rdr.open(fname)                                                        ##_nr_: 4609 _
  File "~/ad3/asciidoc3_new.py", line 3076, in open
    if Reader1.read(self):                                                 ##_nr_: 4116 _
  File "~/ad3/asciidoc3_new.py", line 3114, in read
    mo = macros.match('+', r'^include[1]?$', result)                       ##_nr_: 4154 _
  File "~/ad3/asciidoc3_new.py", line 2811, in match
    mo = m.reo.match(text)                                                 ##_nr_: 3814 _
TypeError: cannot use a string pattern on a bytes-like object

Déjà vu. The same as above, when I did 2to3 on the original unabriged asciidoc2.py. Shit happens :-), but there is a solution. In a first - as is seen later - wrong attempt I try the brute force method and give every reported error just what he wants. Let’s add some "print"-function at the right place - as you know, very often the easiest way to debug:

Adding some "print"-statements
ad3new, counter = re.subn(r'    def match\(self, prefix, name, text\):.*?3820 _', \
                          r"""
    def match(self, prefix, name, text):                                                      ##_nr_: 3809 _
        "Return re match object matching 'text' with macro type 'prefix', macro name 'name'." ##_nr_: 3810 _
                                                                                              ##_nr_: 3811 _
        for m in self.macros:                                                                 ##_nr_: 3812 _
            if m.prefix == prefix:                                                            ##_nr_: 3813 _
                print("m.prefix: ", m.prefix)
                print("prefix: ", prefix)
                print("type prefix: ", type(prefix))
                print("m.reo: ", m.reo)
                print("text: ", text)
                print("type text: ", type(text))
                mo = m.reo.match(text)                                                        ##_nr_: 3814 _
                if mo:                                                                        ##_nr_: 3815 _
                    if m.name == name:                                                        ##_nr_: 3816 _
                        return mo                                                             ##_nr_: 3817 _
                    if re.match(name, mo.group('name')):                                      ##_nr_: 3818 _
                        return mo                                                             ##_nr_: 3819 _
        return None                                                                           ##_nr_: 3820 _
""", ad3new, flags=re.S)
assert counter == 1

The output:

[...]
m.prefix:  +
prefix:  +
type prefix:  <class 'str'>
m.reo:  re.compile('(?u)^(?P<name>[\\\\]?\\w(\\w|-)*?)::(?P<target>\\S*?)(\\[(?P<attrlist>.*?)\\])$')
text:  b'#'
type text:  <class 'bytes'>
asciidoc3_new: FAILED: unexpected error:
asciidoc3_new: ------------------------------------------------------------
Traceback (most recent call last):
  File "~/ad3/asciidoc3_new.py", line 4248, in asciidoc
    if not config.load_from_dirs('asciidoc.conf', include=['attributes']):              ##_nr_: 5924 _
[...]

We recognize the issue: "prefix" is of type "string", but "text" of type "bytes". So we add …

ad3new, counter = re.subn(r'print\("type text: ", type\(text\)\)', \
                          r"\g<0>\n                text = str(text)", ad3new)
assert counter == 1

... to have a "text = str(text)" right before the critical line nr3814.
Yes it works! This exception is eliminated. But the next is thrown:

Another error pops up
[...]
asciidoc3_new: FAILED: unexpected error:
asciidoc3_new: ------------------------------------------------------------
Traceback (most recent call last):
  File "~/ad3/asciidoc3_new.py", line 4249, in asciidoc
    if not config.load_from_dirs('asciidoc.conf', include=['attributes']):       ##_nr_: 5924 _
  File "~/ad3/asciidoc3_new.py", line 3705, in load_from_dirs
    if self.load_file(f, include=include):                                       ##_nr_: 4736 _
  File "~/ad3/asciidoc3_new.py", line 3578, in load_file
    rdr.open(fname)                                                              ##_nr_: 4609 _
  File "~/ad3/asciidoc3_new.py", line 3086, in open
    if self.cursor[2].startswith(UTF8_BOM):                                      ##_nr_: 4117 _
TypeError: startswith first arg must be bytes or a tuple of bytes, not str
>>>

Allright, we place a print-function in the line before nr4117 …

ad3new, counter = re.subn(r'##_nr_: 4116 _', \
                          r'\g<0>\n            print("type UTF8_BOM: ", type(UTF8_BOM))', ad3new)
assert counter == 1

... the output identifies "UTF8_BOM" as a string, but bytes are expected.

[...]
text:  b'#'
type text:  <class 'bytes'>
type UTF8_BOM:  <class 'str'>
asciidoc3_new: FAILED: unexpected error:
asciidoc3_new: ------------------------------------------------------------
Traceback (most recent call last):
  File "~/ad3/asciidoc3_new.py", line 4250, in asciidoc
[...]

So we change it (don’t forget the "encoding"! - despite b'\xef\xbb\xbf' seems to have the same result):

ad3new, counter = re.subn(r"UTF8_BOM = '\\xef\\xbb\\xbf'\s+##_nr_: 4078 _", \
                          r"UTF8_BOM = bytes('\\xef\\xbb\\xbf', encoding = 'utf8'), ad3new)
assert counter == 1

Things are getting better(?):

[...] 1
text:  b'#'
type text:  <class 'bytes'>
asciidoc3_new: FAILED: unexpected error:
asciidoc3_new: ------------------------------------------------------------
Traceback (most recent call last):
  File "~/ad3/asciidoc3_new.py", line 4250, in asciidoc
    if not config.load_from_dirs('asciidoc.conf', include=['attributes']):      ##_nr_: 5924 _
  File "~/ad3/asciidoc3_new.py", line 3706, in load_from_dirs
    if self.load_file(f, include=include):                                      ##_nr_: 4736 _
  File "~/ad3/asciidoc3_new.py", line 3592, in load_file
    found = reo.findall(s)                                                      ##_nr_: 4622 _
TypeError: cannot use a string pattern on a bytes-like object
1 Here we find about 50 lines regarding the "prefix" from above, you can ignore it or comment out the appropriate print-statement.

We go on:

ad3new, counter = re.subn(r'##_nr_: 4621 _', \
                          r'\g<0>\n            s = str(s)                  ##_nr_: 4621 _ ', ad3new)
assert counter == 1

It works …

[...] 1
m.reo:  re.compile('(?u)^(?P<name>[\\\\]?\\w(\\w|-)*?)::(?P<target>\\S*?)(\\[(?P<attrlist>.*?)\\])$')
text:  b'#--------------------------------------------------------------------'
type text:  <class 'bytes'>
asciidoc3_new: ERROR: [attributes] missing 'attributelist-pattern' entry
asciidoc3_new: FAILED: unexpected error:
asciidoc3_new: ------------------------------------------------------------
Traceback (most recent call last):
  File "~/ad3/asciidoc3_new.py", line 4267, in asciidoc
    AttributeList.initialize()                                                     ##_nr_: 5940 _
  File "~/ad3/asciidoc3_new.py", line 1638, in initialize
    AttributeList.pattern = document.attributes['attributelist-pattern']           ##_nr_: 1923 _
  File "~/ad3/asciidoc3_new.py", line 143, in __getitem__
    return dict.__getitem__(self, key.lower())                                     ##_nr_: 115 _
KeyError: 'attributelist-pattern'
1 If you haven’t done it before, you see a huge amount of logging output. Perhaps it’s a good idea to comment out the lines "print prefix …" nr: 3813 etc.
We have something new. What is that meaning? Looking in the "asciidoc.conf" I see:
[...]
attributelist-pattern=(?u)(^\[\[(?P<id>[\w_:][\w_:.-]*)(,(?P<reftext>.*?))?\]\]$)|(^\[(?P<attrlist>.*)\]$)
[...]

"attributelist-pattern" is written there, but asciidoc3_new.py doesn’t find it (?) or can’t read it (?) and throws a KeyError. It’s not easy to fix this "bug" with the possibilities of IDLE and it’s DEBUG-modus. Something like PyDev pydev.org or LiClipse liclipse.com works more convenient in cases like this. To localize the relevant line(s) of code we have to check the different calls of function "document.update_attributes()". Why that? Compared with most other software-projects we have a big plus: asciidoc2.py is running well with the same input! So I found out, the "bug" is somewhere between "document.update_attributes()" in nr5919 and "document.update_attributes()" in nr5934. Both are called in "def asciidoc(…)".
We know that, because after nr5919 the documents.attributes are exactly the same as at the corresponding time in asciidoc2.py (believe me; if you want to verify by yourself, add the appropriate "print()"). But after nr5934 there are no changes in "document.attributes()" in asciidoc3_new.py but in asciidoc2.py. So let’s add a print-function at nr4252 in asccidoc3_new.py:

Adding some mor "print"-statements
[...]
assert cursor                                            ##_nr_: 4252 _
    self.nxt.insert(0, cursor)
    for listitem in self.nxt:
        if str(listitem[2]).find('attributelist-pattern') != -1:
            print("self.nxt: ", self.nxt)
[...]

That is done in my "editad3.py" by

ad3new, counter = re.subn(r'##_nr_: 4252 _', \
                          r"""\g<0>
        for listitem in self.nxt:
            if str(listitem[2]).find('attributelist-pattern') != -1:
                print("self.nxt: ", self.nxt)""", ad3new)
assert counter == 1

And here follows the (pretty formatted) output at the beginning, ignoring the remaining other:

[...]
self.nxt:
[['~/ad3/asciidoc.conf', 40, b'#quirks='],
 ['~/ad3/asciidoc.conf', 41, b'# HTML source code highlighter (source-highlight, pygments or highlight).'],
 ['~/ad3/asciidoc.conf', 42, b'source-highlighter=source-highlight'],
 ['~/ad3/asciidoc.conf', 43, b'# Uncomment to use deprecated quote attributes.'],
 ['~/ad3/asciidoc.conf', 44, b'#deprecated-quotes='],
 ['~/ad3/asciidoc.conf', 45, b'empty='],
 ['~/ad3/asciidoc.conf', 46, b'sp=" "'],
 ['~/ad3/asciidoc.conf', 47, b'# Attribute and AttributeList element patterns.'],
 ['~/ad3/asciidoc.conf', 48, b'attributeentry-pattern=^:(?P<attrname>\\w[^.]*?)(\\.(?P<attrname2>.*?))?:(\\s+(?P<attrvalue>.*))?$'],
 ['~/ad3/asciidoc.conf', 49, b'attributelist-pattern=(?u)(^\\[\\[(?P<id>[\\w_:][\\w_:.-]*)(,(?P<reftext>.*?))?\\]\\]$)|(^\\[(?P<attrlist>.*)\\]$)']]
[...]

There it comes in the last cited line: "…, 49, b’attributelist-pattern= …" (you may have another number than 49, that depends on your asciidoc.conf):
Yep. asciidoc.conf – and especially "attributelist-pattern" - is read in byte-modus. And because of attribute "attributelist-pattern" is the first to be known (you remember, "AttributeList.pattern = document.attributes['attributelist-pattern']") in nr1923 a KeyError is thrown. Of course "attributelist-pattern" is not the same as b’attributelist-pattern'.

Bytes, strings and Unicode
The biggest problem you may encounter relates to one of the most important changes in Python 3; strings are now always Unicode. This will simplify any application that needs to use Unicode, which is almost any application that is to be used outside of English-speaking countries.

— http://python3porting.com/problems.html

So let’s stop the described disappointing "brute-force-method" and this bytes-vs-string mess: We have to edit asciidoc3_new.py from scratch using Python3’s approach to read/open/write/close data streams and files. Strings are encoded in 'utf8' except forced in another way, source- or config-files are presumed to be accurately opened using 'utf8', too … and so on.

I comment out all editad3.py-lines since chapter "Brute force, nice try" (see file editad3.py). Running again with F5 you’ll have this output again:

Revisited: First run of asciidoc3.py
asciidoc3_new: FAILED: unexpected error:
asciidoc3_new: ------------------------------------------------------------
Traceback (most recent call last):
[...]
  File "~/ad3/asciidoc3_new.py", line 2811, in match
    mo = m.reo.match(text)                                                  ##_nr_: 3814 _
TypeError: cannot use a string pattern on a bytes-like object
>>>

At this point we have the following files:

.
├── asciidoc2_new.py         1
├── asciidoc2_new.py23again  2
├── asciidoc2.py             3
├── asciidoc3_new.py         4
├── asciidoc3_start.py       5
├── asciidoc.conf
├── asciidoc.py              6
├── inputfiles
│   ├── helloworld.txt
│   ├── inputone.txt
│   └── redsquare.jpg
├── javascripts
│   └── asciidoc.js
├── lang-en.conf
├── mytools
│   ├── addlineno.py
│   ├── editad2.py
│   └── editad3.py
├── my_traceoutput           7
│   └── asciidoc2_new.cover
├── outputfiles
│   └── helloworld.html
│   └── outputone.html
├── stylesheets
│   └── asciidoc.css
└── xhtml11.conf
1 asciidoc2.py (lineno) = 3 edited by editad2.py
2 file 1 scripted with 2to3
3 the original asciidoc.py with lineno
4 asciidoc3_start.py edited by editad3.py
5 binary identical to 2
6 the original asciidoc.py from asciidoc.org
7 output of my_trace

File "asciidoc2_new.py23again" and directory "my_traceoutput" aren’t needed anymore.

3.5. Summary

What have we done until now and what do we have right here?
We added lineno to the original asciidoc.py, then shortened it with the help of editad2.py. This asciidoc2_new.py is still processing correctly inputone.txt to outputone.html. 2to3 converts asciidoc2_new.py to asciidoc3_start.py. This is edited by editad3.py to asciidoc3_new.py. We did some easy editorial refactoring, but the mess regarding str vs. bytes couldn’t be solved by the "brute-force" approach as you see above.
So we are requested to try another plan in the next chapter.

3.6. Second attempt: open(file) with encoding='utf-8' (not "b")

As you remember, after giving up the brute-force try, running asciidoc3_new.py gives the exception:

[...]
  File "~/ad3/asciidoc3_new.py", line 2811, in match
    mo = m.reo.match(text)                                              ##_nr_: 3814 _
TypeError: cannot use a string pattern on a bytes-like object
>>>

And now begins the "real" migrating work. We have to check and consider all open(file) and write(file) operations.
It seems to be a good idea to change the file-reading and file-writing mode to the default Python3-way. That means: I edit in nr4108 'rb' to 'r'; and in nr4432 'wb+' to 'w'. The "errors = 'strict'" is not necessary (default) but we make use of it later on. And right here it underlines, that we have a "new" Python3 open().

Opening files in Python3
# Python3 open file, default
ad3new, counter = re.subn(r"self\.f = open\(fname, 'rb'\)\s+##_nr_: 4108 _", \
                          r"self.f = open(fname, 'r', encoding = 'utf-8', errors = 'strict')     ##_nr_: 4108 _", ad3new)
assert counter == 1

ad3new, counter = re.subn(r"self\.f = open\(fname, 'wb\+'\)\s+##_nr_: 4432 _", \
                          r"self.f = open(fname, 'w', encoding = 'utf-8', errors = 'strict')     ##_nr_: 4432 _", ad3new)
assert counter == 1

When we do, the following message is written:

asciidoc3_new: FAILED: 'utf-8' codec can't decode

To make it more troublesome :-) there is no line number given. But it’s easy to identify as the "def char_decode(s)" nr1255 et sqq., because there is the only place with that notification. I know, "return s.decode(char_encoding())" raises the exception. We are trapped in the "Python2 vs. Python3 encoding/decoding str/bytes mess" again. To bring sone light in this darkness, I "scrutinize" (or is it spelled "scrutinise"?) the encoding/decoding functions. Here they are:

Encoding/Decoding functions in asciidoc3.py before refactoring
[...]
def char_encoding():                                                                  ##_nr_: 1223 _
    encoding = document.attributes.get('encoding')                                    ##_nr_: 1224 _
    if encoding:                                                                      ##_nr_: 1225 _
        try:                                                                          ##_nr_: 1226 _
            codecs.lookup(encoding)                                                   ##_nr_: 1227 _
        except LookupError as e:                                                      ##_nr_: 1228 _
            raise EAsciiDoc(str(e))                                                   ##_nr_: 1229 _
    return encoding                                                                   ##_nr_: 1230 _
                                                                                      ##_nr_: 1231 _
def char_len(s):                                                                      ##_nr_: 1232 _
    return len(char_decode(s))                                                        ##_nr_: 1233 _
                                                                                      ##_nr_: 1234 _
east_asian_widths = {'W': 2,   # Wide                                                 ##_nr_: 1235 _
                     'F': 2,   # Full-width (wide)                                    ##_nr_: 1236 _
                     'Na': 1,  # Narrow                                               ##_nr_: 1237 _
                     'H': 1,   # Half-width (narrow)                                  ##_nr_: 1238 _
                     'N': 1,   # Neutral (not East Asian, treated as narrow)          ##_nr_: 1239 _
                     'A': 1}   # Ambiguous (s/b wide in East Asian context,           ##_nr_: 1240 _
                               # narrow otherwise, but that doesn't work)             ##_nr_: 1241 _
"""Mapping of result codes from `unicodedata.east_asian_width()` to character         ##_nr_: 1242 _
column widths."""                                                                     ##_nr_: 1243 _
                                                                                      ##_nr_: 1244 _
def column_width(s):                                                                  ##_nr_: 1245 _
    text = char_decode(s)                                                             ##_nr_: 1246 _
    if isinstance(text, str):                                                         ##_nr_: 1247 _
        width = 0                                                                     ##_nr_: 1248 _
        for c in text:                                                                ##_nr_: 1249 _
            width += east_asian_widths[unicodedata.east_asian_width(c)]               ##_nr_: 1250 _
        return width                                                                  ##_nr_: 1251 _
    else:                                                                             ##_nr_: 1252 _
        return len(text)                                                              ##_nr_: 1253 _
                                                                                      ##_nr_: 1254 _
def char_decode(s):                                                                   ##_nr_: 1255 _
    if char_encoding():                                                               ##_nr_: 1256 _
        try:                                                                          ##_nr_: 1257 _
            return s.decode(char_encoding())                                          ##_nr_: 1258 _
        except Exception:                                                             ##_nr_: 1259 _
            raise EAsciiDoc("'%s' codec can't decode \"%s\"" % (char_encoding(), s))  ##_nr_: 1261 _
    else:                                                                             ##_nr_: 1262 _
        return s                                                                      ##_nr_: 1263 _
                                                                                      ##_nr_: 1264 _
def char_encode(s):                                                                   ##_nr_: 1265 _
    if char_encoding():                                                               ##_nr_: 1266 _
        return s.encode(char_encoding())                                              ##_nr_: 1267 _
    else:                                                                             ##_nr_: 1268 _
        return s                                                                      ##_nr_: 1269 _
[...]

3.6.1. def char_encoding()

Let’s start with "def char_encoding()": First the function looks in the dictionary "document.attributes" with the key "encoding". As a default this value is given in asciidoc.conf section [attributes]: "UTF-8". So encoding is set to (or given as) "UTF-8". In a second step "def char_encoding()" tries to find "encoding" in the list of Python3 codecs (see here: ascii, cp1252, iso8859_7, utf_7, utf_8 and many more …). In our case "UTF-8" is no problem and at last "UTF-8" is given as the result of the function. If the codec is not found, e.g. when misspelled as in "UTD-8", a LookUpError is thrown.
When no attribute "encoding" is found in any conf-file or it is set to 'None' or ' ', the "if encoding" yields "False" and "def char_encoding()" results to 'None' or ' ' respectively.

3.6.2. def char_decode(s) / def char_encode(s)

I skip "def char_len(s)", "east_asian_widths", and "def column_width(s)" - they aren’t neccessary to solve the recent "codec can’t decode"-Error; but I come back later to this point.
What about "def char_decode(s)" and "def char_encode(s)"? These two definitions are looking very similar! The differences: the first has an try/except statement and "return s.decode", the second "return s.encode"; both are using "char_encoding" (as seen above, "normally" defaults to "UTF-8") and share the same program logic.

First, a closer look at "def char_decode(s)": What is "s"? To make it more visible, I add two print(), …

char_decode(s) before refactoring
def char_decode(s):
    print("s: ", s, type(s)) ##_nr_: 1255 _
    if s == '': print("s == ''")
    if char_encoding():
[...]

... the answer is: "s: <class 'str'>, s == ' '"
To understand the encoding/decoding-Procedures in this part of code, I do the same with asciidoc2_new.py, the output shows for example: "s:' ', 'First Paragraph', <type 'str'>" First, very surprising(?), asciidoc2_new.py has at this point s as a string?! "def char_decode(s)" is called in line nr1725: initials = (char_decode(firstname)[:1] … and firstname = attrs.get('firstname', ' ') nr1710. This takes place within "def process_author_names()", but we haven’t given in "inputone.txt" any "author_name" or firstname respectively at all.
So, since the default in attrs.get - the second arguement - is ' ', so is "s". And in asciidoc2.py, a Python2 program, s is a Python2-string: Normal strings in Python2 are stored internally as 8-bit ASCII. And in "def char_decode(s)" this Python2-string is to be decoded to an UTF-8-string = Unicode-string.
But this doesn’t work in asciidoc3_new.py: Pythion3-strings do not have a method called "decode()" - strings are UTF-8-strings as a default. That is the reason of the "codec can’t decode" here! In other words: asciidoc3_new.py needs no "decoding" at this place, a Python3-string is always already "decoded". The "encoding" is already given in "open(fname, encoding='utf8')". The open()-function makes the "decoding"-job, if there is any.
Second, "def char_encode(s)". That’s easy to see after we have "solved" def char_decode(s): Python3 turns a string to bytes with the encoding given, in most cases utf8.
After this we can shorten "def char_decode(s)" - not only we don’t need any "decoding", but also there isn’t any:

char_decode(s) after refactoring
def char_decode(s):
    return s                                     ##_nr_: 1255 _

and because every "s" results to "s" in asciidoc3.py, the "def char_decode(s)" is superfluous and may be omitted at all … Let’s do it.

erasing char_decode(s)
# shorten / erase def char_decode(s)
ad3new, counter = re.subn(r'def char_decode\(s\):.*?1263 _', \
                          r"""
# def char_decode(s): return s # no longer needed in asciidoc3!
""", ad3new, flags=re.S)
assert counter == 1

The second step is to eliminate every "char_decode()" call:

Erasing the 4 char_decode()-calls
# eliminate char_decode(), i
ad3new, counter = re.subn(r'return len\(char_decode\(s\)\)', r'return len(s)             ', ad3new)
assert counter == 1

# eliminate char_decode(), ii
ad3new, counter = re.subn(r'text = char_decode\(s\)', r'text = s             ', ad3new)
assert counter == 1

# eliminate char_decode(), iii
ad3new, counter = re.subn(r"initials = \(char.*?##_nr_: 1726 _", \
                          r'initials = (firstname[:1] + middlename[:1] + lastname[:1])', ad3new, flags=re.S)
assert counter == 1

# eliminate char_decode(), iv
ad3new, counter = re.subn(r"char_decode\(title\)\)\.strip\('_'\)\.lower\(\)                    ##_nr_: 2244 _", \
                          r"title).strip('_').lower()                                 ##_nr_: 2244 _ ", ad3new)
assert counter == 1

After doing "eliminate char_decode(), i" we see:

def char_len(s):                                    ##_nr_: 1232 _
    return len(s)                                   ##_nr_: 1233 _

Yeah, the same game again: def char_len(s) is exactly the same as len(s). We can eliminate this, and the calls, too.

# shorten / erase def char_len(s)
ad3new, counter = re.subn(r'def char_len\(s\):.*?1233 _', \
                          r"""
# def char_len(s): return len(s) # no longer needed in asciidoc3!
""", ad3new, flags=re.S)
assert counter == 1

# eliminate char_len(), i
ad3new, counter = re.subn(r'char_len\(ul\)\s+##_nr_: 2072 _', \
                          r'len(ul)                                                         ##_nr_: 2072 _', ad3new)
assert counter == 1

# eliminate char_len(), ii
ad3new, counter = re.subn(r'char_len\(title\) < ul_len\+3\)\):\s+##_nr_: 2079 _', \
                          r'len(title) < ul_len+3)):                                        ##_nr_: 2079 _', ad3new)
assert counter == 1

And, believe it or not: as a third comes this: We see

def column_width(s):                                                                        ##_nr_: 1245 _
    text = s
[...]

"text" is the same as "s", so we can do a refactoring and eliminate "text":

# refactoring of def column_width(s)
ad3new, counter = re.subn(r'def column_width.*?1253 _', \
                          r"""def column_width(s):
    if isinstance(s, str):                                                                   ##_nr_: 1247 _
        width = 0                                                                            ##_nr_: 1248 _
        for c in s:                                                                          ##_nr_: 1249 _
            width += east_asian_widths[unicodedata.east_asian_width(c)]                      ##_nr_: 1250 _
        return width                                                                         ##_nr_: 1251 _
    else:                                                                                    ##_nr_: 1252 _
        return len(s)                                                                        ##_nr_: 1253 _""", ad3new, flags=re.S)
assert counter == 1

Line nr1247 brings us to the next question: Is it possible, that "s" is not a string? Only one single line (nr2071) calls "column_width()":

title_len = column_width(title)                           ##_nr_: 2071 _

and "title" is obviously a string. So I do a second refactoring on def column_width(s), the "assert" quietens any left doubts:

Refactoring of def column_width(s)
# second refactoring of def column_width(s)
ad3new, counter = re.subn(r'def column_width.*?1254 _', \
                          r"""def column_width(s):
    assert type(s) == type('string')  # this line is to be deleted later
    width = 0                                                                                ##_nr_: 1248 _
    for c in s:                                                                              ##_nr_: 1249 _
        width += east_asian_widths[unicodedata.east_asian_width(c)]                          ##_nr_: 1250 _
    return width                                                                             ##_nr_: 1251 _""", ad3new, flags=re.S)
assert counter == 1

We go to the next question: What is "def char_encode(s)" (nr1265) good for?

def char_encode(s):                                                                          ##_nr_: 1265 _
    if char_encoding():                                                                      ##_nr_: 1266 _
        return s.encode(char_encoding())                                                     ##_nr_: 1267 _
    else:                                                                                    ##_nr_: 1268 _
        return s                                                                             ##_nr_: 1269 _

Let’s find out, who is calling char_encode(s)? There a three source lines: nr1281, nr1727, and nr2249. To start with nr1727:

initials = char_encode(initials).upper()                                                     ##_nr_: 1727 _

"initials" is introduced in line nr1714, but here lines nr1724/5 are counting …

if not initials:                                                                             ##_nr_: 1724 _
    initials = (firstname[:1] + middlename[:1] + lastname[:1])

... and so "initials" (in Python3) is a concatenation of three strings to one string, no encoding is necessary at all, too. I erase this "char_encode()", but leave a "assert" like above.

# eliminate char_encode(), i
ad3new, counter = re.subn(r'initials = char_encode\(.*?##_nr_: 1727 _', \
                          r"""assert type(initials) == type('string')  # this line is to be deleted later
            initials = initials.upper()                                                             ##_nr_: 1727 _""", ad3new)
assert counter == 1

The second call of "char_encode()" is found at nr2249:

base_id = char_encode(base_id)                                          ##_nr_: 2249 _

To make it short, we have the same procedure as above:

# eliminate char_encode(), ii
ad3new, counter = re.subn(r'base_id = char_encode\(.*?##_nr_: 2249 _', \
                          r"assert type(base_id) == type('string')  # this line is to be deleted later", ad3new)
assert counter == 1

Here is another "encode" to deal with in nr2248: Keep in mind, that "base_id" is a string, so we have:

# additional to char_encode(), ii
ad3new, counter = re.subn("base_id = unicodedata\.normalize.*?2248 _", \
      """base_id = unicodedata.normalize('NFKD', base_id).encode('ascii', 'ignore')
            base_id = str(base_id, encoding = 'ascii')""", ad3new, flags=re.S)
assert counter == 1

We’ll have to test this later, when we have a title with non-ascii letters and "ascii-ids" in document.attributes.

The third "char_encode()" comes in line nr1281:

result = char_encode(result.decode(locale.getdefaultlocale()[1]))                           ##_nr_: 1281 _

This looks a bit complicated. On my system "import locale; locale.getdefaultlocale()" evaluates to "('de_DE', 'UTF-8')" and so "locale.getdefaultlocale()[1]" = 'UTF-8'.
What is "result" about in this line? It is a string … So "result.decode(locale.getdefaultlocale()[1])" throws in asciidoc3_new always an error, since a string has no nethod "decode()"! The except-branch is called and "result" is not altered. I fix this:

# eliminate char_encode(), iii
ad3new, counter = re.subn(r'try:\s+##_nr_: 1280 _.*?##_nr_: 1284 _', \
                          r"""try:
        assert type(result) == type('string')  # this line is to be deleted later
        result = bytes(result, 'utf8') # assumes, that 'result' is a 'utf-8' string --> TODO
        result = result.decode(locale.getdefaultlocale()[1])
    except Exception:
        pass
    return result""", ad3new, flags=re.S)
assert counter == 1

The TODO means just that: I have to look at this point later again …
We do not need "def char_encode()" any more.

# shorten / erase def char_encode(s)
ad3new, counter = re.subn(r'##_nr_: 1264 _.*?1270 _', \
                          r"""
# def char_encode(s) ... # no longer needed in asciidoc3!
""", ad3new, flags=re.S)
assert counter == 1

Now we do some import optimization:

# import optimization, i
ad3new, counter = re.subn(r'import locale', r'from locale import getdefaultlocale', ad3new)
assert counter == 1

# import optimization, ii
ad3new, counter = re.subn(r'locale\.getdefaultlocale', r'getdefaultlocale', ad3new)
assert counter == 1

# import optimization, iii
ad3new, counter = re.subn(r'import unicodedata\s+##_nr_: 2247 _', r'#import unicodedata   # is already imported', ad3new)
assert counter == 1

# import optimization, iv
ad3new, counter = re.subn(r'import unicodedata', r'from unicodedata import east_asian_width, normalize', ad3new, 1)
assert counter == 1

# import optimization, v
ad3new, counter = re.subn(r'unicodedata\.east_asian_width\(c\)', r'east_asian_width(c)', ad3new)
assert counter == 1

# import optimization, vi
ad3new, counter = re.subn(r'unicodedata\.normalize\(', r'normalize(', ad3new)
assert counter == 1

3.7. Reaching the first Milestone!

After this encoding-work and the non-mandatory import-rearrangement I run asciidoc3_new.py again.

YES, it works! The output "outputone.html" is binary identical to the output produced by asciidoc2_new.py! We have reached the first milestone and ported asciidoc.py to Python3 - unfortunately only to process the "small" inputone.txt. But nevertheless this is good news!

3.8. Starting with the unabridged asciidoc3.py

As I stressed out, there is (a lot) work still to do. The next steps are as follows: 2to3-script an unabridged asciidoc2_new.py, rename it to asciidoc3_start.py, do some "pretty printing" and a first check on it. Hopefully, this "asciidoc3_new.py" does the inputone.txt, too.

First I run editad2.py after commenting out the "temporarly erasing" parts. Second is to convert this asciidoc2_new.py via 2to3.

2to3 -v -w -f all -f buffer -f set_literal -f idioms -f ws_comma -n --add-suffix=23again ~/ad3/asciidoc2_new.py

File "asciidoc2_new.py23again" is renamed to "asciidoc3_start.py", and now editad3.py (the same editad3.py as before, this part ends at #@#). And now let’s run "asciidoc3_new.py": It works! We see the identical "outputone.html" as before. asciidoc3_new.py has of course more code lines. First we do some pretty printing and import rearrangement.

Pretty printing on the unabridged asciidoc3.py
#@# from now on we have to edit the "full" asciidoc3_start.py
# doing some "pretty printing"
ad3new, counter = re.subn(r'##_nr_: 12 _', r'  \g<0>', ad3new)
assert counter == 1
ad3new, counter = re.subn(r' ##_nr_: 509 _', r'', ad3new)
assert counter == 1
ad3new, counter = re.subn(r'##_nr_: 1250 _', r'            \g<0>', ad3new)
assert counter == 1
ad3new, counter = re.subn(r'##_nr_: 3281 _', r'', ad3new)
assert counter == 1
ad3new, counter = re.subn(r'##_nr_: 3283 _', r'', ad3new)
assert counter == 1
ad3new, counter = re.subn(r'# Read ahead buffer containing ', r' # Read ahead buffer containing', ad3new)
assert counter == 1
ad3new, counter = re.subn(r'\s+##_nr_: 3280 _', r'', ad3new)
assert counter == 1
ad3new, counter = re.subn(r'\s+##_nr_: 3282 _', r'', ad3new)
assert counter == 1
ad3new, counter = re.subn(r'\s+##_nr_: 3284 _', r'', ad3new)
assert counter == 1
ad3new, counter = re.subn(r'\s+##_nr_: 4332 _', r'', ad3new)
assert counter == 1
ad3new, counter = re.subn(r'\s+##_nr_: 4334 _', r'', ad3new)
assert counter == 1
ad3new, counter = re.subn(r'\s+##_nr_: 4351 _', r'', ad3new)
assert counter == 1
ad3new, counter = re.subn(r'\s+##_nr_: 4353 _', r'', ad3new)
assert counter == 1
ad3new, counter = re.subn(r'\s+##_nr_: 4682 _', r'', ad3new)
assert counter == 1
ad3new, counter = re.subn(r'\s+##_nr_: 4684 _', r'', ad3new)
assert counter == 1
ad3new, counter = re.subn(r'\s+##_nr_: 6003 _', r'', ad3new)
assert counter == 1
ad3new, counter = re.subn(r'\s+##_nr_: 6005 _', r'', ad3new)
assert counter == 1

# import optimization, a1
ad3new, counter = re.subn(r'import codecs', r'from codecs import lookup', ad3new)
assert counter == 1
# import optimization, a2
ad3new, counter = re.subn(r'codecs\.lookup\(encoding\)', r'lookup(encoding)       ', ad3new)
assert counter == 1

# import optimization, b1
ad3new, counter = re.subn(r'import copy', r'from copy import copy as ccopy, deepcopy as dcopy', ad3new)
assert counter == 1
# import optimization, b2
ad3new, counter = re.subn(r'copy\.copy\(self\)', r'ccopy(self)    ', ad3new)
assert counter == 1

# import optimization, c1
ad3new, counter = re.subn(r'import tempfile', r'from tempfile import mkstemp', ad3new)
assert counter == 1
# import optimization, c2
ad3new, counter = re.subn(r'tempfile\.mkstemp\(\)', r'mkstemp()         ', ad3new)
assert counter == 1

3.8.1. Do not forget: careful with that "/" …

What about the temporarly dimmed "/"-Division in lines 3239, 3322, 3339, and 3340? Let’s see:
3239 → "abswidth = float(v[:-1])/100 * config.pagewidth". "float" ensures that no modification is necessary.
3322 → "width = float(100 - percents)/float(len(self.columns) - n)". Just the same.
3339 → "col.pcwidth = (float(col.width)/props)*100". And here, too.
3340 → "col.abswidth = self.abswidth * (col.pcwidth/100)" To adjust this:

Python3 integer divison, Part ii
# We have to be careful with porting the "/"-Divison from Python2 --> "//" in Python3, part ii
ad3new, counter = re.subn(r'\(col\.pcwidth/100\) ', r'(col.pcwidth//100)', ad3new)
assert counter == 1

(Be careful! Later during the migrating process I recognized that this integer division is to be let unchanged! Shit happens. See here.)

3.8.2. We need more information: verbose option

To see a little more information during the processing of the input, I add the "verbose"-option.

# set option 'verbose'
ad3new, counter = re.subn(r'\= \["\-o",', r'= ["-v", "-o",', ad3new, 1)
assert counter == 1

All right: asciidoc3_new.py processes "inputone.txt" as requested and expected; the output "ouputone.html" is ok.

4. AsciiDoc3: here comes the sticking point

asciidoc3_new.py is working, but we have until now only verified a very abriged amount of the features of AsciiDoc(3). To name it, essential features like tables, lists etc. are not yet mentioned in our input files, and neither are manpages, docbook as an output or the a2x-toolchain …
What is to do next?
And secondly - you have spotted this - I made assumptions that are to be discussed now. So, you remember, we had "self.f = open(fname, 'r', encoding = 'utf-8', errors = 'strict') in nr4108 and "self.f = open(fname, 'w', encoding = 'utf-8', errors = 'strict')" in nr4432. Or, in nr1280, we have "assert type(result) == type('string') # this line is to be deleted later … result = bytes(result, 'utf8') # assumes, that 'result' is a 'utf-8' string → TODO".
What happens, when the input is not utf-8 encoded or output other than of utf-8 is required?
Furthermore, the function "def char_encoding()" nr1223 has fallen into abeyance due to this "improvements": there is no more reference to "def char_encoding()" at all.

I encounter the following question, which is in my opinion the "sticking point" of all the porting labour:
AsciiDoc3 has to deal with different encoding/decoding of output/input files in consequence of Python’s refactored design of strings. The self-evident default is 'utf-8', which is strongly recommended for both input and output. But the user can (of course) choose both another "input-encoding" and "output-encoding"; in addition, these two may be different. And the handling of encoding errors is to be set: strict, ignore, replace? (By the way: the user should know what he does setting others than the default = strict; change this only for debugging reason) Another situation to take care of occurs when stdin is infile and/or stdout is outfile. There is the problem of respective API-compatibility regarding these points, too. And at last (?) we have to handle with BOM or not-BOM (that’s the question) …

4.1. New Class Ad3Codec

To manage this bundle of difficulties, I decided to invent one new class with two subclasses. They handle with questions of input and output encoding, respectively. The two classes are derived from one motherclass which is not instanced itself. The skeleton of this construction looks like this:

Draft for new classes in AsciiDoc3
class Ad3Codec():
    """New in AsciiDoc3. Class Ad3Codec is not used directly,
       but as a base class to Ad3In/Ad3Out."""

    def char_encoding(self):
        """the same(more or less) as in Asciidoc2"""
        [...]


class Ad3In(Ad3Codec):
    """Exactly one instance in AsciiDoc3.
       Handles codec of input-encoding ... """

    def [...]:
        pass


class Ad3Out(Ad3Codec):
    """Exactly one instance in AsciiDoc3.
       Handles codec of output-encoding ... """

    def [...]:
        pass

As I can say, the following lines of code costed me several hours of assembling. Probably the easiest way to introduce them and the ideas behind, I give the entire piece and then try to explain it. Here we go - of course the line numbers are not part of the source code (try "$ nl sourcecode.py") and neither the whitespace at the beginnig of every line.

The new AsciiDoc3-classes; source code
1       # def char_len(s): return len(s)            # no longer needed in AsciiDoc3
2       # def char_decode(s): return s              # no longer needed in AsciiDoc3
3       # def char_encode(s) ...                    # no longer needed in AsciiDoc3

4       class Ad3Codec():
5           """New in AsciiDoc3. Class Ad3Codec is not used directly, but provides
6              static methods and serves as a base class to Ad3In/Ad3Out.
7              Python3's muddle concerning encoding regarding the users input, stdin/out,
8              output, and command line options happens here."""

9           ad3codec_counter = 0

10          def __init__(self):
11              Ad3Codec.ad3codec_counter += 1
12              if Ad3Codec.ad3codec_counter > 2:
13                  raise EAsciiDoc('Class Ad3Codec(): too many instances')

14          def check_encoding(self, encoding_name):
15              """Checks if 'encoding_name' is a valid Python codec.
16              If not, an exception is thrown and the program exits.
17              Replaces 'char_encoding()' in AsciiDoc2."""
18              try:
19                  lookup(encoding_name.lower())
20                  return
21              except LookupError as e:
22                  raise EAsciiDoc(str(e))

23          def check_errors(self, error_name):
24              """Checks if 'error_name' is valid. If not, an exception is thrown
25                 and the program exits."""
26              if error_name.rstrip().lower() not in ('strict', 'ignore', 'replace'):
27                  raise EAsciiDoc(str(e))
28              return

29          def block_updating(self):
30              """blocks immediately any further altering of
31              _ie/_oe/_ierr/_oerr, respectively."""
32              raise NotImplementedError
33
34          @staticmethod
35          def update_encoding(fname, attrs, cmd_attrs):
36              """bundles the updating functions for _ie/_oe/_ierr/_oerr."""
37              if ad3in and ad3out:
38                  ad3in.update_input_encoding(fname, attrs, cmd_attrs)
39                  ad3in.update_input_errors(fname, attrs, cmd_attrs)
40                  ad3out.update_output_encoding(fname, attrs, cmd_attrs)
41                  ad3out.update_output_errors(fname, attrs, cmd_attrs)
42              else:
43                  raise NotImplementedError


44      class Ad3In(Ad3Codec):
45          """Exactly one instance in AsciiDoc3, handles the attributes 'input-encoding'
46             and 'input-errors' and their corresponding 'private' variables
47             '_ie' and '_ierr'."""

48          ad3in_counter = 0

49          def __init__(self):
50              super().__init__()
51              Ad3In.ad3in_counter += 1
52              if Ad3In.ad3in_counter != 1:
53                  raise EAsciiDoc('Class Ad3In: Only one instance allowed')
54              self._ie = 'utf-8'
55              self._ierr = 'strict'
56              self.update_ie_allowed = True
57              self.update_ierr_allowed = True
58
59          def update_input_encoding(self, fname, conf_attrs, cmd_attrs):
60              """Input-encoding '_ie' is updated and then fixed, find this point in time at approx. line 4860 of the source code.
61                 (Search for "update_attrs(self.conf_attrs, d)" and add three lines downwards). The user can not set _ie directly:
62                 she (or he) sets the attribute "input-encoding" in command line (ranks first) or conf-files. Just before the output
63                 file is opened, '_ie' is fixed and inalterable later on. This implies, _ie is not affected by ':input-encoding: xyz'
64                 somewhere in the input-file, such a statement is ignored. "input-encoding" has to be set in the command-line or *.conf,
65                 '_ie' is initialized as 'utf-8'."""
66              if self.update_ie_allowed:
67                  #message.verbose('...')
68                  message.verbose('Entering "update_input-encoding()" with file: {filename}'.format(filename=fname))
69                  # 'stdin' here seems quite impossible - this if-branch is added for 'security reasons'.
70                  if fname == 'stdin'.lower():
71                      """Stdin found, no more encoding updates any more! No change is made to _ie."""
72                      self.block_updating()
73                      message.verbose('stdin as input-file found, encodings are fixed: {filename}'.format(filename=fname))
74                  elif fname.rstrip().endswith(".conf"):             # is this neccessary? TODO
75                      if cmd_attrs and ('input-encoding' in cmd_attrs):              # _ie as set in command-line, overrides conf-files
76                          self._ie = cmd_attrs['input-encoding']
77                          self.update_ie_allowed = False
78                          message.verbose('input-encoding is fixed as found in command-line: {iencoding}'.format(iencoding=self._ie))
79                      elif conf_attrs and ('input-encoding' in conf_attrs):          # _ie as set in conf-files by user
80                          self._ie = conf_attrs['input-encoding']                    # may be None or missing
81                          message.verbose('input-encoding updated as in {filename}: {iencoding}'.format(filename=fname, iencoding=self._ie))
82                      else: pass
83                  else:
84                      """fname is not == stdin and does not end with '.conf': there is nothing to do."""
85              else:
86                  # self.update_ie_allowed == False
87                  message.verbose('altering input-encoding is no longer allowed')

88              # End of updating/fixing. Validate the current input-encoding:
89              self.check_encoding(self._ie)

90          def get_input_encoding(self, fname):
91              """get the input-encoding: conf/css/py are utf-8, 'stdin' is possibly 'special',
92              asciidoc3-input-file is opened with _ie. 'stdin' and 'input-file' block any
93              further altering of '_ie'."""
94              if fname.rstrip().lower().endswith('.conf'):
95                  return 'utf-8'
96              elif fname.rstrip().lower().endswith('.css'):
97                  return 'utf-8'
98              elif fname.rstrip().lower().endswith('.py'):
99                  return 'utf-8'
100             elif fname.lower() == 'stdin':
101                 if 'stdin'.lower() == document.attributes['infile']:
102                     return self._ie
103                 else:
104                     try:
105                         return getdefaultlocale()[1]
106                     except:
107                         return 'utf-8'
108                 self.block_updating()
109             # AsciiDoc3 opens 'infile' and/or a file not ending with conf/css/py
110             elif fname == document.attributes['infile'] or  \
111                     not (fname.rstrip().lower().endswith('.conf') or \
112                          fname.rstrip().lower().endswith('.css')  or \
113                          fname.rstrip().lower().endswith('.py')):
114                 self.block_updating()
115                 return self._ie
116             else:
117                 raise EAsciiDoc('AsciiDoc3 cannot open a file that is not ending with conf/css/py or "stdin" or infile')

118         def update_input_errors(self, fname, conf_attrs, cmd_attrs):
119             """Input-errors is updated and fixed undergoing the same procedures as input-encoding.
120                _ierr has to be set in *.conf or command-line, _ierr is initialized as 'strict'."""
121             if self.update_ierr_allowed:
122                 message.verbose('Entering "update/fix input-errors" with input-file: {filename}'.format(filename=fname))
123                 # 'stdin' here seems quite impossible - this if-branch is added for 'security reasons'.
124                 if fname == 'stdin'.lower():
125                     """stdin found, no more encoding updates any more! No change is made to _ierr."""
126                     self.block_updating()
127                     message.verbose('stdin found, encodings are fixed: {filename}'.format(filename=fname))
128                 elif fname.rstrip().endswith(".conf"):
129                     if cmd_attrs and 'input-errors' in cmd_attrs:
130                         self._ierr = cmd_attrs['input-errors']
131                         self.update_ierr_allowed = False
132                         message.verbose('input-errors updated as in command-line: {iencodingerrors}'.format(iencodingerrors=self._ierr))
133                     elif conf_attrs and 'input-errors' in conf_attrs:
134                         self._ierr = conf_attrs['input-errors']
135                         message.verbose('input-errors updated as in {filename}: {iencodingerrors}'.format(filename=fname, iencodingerrors=self._ierr))
136                 else:
137                     """fname is not == stdin and does not end with '.conf': there is nothing to do."""
138             else:
139                 # self.update_ierr_allowed == False
140                 message.verbose('altering input-errors is no longer allowed')
141
142             # End of updating/fixing. Validate the current errors:
143             self.check_errors(self._ierr)

144         def get_input_errors(self, fname):
145             if fname.lower().endswith('.conf'):
146                 return 'strict'
147             elif fname.lower().endswith('.css'):
148                 return 'strict'
149             elif fname.lower().endswith('.py'):
150                 return 'strict'
151             # AsciiDoc3 opens 'infile' and/or a file not ending with conf/css/py
152             elif fname == ('stdin' or document.attributes['infile']) or  \
153                         not (fname.rstrip().lower().endswith('.conf') or \
154                         fname.rstrip().lower().endswith('.css')  or \
155                         fname.rstrip().lower().endswith('.py')):
156                 self.block_updating()
157                 return self._ierr
158             else:
159                 raise EAsciiDoc('AsciiDoc3 cannot open a file that is not ending with conf/css/py or "stdin" or infile')

160         def block_updating(self):
161             """no more updating _ie/_ierr"""
162             self.update_ie_allowed = False
163             self.update_ierr_allowed = False
164
165         east_asian_widths = {'W' : 2,   # Wide
166                              'F' : 2,   # Full-width (wide)
167                              'Na': 1,   # Narrow
168                              'H' : 1,   # Half-width (narrow)
169                              'N' : 1,   # Neutral (not East Asian, treated as narrow)
170                              'A' : 1}   # Ambiguous (s/b wide in East Asian context,
171                                         # narrow otherwise, but that doesn't work)
172         """Mapping of result codes from 'unicodedata.east_asian_width()' to character
173         column widths."""
174
175         def column_width(self, s):
176             """Computes the 'widths' of utf-8-words, especially east-asian."""
177             assert type(s) == type('string')  ###!###
178             width = 0
179             for c in s:
180                 width += self.east_asian_widths[east_asian_width(c)]
181             return width


182     class Ad3Out(Ad3Codec):
183         """Exactly one instance in AsciiDoc3, handles the attributes 'output-encoding'
184            and 'output-errors' and their corresponding 'private' variables
185            '_oe' and '_oerr'."""
186
187         ad3out_counter = 0

188         def __init__(self):
189             super().__init__()
190             Ad3Out.ad3out_counter += 1
191             if Ad3Out.ad3out_counter != 1:
192                 raise EAsciiDoc('class Ad3Out: Only one instance allowed')
193             self._oe = 'utf-8'                        # this is the default
194             self._oerr = 'strict'                     # this is the default
195             self.update_oe_allowed = True
196             self.update_oerr_allowed = True

197         def update_output_encoding(self, fname, conf_attrs, cmd_attrs):
198             """Output-encoding '_oe' is updated and then fixed, find this point in time at approx. line 4860 of the source code.
199                (Search for "update_attrs(self.conf_attrs, d)" and add three lines downwards). The user can not set _oe directly:
200                she (or he) sets the attribute "output-encoding" in command line (ranks first) or conf-files. Just before the output
201                file is opened, '_oe' is fixed and inalterable later on. This implies, _oe is not affected by ':output-encoding: xyz'
202                somewhere in the input-file, such a statement is ignored. "output-encoding" has to be set in the command-line or *.conf,
203                '_oe' is initialized as 'utf-8'."""

204             if self.update_oe_allowed:
205                 message.verbose('Entering "update/fix output-encoding" with input-file: {filename}'.format(filename=fname))
206                 # 'stdin' here seems quite impossible - this if-branch is added for 'security reasons'.
207                 if fname == 'stdin'.lower():
208                     """Stdin found, no more encoding updates any more! No change is made to _oe."""
209                     self.block_updating()
210                     message.verbose('stdin found, encodings are fixed: {filename}'.format(filename=fname))
211                 elif fname.rstrip().endswith(".conf"):
212                     if cmd_attrs and 'output-encoding' in cmd_attrs:             # as set in command-line, overrides conf-files
213                         self._oe = cmd_attrs['output-encoding']
214                         self.update_oe_allowed = False
215                         message.verbose('output-encoding is fixed as found in command-line: {oencoding}'.format(oencoding=self._oe))
216                     elif conf_attrs and ('output-encoding' in conf_attrs):       # as set in conf-files from user
217                         self._oe = conf_attrs['output-encoding']                 # may be None or missing
218                         message.verbose('outinput-encoding updated as in {filename}: {oencoding}'.format(filename=fname, oencoding=self._oe))
219                     else: pass
220                 else:
221                     """fname is not == stdin and does not end with '.conf': there is nothing to do."""
222             else:
223                 # self.update_oe_allowed == False
224                 message.verbose('altering output-encoding is no longer allowed')

225             # End of updating/fixing. Validate the current output-encoding:
226             self.check_encoding(self._oe)

227         def get_output_encoding(self, fname):
228             if fname:
229                 ad3in.block_updating()
230                 self.block_updating()
231                 return self._oe
232             else:
233                 raise EAsciiDoc("AsciiDoc3 tries to open an unproper file")
234
235         def update_output_errors(self, fname, conf_attrs, cmd_attrs):
236             """Output-errors (_oerr) is fixed; _oerr has to be set in *.conf or command-line."""
237             if self.update_oerr_allowed:
238                 message.verbose('Entering "update/fix output-errors" with input-file: {filename}'.format(filename=fname))
239                 # 'stdin' here seems quite impossible - this if-branch is added for 'security reasons'.
240                 if fname == 'stdin'.lower():
241                     """Stdin found, no more encoding updates any more! No change is made to _oerr."""
242                     self.update_oerr_allowed = False
243                     message.verbose('stdin found, encodings are fixed: {filename}'.format(filename=fname))
244                 elif fname.rstrip().endswith(".conf"):
245                     if cmd_attrs and ('output-errors' in cmd_attrs):
246                         self._oerr = cmd_attrs['output-errors']
247                         self.update_oerr_allowed = False
248                         message.verbose('output-errors updated as in command-line: {oencoding}'.format(oencoding=self._oerr))
249                     elif conf_attrs and ('output-errors' in conf_attrs):
250                         self._oerr = conf_attrs['output-errors']
251                         message.verbose('output-errors updated as in {filename}: {oencoding}'.format(filename=fname, oencoding=self._oerr))
252                     else: pass
253                 else: pass

254             # End of updating/fixing. Validate the current output-errors:
255             self.check_errors(self._oerr)

256         def get_output_errors(self, fname):
257             if fname:
258                 ad3in.block_updating()
259                 self.block_updating()
260                 return self._oerr

261         def block_updating(self):
262             """no more updating _oe/_oerr"""
263             self.update_oe_allowed = False
264             self.update_oerr_allowed = False

265         def add_encoding_for_compatibility(self):
266             """Ensures compatibilty with AsciiDoc2 xhtml11.conf and docbook45.conf"""
267             message.verbose('encoding set for compatibility: {current_oe}'.format(current_oe = self.oe))
268             document.attributes['encoding'] = self.oe
269         # the above function 'add_encoding...()' is edited later on, so 'forget' this 4 lines

The very first three lines …

1       # def char_len(s): return len(s)            # no longer needed in AsciiDoc3
2       # def char_decode(s): return s              # no longer needed in AsciiDoc3
3       # def char_encode(s) ...                    # no longer needed in AsciiDoc3

... are somewhat superfluous. They give a hint for all those who are missing these functions in AsciiDoc3. The functions are gone, but the following lines take some functionality of them into AsciiDoc3. Probably this chunk will be removed in the next version.
These lines and the following new classes, too, are placed in this spot instead.

4       class Ad3Codec():
5           """New in AsciiDoc3. Class Ad3Codec is not used directly, but provides
6              static methods and serves as a base class to Ad3In/Ad3Out.
7              Python3's muddle concerning encoding regarding the users input, stdin/out,
8              output, and command line options happens here."""

All right, my first new class in AsciiDoc3. What is it good for? The docstring notes the main issue. Ad3Codec and its two subclasses shall do all the encoding work in Python3 strings that makes the input and the output flexible. But at second sight the user has to avoid some pitfalls, especially when using "off-wall" encodings.

9           ad3codec_counter = 0

10          def __init__(self):
11              Ad3Codec.ad3codec_counter += 1
12              if Ad3Codec.ad3codec_counter > 2:
13                  raise EAsciiDoc('Class Ad3Codec(): too many instances')

I confess, this way of ruling out more than two instances looks a little bit unusual. I want to stress out, that we have exactly two instances of the subclasses Ad3In and Ad3Out, named ad3in and ad3out, see below. In the real world of AsciiDoc3 this precaution is not necessary, because this program is a monolithic one without modules.

14          def check_encoding(self, encoding_name):
15              """Checks if 'encoding_name' is a valid Python codec.
16              If not, an exception is thrown and the program exits.
17              Replaces 'char_encoding()' in AsciiDoc2."""
18              try:
19                  lookup(encoding_name.lower())
20                  return
21              except LookupError as e:
22                  raise EAsciiDoc(str(e))

Look at the docstring … Please note, that the program accepts only valid encodings. It is not possible to give an invalid encoding in the first conf-file (like "utf-99") and rectify this in a later read conf-file ("utf-8"), as it would be allowed in AsciiDoc2.

23          def check_errors(self, error_name):
24              """Checks if 'error_name' is valid. If not, an exception is thrown
25                 and the program exits."""
26              if error_name.rstrip().lower() not in ('strict', 'ignore', 'replace'):
27                  raise EAsciiDoc(str(e))
28              return

The same logic as in "def check_encoding(…)" just seen before in lines 14-22 to ensure a valid "errors".

29          def block_updating(self):
30              """blocks immediately any further altering of
31              _ie/_oe/_ierr/_oerr, respectively."""
32              raise NotImplementedError

This will be coded in the subclasses Ad3In and Ad3Out, respectively.

34          @staticmethod
35          def update_encoding(fname, attrs, cmd_attrs):
36              """bundles the updating functions for _ie/_oe/_ierr/_oerr."""
37              if ad3in and ad3out:
38                  ad3in.update_input_encoding(fname, attrs, cmd_attrs)
39                  ad3in.update_input_errors(fname, attrs, cmd_attrs)
40                  ad3out.update_output_encoding(fname, attrs, cmd_attrs)
41                  ad3out.update_output_errors(fname, attrs, cmd_attrs)
42              else:
43                  raise NotImplementedError

Again all is said in the docstring; the functions are implemented in the subclasses. This definition reduces the number of lines. The "if/else ad3in and ad3out/NotImplementedError" seems to be an overcautious request.

4.1.1. Class Ad3In()

44      class Ad3In(Ad3Codec):
45          """Exactly one instance in AsciiDoc3, handles the attributes 'input-encoding'
46             and 'input-errors' and their corresponding 'private' variables
47             '_ie' and '_ierr'."""

48          ad3in_counter = 0

49          def __init__(self):
50              super().__init__()
51              Ad3In.ad3in_counter += 1
52              if Ad3In.ad3in_counter != 1:
53                  raise EAsciiDoc('Class Ad3In: Only one instance allowed')

We have exactly one instance of Ad3in named ad3in …

54              self._ie = 'utf-8'
55              self._ierr = 'strict'
56              self.update_ie_allowed = True
57              self.update_ierr_allowed = True

And now begins the main purpose of Ad3In: the handling of the input-encoding "_ie" and input-errors "_ierr". The user has no direct access to these variables. He/she is setting _ie/_ierr by defining the attributes "input-encoding" and "input-errors". This is so to avoid unproper setting: AsciiDoc3 has to guarantee that all files ending with "conf", "py", or "css" are read in mode 'utf-8', 'strict'. In contrast the encoding of the "infile" is defined as set in "input-encoding". The user may choose any of the encodings that Python3 knows.
_ie and _ierr are initialized with 'utf-8', 'strict' - this is probably most frequently used and somehow recommended. Particularly "strict" should only changed to "ignore" or the other alternatives when you know exactly what you are doing, and only for debug reasons.
The boolean variable "update_ie_allowed" blocks any change to _ie (the same as "update_ierr_allowed") when AsciiDoc3 has to preclude any further access to _ie / _ierr, e.g., when "input-encoding" is set in the commandline like "-a input-encoding=cp1234". All further attempts (e.g., in the conf-files) are prohibited: update_ie_allowed = False. Another reason to block _ie/_ierr immediatly occurs, when AsciiDoc3 recognizes the use of stdin or stdout. It is too much "tricky" to handle the complicated possibilities of encoding stdin or stdout, so the setting of _ie/_ierr in this way is prohibited.

59          def update_input_encoding(self, fname, conf_attrs, cmd_attrs):
60              """Input-encoding '_ie' is updated and then fixed, find this point in time at approx. line 4860 of the source code.
61                 (Search for "update_attrs(self.conf_attrs, d)" and add three lines downwards). The user can not set _ie directly:
62                 she (or he) sets the attribute "input-encoding" in command line (ranks first) or conf-files. Just before the output
63                 file is opened, '_ie' is fixed and inalterable later on. This implies, _ie is not affected by ':input-encoding: xyz'
64                 somewhere in the input-file, such a statement is ignored. "input-encoding" has to be set in the command-line or *.conf,
65                 '_ie' is initialized as 'utf-8'."""

Now we’re entering "def update_input_encoding()". This function is called every time when a new entry in "attribute" may appear. This is when asciidoc3.py calls "update_attrs()" with the parameter "conf-attr". "Update_input_encoding()" copies this parameter and adds the second parameter "cmd_attrs", also given every time when "update_attrs()" is called. The third parameter "fname" is the most recently loaded configuration file name.
To be more precise, update_input_encoding() is called in a row with the three similar functions "update_input_errors()", "update_output_encoding()" and "update_output_errors() via the static method "update_encoding()" in Ad3Codec. But that does’nt matter here.
As stressed out before, in some cases there is no need of updating _ie any more, in contrast, AsciiDoc3 has to ensure that no changing is possible at all.

66              if self.update_ie_allowed:
67                  #message.verbose('...')
68                  message.verbose('Entering "update_input-encoding()" with file: {filename}'.format(filename=fname))
69                  # 'stdin' here seems quite impossible - this if-branch is added for 'security reasons'.
70                  if fname == 'stdin'.lower():
71                      """Stdin found, no more encoding updates any more! No change is made to _ie."""
72                      self.block_updating()
73                      message.verbose('stdin as input-file found, encodings are fixed: {filename}'.format(filename=fname))
74                  elif fname.rstrip().endswith(".conf"):             # is this neccessary? TODO
75                      if cmd_attrs and ('input-encoding' in cmd_attrs):              # _ie as set in command-line, overrides conf-files
76                          self._ie = cmd_attrs['input-encoding']
77                          self.update_ie_allowed = False
78                          message.verbose('input-encoding is fixed as found in command-line: {iencoding}'.format(iencoding=self._ie))
79                      elif conf_attrs and ('input-encoding' in conf_attrs):          # _ie as set in conf-files by user
80                          self._ie = conf_attrs['input-encoding']                    # may be None or missing
81                          message.verbose('input-encoding updated as in {filename}: {iencoding}'.format(filename=fname, iencoding=self._ie))
82                      else: pass
83                  else:
84                      """fname is not == stdin and does not end with '.conf': there is nothing to do."""
85              else:
86                  # self.update_ie_allowed == False
87                  message.verbose('altering input-encoding is no longer allowed')

"update_ie_allowed" is (of course) initialized as "True" in line 56 and holds "True" until one of the following cases comes along:
case 1: stdin is found as a conf-file (that seems to be pretty impossible, but implemented for security reason), via block_updating() - see lines 70-73,
case 2: input-encoding is in cmd_attr, i.e., input-encoding is set in command line - see line 75,
case 3: asciidoc3 opens a file with an ending not in conf/css/py - see get_input_encoding() lines 90 ff. just below,
case 4: analogous to case 1 regarding _ie, via block_updating()- see line 162 below,
case 5: analogous to case 3 regarding _ierr, via block_updating() - see below.

If "update_ie_allowed" is "True", the function sets "_ie" in the following order/priority. First when input-encoding is given in cmd_attrs, let _ie=input-encoding, then blocking (lines 86-90). Second, when in conf_attrs, with no associated blocking (lines 91-94). This means, a conf-file read later on in the AsciiDoc3-config-chain may alter the input-encoding (but only, when the two are a valid encoding at all).

Lines 83-87 are a precaution / placeholder, when no setting is found - though this seems impossible since _ie is set during initialization.

88              # End of updating/fixing. Validate the current input-encoding:
89              self.check_encoding(self._ie)

Every time when calling "update_input_encoding()" we check if _ie is an valid encoding accepted by Python3.

90          def get_input_encoding(self, fname):
91              """get the input-encoding: conf/css/py are utf-8, 'stdin' is possibly 'special',
92              asciidoc3-input-file is opened with _ie. 'stdin' and 'input-file' block any
93              further altering of '_ie'."""
94              if fname.rstrip().lower().endswith('.conf'):
95                  return 'utf-8'
96              elif fname.rstrip().lower().endswith('.css'):
97                  return 'utf-8'
98              elif fname.rstrip().lower().endswith('.py'):
99                  return 'utf-8'
100             elif fname.lower() == 'stdin':
101                 if 'stdin'.lower() == document.attributes['infile']:
102                     return self._ie
103                 else:
104                     try:
105                         return getdefaultlocale()[1]
106                     except:
107                         return 'utf-8'
108                 self.block_updating()
109             # AsciiDoc3 opens 'infile' and/or a file not ending with conf/css/py
110             elif fname == document.attributes['infile'] or  \
111                     not (fname.rstrip().lower().endswith('.conf') or \
112                          fname.rstrip().lower().endswith('.css')  or \
113                          fname.rstrip().lower().endswith('.py')):
114                 self.block_updating()
115                 return self._ie
116             else:
117                 raise EAsciiDoc('AsciiDoc3 cannot open a file that is not ending with conf/css/py or "stdin" or infile')

When a Python3-program like AsciiDoc3 opens a file an encoding is needed, 'utf-8' is the default. AsciiDoc3 sets 'utf-8' explicitly at the time of initializing. Later on AsciiDoc3 handles only with files of this kind:
1. ending with "conf/css/py"
2. stdin (pseudofile)
3. all other endings.
Case 1. sets encoding to utf-8 (lines 94-99). This is neither affected through the attribute "input-encoding" nor "encoding" in cmd_attrs or conf_attrs.
Case 2.: If stdin is the "input-file" (lines 100-108), any further altering of _ie (and the other "encoding-parameterss" _oe, _ierr, _oerr) is blocked and the very current encoding is "fixed." If stdin is not the infile but is to be opened, AsciiDoc3 tries to detect the defaultlocale. If an error occurs, utf-8 is returned. In both alternatives we block setting the encoding-parameters.
Case3.: We find a file not ending with conf/css/py neither stdin or the infile (lines 109-117). Thats ends the procedure of finding _ie immediatly; "doing the work" begins.

118         def update_input_errors(self, fname, conf_attrs, cmd_attrs):
119             """Input-errors is updated and fixed undergoing the same procedures as input-encoding.
120                _ierr has to be set in *.conf or command-line, _ierr is initialized as 'strict'."""
121             if self.update_ierr_allowed:
122                 message.verbose('Entering "update/fix input-errors" with input-file: {filename}'.format(filename=fname))
123                 # 'stdin' here seems quite impossible - this if-branch is added for 'security reasons'.
124                 if fname == 'stdin'.lower():
125                     """stdin found, no more encoding updates any more! No change is made to _ierr."""
126                     self.block_updating()
127                     message.verbose('stdin found, encodings are fixed: {filename}'.format(filename=fname))
128                 elif fname.rstrip().endswith(".conf"):
129                     if cmd_attrs and 'input-errors' in cmd_attrs:
130                         self._ierr = cmd_attrs['input-errors']
131                         self.update_ierr_allowed = False
132                         message.verbose('input-errors updated as in command-line: {iencodingerrors}'.format(iencodingerrors=self._ierr))
133                     elif conf_attrs and 'input-errors' in conf_attrs:
134                         self._ierr = conf_attrs['input-errors']
135                         message.verbose('input-errors updated as in {filename}: {iencodingerrors}'.format(filename=fname, iencodingerrors=self._ierr))
136                 else:
137                     """fname is not == stdin and does not end with '.conf': there is nothing to do."""
138             else:
139                 # self.update_ierr_allowed == False
140                 message.verbose('altering input-errors is no longer allowed')
141
142             # End of updating/fixing. Validate the current errors:
143             self.check_errors(self._ierr)

Almost the same logical approach as regarding "_ie"; see "def update_input_encoding(…)" line 59. So no further annotations are necessary. A small difference between this two functions is seen because "_ierr" has no counterpart in AsciiDoc2.

144         def get_input_errors(self, fname):
145             if fname.lower().endswith('.conf'):
146                 return 'strict'
147             elif fname.lower().endswith('.css'):
148                 return 'strict'
149             elif fname.lower().endswith('.py'):
150                 return 'strict'
151             # AsciiDoc3 opens 'infile' and/or a file not ending with conf/css/py
152             elif fname == ('stdin' or document.attributes['infile']) or  \
153                         not (fname.rstrip().lower().endswith('.conf') or \
154                         fname.rstrip().lower().endswith('.css')  or \
155                         fname.rstrip().lower().endswith('.py')):
156                 self.block_updating()
157                 return self._ierr
158             else:
159                 raise EAsciiDoc('AsciiDoc3 cannot open a file that is not ending with conf/css/py or "stdin" or infile')

160         def block_updating(self):
161             """no more updating _ie/_ierr"""
162             self.update_ie_allowed = False
163             self.update_ierr_allowed = False

And here I’d like refer to "def get_input_encoding(…)" - the same logic again.

164
165         east_asian_widths = {'W' : 2,   # Wide
166                              'F' : 2,   # Full-width (wide)
167                              'Na': 1,   # Narrow
168                              'H' : 1,   # Half-width (narrow)
169                              'N' : 1,   # Neutral (not East Asian, treated as narrow)
170                              'A' : 1}   # Ambiguous (s/b wide in East Asian context,
171                                         # narrow otherwise, but that doesn't work)
172         """Mapping of result codes from 'unicodedata.east_asian_width()' to character
173         column widths."""
174
175         def column_width(self, s):
176             """Computes the 'widths' of utf-8-words, especially east-asian."""
177             assert type(s) == type('string')  ###!###
178             width = 0
179             for c in s:
180                 width += self.east_asian_widths[east_asian_width(c)]
181             return width

The function "column_width()" is called exactly one time in AsciiDoc3 - in the function/class abc.def TODO. It’s purpose is to determine the "length" of a string adding his characters. This is the same as in AsciiDoc2.

4.1.2. Class Ad3Out()

This was all about Ad3In. Now we step forward to Ad3Out, but the annotations will be less comprehensive due to the analogue structure. Let’s start:

182     class Ad3Out(Ad3Codec):
183         """Exactly one instance in AsciiDoc3, handles the attributes 'output-encoding'
184            and 'output-errors' and their corresponding 'private' variables
185            '_oe' and '_oerr'."""
186
187         ad3out_counter = 0

188         def __init__(self):
189             super().__init__()
190             Ad3Out.ad3out_counter += 1
191             if Ad3Out.ad3out_counter != 1:
192                 raise EAsciiDoc('class Ad3Out: Only one instance allowed')
193             self._oe = 'utf-8'                        # this is the default
194             self._oerr = 'strict'                     # this is the default
195             self.update_oe_allowed = True
196             self.update_oerr_allowed = True

197         def update_output_encoding(self, fname, conf_attrs, cmd_attrs):
198             """Output-encoding '_oe' is updated and then fixed, find this point in time at approx. line 4860 of the source code.
199                (Search for "update_attrs(self.conf_attrs, d)" and add three lines downwards). The user can not set _oe directly:
200                she (or he) sets the attribute "output-encoding" in command line (ranks first) or conf-files. Just before the output
201                file is opened, '_oe' is fixed and inalterable later on. This implies, _oe is not affected by ':output-encoding: xyz'
202                somewhere in the input-file, such a statement is ignored. "output-encoding" has to be set in the command-line or *.conf,
203                '_oe' is initialized as 'utf-8'."""

204             if self.update_oe_allowed:
205                 message.verbose('Entering "update/fix output-encoding" with input-file: {filename}'.format(filename=fname))
206                 # 'stdin' here seems quite impossible - this if-branch is added for 'security reasons'.
207                 if fname == 'stdin'.lower():
208                     """Stdin found, no more encoding updates any more! No change is made to _oe."""
209                     self.block_updating()
210                     message.verbose('stdin found, encodings are fixed: {filename}'.format(filename=fname))
211                 elif fname.rstrip().endswith(".conf"):
212                     if cmd_attrs and 'output-encoding' in cmd_attrs:             # as set in command-line, overrides conf-files
213                         self._oe = cmd_attrs['output-encoding']
214                         self.update_oe_allowed = False
215                         message.verbose('output-encoding is fixed as found in command-line: {oencoding}'.format(oencoding=self._oe))
216                     elif conf_attrs and ('output-encoding' in conf_attrs):       # as set in conf-files from user
217                         self._oe = conf_attrs['output-encoding']                 # may be None or missing
218                         message.verbose('outinput-encoding updated as in {filename}: {oencoding}'.format(filename=fname, oencoding=self._oe))
219                     else: pass
220                 else:
221                     """fname is not == stdin and does not end with '.conf': there is nothing to do."""
222             else:
223                 # self.update_oe_allowed == False
224                 message.verbose('altering output-encoding is no longer allowed')

225             # End of updating/fixing. Validate the current output-encoding:
226             self.check_encoding(self._oe)

227         def get_output_encoding(self, fname):
228             if fname:
229                 ad3in.block_updating()
230                 self.block_updating()
231                 return self._oe
232             else:
233                 raise EAsciiDoc("AsciiDoc3 tries to open an unproper file")
234
235         def update_output_errors(self, fname, conf_attrs, cmd_attrs):
236             """Output-errors (_oerr) is fixed; _oerr has to be set in *.conf or command-line."""
237             if self.update_oerr_allowed:
238                 message.verbose('Entering "update/fix output-errors" with input-file: {filename}'.format(filename=fname))
239                 # 'stdin' here seems quite impossible - this if-branch is added for 'security reasons'.
240                 if fname == 'stdin'.lower():
241                     """Stdin found, no more encoding updates any more! No change is made to _oerr."""
242                     self.update_oerr_allowed = False
243                     message.verbose('stdin found, encodings are fixed: {filename}'.format(filename=fname))
244                 elif fname.rstrip().endswith(".conf"):
245                     if cmd_attrs and ('output-errors' in cmd_attrs):
246                         self._oerr = cmd_attrs['output-errors']
247                         self.update_oerr_allowed = False
248                         message.verbose('output-errors updated as in command-line: {oencoding}'.format(oencoding=self._oerr))
249                     elif conf_attrs and ('output-errors' in conf_attrs):
250                         self._oerr = conf_attrs['output-errors']
251                         message.verbose('output-errors updated as in {filename}: {oencoding}'.format(filename=fname, oencoding=self._oerr))
252                     else: pass
253                 else: pass

254             # End of updating/fixing. Validate the current output-errors:
255             self.check_errors(self._oerr)

256         def get_output_errors(self, fname):
257             if fname:
258                 ad3in.block_updating()
259                 self.block_updating()
260                 return self._oerr

261         def block_updating(self):
262             """no more updating _oe/_oerr"""
263             self.update_oe_allowed = False
264             self.update_oerr_allowed = False

Did you recognize the main difference? Of course. There are no special files ending with css/py/conf, so we don’t need any special handling. This makes the code somewhat shorter. When AsciiDoc3 starts producing the output, the encoding and the errors are immediatly fixed (line 230/259).

265         def add_encoding_for_compatibility(self):
266             """Ensures compatibilty with AsciiDoc2 xhtml11.conf and docbook45.conf"""  1
267             message.verbose('encoding set for compatibility: {current_oe}'.format(current_oe = self.oe))
268             document.attributes['encoding'] = self.oe
269         # the above function 'add_encoding...()' is edited later on, so 'forget' this 4 lines
1 as said in line 269: forget about this (lines 265-269)! We’ll edit this function later, see here.

This is the end of the two classes - a good part of the 2to3-work is done (only for asciidoc3.py - not for all the conf-files, a2x.py or asciidoc.api …). After inserting the code given in lines 1-269 asciidoc3_new.py still executes fine and produces inputone.html as expected.

4.2. Eliminating BOM

In the docstring of the new "class Ad3Codec" I wrote: "concerning encoding regarding the users input". This would include also handling about BOM (Byte Order Mark). But to make it short: There is no need to use the BOM in the utf-8 world, see here

The Unicode Standard permits the BOM in UTF-8, but does not require or recommend its use.

— http://en.wikipedia.org/wiki/Byte_order_mark

or a more polemic opinion here

[…] the UTF-8 BOM is an abomination on this earth brought forth by Microsoft..

— https://stackoverflow.com/questions/14083111/should-i-use-encoding-declaration-in-python3

So asciidoc3.py ignores the first three bytes of the input when they are equal to b'\xef\xbb\xbf' (= UTF8_BOM). And asciidoc3.py also produces no output that begins with "UTF8_BOM"; the program does not affect or deal with the other BOMs like UTF-16 (BE) EE FF, UTF-16 (LE) FF FE, SCSU 0E FE FF and so on. Here’s the code:

BOM is ignored/outdated
## BOM is no longer used
# bom is of type 'bytes' _nr_: 4078 _
ad3new, counter = re.subn(r"UTF8_BOM = '\\xef\\xbb\\xbf' ", r"UTF8_BOM = b'\\xef\\xbb\\xbf'", ad3new)
assert counter == 1

# replace checking and handle BOM nr4116
ad3new, counter = re.subn(r'if Reader1.read\(self\):.*?4119 _', \
                          r"""if Reader1.read(self):
            if self.cursor[2].startswith(str(UTF8_BOM, encoding = 'utf-8')):
                self.cursor[2] = self.cursor[2][len(str(UTF8_BOM, encoding = 'utf-8')):]
                message.verbose('BOM utf-8 was erased, no output with BOM')
                self.bom = None  # replaces "self.bom = UTF8_BOM" -> BOM is cut and BOM == False""", ad3new, flags=re.S)
assert counter == 1

# ignore BOM for output nr4435
ad3new, counter = re.subn(r'if bom:.*?4435 _', \
                          r"""#if bom: ... # BOM is ignored in AsciiDoc3""", ad3new, flags=re.S)
assert counter == 1
## end of eliminating BOM

4.3. Naming AsciiDoc3 AsciiDoc3

Until now I worked inside my directory ~ad3. To be more clear and to see the difference to the original AsciiDoc version powered by Python2, I move the results of the work so far to a new directory "~/.asciidoc3". Of course this does’nt mean that we can start AsciiDoc3 with "asciidoc3" on the command line. We haven’t any executable of this name in the path right now. The second step in this context is the renaming of some directories, file names, and variables; as a crucial point we have now asciidoc3.conf instead of asciidoc.conf. asciidoc3.py does not accept a file named "asciidoc.conf" any more, a file of this name is ignored.
And I rename asciidocapi.py to asciidoc3api.py, asciidoc.css to asciidoc3.css. Additionally I rearrange the location of all the images to ./images. The images used by inputone.txt no longer reside in ./imputfiles.
We have to edit asciidoc3.conf to meet these requirements, e.g. change attribute "asciidoc-version" to "asciidoc3-version" and others. We’ll edit the conf/css/py and other files (xhtml11.conf, docbook45.conf …) bit by bit. I do mention this from now on only if needed.
Doing as described and running

Running the new located asciidoc3
python3 ~/.asciidoc3/asciidoc3_new.py --verbose -n -a icons -o ~/.asciidoc3/outputfiles/outputone.html ~/.asciidoc3/inputfiles/inputone.txt

the identically output as before is produced → everything’s ok.

My "new" working directory .asciidoc3 looks like this:

├── asciidoc3.conf
├── asciidoc3_new.py
├── asciidoc3_start.py
├── docs
│   ├── ad2to3.html
│   ├── ad2to3.txt
│   ├── helloworld.jpg
│   └── outputone.jpg
├── images
│   ├── helloworld.jpg
│   ├── highlighter.png
│   ├── icons
│   │   ├── callouts
│   │   │   ├── 10.png
│   │   │   ├── ...
│   │   │   ├── 15.png
│   │   │   ├── 1.png
│   │   │   ├── ...
│   │   │   └── 9.png
│   │   ├── caution.png
│   │   ├── example.png
│   │   ├── home.png
│   │   ├── important.png
│   │   ├── next.png
│   │   ├── note.png
│   │   ├── output_one_html.png
│   │   ├── prev.png
│   │   ├── README
│   │   ├── tip.png
│   │   ├── up.png
│   │   └── warning.png
│   ├── outputone.jpg
│   ├── redsquare.jpg
│   ├── smallnew.png
│   └── tiger.png
├── inputfiles
│   ├── helloworld.html
│   ├── helloworld.jpg
│   ├── helloworld.txt
│   ├── inputone.txt
│   ├── oldtables.txt
├── javascripts
│   └── asciidoc3.js
├── lang-en.conf
├── mytools
│   ├── addlineno.py
│   ├── editad2.py
│   ├── editad3.py
├── outputfiles
│   ├── outoldtables.html
│   ├── outputone.html
├── stylesheets
│   ├── asciidoc3.css
├── xhtml11.conf

5. Improving AsciiDoc3 step by step

A good part of the Asciidoc-2to3-work is done … but unfortunatly of course not all. We identified one point to consider yet: the conf-files beside asciidoc.conf may contain some critical lines. Look at "xhtml11.conf" line 529: "<meta http-equiv= … charset=UTF-8". The attribute "encoding" is missing/unproper when using "output-encoding" in AsciiDoc3. We have set this manually in "def encoding_for_compatibility(self)", see line 265 above. There is an analogous issue in "docbook45.conf", "html4.conf", "lang-en.conf", and "xhtml11.conf". We go through the conf-files later on to fix this as said before - and do not always declare all these steps explicitly.

5.1. Testing the Homepage, part i

Another idea is to "test" the pages of the asciidoc.org homepage. What does this mean? We go to asciidoc.org/index.html, click on "Page Source" at the left sidebar and save the file "asciidoc.org/index.txt" as "inputtwo.txt". The next step is to run this at first with "python2 ~/asciidoc2/asciidoc.py --verbose -n -a icons -o ~/.asciidoc3/outputfiles/outputtwo2.html ~/.asciidoc3/inputfiles/inputtwo.txt". The second step is to run the same input with asciidoc3_new.py: "python3 ~/.asciidoc3/asciidoc3_new.py --verbose -n -a icons -o ~/.asciidoc3/outputfiles/outputtwo3.html ~/.asciidoc3/inputfiles/inputtwo.txt". Both output has to be identical …
In other words: we run some input with asciidoc.py and with asciidoc3.py and then compare the output. When it’s not (binary) identical - check this with the help of kdiff3 -, something went wrong and we have to improve asciidoc3_new.py. Please note: it’s important to refer to the original directory "asciidoc2" (I have it installed without the dot, having ~/asciidoc2 and not ~/.asciidoc2) to make use of the original conf/css and other files.

I make consciously not use of the files given in ~/asciidoc2/examples/website/index.txt: this gives better handling in further testing and the steps are more informative, too. And we do not build a web-site, we want to compare one file with one file - see the remarks on layout2.css later.

Let’s do it and start:
Be sure to have the iconsdir and imagedir attribute right-configured in both asciidoc.conf and asciidoc3.conf. In our case it’s "imagesdir=~/.asciidoc3/images" and "iconsdir=~/.asciidoc3/images/icons" for both. A second editing step consists of changing line 187 of inputtwo.txt to

"image::highlighter.png[height=400,caption="",link="~/.asciidoc3/images/highlighter.png"]".

Doing so, we have (almost) a binary identical output, but asciidoc2 shows:

<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" />
<meta name="generator" content="AsciiDoc3 8.6.9" />

asciidoc3 uses lower letters and skips one line:

<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8" />"

Editing "encoding=UTF-8" to "encoding=utf-8" in ~/asciidoc2’s asciidoc.conf lets this inconsequential difference "utf-8 vs UTF-8" vanish. And the second, also inconsequential difference regarding the "meta name generator"? Changing 'content="AsciiDoc {asciidoc-version}" ' to 'content="AsciiDoc {asciidoc3-version}" ' (one line after the charset meta-tag) makes it. Note: we have to edit "AsciiDoc {asciidoc3-version}" at the end, too! The correct line is of course "AsciiDoc3 {asciidoc3-version}" with digit "3", but this would produce a slightly different output. BTW: We ignore the message "WARNING: inputtwo.txt: line 12: blank block title" - this is without consequences. Doing this the output is binary identical - good news!

The same procedure goes for asciidoc.org/userguide.html → Page Source → inputthree.txt
We run "python2 ~/asciidoc2/asciidoc.py --verbose -n -a icons -o ~/.asciidoc3/outputfiles/outputthree2.html ~/.asciidoc3/inputfiles/inputthree.txt" and second "python3 ~/.asciidoc3/asciidoc3_new.py --verbose -n -a icons -o ~/.asciidoc3/outputfiles/outputthree3.html ~/.asciidoc3/inputfiles/inputthree.txt" But we have different outputs and errors in processing the files. It’s a pity!
A first look shows different levels in the tocs (table of content): v2 has only two, v3 at least four. That is easy to adjust: in asciidoc3.conf we change "toclevels=5" to "toclevels=2" - this was just a remainder of my earlier work. Executing again the two programs, v2 and v3, this is done.

A second look in the source of asciidoc.org/userguide.html via "Browser <strg><u>" reveals that the given page-source code is not identical with the source code of the existing website-page: first we have a second css-file named "layout2.css", but that’s not the problem. This css is mandatory to build the pages of asciidoc.org with one sidebar left and the logo placed in the header - we don’t need this here. Second we have the css-files given as a link: <link rel="stylesheet" href="./asciidoc.css" type="text/css" />, in v3 they are inlined causing much more lines; but that’s not the issue, too. Neither are the different links to the images, we may ignore that. In the website-file we have a lot of additional <code>asciidoc(1)</code> snippets. Ignoring them, too, one thing remains: "asciidoc: WARNING: inputthree.txt: line 3203: include file not found: ~/.asciidoc3/inputfiles/customers.csv". So we copy this file and let it run over again: it works!

All right, the website-file and the asciidoc2-file are somewhat different but sufficiently equivalent, on the other hand, what about asciidoc3.py? Looks nice only at first glance, but we have two errors:

asciidoc3_new: ERROR: inputthree.txt: line 1342: undefined filter attribute in command:
  "{python}" "{asciidoc-file}" -b {backend} {asciidoc-args}{lang? -a "lang={lang}@"}{icons? -a icons
  -a "iconsdir={iconsdir}"}{imagesdir? -a "imagesdir={imagesdir}"}{data-uri? -a data-uri}
  -a "indir={indir}"{trace? -a "trace={trace}"}{blockname? -a "blockname={blockname}"} -s

A second similar alert shows up in "inputthree.txt: line 6031." Examining the two outputs with the help of kdiff3 shows that asciidoc3_new.py is not able to "render" the <div class="paragraph">. An example, v2 has:

<td align="left" valign="top"><div><div class="paragraph"><p>Sets the number of title levels (1..4)
reported in the table of contents (see the <em>toc</em> attribute above). Defaults to 2 and must be
used with the <em>toc</em> attribute. Example usage:</p></div></div></td>

At this location v3 keeps an empty space:

<td align="left" valign="top"><div></div></td>

To get rid of this bug, I leave producing the asciidoc.org website and go to the - as you’ll see - not so easy topic "filtering".

5.2. Filtering and subprocessing

In a first step, let’s look what filters are installed in v2:

cd ~/asciidoc2/filters/code
~/asciidoc2/asciidoc.py --filter list

~/asciidoc2/filters/graphviz
~/asciidoc2/filters/latex
~/asciidoc2/filters/music
~/asciidoc2/filters/source
~/asciidoc2/filters/code

We find five filters. We go through this list in the above order of appearence.

5.2.1. Filter "graphviz"

We try to run the graphviz filter with asciidoc2.py. Be sure to have the package installed, test it by typing "whereis graphviz" on the command line. You’ll see something like "graphviz: /usr/lib/graphviz /usr/share/graphviz /usr/share/man/man7/graphviz.7.gz" - the directory contains files as "libgvplugin_core.so.6.0.0" among others.
If you don’t have graphviz installed, in the following listing appears an "IndexError: list index out of range" instead of the IOError. In both cases you see probably some warnings and an error: (listing abridged)

$ ~/asciidoc2/asciidoc.py -v ~/asciidoc2/filters/graphviz/asciidoc-graphviz-sample.txt
$
asciidoc: reading: ~/asciidoc2/asciidoc.conf
...
asciidoc: asciidoc-graphviz-sample.txt: line 38: evaluating: {set2:target:asciidoc-graphviz-sample__1.png}
asciidoc: asciidoc-graphviz-sample.txt: line 39: filtering: "/usr/bin/python" "~/asciidoc2/filters/graphviz/graphviz2png.py" -v -o "~/asciidoc2/filters/graphviz//~/.asciidoc3/images/asciidoc-graphviz-sample__1.png" -L dot -F png -
Traceback (most recent call last):
  File "~/asciidoc2/filters/graphviz/graphviz2png.py", line 169, in <module>
    app.run()
  File "~/asciidoc2/filters/graphviz/graphviz2png.py", line 151, in run
    open(infile, 'w').writelines(lines)
IOError: [Errno 2] No such file or directory: '~/asciidoc2/filters/graphviz//~/.asciidoc3/images/asciidoc-graphviz-sample__1.txt'
asciidoc: WARNING: asciidoc-graphviz-sample.txt: line 39: filter non-zero exit code: "/usr/bin/python" "~/asciidoc2/filters/graphviz/graphviz2png.py" -v -o "~/asciidoc2/filters/graphviz//~/.asciidoc3/images/asciidoc-graphviz-sample__1.png" -L dot -F png -: returned 1
...

This output shows up on my terminal, note the "-o ~/asciidoc2/filters/graphviz//~/.asciidoc3/images/" part: this path does not exist, it’s impossible. We have to adjust it in the graphviz-filter.conf in directory ~/asciidoc2/filters/graphviz to avoid the "duplication" // and set the correct path (before / after - some whitespace added):

-o "{outdir={indir}}/{imagesdir=}{imagesdir?/}{target}" -L {layout=dot} -F {format=png}

-o                  "{imagesdir=}{imagesdir?/}{target}" -L {layout=dot} -F {format=png}

Doing so, it works!

$ ~/asciidoc2/asciidoc.py -v ~/asciidoc2/filters/graphviz/asciidoc-graphviz-sample.txt
$
asciidoc: reading: ~/asciidoc2/asciidoc.conf
asciidoc: reading: ~/asciidoc2/filters/graphviz/asciidoc-graphviz-sample.txt
...
asciidoc: reading: ~/asciidoc2/filters/graphviz/graphviz-filter.conf
...
asciidoc: writing: ~/asciidoc2/filters/graphviz/asciidoc-graphviz-sample.html
asciidoc: asciidoc-graphviz-sample.txt: line 38: evaluating: {counter2:target-number}
asciidoc: asciidoc-graphviz-sample.txt: line 38: evaluating: {set2:target:asciidoc-graphviz-sample__1.png}
asciidoc: asciidoc-graphviz-sample.txt: line 39: filtering: "/usr/bin/python" "~/asciidoc2/filters/graphviz/graphviz2png.py" -v -o "~/.asciidoc3/images/asciidoc-graphviz-sample__1.png" -L dot -F png -
Execute: dot -Tpng "~/.asciidoc3/images/asciidoc-graphviz-sample__1.txt" > "~/.asciidoc3/images/asciidoc-graphviz-sample__1.png"
asciidoc: asciidoc-graphviz-sample.txt: line 76: filtering: "/usr/bin/python" "~/asciidoc2/filters/graphviz/graphviz2png.py" -v -o "~/.asciidoc3/images/sample2.png" -L dot -F png -
Execute: dot -Tpng "~/.asciidoc3/images/sample2.txt" > "~/.asciidoc3/images/sample2.png"
asciidoc: asciidoc-graphviz-sample.txt: line 128: filtering: "/usr/bin/python" "~/asciidoc2/filters/graphviz/graphviz2png.py" -v -o "~/.asciidoc3/images/sample3.png" -L dot -F png -
Execute: dot -Tpng "~/.asciidoc3/images/sample3.txt" > "~/.asciidoc3/images/sample3.png"

The standalone png’s are stored in ~/.asciidoc3/images/*.png (e.g. sample3.png) the html-page as ~/asciidoc2/filters/graphviz/asciidoc-graphviz-sample.html. Take a look at them on the basis of the two pics:

Sample of a graphviz-generated picture

graphviz_png

empty

The graphviz-filter-sample

Good news - the first step is successfully done: graphviz filter v2 works.

Allright, let’s try the same for asciidoc3.py: Before we start, we have to copy the filter-directory to /.asciidoc3 - the subdirectory "code" is edited as mentioned above.

If you see no error, but something like

...
asciidoc3_new: WARNING: asciidoc-graphviz-sample.txt: line 38: missing style: [blockdef-listing]: graphviz
...

probably the appropriate conf-file (here "graphviz-filter.conf") is missing. We have it right-placed, so

$~/.asciidoc3$ ./asciidoc3_new.py -v ~/.asciidoc3/filters/graphviz/asciidoc-graphviz-sample.txt
S ...
asciidoc3_new: asciidoc-graphviz-sample.txt: line 38: evaluating: {set2:target:asciidoc-graphviz-sample__1.png}
asciidoc3_new: FAILED: asciidoc-graphviz-sample.txt: line 39: unexpected error:
asciidoc3_new: ------------------------------------------------------------
Traceback (most recent call last):
  File "~/.asciidoc3/asciidoc3_new.py", line 5874, in asciidoc3
    document.translate(has_header) # Generate the output.                     ##_nr_: 6023 _
  File "~/.asciidoc3/asciidoc3_new.py", line 1894, in translate
    Section.translate()                                                       ##_nr_: 1666 _
  File "~/.asciidoc3/asciidoc3_new.py", line 2536, in translate
    Section.translate_body()                                                  ##_nr_: 2307 _
  File "~/.asciidoc3/asciidoc3_new.py", line 2544, in translate_body
    nxt.translate()                                                           ##_nr_: 2315 _
  File "~/.asciidoc3/asciidoc3_new.py", line 3312, in translate
    body = filter_lines(self.parameters.filter, body, self.attributes)        ##_nr_: 3083 _
  File "~/.asciidoc3/asciidoc3_new.py", line 762, in filter_lines
    filter_cmd = '"%s" %s' % (document.attributes['python'],                  ##_nr_: 808 _
  File "~/.asciidoc3/asciidoc3_new.py", line 139, in __getitem__
    return dict.__getitem__(self, key.lower())                                ##_nr_: 115 _
KeyError: 'python'

To have more verbose error messages I momentarily suppress "EAsciiDoc(Exception)". This changes only a little here - the two lines before the traceback -, but later on it helps to detect the exact problem in the source code.

ad3new, counter = re.subn(r"# Cleanup\.      .*?6047 _", r"raise # Cleanup.", ad3new, flags=re.S)
assert counter == 1

Looking at nr115 in asciidoc3_new.py:

    def __getitem__(self, key):                                 ##_nr_: 114 _
        return dict.__getitem__(self, key.lower())              ##_nr_: 115 _

I insert the missing key - some kind of brute force debugging again. That is to be rearranged later, we do not want such patches in the productive code:

    def __getitem__(self, key):                                 ##_nr_: 114 _
        document.attributes['python'] = '/usr/bin/python'
        return dict.__getitem__(self, key.lower())              ##_nr_: 115 _

and now I see:

...
asciidoc3_new: asciidoc-graphviz-sample.txt: line 38: evaluating: {set2:target:asciidoc-graphviz-sample__1.png}
asciidoc3_new: asciidoc-graphviz-sample.txt: line 39: filtering: "/usr/bin/python" "~/.asciidoc3/filters/graphviz/graphviz2png.py" -v -o "~/.asciidoc3/filters/graphviz/~/.asciidoc3/images/asciidoc-graphviz-sample__1.png" -L dot -F png -
Traceback (most recent call last):
  File "./asciidoc3_new.py", line 778, in filter_lines
    output = p.communicate(os.linesep.join(lines))[0]                       ##_nr_: 822 _
  File "/usr/lib/python3.5/subprocess.py", line 1072, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File "/usr/lib/python3.5/subprocess.py", line 1700, in _communicate
    input_view = memoryview(self._input)
TypeError: memoryview: a bytes-like object is required, not 'str'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./asciidoc3_new.py", line 6077, in <module>
    execute(sys.argv[0], opts, args)                                        ##_nr_: 6263 _
  File "./asciidoc3_new.py", line 6026, in execute
    asciidoc3(backend, doctype, confiles, infile, outfile, options)         ##_nr_: 6206 _
  File "./asciidoc3_new.py", line 5876, in asciidoc3
    document.translate(has_header) # Generate the output.                   ##_nr_: 6023 _
  File "./asciidoc3_new.py", line 1896, in translate
    Section.translate()                                                     ##_nr_: 1666 _
  File "./asciidoc3_new.py", line 2538, in translate
    Section.translate_body()                                                ##_nr_: 2307 _
  File "./asciidoc3_new.py", line 2546, in translate_body
    nxt.translate()                                                         ##_nr_: 2315 _
  File "./asciidoc3_new.py", line 3314, in translate
    body = filter_lines(self.parameters.filter, body, self.attributes)      ##_nr_: 3083 _
  File "./asciidoc3_new.py", line 780, in filter_lines
    raise EAsciiDoc('filter error: %s: %s' % (filter_cmd, sys.exc_info()[1]))      ##_nr_: 824 _
__main__.EAsciiDoc: filter error: "/usr/bin/python" "~/.asciidoc3/filters/graphviz/graphviz2png.py" -v -o "~/.asciidoc3/filters/graphviz/~/.asciidoc3/images/asciidoc-graphviz-sample__1.png" -L dot -F png -: memoryview: a bytes-like object is required, not 'str'
Traceback (most recent call last):
  File "~/.asciidoc3/filters/graphviz/graphviz2png.py", line 169, in <module>
    app.run()
  File "~/.asciidoc3/filters/graphviz/graphviz2png.py", line 151, in run
    open(infile, 'w').writelines(lines)
IOError: [Errno 2] No such file or directory: '~/.asciidoc3/filters/graphviz/~/.asciidoc3/images/asciidoc-graphviz-sample__1.txt'

The program runs one line further - till line 39, but we find a bundle of nasty errors.
First note the very last line: "IOError … ". We had it all before. Again we have to adjust the conf-file, here "graphviz-filter.conf". See the file after editing:

...
graphviz-style=template="graphviz{format?-{format}}-block",subs=(),posattrs=("style","target","layout","format"),filter='graphviz2png.py {verbose?-v} -o "{imagesdir=}{imagesdir?/}{target}" -L {layout=dot} -F {format=png} -'
-o "{outdir={indir}}/{imagesdir=}{imagesdir?/}{target}" -L {layout=dot} -F {format=png}

-o "{imagesdir=}{imagesdir?/}{target}" -L {layout=dot} -F {format=png}
...

Let’s try again:

~/.asciidoc3$ ./asciidoc3_new.py -v ~/.asciidoc3/filters/graphviz/asciidoc-graphviz-sample.txt
...
__main__.EAsciiDoc: filter error: "/usr/bin/python" "~/.asciidoc3/filters/...
Execute: dot -Tpng "~/.asciidoc3/images/asciidoc-graphviz-sample__1.txt" > "~/.asciidoc3/images/asciidoc-graphviz-sample__1.png"
~/.asciidoc3$ close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr

The IOError is gone, but what happens here exactly? "sys.excepthook is missing"? In most cases this is caused by a piping error. Do we have a pipe hanging around? Yes! The filter graphviz2png.py uses "subprocess.Popen". Keep this in mind, we come soon back to this point.
In the meantime we take care of another issue: graphviz2png.py is a Python2 program and executed by "/usr/bin/python" as seen in the listing of the last output. That is of course not intended. I’ll migrate graphviz2png.py to Python3:

2to3 -v -w -n -f all -f buffer -f set_literal -f idioms -f ws_comma graphviz2png.py

Comparing v2 vs v3 shows only marginal differences like (before/after):

if os.system(cmd): raise EApp, 'failed command: %s' % cmd

if os.system(cmd): raise EApp('failed command: %s' % cmd)

In addition I edit the shebang and coding lines and the copyright information.
Running again nothing seems changed, the identical errors show up. Let us eleminate the "document.attributes['python'] = '/usr/bin/python'" additive at first. A closer look on the definition and references of "document.attributes" reveals that I forgot to edit line 808:

...
# to make up leeway nr808
ad3new, counter = re.subn(r"document\.attributes\['python'],      ##_nr_: 808 _", \
                          r"document.attributes['python3'],       ##_nr_: 808 _", ad3new)
assert counter == 1

Start again:

...
asciidoc3_new: writing: ~/.asciidoc3/filters/graphviz/asciidoc-graphviz-sample.html
asciidoc3_new: asciidoc-graphviz-sample.txt: line 38: evaluating: {counter2:target-number}
asciidoc3_new: asciidoc-graphviz-sample.txt: line 38: evaluating: {set2:target:asciidoc-graphviz-sample__1.png}
asciidoc3_new: asciidoc-graphviz-sample.txt: line 39: filtering: "/usr/bin/python3" "~/.asciidoc3/filters/graphviz/graphviz2png.py" -v -o "~/.asciidoc3/images/asciidoc-graphviz-sample__1.png" -L dot -F png -
Traceback (most recent call last):
  File "asciidoc3_new.py", line 776, in filter_lines
    output = p.communicate(os.linesep.join(lines))[0]                              ##_nr_: 822 _
  File "/usr/lib/python3.5/subprocess.py", line 1072, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File "/usr/lib/python3.5/subprocess.py", line 1700, in _communicate
    input_view = memoryview(self._input)
TypeError: memoryview: a bytes-like object is required, not 'str'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "asciidoc3_new.py", line 6075, in <module>
    execute(sys.argv[0], opts, args)                                               ##_nr_: 6263 _
  File "asciidoc3_new.py", line 6024, in execute
    asciidoc3(backend, doctype, confiles, infile, outfile, options)                ##_nr_: 6206 _
  File "asciidoc3_new.py", line 5874, in asciidoc3
    document.translate(has_header) # Generate the output.                          ##_nr_: 6023 _
  File "asciidoc3_new.py", line 1894, in translate
    Section.translate()                                                            ##_nr_: 1666 _
  File "asciidoc3_new.py", line 2536, in translate
    Section.translate_body()                                                       ##_nr_: 2307 _
  File "asciidoc3_new.py", line 2544, in translate_body
    nxt.translate()                                                                ##_nr_: 2315 _
  File "asciidoc3_new.py", line 3312, in translate
    body = filter_lines(self.parameters.filter, body, self.attributes)             ##_nr_: 3083 _
  File "asciidoc3_new.py", line 778, in filter_lines
    raise EAsciiDoc('filter error: %s: %s' % (filter_cmd, sys.exc_info()[1]))      ##_nr_: 824 _
__main__.EAsciiDoc: filter error: "/usr/bin/python3" "~/.asciidoc3/filters/graphviz/graphviz2png.py" -v -o "~/.asciidoc3/images/asciidoc-graphviz-sample__1.png" -L dot -F png -: memoryview: a bytes-like object is required, not 'str'
~/.asciidoc3$ Traceback (most recent call last):
  File "~/.asciidoc3/filters/graphviz/graphviz2png.py", line 171, in <module>
    app = Application()
  File "~/.asciidoc3/filters/graphviz/graphviz2png.py", line 73, in __init__
    supported_formats = format_output.split(": ")[2][:-1].split(" ")
TypeError: a bytes-like object is required, not 'str'

<endless loop>

In line 39 filtering: "/usr/bin/python3" is ok, but we have this nasty "memoryview TypeError" again two times and the well known "bytes vs string" message. And, we have an endless loop at the end - irritating! But do you remember the hint a few lines before about the piping error? We are looking for "subprocess.popen()" and in deed we find the first in line822 of asciidoc3_new.py, just where exception is thrown:

subprocess.Popen in asciidoc3.py before enabling Python3
...
    try:                                                                             ##_nr_: 819 _
        p = subprocess.Popen(filter_cmd, shell=True,                                 ##_nr_: 820 _
                stdin=subprocess.PIPE, stdout=subprocess.PIPE)                       ##_nr_: 821 _
        output = p.communicate(os.linesep.join(lines))[0]                            ##_nr_: 822 _
    except Exception:                                                                ##_nr_: 823 _
        raise EAsciiDoc('filter error: %s: %s' % (filter_cmd, sys.exc_info()[1]))    ##_nr_: 824 _
    if output:                                                                       ##_nr_: 825 _
        result = [s.rstrip() for s in output.split(os.linesep)]                      ##_nr_: 826 _
    else:                                                                            ##_nr_: 827 _
        result = []                                                                  ##_nr_: 828 _
    filter_status = p.wait()                                                         ##_nr_: 829 _
...

To find the solution wasn’t that easy, the clue not obvious. After some try and error with the Popen constructor described in the Python3 docs the solution came along.

… 17.5.1.1. Frequently Used Arguments To support a wide variety of use cases, the Popen constructor (and the convenience functions) accept a large number of optional arguments. For most typical use cases, many of these arguments can be safely left at their default values. The arguments that are most commonly needed are: …
If encoding or errors are specified, or universal_newlines is true, the file objects stdin, stdout and stderr will be opened in text mode using the encoding and errors specified in the call or the defaults for io.TextIOWrapper. …
New in version 3.6: Added encoding and errors parameters.
Note
The newlines attribute of the file objects Popen.stdin, Popen.stdout and Popen.stderr are not updated by the Popen.communicate() method.

Changed in version 3.3: When universal_newlines is True, the class uses the encoding locale.getpreferredencoding(False) instead of locale.getpreferredencoding(). See the io.TextIOWrapper class for more information on this change. …

17.5.1.2. Popen Constructor
… bufsize will be supplied as the corresponding argument to the open() function when creating the stdin/stdout/stderr pipe file objects:
0 means unbuffered (read and write are one system call and can return short)
1 means line buffered (only usable if universal_newlines=True i.e., in a text mode)
any other positive value means use a buffer of approximately that size negative bufsize (the default) means the system default of io.DEFAULT_BUFFER_SIZE will be used.
Changed in version 3.3.1: bufsize now defaults to -1 to enable buffering by default to match the behavior that most code expects. In versions prior to Python 3.2.4 and 3.3.1 it incorrectly defaulted to 0 which was unbuffered and allowed short reads. This was unintentional and did not match the behavior of Python 2 as most code expected. …

Popen.wait(timeout=None)
Wait for child process to terminate. Set and return returncode attribute. …

Note
This will deadlock when using stdout=PIPE or stderr=PIPE and the child process generates enough output to a pipe such that it blocks waiting for the OS pipe buffer to accept more data. Use Popen.communicate() when using pipes to avoid that. …

— https://docs.python.org/3/library/subprocess.html

What can we learn from the information given here?
i) To open stdin, stdout and stderr in text mode - and that is what we want -, we have to add "universal_newlines=True". "Encoding" and "errors" would be smarter, but this requires Python 3.6.
ii) Due to the option added in i) - "universal_newlines=True" - encoding locale.getpreferredencoding(False) is used instead of locale.getpreferredencoding() depending on the version of Python.
iii) It is a good idea to add option "bufsize = -1". This improves the compatibility.
iv) It looks more save to use p.returncode instead of p.wait to avoid the danger of a deadlock. Perhaps this is not necessary because of the implicit waiting for the result from "popen.communicate()".
We edit asciidoc3_new.py as described:

subprocess.Popen in asciidoc3.py enabling Python3
...
# adjust subprocess.Popen
ad3new, counter = re.subn(r'try:\s+##_nr_: 819 _.*?830 _', \
                          r"""try:
        p = subprocess.Popen(filter_cmd, shell=True, stdin=subprocess.PIPE,
        stdout=subprocess.PIPE, universal_newlines=True, bufsize=-1)                  ##_nr_: 821 _
        output = p.communicate(os.linesep.join(lines))[0]                             ##_nr_: 822 _
    except Exception:                                                                 ##_nr_: 823 _
        raise EAsciiDoc('filter error: %s: %s' % (filter_cmd, sys.exc_info()[1]))     ##_nr_: 824 _
    if output:                                                                        ##_nr_: 825 _
        result = [s.rstrip() for s in output.split(os.linesep)]                       ##_nr_: 826 _
    else:                                                                             ##_nr_: 827 _
        result = []                                                                   ##_nr_: 828 _
    filter_status = p.returncode                                                      ##_nr_: 829 _
    if filter_status:                                                                 ##_nr_: 830 _""", ad3new, flags=re.S)
assert counter == 1
...

But doing so we have another issue:

...
asciidoc3_new: asciidoc-graphviz-sample.txt: line 38: evaluating: {set2:target:asciidoc-graphviz-sample__1.png}
asciidoc3_new: asciidoc-graphviz-sample.txt: line 39: filtering: "/usr/bin/python3" "~/.asciidoc3/filters/graphviz/graphviz2png.py" -v -o "~/.asciidoc3/images/asciidoc-graphviz-sample__1.png" -L dot -F png -
Traceback (most recent call last):
  File "~/.asciidoc3/filters/graphviz/graphviz2png.py", line 171, in <module>
    app = Application()
  File "~/.asciidoc3/filters/graphviz/graphviz2png.py", line 73, in __init__
    supported_formats = format_output.split(": ")[2][:-1].split(" ")
TypeError: a bytes-like object is required, not 'str'
asciidoc3_new: WARNING: asciidoc-graphviz-sample.txt: line 39: filter non-zero exit code: "/usr/bin/python3" "~/.asciidoc3/filters/graphviz/graphviz2png.py" -v -o "~/.asciidoc3/images/asciidoc-graphviz-sample__1.png" -L dot -F png -: returned 1
asciidoc3_new: WARNING: asciidoc-graphviz-sample.txt: line 39: no output from filter: "/usr/bin/python3" "~/.asciidoc3/filters/graphviz/graphviz2png.py" -v -o "~/.asciidoc3/images/asciidoc-graphviz-sample__1.png" -L dot -F png -
asciidoc3_new: asciidoc-graphviz-sample.txt: line 76: filtering: "/usr/bin/python3" "~/.asciidoc3/filters/graphviz/graphviz2png.py" -v -o "~/.asciidoc3/images/sample2.png" -L dot -F png -
Traceback (most recent call last):
  File "~/.asciidoc3/filters/graphviz/graphviz2png.py", line 171, in <module>
    app = Application()
  File "~/.asciidoc3/filters/graphviz/graphviz2png.py", line 73, in __init__
    supported_formats = format_output.split(": ")[2][:-1].split(" ")
TypeError: a bytes-like object is required, not 'str'
asciidoc3_new: WARNING: asciidoc-graphviz-sample.txt: line 76: filter non-zero exit code: "/usr/bin/python3" "~/.asciidoc3/filters/graphviz/graphviz2png.py" -v -o "~/.asciidoc3/images/sample2.png" -L dot -F png -: returned 1
asciidoc3_new: WARNING: asciidoc-graphviz-sample.txt: line 76: no output from filter: "/usr/bin/python3" "~/.asciidoc3/filters/graphviz/graphviz2png.py" -v -o "~/.asciidoc3/images/sample2.png" -L dot -F png -
asciidoc3_new: asciidoc-graphviz-sample.txt: line 128: filtering: "/usr/bin/python3" "~/.asciidoc3/filters/graphviz/graphviz2png.py" -v -o "~/.asciidoc3/images/sample3.png" -L dot -F png -
Traceback (most recent call last):
  File "~/.asciidoc3/filters/graphviz/graphviz2png.py", line 171, in <module>
    app = Application()
  File "~/.asciidoc3/filters/graphviz/graphviz2png.py", line 73, in __init__
    supported_formats = format_output.split(": ")[2][:-1].split(" ")
TypeError: a bytes-like object is required, not 'str'
asciidoc3_new: WARNING: asciidoc-graphviz-sample.txt: line 128: filter non-zero exit code: "/usr/bin/python3" "~/.asciidoc3/filters/graphviz/graphviz2png.py" -v -o "~/.asciidoc3/images/sample3.png" -L dot -F png -: returned 1
asciidoc3_new: WARNING: asciidoc-graphviz-sample.txt: line 128: no output from filter: "/usr/bin/python3" "~/.asciidoc3/filters/graphviz/graphviz2png.py" -v -o "~/.asciidoc3/images/sample3.png" -L dot -F png -
...

Oh no! Have we made a step forward or not? Yes, we do - the first exception (after "line 39: filtering …") is now thrown by graphviz2png.py and not by asciidoc3_new.py! And if we reckon about that one moment, it finally sunk in: we have to edit "graphviz2png.py" in the same way as asciidoc3_new.py. Now we have this:

subprocess.Popen in graphviz2png.py enabling Python3
...
format_output = subprocess.Popen(["dot", "-T?"], stderr=subprocess.PIPE, stdout=subprocess.PIPE,
                                         bufsize=-1, universal_newlines=True).communicate()[1]
...

And then, good news again, it works! We have successfully configured our first filtering … The final step is to rename the files contenting "asciidoc" to "asciidoc3" and the text, too. This will later cause some extra work, when we implement automatic testing …

empty

5.2.2. Filter "latex"

According to "latex-filter.txt" in directory asciidoc2/doc we check if we have installed "latex" and "dvipng":

~/.asciidoc3$ dvipng
This is dvipng 1.15 Copyright 2002-2015 Jan-Ake Larsson

Usage: dvipng [OPTION]... FILENAME[.dvi]
Options are chosen to be similar to dvips' options where possible:
  -d #         Debug (# is the debug bitmap, 1 if not given)
  -D #         Output resolution
...

~/.asciidoc3$ latex -v
pdfTeX 3.14159265-2.6-1.40.16 (TeX Live 2015/Debian)
kpathsea version 6.2.1
Copyright 2015 Peter Breitenlohner (eTeX)/Han The Thanh (pdfTeX).
There is NO warranty.  Redistribution of this software is
covered by the terms of both the pdfTeX copyright and
...

That’s ok - I copy "latex-filter.txt" in the directory where asciidoc.py resides and try:

~/asciidoc2/doc$ python2 ~/asciidoc2/asciidoc.py --verbose -n -a icons -o ~/.asciidoc3/outputfiles/outputlatex2.html latex-filter.txt

asciidoc: reading: ~/asciidoc2/asciidoc.conf
asciidoc: reading: ~/asciidoc2/doc/asciidoc.conf
asciidoc: reading: ~/asciidoc2/doc/latex-filter.txt
asciidoc: reading: ~/asciidoc2/xhtml11.conf
asciidoc: include1: ~/asciidoc2/stylesheets/asciidoc.css
asciidoc: include1: ~/asciidoc2/javascripts/asciidoc.js
asciidoc: reading: ~/asciidoc2/filters/music/music-filter.conf
asciidoc: reading: ~/asciidoc2/filters/latex/latex-filter.conf
asciidoc: reading: ~/asciidoc2/filters/code/code-filter.conf
asciidoc: reading: ~/asciidoc2/filters/source/source-highlight-filter.conf
asciidoc: reading: ~/asciidoc2/filters/graphviz/graphviz-filter.conf
asciidoc: reading: ~/asciidoc2/lang-en.conf
asciidoc: reading: ~/asciidoc2/doc/asciidoc.conf
asciidoc: writing: ~/.asciidoc3/outputfiles/outputlatex2.html
asciidoc: latex-filter.txt: line 27: evaluating: {counter2:target-number}
asciidoc: latex-filter.txt: line 27: evaluating: {set2:target:latex-filter__1.png}
asciidoc: latex-filter.txt: line 27: filtering: "/usr/bin/python2" "~/asciidoc2/filters/latex/latex2png.py" -m -v -o "~/.asciidoc3/outputfiles/~/.asciidoc3/images/latex-filter__1.png" -
latex2png.py: directory does not exist: ~/.asciidoc3/outputfiles~/.asciidoc3/images
asciidoc: WARNING: latex-filter.txt: line 27: filter non-zero exit code: "/usr/bin/python2" "~/asciidoc2/filters/latex/latex2png.py" -m -v -o "~/.asciidoc3/outputfiles/~/.asciidoc3/images/latex-filter__1.png" -: returned 1
...

We have seen that output messages before and edit the "latex-filter.conf" to see this

...
latex-style=template="latex-block",subs=(),posattrs=("style","target","dpi"),filter='latex2png.py -m{verbose? -v}{dpi? -D {dpi}} -o "{imagesdir=}{imagesdir?/}{target}" -'
...

Running again, it works fine:

~/asciidoc2/doc$ python2 ~/asciidoc2/asciidoc.py --verbose -n -a icons -o ~/.asciidoc3/outputfiles/outputlatex2.html latex-filter.txt
asciidoc: reading: ~/asciidoc2/asciidoc.conf
asciidoc: reading: ~/asciidoc2/doc/asciidoc.conf
asciidoc: reading: ~/asciidoc2/doc/latex-filter.txt
asciidoc: reading: ~/asciidoc2/xhtml11.conf
...
asciidoc: latex-filter.txt: line 27: filtering: "/usr/bin/python2" "~/asciidoc2/filters/latex/latex2png.py" -m -v -o "~/.asciidoc3/images/latex-filter__1.png" -
tex:
\documentclass{article}
\usepackage{amsmath}
\usepackage{amsthm}
\usepackage{amssymb}
...
<a lot of latex messages>
...
executing: dvipng -T tight -x 1000 -z 9 -bg Transparent --truecolor -o "~/.asciidoc3/images/latex-filter__1.png" "~/.asciidoc3/images/tmpKAKsmF.dvi"  1>&2
This is dvipng 1.15 Copyright 2002-2015 Jan-Ake Larsson
...
deleting: ~/.asciidoc3/images/tmpNVgBA7.log
writing: ~/.asciidoc3/images/latex-filter__3.md5
asciidoc: latex-filter.txt: line 135: evaluating: {counter:figure-number}

As expected the output shows up in "~/.asciidoc3/outputfiles/outputlatex2.html", the pics are found here: "~/.asciidoc3/images/latex-filter__1.png" and so on.

The latex-filter-sample

latex_ad2

empty

Two annotations about that: if dvipng and/or latex aren’t installed, you get a message similar to this

dvipng: not found
...
sh: 1: latex: not found

And second, a little bit confusing: if you run "latex-filter.txt" one more time after a successfully run, "nothing" happens:

...
skipped: no change: ~/.asciidoc3/images/latex-filter__1.png
...

That means, the file exists already, and the filter has nothing to do this time.

empty

All right. I try the same as before to get asciidoc3_new.py working on this.

~/.asciidoc3$ python3 ./asciidoc3_new.py -v -o ~/.asciidoc3/outputfiles/latex-filter.html ./inputfiles/latex-filter.txt
...
asciidoc3_new: latex-filter.txt: line 2: Entering "update/fix output-encoding" with input-file: ~/.asciidoc3/filters/latex/latex-filter.conf
...
asciidoc3_new: writing: ~/.asciidoc3/inputfiles/latex-filter.html
asciidoc3_new: latex-filter.txt: line 27: evaluating: {counter2:target-number}
asciidoc3_new: latex-filter.txt: line 27: evaluating: {set2:target:latex-filter__1.png}
asciidoc3_new: latex-filter.txt: line 27: filtering: "/usr/bin/python3" "~/.asciidoc3/filters/latex/latex2png.py" -m -v -o "~/.asciidoc3/inputfiles/~/.asciidoc3/images/latex-filter__1.png" -
  File "~/.asciidoc3/filters/latex/latex2png.py", line 115
    raise EApp, 'failed command: %s' % cmd
              ^
SyntaxError: invalid syntax
asciidoc3_new: WARNING: latex-filter.txt: line 27: filter non-zero exit code: "/usr/bin/python3" "~/.asciidoc3/filters/latex/latex2png.py" -m -v -o "~/.asciidoc3/inputfiles/~/.asciidoc3/images/latex-filter__1.png" -: returned 1
...

Yes, that’s the expected result - I have to 2to3 on "latex2png.py" first:

sudo 2to3 -v -w -n -f all -f buffer -f set_literal -f idioms -f ws_comma latex2png.py

All right, the expected changes … and again:

~/.asciidoc3$ python3 ./asciidoc3_new.py -v -o ~/.asciidoc3/outputfiles/latex-filter.html ./inputfiles/latex-filter.txt
...
asciidoc3_new: writing: ~/.asciidoc3/outputfiles/latex-filter.html
...
asciidoc3_new: latex-filter.txt: line 27: filtering: "/usr/bin/python3" "~/.asciidoc3/filters/latex/latex2png.py" -m -v -o "~/.asciidoc3/outputfiles/~/.asciidoc3/images/latex-filter__1.png" -
Traceback (most recent call last):
  File "~/.asciidoc3/filters/latex/latex2png.py", line 63, in <module>
    import os, sys, tempfile, md5
ImportError: No module named 'md5'
asciidoc3_new: WARNING: latex-filter.txt: line 27: filter non-zero exit code: "/usr/bin/python3" "~/.asciidoc3/filters/latex/latex2png.py" -m -v -o "~/.asciidoc3/outputfiles/~/.asciidoc3/images/latex-filter__1.png" -: returned 1

First we to adjust latex-filter.conf exactly like above …

$ python3 ./asciidoc3_new.py -v -o ~/.asciidoc3/outputfiles/latex-filter.html ./inputfiles/latex-filter.txt
...
asciidoc3_new: latex-filter.txt: line 27: evaluating: {set2:target:latex-filter__1.png}
asciidoc3_new: latex-filter.txt: line 27: filtering: "/usr/bin/python3" "~/.asciidoc3/filters/latex/latex2png.py" -m -v -o "~/.asciidoc3/images/latex-filter__1.png" -
Traceback (most recent call last):
  File "~/.asciidoc3/filters/latex/latex2png.py", line 63, in <module>
    import os, sys, tempfile, md5
ImportError: No module named 'md5'
asciidoc3_new: WARNING: latex-filter.txt: line 27: filter non-zero exit code: "/usr/bin/python3" "~/.asciidoc3/filters/latex/latex2png.py" -m -v -o "~/.asciidoc3/images/latex-filter__1.png" -: returned 1
...

... the link to the imgage-dir is correct, but there is "No module named 'md5'" in Python3 as requested in latex2png.py: "import os, sys, tempfile, md5". In deed, md5 is now part of hashlib, so we change this line to "import os, sys, tempfile, hashlib" (and may delete or comment out the three preceding lines "deprecation warning"). And of course we have to edit the only call of this module: "checksum = md5.new(tex).digest()". See the changes

Edit latex2png.py (i)
import os, sys, tempfile, md5
            vs.
import os, sys, tempfile, hashlib

checksum = md5.new(tex).digest()
            vs.
checksum = hashlib.new('md5', tex).digest()

Starting again:

python3 ./asciidoc3_new.py -v -o ~/.asciidoc3/outputfiles/latex-filter.html ./inputfiles/latex-filter.txt
...
asciidoc3_new: writing: ~/.asciidoc3/outputfiles/latex-filter.html
asciidoc3_new: latex-filter.txt: line 27: evaluating: {counter2:target-number}
asciidoc3_new: latex-filter.txt: line 27: evaluating: {set2:target:latex-filter__1.png}
asciidoc3_new: latex-filter.txt: line 27: filtering: "/usr/bin/python3" "~/.asciidoc3/filters/latex/latex2png.py" -m -v -o "~/.asciidoc3/images/latex-filter__1.png" -
latex2png.py: Unicode-objects must be encoded before hashing
...

Yes, I forgot, hashing only makes sense with a byte-like object. We add an errors="ignore" to be in a very save harbor and avoid any exception.

Edit latex2png.py (ii)
checksum = hashlib.new('md5', tex).digest()
            vs.
checksum = hashlib.new('md5', bytes(tex, encoding='utf-8', errors='ignore')).digest()"

Doing so, let’s run the test once more.

python3 ./asciidoc3_new.py -v -o ~/.asciidoc3/outputfiles/latex-filter.html ./inputfiles/latex-filter.txt
...
asciidoc3_new: latex-filter.txt: line 27: filtering: "/usr/bin/python3" "~/.asciidoc3/filters/latex/latex2png.py" -m -v -o "~/.asciidoc3/images/latex-filter__1.png" -
tex:
\documentclass{article}
\usepackage{amsmath}
...
\pagestyle{empty}
\begin{document}
$y = \int_0^\infty \gamma^2 \cos(x) dx$
\end{document}

executing: latex ~/.asciidoc3/images/tmpoibpbni4.tex 1>&2
This is pdfTeX, Version 3.14159265-2.6-1.40.16 (TeX Live 2015/Debian) (preloaded format=latex)
 restricted \write18 enabled.
entering extended mode
(~/.asciidoc3/images/tmpoibpbni4.tex
LaTeX2e <2016/02/01>
...
Output written on tmpwghvugw3.dvi (1 page, 480 bytes).
Transcript written on tmpwghvugw3.log.
executing: dvipng -T tight -x 1000 -z 9 -bg Transparent --truecolor -o "~/.asciidoc3/images/latex-filter__3.png" "~/.asciidoc3/images/tmpwghvugw3.dvi"  1>&2
...
writing: ~/.asciidoc3/images/latex-filter__3.md5
asciidoc3_new: latex-filter.txt: line 135: evaluating: {counter:figure-number}
...

It works!

empty

5.2.3. Filter "music"

The same procedure, let’s begin with AsciiDoc2:
According to "music-filter.txt" in asciidoc2/doc we check if we have installed imagemagick. There is no "imagemagick"-command zto use on the commandline, this package consists of ten or so standalone-commands like "animate", "identify", or "stream".

~/asciidoc2$ sudo apt-get install imagemagick
...
~/asciidoc2$ whereis lilypond
lilypond: /usr/bin/lilypond /usr/share/lilypond /usr/share/man/man1/lilypond.1.gz /usr/share/info/lilypond
...

All right, let’s start once more:

~/asciidoc2$ python2 ~/asciidoc2/asciidoc.py --verbose -n -a icons -o ~/.asciidoc3/outputfiles/musictest2.html ~/asciidoc2/filters/music/music-filter-test.txt
asciidoc: reading: ~/asciidoc2/asciidoc.conf
asciidoc: reading: ~/asciidoc2/filters/music/music-filter-test.txt
...
asciidoc: writing: ~/.asciidoc3/outputfiles/musictest2.html
asciidoc: music-filter-test.txt: line 22: filtering: "/usr/bin/python2" "~/asciidoc2/filters/music/music2png.py" -m -v -o "~/.asciidoc3/outputfiles/~/.asciidoc3/images/music1.png" -
music2png.py: directory does not exist: ~/.asciidoc3/outputfiles~/.asciidoc3/images
...

It’s the same again, adjusting "music-filter.conf" brings us to

~/asciidoc2$ python2 ~/asciidoc2/asciidoc.py --verbose -n -a icons -o ~/.asciidoc3/outputfiles/musictest2.html ~/asciidoc2/filters/music/music-filter-test.txt
asciidoc: reading: ~/asciidoc2/asciidoc.conf
asciidoc: reading: ~/asciidoc2/filters/music/music-filter-test.txt
...
asciidoc: writing: ~/.asciidoc3/outputfiles/musictest2.html
asciidoc: music-filter-test.txt: line 22: filtering: "/usr/bin/python2" "~/asciidoc2/filters/music/music2png.py" -m -v -o "~/.asciidoc3/images/music1.png" -
executing: abc2ly -o "~/.asciidoc3/images/tmpfE04Fu.ly" "~/.asciidoc3/images/tmpfE04Fu.abc"
/usr/bin/abc2ly from LilyPond 2.18.2
Parsing `~/.asciidoc3/images/tmpfE04Fu.abc'...
Line ... Warning: inserting repeat to beginning of notes.
lilypond output to: `~/.asciidoc3/images/tmpfE04Fu.ly'...
executing: lilypond --png -o "~/.asciidoc3/images/tmpfE04Fu" "~/.asciidoc3/images/tmpfE04Fu.ly"
»~/.asciidoc3/images/tmpfE04Fu.ly« wird verarbeitet                            1
Analysieren...
...
Konvertierung nach PNG...
...
executing: convert "~/.asciidoc3/images/music2.png" -strip -gravity South -chop 0x75 -trim "~/.asciidoc3/images/music2.png"
deleting: ~/.asciidoc3/images/tmpFdTEyU.ly
1 Some messages in German, so you may guess what my mother-tongue is?

It works! The generated file resides in ~/.asciidoc3/outputfiles, the pics are found in ~/.asciidoc3/images.

The music-filter-sample

musictest

empty

The same for v3, but a little bit faster! I omit the listing, we adjust "music-filter.conf" in the very identical way as "latex-filter.conf". Following:

python3 ~/.asciidoc3/asciidoc3_new.py --verbose -n -a icons -o ~/.asciidoc3/outputfiles/musictest3.html ~/.asciidoc3/filters/music/music-filter-test.txt
...
asciidoc3_new: music-filter-test.txt: line 22: filtering: "/usr/bin/python3" "~/.asciidoc3/filters/music/music2png.py" -m -v -o "~/.asciidoc3/images/music1.png" -
  File "~/.asciidoc3/filters/music/music2png.py", line 93
    raise EApp, 'failed command: %s' % cmd
              ^
SyntaxError: invalid syntax

Going on:

sudo 2to3 -v -w -n -f all -f buffer -f set_literal -f idioms -f ws_comma music2png.py

No surprise at all, the expected changes again … You guessed it: "ImportError: No module named 'md5'" We change the lines as follows v2 to v3:

checksum = md5.new(source).digest()

checksum = hashlib.new('md5', bytes(source, encoding='utf-8', errors='ignore')).digest()

Don’t forget to delete the files generated by asciidoc2.py to see what happens now: it works!

empty

5.2.4. Filter "source"

This filter is different from the others above: we have to mark the elected "HTML source code highlighter" in asciidoc(3).conf. You may choose "source-highlight", "pygments", or "highlight". Be sure you have all of them installed, try "$ whereis source-highlight" and get the answer "source-highlight: /usr/bin/source-highlight /usr/share/source-highlight /usr/share/man/man1/source-highlight.1.gz /usr/share/info/source-highlight.info.gz"

Option I "source-highlight"

We begin with "source-highlight", the lines both in asciidoc.conf and asciidoc3.conf are:

...
# HTML source code highlighter (source-highlight, pygments or highlight)
source-highlighter=source-highlight
...

We start with asciidoc2.py:

$ python2 ~/asciidoc2/asciidoc.py --verbose -n -a icons -o ~/.asciidoc3/outputfiles/sourcehighlightfiltertest2.html ~/asciidoc2/filters/source/source-highlight-filter-test.txt
asciidoc: reading: ~/asciidoc2/asciidoc.conf
asciidoc: reading: ~/asciidoc2/filters/source/source-highlight-filter-test.txt
...
asciidoc: reading: ~/asciidoc2/filters/source/source-highlight-filter.conf
asciidoc: source-highlight-filter-test.txt: line 2: ifeval: "source-highlight"=="source-highlight": True
asciidoc: source-highlight-filter-test.txt: line 2: ifeval: "source-highlight"=="highlight": False
asciidoc: source-highlight-filter-test.txt: line 2: ifeval: "source-highlight"=="pygments": False
...
asciidoc: writing: ~/.asciidoc3/outputfiles/sourcehighlightfiltertest2.html
asciidoc: source-highlight-filter-test.txt: line 17: filtering: source-highlight -f xhtml -s python

Hey, it works right out of the box! Would it with asciidoc3_new.py?

$ python3 ~/.asciidoc3/asciidoc3_new.py --verbose -n -a icons -o ~/.asciidoc3/outputfiles/sourcehighlightfiltertest3.html ~/.asciidoc3/filters/source/source-highlight-filter-test.txt
asciidoc3_new: reading: ~/.asciidoc3/asciidoc3.conf
asciidoc3_new: reading: ~/.asciidoc3/filters/source/source-highlight-filter-test.txt
...
asciidoc3_new: source-highlight-filter-test.txt: line 2: ifeval: "source-highlight"=="pygments": False
asciidoc3_new: reading: ~/.asciidoc3/filters/source/source-highlight-filter.conf
asciidoc3_new: source-highlight-filter-test.txt: line 2: ifeval: "source-highlight"=="source-highlight": True
asciidoc3_new: source-highlight-filter-test.txt: line 2: ifeval: "source-highlight"=="highlight": False
asciidoc3_new: source-highlight-filter-test.txt: line 2: ifeval: "source-highlight"=="pygments": False
...
asciidoc3_new: writing: ~/.asciidoc3/outputfiles/sourcehighlightfiltertest3.html
asciidoc3_new: source-highlight-filter-test.txt: line 17: filtering: source-highlight -f xhtml -s python

Yes, it does. But not 100%: the two output-files are not binary identical. First we have different "meta content-type" and a different "last updated" in the footer. We care about these things in a few moments, see (target is TODO) here.

Option II "highlight"

Next is filter "highlight", the lines both in asciidoc.conf and asciidoc3.conf are:

...
# HTML source code highlighter (source-highlight, pygments or highlight)
#source-highlighter=source-highlight
source-highlighter=highlight
...

We start with asciidoc2.py:

$ python2 ~/asciidoc2/asciidoc.py --verbose -n -a icons -o ~/.asciidoc3/outputfiles/highlightfiltertest2.html ~/asciidoc2/filters/source/source-highlight-filter-test.txt
...
asciidoc: writing: ~/.asciidoc3/outputfiles/highlightfiltertest2.html
asciidoc: source-highlight-filter-test.txt: line 17: filtering: highlight --no-doc --inline-css --out-format=xhtml --syntax=py   --encoding=utf-8
/bin/sh: 1: highlight: not found
asciidoc: WARNING: source-highlight-filter-test.txt: line 17: filter non-zero exit code: highlight --no-doc --inline-css --out-format=xhtml --syntax=py   --encoding=utf-8: returned 127
asciidoc: WARNING: source-highlight-filter-test.txt: line 17: no output from filter: highlight --no-doc --inline-css --out-format=xhtml --syntax=py   --encoding=utf-8

Yep, we forgot to install "highlight" (sudo apt install highlight), and then - it works (output omitted).
Let’s go to v3:

python3 ~/.asciidoc3/asciidoc3_new.py --verbose -n -a icons -o ~/.asciidoc3/outputfiles/highlightfiltertest3.html ~/.asciidoc3/filters/source/source-highlight-filter-test.txt
asciidoc3_new: reading: ~/.asciidoc3/asciidoc3.conf
asciidoc3_new: ...
...
asciidoc3_new: source-highlight-filter-test.txt: line 2: ifeval: "highlight"=="source-highlight": False
asciidoc3_new: source-highlight-filter-test.txt: line 2: ifeval: "highlight"=="highlight": True
asciidoc3_new: source-highlight-filter-test.txt: line 2: ifeval: "highlight"=="pygments": False
...
asciidoc3_new: writing: ~/.asciidoc3/outputfiles/highlightfiltertest3.html
asciidoc3_new: source-highlight-filter-test.txt: line 17: filtering: highlight --no-doc --inline-css --out-format=xhtml --syntax=py   --encoding=utf-8

Everything’s ok … again we have a different "last updated" in the footer.

Option III "pygmentize"

The third and last option is "pygmentize". When installed (try "$ whereis pygmentize") and the conf v2 and v3 is set to "source-highlighter=pygments", we start with v2:

python2 ~/asciidoc2/asciidoc.py --verbose -n -a icons -o ~/.asciidoc3/outputfiles/pygmentsfiltertest2.html ~/asciidoc2/filters/source/source-highlight-filter-test.txt
...
asciidoc: reading: ~/asciidoc2/filters/source/source-highlight-filter.conf
asciidoc: source-highlight-filter-test.txt: line 2: ifeval: "pygments"=="source-highlight": False
asciidoc: source-highlight-filter-test.txt: line 2: ifeval: "pygments"=="highlight": False
asciidoc: source-highlight-filter-test.txt: line 2: ifeval: "pygments"=="pygments": True
...
asciidoc: writing: ~/.asciidoc3/outputfiles/pygmentsfiltertest2.html
asciidoc: source-highlight-filter-test.txt: line 17: filtering: pygmentize -f html -l python  -O encoding=utf-8
Pygments source code highlight sample

pygments

empty

It works. Only one step to have all the filter-work done?

python3 ~/.asciidoc3/asciidoc3_new.py --verbose -n -a icons -o ~/.asciidoc3/outputfiles/pygmentsfiltertest3.html ~/.asciidoc3/filters/source/source-highlight-filter-test.txt
...
asciidoc3_new: writing: ~/.asciidoc3/outputfiles/pygmentsfiltertest3.html
asciidoc3_new: source-highlight-filter-test.txt: line 17: filtering: pygmentize -f html -l python  -O encoding=utf-8

Yes. When you see an one-colored output only, you may have to copy "pygments.css" to the "stylesheet"-directory …
Pygmentize requires Python 3.3 or higher, we have to insert an appropriate if-clause with a hint.

... What are the system requirements? Pygments only needs a standard Python install, version 2.6 or higher or version 3.3 or higher for Python 3. No additional libraries are needed. …

— http://pygments.org/faq/

I insert five lines:

Pygments requires Python3.3
...
        filter_cmd = re.sub(r'"([^ ]+?)"', r'\1', filter_cmd)                              ##_nr_: 818 _
    if "pygmentize" in filter_cmd.strip().lower():
        if float(sys.version[:3]) < 3.3:
            message.stderr('FAILED: filter pygments requires Python 3.3 and higher')
            message.stderr("Use 'source-highlight' or 'highlight' instead - see manual chapter filter!")
            sys.exit(1)
...

When pygments is "on" in asciidoc3.conf, the css-file stylesheet/pygments.css is included. If you don’t make use of pygmnets (or any other highlighter) it may be smarter to deactivate all of them in the asciidoc(3).conf.

5.2.5. Filter "code"

To say it in the words of the manual: the filter "code-filter.py" - a Python2 program - serves as a demo for those who want to write a filter of their own. My first attempt ends like this…

python2 ~/asciidoc2/asciidoc.py -v ~/asciidoc2/filters/code/code-filter-test.txt
asciidoc: reading: ~/asciidoc2/asciidoc.conf
asciidoc: reading: ~/asciidoc2/filters/code/code-filter-test.txt
...
asciidoc: reading: ~/asciidoc2/lang-en.conf
asciidoc: writing: ~/asciidoc2/filters/code/code-filter-test.html
asciidoc: WARNING: code-filter-test.txt: line 4: missing style: [paradef-default]: python

Let’s take a look in "code-filter-test.txt". It is obviously damaged, note the two lines "code~~~~~~~~" at the beginning and at the end:

Code Filter Test
================

[python]
code~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
''' A multi-line
    comment.'''
def sub_word(mo):
        ''' Single line comment.'''
        word = mo.group('word') # Inline comment
        if word in keywords[language]:
                return quote + word + quote
        else:
                return word
code~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We edit:

Code Filter Test
================

[code, python]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
''' A multi-line
    comment.'''
def sub_word(mo):
        ''' Single line comment.'''
        word = mo.group('word') # Inline comment
        if word in keywords[language]:
                return quote + word + quote
        else:
                return word
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

but the identical error is thrown. And now we see: we have to change the block-delimiter … dashes, not tildes! Because the "code-filter.conf" says: "code-style=template=listingblock" and a listing-block is indicated with dashes …

 Code Filter Test
 ================

 [code, python]
 -------------------------------
 ''' A multi-line
     comment.'''
 def sub_word(mo):
         ''' Single line comment.'''
         word = mo.group('word') # Inline comment
         if word in keywords[language]:
                 return quote + word + quote
         else:
                 return word
 -------------------------------

and now it works so far!

We have to copy this corrected file to ~/.asciidoc3/filters/code and start with v3:

~/.asciidoc3$ python3 ~/.asciidoc3/asciidoc3_new.py -v ~/.asciidoc3/filters/code/code-filter-test.txt
asciidoc3_new: reading: ~/.asciidoc3/asciidoc3.conf
...
asciidoc3_new: writing: ~/.asciidoc3/filters/code/code-filter-test.html
asciidoc3_new: code-filter-test.txt: line 14: filtering: "/usr/bin/python3" "~/.asciidoc3/filters/code/code-filter.py" -b html -l python
  File "~/.asciidoc3/filters/code/code-filter.py", line 196
    print __doc__
                ^
SyntaxError: Missing parentheses in call to 'print'
asciidoc3_new: WARNING: code-filter-test.txt: line 14: filter non-zero exit code: "/usr/bin/python3" "~/.asciidoc3/filters/code/code-filter.py" -b html -l python: returned 1
asciidoc3_new: WARNING: code-filter-test.txt: line 14: no output from filter: "/usr/bin/python3" "~/.asciidoc3/filters/code/code-filter.py" -b html -l python

Yep, 2to3 on code-filter.py:

$ cd ./filters/code
$ sudo 2to3 -v -w -n -f all -f buffer -f set_literal -f idioms -f ws_comma code-filter.py

and again:

$ python3 ~/.asciidoc3/asciidoc3_new.py -v ~/.asciidoc3/filters/code/code-filter-test.txt
...
asciidoc3_new: writing: ~/.asciidoc3/filters/code/code-filter-test.html
asciidoc3_new: code-filter-test.txt: line 14: filtering: "/usr/bin/python3" "~/.asciidoc3/filters/code/code-filter.py" -b html -l python
code-filter.py: unexpected exit status: module 'string' has no attribute 'lower'
asciidoc3_new: WARNING: code-filter-test.txt: line 14: no output from filter: "/usr/bin/python3" "~/.asciidoc3/filters/code/code-filter.py" -b html -l python

What’s this - we have to take a look on the source code. There we see some items that are no longer necessary and/or useful. We can simplify the code in many matter. I give the changes in short, showing the particular line before/after v2/v3:

#!/usr/bin/env python
         vs.
#!/usr/bin/env python3

import os, sys, re, string
         vs.
import os, sys, re

line = string.rstrip(line)
line = string.expandtabs(line, tabsize)
# Escape special characters.
line = string.replace(line, '&', '&amp;')
line = string.replace(line, '<', '&lt;')
line = string.replace(line, '>', '&gt;')
         vs.
line = line.rstrip()
line = line.expandtabs(tabsize)
# Escape special characters.
line = line.replace('&', '&amp;')
line = line.replace('<', '&lt;')
line = line.replace('>', '&gt;')

pos = string.find(line, inline_comment)
         vs.
pos = line.find(inline_comment)

v = string.lower(v)
         vs.
v = v.lower()

and a very last time:

~/.asciidoc3$ python3 ~/.asciidoc3/asciidoc3_new.py -v ~/.asciidoc3/filters/code/code-filter-test.txt
...
asciidoc3_new: writing: ~/.asciidoc3/filters/code/code-filter-test.html
asciidoc3_new: code-filter-test.txt: line 14: filtering: "/usr/bin/python3" "~/.asciidoc3/filters/code/code-filter.py" -b html -l python
~/.asciidoc3$

It works fine. We have a binary identical output compared with v2.
A post in the asciidoc-group

… The code filter is didactic example I wrote to illustrate how to write a filter …, strictly speaking it probably shouldn’t even be in the distribution. If you want to take it further then the best course would be to rename your enhanced version and publish it as a filter plugin.
Cheers, Stuart

— https://groups.google.com/forum/#!topic/asciidoc/hJ5eqR9DrfI

from the author (both of asciidoc and the filter) let me consider to skip this filter - I did the 2to3 work for the sake of completeness. Probably it’s a good filter-demo, but for real world code-highlighting you better move to chapter "filter source" above.

empty

5.2.6. Subprocess, revisited

Are we through with "subprocess"? Do you remember, we start the preceding filter-chapter because of executing "inputthree" - which is effectivly the same as the page source of asciidoc.org/userguide.html? This brought us the error:

asciidoc3_new: ERROR: inputthree.txt: line 1342: undefined filter attribute in command:
  "{python}" "{asciidoc-file}" -b {backend} {asciidoc-args}{lang? -a "lang={lang}@"}{icons? -a icons
  -a "iconsdir={iconsdir}"}{imagesdir? -a "imagesdir={imagesdir}"}{data-uri? -a data-uri}
  -a "indir={indir}"{trace? -a "trace={trace}"}{blockname? -a "blockname={blockname}"} -s

In adition or just on account of this asciidoc3_new.py is not able to "render" the <div class="paragraph">.
But, when we run …

python3 ~/.asciidoc3/asciidoc3_new.py --verbose -n -a icons -o ~/.asciidoc3/outputfiles/outputthree3.html ~/.asciidoc3/inputfiles/inputthree.txt

... the same command again, the same error shows up. Was all of our "filter-work" for nothing? Yes and no.
We have to consider that filter_lines() and incidental subprocess.popen() is called from three points in asciidoc3_new.py. "body = filter_lines(self.parameters.filter, body, self.attributes)" in line nr2727 - class Paragraph(AbstractBlock) - and in line nr3083 - class DelimitedBlock(AbstractBlock). This two have the identical signature and we have them covered in the filter-chapter … successfully.
But "subprocess" is invoked a third time with a different signature in line nr3497: class Table, def subs_row():
"data = filter_lines(self.get_param('filter', colstyle), data, self.attributes)"
And this is just what we have to deal with now! Ok, once again:

python3 ~/.asciidoc3/asciidoc3_new.py --verbose -n -a icons -o ~/.asciidoc3/outputfiles/outputthree3.html ~/.asciidoc3/inputfiles/inputthree.txt
...
asciidoc3_new: inputthree.txt: line 474: evaluating: {counter:table-number}
asciidoc3_new: inputthree.txt: line 493: evaluating: {counter:table-number}
asciidoc3_new: ERROR: inputthree.txt: line 1342: undefined filter attribute in command: "{python}" "{asciidoc-file}" -b {backend} {asciidoc-args}{lang? -a "lang={lang}@"}{icons? -a icons -a "iconsdir={iconsdir}"}{imagesdir? -a "imagesdir={imagesdir}"}{data-uri? -a data-uri} -a "indir={indir}"{trace? -a "trace={trace}"}{blockname? -a "blockname={blockname}"} -s -
asciidoc3_new: inputthree.txt: line 1548: evaluating: {counter:table-number}
...

We can identify the sticking point in asciidoc3.conf:

#-------
# Tables
#-------
[tabledef-default]
delimiter=^\|={3,}$
...
asciidoc-style=tags="asciidoc",subs=(),filter='"{python}" "{asciidoc-file}" -b {backend} {asciidoc-args}{lang? -a "lang={lang}@"}{icons? -a icons -a "iconsdir={iconsdir}"}{imagesdir? -a "imagesdir={imagesdir}"}{data-uri? -a data-uri} -a "indir={indir}"{trace? -a "trace={trace}"}{blockname? -a "blockname={blockname}"} -s -'


[tabledef-nested]
# Same as [tabledef-default] but with different delimiter and separator.
...
asciidoc-style=tags="asciidoc",subs=(),filter='"{python}" "{asciidoc-file}" -b {backend} {asciidoc-args}{lang? -a "lang={lang}@"}{icons? -a icons -a "iconsdir={iconsdir}"}{imagesdir? -a "imagesdir={imagesdir}"}{data-uri? -a data-uri} -a "indir={indir}"{trace? -a "trace={trace}"}{blockname? -a "blockname={blockname}"} -s -'

Chapter "Tables", that seems to be suitable! Probably we are in section "tabledef-default" and not in section "tabledef-nested", but that’s not the point. And we do see an "undefined filter attribute" right ahead: {python} is in asciidoc3_new.py {python3}, {asciidoc-file} is {asciidoc3-file}, see above. So let’s edit asciidoc3.conf:

asciidoc-style=tags="asciidoc",subs=(),filter='"{python3}" "{asciidoc3-file}" -b {backend} {asciidoc-args}{lang? -a "lang={lang}@"}{icons? -a icons -a "iconsdir={iconsdir}"}{imagesdir? -a "imagesdir={imagesdir}"}{data-uri? -a data-uri} -a "indir={indir}" {trace? -a "trace={trace}"} {blockname? -a "blockname={blockname}"} -s -'

Trying again, it works!
To be on the safe side regarding encoding, we add our new invented attributes.

asciidoc(3)-style
asciidoc-style=tags="asciidoc",subs=(),filter='"{python3}" "{asciidoc3-file}" -b {backend} {asciidoc-args}{lang? -a "lang={lang}@"}{icons? -a icons -a "iconsdir={iconsdir}"}{imagesdir? -a "imagesdir={imagesdir}"}{data-uri? -a data-uri} -a "indir={indir}" {trace? -a "trace={trace}"} {blockname? -a "blockname={blockname}"} -a "input-encoding={input-encoding}" -a "input-errors={input-errors}" -a "output-encoding={output-encoding}" -a "output-errors={output-errors}" -s -'

To make these attributes available, I enlarge the so-called function "add_encoding_for_compatibility()" and move its definition to class Ad3Codec(). See the note in line 265 in chapter "New Class Ad3Out(AD3Codec)" above - and we will edit this function one more time later on!

# delete old "def add_encoding_for_compatibility()"
ad3new, counter = re.subn(r'    def add_encoding_for_compatibility.*?def', r'def', ad3new, flags=re.S)
assert counter == 1

# new "def add_encoding_for_compatibility()"
ad3new, counter = re.subn(r'            raise NotImplementedError.*?class Ad3In', \
                          r'''            raise NotImplementedError

    @staticmethod
    def add_encoding_for_compatibility():
        """Ensures that encoding is not edited by user"""
        if ad3in and ad3out:
            message.verbose('encoding set for compatibility: {current_oe}'.format(current_oe = ad3out._oe))
            document.attributes['encoding'] = ad3out._oe
            document.attributes['input-encoding'] = ad3in._ie
            document.attributes['input-errors'] = ad3in._ierr
            document.attributes['output-encoding'] = ad3out._oe
            document.attributes['input-errors'] = ad3out._oerr
        else:
            raise NotImplementedError


class Ad3In''', ad3new, flags=re.S)
assert counter == 1

# new call "def add_encoding_for_compatibility()" ~##nr6018
ad3new, counter = re.subn(r"ad3out\.add_encoding_for_compatibility\(\) # DEPRECATED compatibility raesons", \
                          r"Ad3Codec.add_encoding_for_compatibility() # to ensure encoding in subprocess and conf-files", ad3new)
assert counter == 1

And still everything works fine …

We copy the new "asciidoc(3)-style" to it’s place at the end of "[tabledef-nested]" in asciidoc3.conf. And, to make close this chapter, I edit 'asciidoc-style=tags="asciidoc",subs=()…' to 'asciidoc3-style=tags="asciidoc3",subs=()…'. The test for both [tabledef-default] and [tabledef-nested] (we have only tested [tabledef-default] with inputthree.txt) is given in inputfifteen.txt (yes fifteen = 15, because of this was later inserted), see here.

5.2.7. Some more asciidoc3-renaming

As seen above, the renaming process of asciidoc to asciidoc3 is incomplete: "asciidoc-args" are so far not "asciidoc3-args". To make it complete, I rename this in asciidoc3_new.py for the sake of completeness (don’t forget the asciidoc3.conf). And doing so, we correct the forgotten "asciidoc-confdir" in line nr6006, too:

# the forgotten asciidoc-args to asciidoc3-args ##_nr_: 5976 _
ad3new, counter = re.subn(r"-args attribute\.                                ##_nr_: 5976 _", \
                          r"3-args attribute.                                ##_nr_: 5976 _", ad3new)
assert counter == 1

##nr5992
ad3new, counter = re.subn(r"document\.attributes\['asciidoc-args'] = args ", \
                          r"document.attributes['asciidoc3-args'] = args", ad3new)
assert counter == 1

# the forgotten asciidoc3-confdir ##nr6006
ad3new, counter = re.subn(r"document\.attributes\['asciidoc-confdir'], 'images/icons'\) ", \
                          r"document.attributes['asciidoc3-confdir'], 'images/icons')", ad3new)
assert counter == 1

5.2.8. Summary on Filters and Filter-Encoding

Coming back to the encoding-"problem" regarding subprocess.popen(). As cited above, that uses the encoding returned by "locale.getpreferredencoding(False)" instead of "locale.getpreferredencoding()". This may sometimes provoke that the encoding may differ from the attributes "input-encoding" or "output-encoding". So be careful and don’t wonder if the result is not what you expected. There is no obvious way to manipulate this behavior. Aall right, it is possible. See here …

PYTHONIOENCODING
Overrides the encoding used for stdin/stdout/stderr, in the syntax encodingname:errorhandler. The :errorhandler part is optional and has the same meaning as in str.encode(). For stderr, the :errorhandler part is ignored; the handler will always be 'backslashreplace'.

— https://docs.python.org/3.1/using/cmdline.html

... but that is not always an practicable way. Perhaps Python 3.6 offers an solution by implementing popen(…, encoding="whatyoulike"), we’ll see. Other ideas for dealing with this are: using contextlib.redirect_stdout() and/or io.StringIO() …
In my opinion, however, that’s only a small (if any) limitation of AsciiDoc v3. By the way, this issue is/was also existent in AsciiDoc v2 and in most cases shell programs accept only utf-8 or ascii or strongly recommend it.

As an example look how Pygments handles encoding:

Pygments tries to be smart regarding encodings in the formatting process: If you give an encoding option, it will be used as the input and output encoding. If you give an outencoding option, it will override encoding as the output encoding. If you give an inencoding option, it will override encoding as the input encoding. If you don’t give an encoding and have given an output file, the default encoding for lexer and formatter is the terminal encoding or the default locale encoding of the system. As a last resort, latin1 is used (which will pass through all non-ASCII characters). If you don’t give an encoding and haven’t given an output file (that means output is written to the console), the default encoding for lexer and formatter is the terminal encoding (sys.stdout.encoding).

— http://pygments.org/docs/cmdline/

empty

5.3. Testing the Homepage, part ii

Do you remember? We came to the filtering subject because of inputthree.txt (= asciidoc.org/userguide.html) was not able to render <div class="paragraph">. So we try a second time:

python2 ~/asciidoc2/asciidoc.py --verbose -n -a icons -o ~/.asciidoc3/outputfiles/outputthree2.html ~/.asciidoc3/inputfiles/inputthree.txt

python3 ~/.asciidoc3/asciidoc3_new.py --verbose -n -a icons -o ~/.asciidoc3/outputfiles/outputthree3.html ~/.asciidoc3/inputfiles/inputthree.txt

Now it works, the files are binary identical. I try the other pages of asciidoc.org:

  • We go on with inputfour.txt = asciidoc.org/INSTALL.html;

  • inputfive.txt = asciidoc.org/faq.html;

  • inputsix.txt = asciidoc.org/manpage.html;

  • inputseven.txt = asciidoc.org/a2x.1.html;

  • inputeight.txt = asciidoc.org/asciidocapi.html;

  • inputnine.txt = asciidoc.org/plugins.html;

  • inputten.txt = asciidoc.org/testasciidoc.html;

  • inputeleven.txt = asciidoc.org/CHANGELOG.html.

python2 ~/asciidoc2/asciidoc.py --verbose -n -a icons -o ~/.asciidoc3/outputfiles/output..1..2.html ~/.asciidoc3/inputfiles/input..1...txt

python3 ~/.asciidoc3/asciidoc3_new.py --verbose -n -a icons -o ~/.asciidoc3/outputfiles/output..1..3.html ~/.asciidoc3/inputfiles/input..1...txt
1 Replace the points with four, five, six, seven, eight, nine, ten, eleven, respectively. To see outputeight, don’t forget to activate (=uncomment) "source-highlighter" in asciidoc(3).conf. The v2 vs.v3 are binary identical; asciidoc.org/support.html and the cheatsheet are skipped - there’s nothing new or no source code is given.

empty

5.3.1. Playing with "sys"

We have successfully finished the "Home Page"-Testing. What’s next? Let’s perform another test given in the documentation of AsciiDoc2, probably the most concise file to test a lot of AsciiDoc(3)'s features. If this file is processed as expected, we jump over a further hurdle.
I execute "testcases.txt", found in ~/asciidoc2/tests/data/testcases.txt - copy the file to directory "inputfiles" together with "testcases.conf". Starting with v2:

python2 ~/asciidoc2/asciidoc.py --verbose -n -a icons -o ~/.asciidoc3/outputfiles/testcases2.html ~/.asciidoc3/inputfiles/testcases.txt
Note If you see a WARNING in the last line of the command line output "asciidoc: WARNING: testcases.txt: line 788: {template:test-template}: template does not exist" you forgot to copy "testcases.conf".

In the other case we see:

...
asciidoc: testcases.txt: line 192: filtering: source-highlight -f xhtml -s python
asciidoc: WARNING: testcases.txt: line 282: nested inline passthrough
asciidoc: testcases.txt: line 304: evaluating: {counter:figure-number}
...
sh: 1: cannot open ~/.asciidoc3/inputfiles/../../images/smallnew.png: No such file
asciidoc: WARNING: testcases.txt: line 326: {sys:"/usr/bin/python2" -u -c "import mimetypes,base64,sys; print 'src=\"data:'+mimetypes.guess_type(r'smallnew.png')[0]+';base64,'; base64.encode(sys.stdin,sys.stdout)" < "~/.asciidoc3/inputfiles/../../images/smallnew.png"}: non-zero exit status
asciidoc: testcases.txt: line 604: evaluating: {counter:table-number}
...
asciidoc: testcases.txt: line 788: evaluating: {template:test-template}

When we open the output file "testcases2.html" with our browser, we detect that all images and icons are missing. As an example "…images/smallnew.png" is not found. We have to edit the paths in "textcases.txt". Afterwards on my machine this looks as follows:

Edited "textcases.txt"
...
=== Block images

[[tiger_image]]
.Tyger tyger
image::tiger.png[Tyger tyger]

:height: 250
:width: 350
.Tyger tyger two
image::tiger.png[caption="Figure 2: ", alt="Tiger", align="center"]
:height!:
:width!:

// Images and icons directories.
:imagesdir: ../doc
image::music2.png[]

:icons:
:iconsdir:  ../images/icons
NOTE: Lorum ipsum.

:icons!:

:imagesdir: ~/.asciidoc3/images
:data-uri:
image:smallnew.png[NEW] 'testing' `123`.


:data-uri!:

=== Inline images

:imagesdir: ../images
...
...
.Tiger
[float="right"]
image::tiger.png["Tiger image"]

unfloat::[]
...

Replace "~" with your real path, e.g. "/home/myname". asciidoc2.py works with ":imagesdir: ../images".

Note This imagedir-and-iconsdir-finding-the-right-location-confusion is very often an annoying part of working with AsciiDoc(3) … Put on my ToDo-List: Refactoring of imagedir-path, perhaps "os.path.expanduser(path)" will help?

Doing so we have still "asciidoc: WARNING: testcases.txt: line 282: nested inline passthrough", but that’s expected and ok. Scrolling down the output, everything looks fine. I give two screenshots.

Part of "testcases.html", i

testcases1_png

empty

Part of "testcases.html", ii

testcases2_png

empty

Let’s start the same procedure with asciidoc3_new.py:

python3 ~/.asciidoc3/asciidoc3_new.py --verbose -n -a icons -o ~/.asciidoc3/outputfiles/testcases3.html ~/.asciidoc3/inputfiles/testcases.txt

Please keep in mind that we logically have to edit xhtml11.conf before we do so (perhaps you did already): asciidoc becomes asciidoc3, see here

Part of xhtml11.conf v3
 ...
 Asciidoc3 configuration file.   1
 ...
 ...
 [header]
 ...
 <meta http-equiv="Content-Type" content="{quirks=application/xhtml+xml}{quirks?text/html}; charset={output-encoding}" />  2
 <meta name="generator" content="AsciiDoc(3?) {asciidoc3-version}" />     3
 <meta name="description" content="{description}" />
 <meta name="keywords" content="{keywords}" />
 ...
 <link rel="stylesheet" href="{stylesdir=.}/{theme=asciidoc3}.css" type="text/css" />
 ifdef::quirks[]
 <link rel="stylesheet" href="{stylesdir=.}/xhtml11-quirks.css" type="text/css" />
 endif::quirks[]
 ...
 include1::{theme%}{stylesdir=./stylesheets}/asciidoc3.css[]
 include1::{themedir}/{theme}.css[]
 ...
 <script type="text/javascript" src="{scriptsdir=.}/asciidoc3.js"></script>
 ...
 include1::{scriptsdir=./javascripts}/asciidoc3.js[]
 ...
1 almost at the top: 4th line;
2 charset=UTF-8 is also possible, but "output-encoding" is more exact;
3 if you make a change here ("AsciiDoc3"), the output v2 to v3 will differ - perhaps we do it later after testing.

Starting at last, we find again "asciidoc3_new: WARNING: testcases.txt: line 282: nested inline passthrough" - that is ok. No ERROR is thrown - that’s ok, too. And all images seem to be on their place. But when we look closer - in "13.1. Block images" the yellow small "new" image is missing. A second look reveals what’s going on: it’s in the lines

:data-uri:
image:smallnew.png[NEW] 'testing' `123`.


:data-uri!:

asciidoc3.py doesn’t compute the output-lines as asciidoc2 does - and neither it’s correct xhtml at all:

<img alt="NEW"                                              1
src="data:image/png;base64,
iVBORw0KGgoAAAANSUhEUgAAABsAAAARCAYAAAAsT9czAAAAAXNSR0IArs4c6QAAAAZiS0dEAP8A
/wD/oL2nkwAAAAlwSFlzAAAOwgAADsIBFShKgAAAAAd0SU1FB9kEGQU1DxUxRF4AAACkSURBVDjL
3VXbCoAgDD2Wwf7/awcR9mRM23QSWDQQ5i5tZxcLKSXMogUT6bvBjmNLLV1LDwBB61nttK570GSt
BKQ+81ewXlZPKAdbagEAEJXG+U5UHs3WkgFAtMpFBDDfHTSZB1VzQJjtrEfkJrK6b1pADZm0kbxE
5Rr9XtmynrnkteSjZ4csFJ7+vTb60buosv4jS999QeqPao6yPF6f8NtfzAno2HZ/Qe1mTQAAAABJ
RU5ErkJggg==" />
</span> <em>testing</em> <code>123</code>.</p></div>


<img alt="NEW"                                              2
</span> <em>testing</em> <code>123</code>.</p></div>
</div>
1 asciidoc.py
2 asciidoc3_new.py

Very soon I’ve found the relevant location in xhtml11.conf (it’s not difficult but rather obvious):

...
[image-inlinemacro]                                         1
<span class="image{role? {role}}">
<a class="image" href="{link}">
{data-uri%}<img src="{imagesdir=}{imagesdir?/}{target}" alt="{alt={target}}"{width? width="{width}"}{height? height="{height}"}{title? title="{title}"} />
{data-uri#}<img alt="{alt={target}}"{width? width="{width}"}{height? height="{height}"}{title? title="{title}"}
{data-uri#}{sys:"{python}" -u -c "import mimetypes,base64,sys; print 'src=\"data:'+mimetypes.guess_type(r'{target}')[0]+';base64,'; base64.encode(sys.stdin,sys.stdout)" < "{eval:os.path.join(r"{indir={outdir}}",r"{imagesdir=}",r"{target}")}"}" />
{link#}</a>
</span>
1 The (almost) same code resides in [image-blockmacro] just subsequent and in [callout-inlinemacro], [listtags-callout], and [admonitionblock].

Here matters the line

"Critical" Line in xhtml11.conf
{data-uri#}{sys:"{python}" -u -c "import mimetypes,base64,sys; print 'src=\"data:'+mimetypes.guess_type(r'{target}')[0]+';base64,'; base64.encode(sys.stdin,sys.stdout)"

Why? First, {python} doesn’t exist → {python3}; and again:

asciidoc3_new: testcases.txt: line 326: evaluating: {eval:os.path.join(r"~/.asciidoc3/inputfiles",r"../images",r"smallnew.png")}
asciidoc3_new: testcases.txt: line 326: evaluating: {sys:"/usr/bin/python3" -u -c "import mimetypes,base64,sys; print 'src=\"data:'+mimetypes.guess_type(r'smallnew.png')[0]+';base64,'; base64.encode(sys.stdin,sys.stdout)" < "~/.asciidoc3/inputfiles/../images/smallnew.png"}   1
asciidoc3_new: testcases.txt: line 326: shelling: "/usr/bin/python3" -u -c "import mimetypes,base64,sys; print 'src=\"data:'+mimetypes.guess_type(r'smallnew.png')[0]+';base64,'; base64.encode(sys.stdin,sys.stdout)" < "~/.asciidoc3/inputfiles/../images/smallnew.png" > "/tmp/tmpkz4x4094"
  File "<string>", line 1
    import mimetypes,base64,sys; print 'src="data:'+mimetypes.guess_type(r'smallnew.png')[0]+';base64,'; base64.encode(sys.stdin,sys.stdout)
                                                  ^ 2
SyntaxError: invalid syntax
asciidoc3_new: WARNING: testcases.txt: line 326: {sys:"/usr/bin/python3" -u -c "import mimetypes,base64,sys; print 'src=\"data:'+mimetypes.guess_type(r'smallnew.png')[0]+';base64,'; base64.encode(sys.stdin,sys.stdout)" < "~/.asciidoc3/inputfiles/../images/smallnew.png"}: non-zero exit status
1 We have again not the correct path (depending on your imagedir - see above), and
2 some outdated Python2 expressions in our now Python3 chunk.

I give my solution right here in several steps and for better traceability in an alternate listing. The "critical" xhtml11.conf-line may be written as

python -u -c
import mimetypes,base64,sys
print 'src=\"data:'+mimetypes.guess_type(r'{target}')[0]+';base64,'
base64.encode(sys.stdin,sys.stdout)

to be executed in the command line. (We have "python3 -u -c …" here, to be exactly). All right, "python3" starts the Python3-interpreter, "-u" says there is no buffering - using buffering is no good idea when calling a second program from "inside" another, especially when the output is needed without any delay (as we do expect here). The optioin "-c" says, all Python input is given from the command line, too. Here we have "import mimetypes …" And now comes my first solution (some whitespace added), followed by some additional information:

python3 -u -c
import mimetypes,base64,sys
ad3mtype=mimetypes.guess_type(r'{target}')[0]                 1
print('src=\"data:'+ad3mtype+';base64,')                      2
ad3iopen = open('{imagesdir=}{imagesdir?/}{target}', 'rb')    3
ad3data=base64.b64encode(ad3iopen.read()).decode()            4
ad3data = bytearray(ad3data, encoding='ascii')                5
ad3data = list(ad3data)
ad3len = len(ad3data)
ad3space = 76
ad3lines = int(ad3len / ad3space) + 1
[ad3data.insert(ad3counter * ad3space + ad3counter - 1, 10) for ad3counter in range(1, ad3lines)]
ad3data = bytes(ad3data)                                      6
ad3data = str(ad3data, encoding='ascii')                      7
print(ad3data)                                                8
ad3iopen.close()                                              9
1 ad3mtype is our target-type (=png)
2 the "print"+ad3mtype is so to avoid the quotes
3 "ad3iopen" opens the image-file in binary mode
4 computes the base42-code of the image-data
5 ff.: "bytearray" casts to a list to insert a b’10' = \n after 76 characters
6 back to bytes
7 back to string
8 to "print" directly to the html-output
9 close the file to exit this "subprocess" safely

Why are we doing this? We want the image base64-coded in 76-character-wide lines to be read as "data-uri". This is not mandatory, any browser would accept the data in one row without a new line, but this is not identical to the way asciidoc2.py works. Some kind of hack, I confess, hopefully surviving all inputs … I improve this "hack" a little bit by using a "with"-expression. An abbreviation is performed by reducing the number of variables and expressions:

python3 -u -c
import mimetypes,base64,sys
ad3mtype=mimetypes.guess_type(r'{target}')[0]
print('src=\"data:'+ad3mtype+';base64,')
ad3iopen = open('{imagesdir=}{imagesdir?/}{target}', 'rb')
ad3data=base64.b64encode(ad3iopen.read()).decode()
ad3data = bytearray(ad3data, encoding='ascii')
ad3data = list(ad3data)
d3lines = int( len(ad3data) / 76 ) + 1
[ad3data.insert(ad3counter * ad3space + ad3counter - 1, 10) for ad3counter in range(1, ad3lines)]
print(str(bytes(ad3data), encoding='ascii'))
ad3iopen.close()

and now "with"

python3 -u -c
with open('{imagesdir=}{imagesdir?/}{target}', 'rb') as ad3iopen:
    import mimetypes, base64, sys
    ad3mtype=mimetypes.guess_type(r'{target}')[0]
    print('src=\"data:'+ad3mtype+';base64,')
    ad3data=base64.b64encode(ad3iopen.read()).decode()
    ad3data = bytearray(ad3data, encoding='ascii')
    ad3data = list(ad3data)
    ad3lines = int( len(ad3data) / 76 ) + 1
    [ad3data.insert(ad3counter * 76 + ad3counter - 1, 10) for ad3counter in range(1, ad3lines)]
    print(str(bytes(ad3data), encoding='ascii'))

This was my first step of refactoring, when at this point I "discovered" the module "textwrap" - Python is some kind of a treasure chest. This module makes our solution easier, shorter and probably even more safe (in one word: "pythonic"):

python3 -u -c
with open('{imagesdir=}{imagesdir?/}{target}', 'rb') as ad3iopen:
    import mimetypes, base64, sys, textwrap
    ad3mtype=mimetypes.guess_type(r'{target}')[0]
    print('src=\"data:'+ad3mtype+';base64,')
    ad3data=base64.b64encode(ad3iopen.read()) 1
    ad3data = list(ad3data)
    print(textwrap.fill(str(bytes(ad3data), encoding='ascii'), width=76))
    del(ad3mtype, ad3data)                    2
1 We can omit the "decode" and the "encoding='ascii'-step.
2 Probably not neccessary

To pack the next four lines in two, or even in one (you may omit the "del"), seems to me ambitious and confusing to the reader

We can do it in one line … but we don’t
python3 -u -c
with open('{imagesdir=}{imagesdir?/}{target}', 'rb') as ad3iopen:
    import mimetypes, base64, sys, textwrap
    ad3mtype=mimetypes.guess_type(r'{target}')[0]
    print('src=\"data:'+ad3mtype+';base64,')
    print(textwrap.fill(str(bytes(list(base64.b64encode(ad3iopen.read()))), encoding=\'ascii'), width=76))

but it’s possible and works as expected …

5.3.2. "data-uri" and the .conf-files

All together in the end we have the xhtml11.conf edited as follows:

The "Critical" Line after Refactoring
[...]
{data-uri#}{sys:"{python3}" -u -c "with open('{imagesdir=}{imagesdir?/}{target}', 'rb') as ad3iopen: import base64, mimetypes, sys, textwrap; ad3mtype=mimetypes.guess_type(r'{target}')[0]; print('src=\"data:'+ad3mtype+';base64,'); ad3data=base64.b64encode(ad3iopen.read()); ad3data = list(ad3data); print(textwrap.fill(str(bytes(ad3data), encoding='ascii'), width=76)); del(ad3mtype, ad3data)" <"{eval:os.path.join( r"{indir={outdir}}",r"{imagesdir=}",r"{target}")}"}" />
{link#}</a>
</span>
[...]

If we let run our new configuration, asciidoc3.py on "testcases.txt" works fine: the output is binary identical to v2, especially the src="data:image/png;base64, iVBORw0KGgoAA … RU5ErkJggg==" part.
To complete this I run further tests about "data-uri" in all mentioned contexts found in xhtml11.conf and in asciidoc3.conf. "image-inlinemacro" is already successfully tested with "testcases.txt". The "filter-image-blockmacro" in filter-chapter, see above.
The next is "image-blockmacro". This is checked "ok" with inputtwelve.txt, which in addition tests some other things like toggling data-uri on/off.
"tabledef-default" and "tabledef-nested" are successfully tested in inputfifteen.txt and inputsixteen.txt.

The "callout-inlinemacro" in xhtml11.conf looks very similiar to the "image-inlinemacro". I give the source right here:

"callout-inlinemacro" (xhtml11.conf) after Refactoring
[...]
<img alt="{index}" src="data:image/png;base64,{sys:"{python3}" -u -c "with open('{icon={iconsdir}/callouts/{index}.png}', 'rb') as ad3iopen: import base64, sys, textwrap; ad3data=base64.b64encode(ad3iopen.read()); ad3data = list(ad3data); print(''); print(textwrap.fill(str(bytes(ad3data), encoding='ascii'), width=76)); del(ad3data)" < "{eval:os.path.join(r"{indir={outdir}}",r"{icon={iconsdir}/callouts/{index}.png}")}"}" />
[...]

There is no "import mimetypes", we have another path to the image-source, and we have an additional print('')? What is that print good for? Just to enforce the binary identical output compared to asciidoc2.py, nothing else. So this is not mandatory, but helpfully.
And the "listtags-callout"? Here we go

"listtags-callout" (xhtml11.conf) after Refactoring
[...]
item=<tr><td><img alt="{listindex}" src="data:image/png;base64, {sys:"{python3}" -u -c "with open('{iconsdir=}{iconsdir?/}callouts/{listindex}.png', 'rb') as ad3iopen: import base64, sys, textwrap; ad3data=base64.b64encode(ad3iopen.read()); ad3data = list(ad3data); print(textwrap.fill(str(bytes(ad3data), encoding='ascii'), width=76)); del(ad3data)" < "{eval:os.path.join(r"{indir={outdir}}",r"{icon={iconsdir}/callouts/{listindex}.png}")}"}" /></td><td>|</td></tr>
[...]

No additional "print" neccessary. "callout-inlinemacro" and "listtags-callout" are tested (successfully …) in inputtwelve.txt and inputthirteen.txt, data-uri on/off, respectively.

Note Toggling data-uri on/off inside an input.txt seems to have no effect when regarding callouts (a bug? - but toggling data-uri in one document makes of course either way no sense). Avoid it. Use the command-line option "-a data-uri" instead or set the attribute in the first line of your input.txt: ":data-uri:" - that will work fine for you.

We go to "admonitionblock".

"[admonitionblock]" (xhtml11.conf) before Refactoring
[...]
<td class="icon">
{data-uri%}{icons#}<img src="{icon={iconsdir}/{name}.png}" alt="{caption}" />
{data-uri#}{icons#}<img alt="{caption}" src="data:image/png;base64,
{data-uri#}{icons#}{sys:"{python}" -u -c "import base64,sys; base64.encode(sys.stdin,sys.stdout)" < "{eval:os.path.join(r"{indir={outdir}}",r"{icon={iconsdir}/{name}.png}")}"}" />
{icons%}<div class="title">{caption}</div>
</td>
[...]

and do a very similar work:

"[admonitionblock]" (xhtml11.conf) after Refactoring
[...]
<td class="icon">
{data-uri%}{icons#}<img src="{icon={iconsdir}/{name}.png}" alt="{caption}" />
{data-uri#}{icons#}<img alt="{caption}" src="data:image/png;base64,
{data-uri#}{icons#}{sys:"{python3}" -u -c "with open('{icon={iconsdir}/{name}.png}', 'rb') as ad3iopen: import base64, sys, textwrap; ad3data=base64.b64encode(ad3iopen.read()); ad3data = list(ad3data); print(textwrap.fill(str(bytes(ad3data), encoding='ascii'), width=76)); del(ad3data)" < "{eval:os.path.join(r"{indir={outdir}}",r"{icon={iconsdir}/{name}.png}")}"}" />
{icons%}<div class="title">{caption}</div>
</td>
[...]

These conditions are bundled in "inputseventeen.txt" - it works.

All right, to make things complete, we have to test all conf-files with a "data-uri"-condition. As seen before, xhtml11.conf and asciidoc3.conf are successfully checked. Let’s go to:
music-filter.conf latex-filter.conf graphviz-filter.conf html5.conf html4.conf

The easiest way to do this is to run the previous command plus "-o data-uri".

music-filter.conf "data-uri"
python3 ~/.asciidoc3/asciidoc3_new.py --verbose -n -a icons -a data-uri -o ~/.asciidoc3/outputfiles/musictest3du.html ~/.asciidoc3/filters/music/music-filter-test.txt

"musictest3du.html" means:running with asciidoc3.py plus data-uri

latex-filter.conf "data-uri"
python3 ~/.asciidoc3/asciidoc3_new.py --verbose -n -a icons -a data-uri -o ~/.asciidoc3/outputfiles/latextest3du.html ./inputfiles/latex-filter.txt

and at last

graphviz-filter.conf "data-uri"
python3 ~/.asciidoc3/asciidoc3_new.py --verbose -n -a icons -a data-uri -o ~/.asciidoc3/outputfiles/graphviztest3du.html ~/.asciidoc3/filters/graphviz/asciidoc3-graphviz-sample.txt

Yeah, this three are working right out of the box!

We have to test all "conf", that is here, backends (html5, html4, xhtml11), with both data-uri on and data-uri off to make it more safe. This job will be done in chapter/step "utest" (unittest).

Note Backend "html4" is somewhat outdated and in deed not "up to date". Avoid it. html4 doesn’t use any css (in standard), so the layout is not a burner. In addition, callouts are not always rendered correctly and we have no toc …

empty

5.4. Checking the rest of the asciidoc3.py code

5.4.1. Eliminating quirks-modus

Did you notice that we didn’t edit asciidoc3_new.py for a while, but the various conf-files? To make this just more complete, I eliminate the already in v2 deprecated "quirks"-option. To operate this, I have to edit asciidoc3.conf and xhtml11.conf.

xhtml11.conf before eliminating quirks
 [...]
 [header]
 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
     "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="{lang=en}">
 <head>
 <meta http-equiv="Content-Type" content="{quirks=application/xhtml+xml}{quirks?text/html}; charset={output-encoding}" />  1
 <meta name="generator" content="AsciiDoc {asciidoc3-version}" />
 <meta name="description" content="{description}" />
 <meta name="keywords" content="{keywords}" />
 <title>{title}</title>
 {title%}<title>{doctitle=}</title>
 ifdef::linkcss[]
 <link rel="stylesheet" href="{stylesdir=.}/{theme=asciidoc3}.css" type="text/css" />
 ifdef::quirks[]                                                                                                           1
 <link rel="stylesheet" href="{stylesdir=.}/xhtml11-quirks.css" type="text/css" />                                         1
 endif::quirks[]                                                                                                           1
 ifeval::["{source-highlighter}"=="pygments"]
 <link rel="stylesheet" href="{stylesdir=.}/pygments.css" type="text/css">
 endif::[]

 # DEPRECATED: 'pygments' attribute.
 ifdef::pygments[<link rel="stylesheet" href="{stylesdir=.}/pygments.css" type="text/css" />]

 ifdef::toc2[<link rel="stylesheet" href="{stylesdir=.}/toc2.css" type="text/css" />]
 <link rel="stylesheet" href="{stylesdir=.}/{stylesheet}" type="text/css" />
 endif::linkcss[]
 ifndef::linkcss[]
 <style type="text/css">
 include1::{theme%}{stylesdir=./stylesheets}/asciidoc3.css[]
 include1::{themedir}/{theme}.css[]
 ifdef::quirks[]                                                                                                           1
 include1::{stylesdir=./stylesheets}/xhtml11-quirks.css[]                                                                  1
 endif::quirks[]                                                                                                           1
 ifeval::["{source-highlighter}"=="pygments"]
 include1::{stylesdir=./stylesheets}/pygments.css[]
 endif::[]
 [...]
1 We delete the lines dealing with "quirks":
xhtml11.conf after eliminating quirks, part ia
 [...]
 [header]
 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
     "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="{lang=en}">
 <head>
 <meta http-equiv="Content-Type" content="application/xhtml+xml; charset={output-encoding}" />
 <meta name="generator" content="AsciiDoc {asciidoc3-version}" />
 <meta name="description" content="{description}" />
 <meta name="keywords" content="{keywords}" />
 <title>{title}</title>
 {title%}<title>{doctitle=}</title>
 ifdef::linkcss[]
 <link rel="stylesheet" href="{stylesdir=.}/{theme=asciidoc3}.css" type="text/css" />
 ifeval::["{source-highlighter}"=="pygments"]
 <link rel="stylesheet" href="{stylesdir=.}/pygments.css" type="text/css">
 endif::[]

 # DEPRECATED: 'pygments' attribute.
 ifdef::pygments[<link rel="stylesheet" href="{stylesdir=.}/pygments.css" type="text/css" />]

 ifdef::toc2[<link rel="stylesheet" href="{stylesdir=.}/toc2.css" type="text/css" />]
 <link rel="stylesheet" href="{stylesdir=.}/{stylesheet}" type="text/css" />
 endif::linkcss[]
 ifndef::linkcss[]
 <style type="text/css">
 include1::{theme%}{stylesdir=./stylesheets}/asciidoc3.css[]
 include1::{themedir}/{theme}.css[]
 ifeval::["{source-highlighter}"=="pygments"]
 include1::{stylesdir=./stylesheets}/pygments.css[]
 endif::[]
 [...]

This three lines (the very last) may be deleted without replacement:

xhtml11.conf after eliminating quirks, part ib
 [...]
 ifdef::quirks[]
 include::xhtml11-quirks.conf[]
 endif::quirks[]

This two lines may be erased in asciidoc3.conf:

asciidoc3.conf after eliminating quirks, part ii
 [...]
 # Uncomment to use xhtml11 quirks mode CSS.
 #quirks=
 [...]

So quirks is gone …

# message: no more 'quirks'
ad3new, counter = re.subn(r'subprocess and conf-files', \
                          r"""\g0
            if 'quirks' in document.attributes: message.verbose('No quirks-mode in AsciiDoc3 any more!')""", ad3new)
assert counter == 1
Note We have eliminated "quirks", option "unsafe", "oldtables", and "BOM" - but not the decprecated "old list"-format or the "special section"-titles. This will happen in one of the next versions of AsciiDoc3 …

empty

5.4.2. Checking time

We have a few points left in asciidoc3.py to look for. I have marked them with "TODO".
At first we look at the functions "def time_str(t)" and "def date_str(t)":

time_…(t) in asciidoc3.py
[...]
def time_str(t):                                                                      ##_nr_: 1271 _
    """Convert seconds since the Epoch to formatted local time string."""             ##_nr_: 1272 _
    t = time.localtime(t)                                                             ##_nr_: 1273 _
    s = time.strftime('%H:%M:%S', t)                                                  ##_nr_: 1274 _
    if time.daylight and t.tm_isdst == 1:                                             ##_nr_: 1275 _
        result = s + ' ' + time.tzname[1]                                             ##_nr_: 1276 _
    else:                                                                             ##_nr_: 1277 _
        result = s + ' ' + time.tzname[0]                                             ##_nr_: 1278 _
    # Attempt to convert the localtime to the output encoding.                        ##_nr_: 1279 _
    try:
        assert type(result) == type('string')  # this line is to be deleted later
        result = bytes(result, 'utf8') # assumes that 'result' is a 'utf-8' string --> TODO
        result = result.decode(getdefaultlocale()[1])
    except Exception:
        pass
    return result
                                                                                      ##_nr_: 1285 _
def date_str(t):                                                                      ##_nr_: 1286 _
    """Convert seconds since the Epoch to formatted local date string."""             ##_nr_: 1287 _
    t = time.localtime(t)                                                             ##_nr_: 1288 _
    return time.strftime('%Y-%m-%d', t)                                               ##_nr_: 1289 _
[...]

I take a closer look: "t = time.time()" (see "class Document(object) - def update_attributes()") returns a floating point number (or None). When passed to time.localtime() we have a (local) "struct_time". According to this, "result" is always of type str and encoded 'utf-8' - we may delete this assertion. And we may erase the "# assumes that … string -→ TODO" comment, too. But does the "result.decode(locale.getdefaultlocale()[1])" always the right thing? I do hope so, but a little incertitude remains, perhaps something like "locale.LC_CTYPE" helps - I take a chance … For "def date_str(t)" nothing is to add.

# def time_...()
ad3new, counter = re.subn(r'assert type\(result\).*?\[1]\)', \
                          r"""result = bytes(result, 'utf8')
        result = result.decode(getdefaultlocale()[1])""", ad3new, flags=re.S)
assert counter == 1

empty

5.4.3. Checking out "ascii-ids"

The next point to check is function "gen_id" in "class Section". We have to be sure that "base_id" is correct when attribute "ascii-ids" is set:

class Section:                                                                    ##_nr_: 2216 _
[...]
    @staticmethod                                                                 ##_nr_: 2232 _
    def gen_id(title):                                                            ##_nr_: 2233 _
        """                                                                       ##_nr_: 2234 _
        The normalized value of the id attribute is an NCName according to        ##_nr_: 2235 _
        the 'Namespaces in XML' Recommendation:                                   ##_nr_: 2236 _
        NCName          ::=     NCNameStartChar NCNameChar*                       ##_nr_: 2237 _
        NCNameChar      ::=     NameChar - ':'                                    ##_nr_: 2238 _
        NCNameStartChar ::=     Letter | '_'                                      ##_nr_: 2239 _
        NameChar        ::=     Letter | Digit | '.' | '-' | '_' | ':'            ##_nr_: 2240 _
        """                                                                       ##_nr_: 2241 _
        # Replace non-alpha numeric characters in title with underscores and      ##_nr_: 2242 _
        # convert to lower case.                                                  ##_nr_: 2243 _
        base_id = re.sub(r'(?u)\W+', '_', title).strip('_').lower()               ##_nr_: 2244 _
        if 'ascii-ids' in document.attributes:                                    ##_nr_: 2245 _
            # Replace non-ASCII characters with ASCII equivalents.                ##_nr_: 2246 _
            #import unicodedata   # is already imported
            base_id = normalize('NFKD', base_id).encode('ascii', 'ignore')
            base_id = str(base_id, encoding = 'ascii')
        assert type(base_id) == type('string')  # this line is to be deleted later
[...]

To prove this, I use a little test-program that contains the critical lines of "asciidoc3.py"

import re
from unicodedata import normalize

def gen_id(title):
    base_id = re.sub(r'(?u)\W+', '_', title).strip('_').lower()
    assert type(base_id) == type('string')  # this line is to be deleted later
    output1 = base_id
    base_id = normalize('NFKD', base_id).encode('ascii', 'ignore')
    base_id = str(base_id, encoding = 'ascii')
    assert type(base_id) == type('string')  # this line is to be deleted later
    f = "{:25}"
    print(f.format(title), f.format(output1), f.format(base_id))
    print()

testtuple = ('abc', 'öjkh', 'Ju Ginseng', 'JÜ Ginseng', 'Kraß', 'Hello€',
             'simson der große', 'ûlrich îgZtopz', 'abc', 'Herbert Kolloöjkh',
             'Ju þÿ', 'JÜ Ginseng', 'Conrad¢ Kr␀␀þÿ', 'Hello Norbert Plus+',
             'Stern* Tilde~', 'ûlrich ÷dL', '1•3 Hhube', 'Herbert Kolloöjkh',
             'Ju þÿ', 'JÜ Ginseng', 'Conrad¢ Kr␀␀þÿ', 'Norbert Rohrbert Plus+',
             'Stern* Tilde~', 'ûlrich ÷dL', '这是一个测试', 'Duck 这是一个测试 Donald',
             'Herbert Kolloå', 'JÜ Merin℀', 'JÜ G℈seng', 'Conrad¢ KℏjuuewQ',
             'Hello N℔rbert Minus-', 'Suz℞ Tild' )

print("title                     base_id (no ids)          base_id (ids)")
for item in testtuple:
    gen_id(item)

The output - abridged and formatted:

title                     base_id (no ids)          base_id (ids)
abc                       abc                       abc
öjkh                      öjkh                      ojkh
Ju Ginseng                ju_ginseng                ju_ginseng
JÜ Ginseng                jü_ginseng                ju_ginseng
Kraß                      kraß                      kra
Hello€                    hello                     hello
simson der große          simson_der_große          simson_der_groe
ûlrich îgZtopz            ûlrich_îgztopz            ulrich_igztopz
abc                    ï_abc                     i_abc
Herbert Kolloöjkh      herbert_kolloï_öjkh       herbert_kolloi_ojkh
Ju þÿ                     ju_þÿ                     ju_y
JÜ Ginseng                jü_ginseng                ju_ginseng
Conrad¢ Kr␀␀þÿ            conrad_kr_þÿ              conrad_kr_y
Hello Norbert Plus+       hello_norbert_plus        hello_norbert_plus
Stern* Tilde~             stern_tilde               stern_tilde
ûlrich ÷dL                ûlrich_dl                 ulrich_dl
1•3 Hhube                 1_3_hhube                 1_3_hhube
Herbert Kolloöjkh      herbert_kolloï_öjkh       herbert_kolloi_ojkh
Ju þÿ                     ju_þÿ                     ju_y
JÜ Ginseng                jü_ginseng                ju_ginseng
Conrad¢ Kr␀␀þÿ           conrad_kr_þÿ              conrad_kr_y
Norbert Rohrbert Plus+    norbert_rohrbert_plus     norbert_rohrbert_plus
Stern* Tilde~             stern_tilde               stern_tilde
ûlrich ÷dL                ûlrich_dl                 ulrich_dl
这是一个测试               这是一个测试                                    1
Duck 这是一个测试 Donald   duck_这是一个测试_donald  duck__donald          1
Herbert Kolloå            herbert_kolloå            herbert_kolloa
JÜ Merin℀                             jü_merin                  ju_merin              2
JÜ G℈seng                 jü_g_seng                 ju_g_seng
Conrad¢ KℏjuuewQ          conrad_kℏjuuewq           conrad_kjuuewq
Hello N℔rbert Minus-      hello_n_rbert_minus       hello_n_rbert_minus
Suz℞ Tild                 suz_tild                  suz_tild
1 the source shows the three columns aligned, the html-output does not - a "problem" caused by the east-asian glyphs (and not by AsciiDoc3)
2 not only the east-asian, the ℀ shows a similar effect…

... looks nice, there’s nothing to change here. We delete all the unneccessary stuff.

# def gen_id(title)
ad3new, counter = re.subn(r'assert type\(base_id\) == type\(.*?later\n        ', r'', ad3new)
assert counter == 1

empty

5.4.4. Checking out "open()"

And we have a third qestion about encoding:
In function "def system(name, args, is_macro=False, attrs=None)" nr837 ff. we have two open() on files:
"f = open(tmp) _nr_: 902 _" and "f = open(args) _nr_: 975 _". Is 'utf-8' (the default) the desired encoding? Yes, it is: "f" contains the output of the shelling / filters, e.g. the "base64"-result or the included css/js-files. I refactor the two blocks and write down an equivalent "with"-statement.

# f = open(tmp) with
ad3new, counter = re.subn(r'f = open\(tmp\)\s*?##_nr_: 902 _.*?906 _', \
                          r"""with open(tmp, encoding = 'utf-8') as f:
                        lines = [s.rstrip() for s in f]""", ad3new, flags=re.S)
assert counter == 1

# f = open(args) with
ad3new, counter = re.subn(r'f = open\(args\)\s*?##_nr_: 975 _.*?979 _', \
                          r"""with open(args, encoding = 'utf-8') as f:
                result = [s.rstrip() for s in f]""", ad3new, flags=re.S)
assert counter == 1

empty

5.4.5. Checking out something left

Going through the entire code of asciidoc3.py once more, I found a few little improvements reasonable:
- change asciidoc to asciidoc3 (five times),
- moving all "import" to the top (csv, io),
- and add "itertools" here,
- to eliminate bad indentation messages from a quick pylint-run,
- to eliminate some "unused" variables to make the program faster and more pythonic, see the example:

for i in range(int(cols)):                       1
    reader.read()
[...]

for __ in itertools_repeat(None, int(cols)):     2
    reader.read()
[...]
1 "def translate()" in "class Title": variable "i" is never used ….
2 … this looks better (I prefer __ to _ because _ is often used by module "gettext")

I do not list all the other changes here, see "editad3.py".

Note We "test" the rendering of mathematical formulas later on: we need a2x3.py to produce PDF by dblatex, see here

empty

6. asciidoc3api.py

So we have migrated most of the conf-files and the testfiles are running well so far, the next step is to migrate asciidocapi.py to asciidoc3api.py. It’s necessary to perform the "AsciiDoc-Selftest" described below …

In our well-tried approach we test the use of asciidocapi.py with v2: I copy "inputone.txt" (= "in001.txt") together with the needed "redsquare.jpg" to a directory far behind my working directory ~/.asciidoc3, in my case I use "/dos/db_phf/python/ad3api" (which lives in another partition and a second hard disk - but that’s of course irrelevant, you can use whatever you like to). I have to add "asciidocapi.py" to this directory, and the input-file "in001.txt" plus "redsquare.jpg", too. To write down the appropriate program named "api2.py" isn’t a challenge:

api2.py to work with asciidocapi.py
"""api2.py to test asciidocapi.py"""

from asciidocapi import AsciiDocAPI

asciidoc = AsciiDocAPI()
asciidoc.execute('in001.txt')

Now we change into the "new" working directory and let it run:

cd /dos/db_phf/python/ad3api
python2 api2.py

But the following error occurs:

api2.py throws an AsciiDocError
~: /dos/db_phf/python/ad3api$ python2 api2.py
Traceback (most recent call last):
  File "api2.py", line 4, in <module>
    asciidoc = AsciiDocAPI()
  File "/dos/db_phf/python/ad3api/asciidocapi.py", line 177, in __init__
    raise AsciiDocError('failed to locate asciidoc')
asciidocapi.AsciiDocError: failed to locate asciidoc

What happens here? Yes, we have no executable command "asciidoc", because we are working "local" within ~/.asciidoc3 or ~/asciidoc2 and haven’t installed AsciiDoc v2. As a workaround I place a mock inside the executable path:

sudo ln --symbolic /usr/bin/asciidoc ~/asciidoc2/asciidoc.py

Doing so we find our output "in001.html" in the same directory as the input (/dos/db_phf/python/ad3api) unless we define another output directory or filename. It works.

The AsciiDoc-Api needs the information where "asciidoc.py" resides: there are three different ways to provide this. The first we have seen above: the default assumes an "asciidoc" in the usual PATH containg "/usr/bin". The second is to define a new path-entry "ASCIIDOC_PY":

/dos/db_phf/python/ad3api$ echo $ASCIIDOC_PY   1

/dos/db_phf/python/ad3api$ export ASCIIDOC_PY="~/asciidoc2/asciidoc.py"
/dos/db_phf/python/ad3api$ echo $ASCIIDOC_PY
~/asciidoc2/asciidoc.py                        2
/dos/db_phf/python/ad3api$ python2 api2.py     3
1 ASCIIDOC_PY doesn’t exist, a blank line is returned
2 ASCIIDOC_PY is now set (for this session only)
3 … and we find the expected output. Be sure to deactivate the given symbolic link to see this in effect.

The third (probably easiest way for us at this moment) is to give the absolute path to "asciidoc.py":

"""api2.py to test asciidoc.py"""
from asciidocapi import AsciiDocAPI

asciidoc = AsciiDocAPI('~/asciidoc2/asciidoc.py')  1
asciidoc.attributes['data-uri'] = ''               2
asciidoc.execute('in001.txt')                      3
1 replace "~" with your home-directory, e.g. "/home/thatsme"
2 an option is given to see if everythings works fine
3 … yes it does, we find the expected output. Be sure to deactivate "ASCIIDOC_PY" to see this in effect.

By the way, when you see an error like this:

~$ python2 api2.py
Traceback (most recent call last):
  File "api2.py", line 7, in <module>
    asciidoc.execute('in001.txt')
  File "/dos/db_phf/python/ad3api/asciidocapi.py", line 248, in execute
    raise AsciiDocError(self.messages[-1])
IndexError: list index out of range

you forgot to copy the input file "in001.txt" to the new working directory.

AsciiDoc’v2 asciidocapi.py works fine (that is in deed no surprise at all, isn’t it?), but what about AsciiDoc3? Let’s move to v3 and try the same. I copy api2.py to api3.py and edit it as follows:

api3.py to work with asciidoc3api.py
"""api3.py to test asciidoc3api.py"""

from asciidoc3api import AsciiDocAPI

asciidoc = AsciiDocAPI('~/asciidoc2/asciidoc.py')  1
asciidoc.execute('in001.txt')
1 and again: replace ~ with your home-directory, e.g. "/home/thatsme"

This step requires to copy asciidocapi.py to asciidoc3api.py and edit it, too. First we migrate 2to3 asciidocapi.py to asciidoc3api.py:

sudo /usr/bin/2to3-3.5 -v -w -f all -f buffer -f set_literal -f idioms -f ws_comma -n asciidocapi.py

The result is renamed to asciidoc3api.py. Beside some "ws_comma"-changes we see only a few transformations (v2 vs. v3):

import __builtin__  # Because reload() is shadowed.
__builtin__.reload(self.asciidoc)
                 vs.
import builtins  # Because reload() is shadowed.
builtins.reload(self.asciidoc)

for k,v in self.attributes.items():
                 vs.
for k, v in list(self.attributes.items()):

except SystemExit, e:
                 vs.
except SystemExit as e:

So we try "python3 api3.py", but not very surprisingly:

Traceback (most recent call last):
  File "api3.py", line 1, in <module>
    from asciidoc3api import AsciiDoc3API
ImportError: cannot import name 'AsciiDoc3API'

We have to do some work to update the code to a running v3; see the following lines v2 vs. v3:

#!/usr/bin/env python
#!/usr/bin/env python3

class AsciiDocError(Exception):
class AsciiDoc3Error(Exception):

Stores asciidoc(1) command options.
Stores asciidoc3 command options.

class AsciiDocAPI(object):
class AsciiDoc3API(object):

AsciiDoc API class.
AsciiDoc3 API class.

def __init__(self, asciidoc_py=None):
def __init__(self, asciidoc3_py=None):
....

... and so on - see "asciidoc3api.py" in your installation/tarball of AsciiDoc3. When we start our new program we have:

Traceback (most recent call last):
  File "api3.py", line 2, in <module>
    asciidoc3 = AsciiDoc3API('~/.asciidoc3/asciidoc3_new.py')
  File "/dos/db_phf/python/ad3api/asciidoc3api.py", line 181, in __init__
    self.__import_asciidoc3()
  File "/dos/db_phf/python/ad3api/asciidoc3api.py", line 214, in __import_asciidoc3
    if Version(self.asciidoc3.VERSION) < Version(MIN_ASCIIDOC3_VERSION):
TypeError: unorderable types: Version() < Version()

To make a quick solution I comment out the lines that are dealing with VERSION:

[...]
#        if Version(self.asciidoc3.VERSION) < Version(MIN_ASCIIDOC3_VERSION):
#            raise AsciiDoc3Error(
#                'asciidoc3api %s requires asciidoc3 %s or better'
#                % (API_VERSION, MIN_ASCIIDOC3_VERSION))
[...]

and in addition all lines regarding doctests: we have no "doctest" in AsciiDoc3, and no version at all. We postpone the doctest and can shorten asciidoc3api.py. When it runs with Python3.x we’ll probably rearrange for forthcoming versions.

Now I encounter a new error:

AttributeError: module 'builtins' has no attribute 'reload'
during executing:
    if reload:
        import builtins  # Because reload() is shadowed.
        builtins.reload(self.asciidoc3)

Looking in the Python3 docs shows that the "imp package is pending deprecation in favor of importlib".

The following constant and functions are obsolete; their functionality is available through find_module() or load_module(). They are kept around for backward compatibility:

imp.load_source(name, pathname[, file])
Load and initialize a module implemented as a Python source file and return its module object. If the module was already initialized, it will be initialized again. The name argument is used to create or access a module object. The pathname argument points to the source file. The file argument is the source file, open for reading as text, from the beginning. It must currently be a real file object, not a user-defined class emulating a file. Note that if a properly matching byte-compiled file (with suffix .pyc or .pyo) exists, it will be used instead of parsing the given source file.

— https://docs.python.org/3.0/library/imp.html

So I load imp/importlib depending on Python’s version:

[...]
import sys, os, re
if float('3.0') <= float(sys.version[:3]) < float('3.4'):
    import imp
elif float('3.4') <= float(sys.version[:3]):
    import importlib
else:
    sys.exit('Python > 3.x required!')
[...]

and some lines later

[...]
if os.path.splitext(self.cmd)[1] in ['.py', '.pyc']:
            sys.path.insert(0, os.path.dirname(self.cmd))
            try:
                try:
                    if reload:
                        import builtins  # Because reload() is shadowed.
                        if float(sys.version[:3]) < float('3.4'): imp.reload('self.asciidoc3')
                        else: importlib.reload('self.asciidoc3')
                except ImportError:
                    raise AsciiDocError('failed to import ' + self.cmd)
[...]

                if float(sys.version[:3]) < float('3.4'): imp.load_source('asciidoc3', self.cmd)
                else: importlib.import_module('asciidoc3')
[...]

When we have this job done everything is o.k. and asciidoc3api.py works with all the three ways to determine the location of "asciidoc3.py". I do not show this here and list the output, but you can believe me, try it out when in doubt… And, just for your information, asciidoc3api.py has a few doctests, too! Try "python3 asciidoc3api.py -v".
We have successfully migrated asciidocapi.py to asciidoc3api.py.

7. a2x3.py

And the next step follows right now … As the last "bigger point" (?) we have to migrate a2x.py. We start - you guessed it - with a2x.py under Python2. Be sure to have all the executables installed if you want to make use of them - most users won’t need them all (I confess to have not used "w3m" in my lifetime before):
xsltproc, dblatex, fop, w3m, lynx, xmllint, epubcheck … Test it by typing "xsltproc" etc. on the command line.

Note The file "a2x.py" assigns in line 30: CONF_DIR = '/etc/asciidoc' - there is no need at this moment to edit, e.g. CONF_DIR = '~/asciidoc2'. When asciidoc.py and a2x.py live in the same directory, things are working.
Start a2x.py v2
cd ~/asciidoc2
python2 a2x.py --verbose -f pdf ~/.asciidoc3/inputfiles/in003.txt

I use "in003.txt" because it is a well-formed document which uses many features from AsciiDoc(3) - it’s the source of "asciidoc-user-manual". The output at the console would be like this:

Output of a2x with input in003.txt
~/asciidoc2$ python2 a2x.py --verbose -f pdf ~/.asciidoc3/inputfiles/in003.txt
a2x: args: ['--verbose', '-f', 'pdf', '~/.asciidoc3/inputfiles/in003.txt']
a2x: resource files: []
a2x: resource directories: ['~/asciidoc2/images', '~/asciidoc2/stylesheets']
a2x: executing: "~/asciidoc2/asciidoc.py" --backend docbook -a "a2x-format=pdf"  --verbose  --out-file "~/.asciidoc3/inputfiles/in003.xml" "~/.asciidoc3/inputfiles/in003.txt"

asciidoc: reading: ~/asciidoc2/asciidoc.conf
asciidoc: reading: ~/.asciidoc3/inputfiles/in003.txt
asciidoc: reading: ~/asciidoc2/docbook45.conf
asciidoc: reading: ~/asciidoc2/filters/music/music-filter.conf
asciidoc: reading: ~/asciidoc2/filters/latex/latex-filter.conf
asciidoc: reading: ~/asciidoc2/filters/code/code-filter.conf
asciidoc: reading: ~/asciidoc2/filters/source/source-highlight-filter.conf
asciidoc: reading: ~/asciidoc2/filters/graphviz/graphviz-filter.conf
asciidoc: reading: ~/asciidoc2/lang-en.conf
asciidoc: writing: ~/.asciidoc3/inputfiles/in003.xml
asciidoc: in003.txt: line 1342: filtering: "/usr/bin/python" "~/asciidoc2/asciidoc.py" -b docbook45  --attribute "numbered" --attribute "website=http://asciidoc3.org/" --attribute "icons" --attribute "toc" --attribute "a2x-format=pdf" --attribute "authorinitials=SJR" -a icons -a "iconsdir=~/.asciidoc3/images/icons" -a "imagesdir=~/.asciidoc3/images" -a "indir=~/.asciidoc3/inputfiles" -a "blockname=table" -s -
asciidoc: include: ~/.asciidoc3/inputfiles/customers.csv
asciidoc: in003.txt: line 3203: reading: ~/.asciidoc3/inputfiles/customers.csv
asciidoc: in003.txt: line 6031: filtering: "/usr/bin/python" "~/asciidoc2/asciidoc.py" -b docbook45  --attribute "numbered" --attribute "website=http://asciidoc3.org/" --attribute "icons" --attribute "toc" --attribute "a2x-format=pdf" --attribute "authorinitials=SJR" -a icons -a "iconsdir=~/.asciidoc3/images/icons" -a "imagesdir=~/.asciidoc3/images" -a "indir=~/.asciidoc3/inputfiles" -a "blockname=table" -s -

a2x: executing: "xmllint" --nonet --noout --valid "~/.asciidoc3/inputfiles/in003.xml"

a2x: executing: "dblatex" -t pdf -p "~/asciidoc2/dblatex/asciidoc-dblatex.xsl" -s "~/asciidoc2/dblatex/asciidoc-dblatex.sty"  -V  "~/.asciidoc3/inputfiles/in003.xml" 1

Build the book set list...
xsltproc -o /tmp/tmpvk3XfZ/doclist.txt --xinclude --xincludestyle doclist.xsl ~/.asciidoc3/inputfiles/in003.xml
Build the listings...
xsltproc -o /tmp/tmpvk3XfZ/listings.xml --xinclude --xincludestyle --param current.dir '~/.asciidoc3/inputfiles' /usr/share/dblatex/xsl/common/mklistings.xsl ~/.asciidoc3/inputfiles/in003.xml
xsltproc -o in003.rtex --xinclude --xincludestyle --param current.dir '~/.asciidoc3/inputfiles' --param listings.xml '/tmp/tmpvk3XfZ/listings.xml' /tmp/tmpvk3XfZ/custom.xsl ~/.asciidoc3/inputfiles/in003.xml
XSLT stylesheets DocBook - LaTeX 2e (0.3.7-1)
===================================================
Image 'dblatex' not found      2
Build in003.pdf
built-in module pdftex registered
no support found for ifthen    3
no support found for ifxetex
no support found for fontspec
no support found for xltxtra
no support found for fontenc
no support found for inputenc
no support found for fancybox
built-in module makeidx registered
no support found for asciidoc-dblatex
building additional files...
checking if compiling is necessary...
the output file doesn't exist
pdflatex -interaction=batchmode in003.tex
running post-compilation scripts...
[index] the index file /tmp/tmpvk3XfZ/in003.idx is empty
in003.aux MD5 checksum changed
in003.toc MD5 checksum changed
the /tmp/tmpvk3XfZ/in003.aux file has changed
pdflatex -interaction=batchmode in003.tex
running post-compilation scripts...
[index] the index file /tmp/tmpvk3XfZ/in003.idx is empty
in003.aux MD5 checksum changed
the /tmp/tmpvk3XfZ/in003.aux file has changed but no re-run required?
no new compilation is needed
running last-compilation scripts...
'in003.pdf' successfully built

a2x: deleting ~/.asciidoc3/inputfiles/in003.xml
1 dblatex is working …
2 … or not?
3 And what is this?

First we look in "~/.asciidoc3/inputfiles/" and find the output-pdf "in003.pdf" - it is written into the identical directory as the input was in. It looks pretty good - and it is. And here are the answers to the three questions:

1 "dblatex is working …" Yes it is, see the footer at page1: (created by) dblatex/pdf,
2 "… or not?" The message is correct, we don’t need dblatex at this point (and it’s not available here, too - but that’s not the whole truth, see here),
3 "And what is this?" Don’t care about these messages. We don’t need the LaTeX-functions ifthen, ifxetex, …, fancybook anyway.

So this works fine, let’s try the alternative way to produce an even more (? - as said in the docs) good-looking pdf-file (rename "in003.pdf" to "in003_dblatex.pdf" if you like to compare the two outputs):

Start a2x.py v2 using fop
python2 a2x.py --verbose --fop -f pdf ~/.asciidoc3/inputfiles/in003.txt

I encounter the following message:

[...]
Making portrait pages on A4 paper (210mmx297mm)

a2x: chdir ~/asciidoc2
a2x: executing: "fop"   -fo "~/.asciidoc3/inputfiles/in003.fo" -pdf "~/.asciidoc3/inputfiles/in003.pdf"

[warning] /usr/bin/fop: No java runtime was found
[WARN] PropertyMaker - background-color="inherit" on fo:block, but no explicit value found on the parent FO.
[WARN] FOUserAgent - Invalid property value encountered in border="" (Siehe Position 118:461)
[...]

"No java runtime was found"? That’s factual not correct: I have installed the usual "openjdk-jre"-package. Don’t look around too long to find the solution, for me helped (a hint was found in "askubuntu.com/questions/217421/freemind-wont-run-despite-openjdk-jre-installed" - here’s a very similar approach):
Open as root/sudo the file "/usr/share/java-wrappers/java-wrappers.sh" and edit as follows:

Editing /usr/share/java-wrappers/java-wrappers.sh
[...]

    if [ -z "$JAVA_HOME" ]; then

        # We now try to look for a reasonable JAVA_HOME.
        # First, narrow the choices according to what
        # was asked.
        #
        # Please see the list of understood jvms in
        # /usr/lib/java-wrappers/jvm-list.sh

        #DIRS=""                    1
        DIRS="$__jvm_default"       2

        # If no arguments are given, we take it as 'all'
        if test -z "$1"; then
            set all
        fi
[...]
1 comment out the original DIRS="",
2 and replace it with DIRS="$__jvm_default"

running again,

python2 a2x.py --verbose --fop -f pdf ~/.asciidoc3/inputfiles/in003.txt

the "[warning] /usr/bin/fop: No java runtime was found" is gone - we don’t care about the many "[WARN] PropertyMaker / FOUserAgent" messages at this time. And we find the second pdf in our directory. You may rename it to "in003_fop.pdf".

There is a third way to generate a "good-looking" pdf. Run

python2 a2x.py --verbose --fop -k -f pdf ~/.asciidoc3/inputfiles/in003.txt

with option "-k" to keep the docbook-file "in003.xml" and subsequent

dblatex ~/.asciidoc3/inputfiles/in003.xml

This produces another pdf which is slightly different from the first - maybe a configuration problem, probably caused by dblatex, not asciidoc? This suspends the one-step-feature of a2x - so put it on the TODO-list to consider later on.

empty

Now we switch to a2x.py v3. Of course we have to work 2to3 on the v2 program:

sudo 2to3 -v -w -f all -f buffer -f set_literal -f idioms -f ws_comma -n --add-suffix=3 ~/.asciidoc3/a2x.py

We see again a lot of ws_comma-, list/iterator-, or 'except as' changes, and rename "a2x.py3" to our new "a2x3.py"

Starting a2x3.py v3
cd ~/.asciidoc3
python3 a2x3.py --verbose -f pdf ~/.asciidoc3/inputfiles/in003.txt

But that’s not a good idea … too many messages pop up. Before we can go on, there’s work to do (of course only when you didn’t do this before):

sudo ln --symbolic ~/.asciidoc3/asciidoc3.py /usr/bin/asciidoc3

The next step is to change (if necessary) the permissions of "asciidoc3.py" and set them to "rwxrwxr--"
And then we have to copy "~/asciidoc2/dblatex/" and "~/asciidoc2/docbook-xsl/" to "~/.asciidoc3". Doing so, we change the filename of "asciidoc-dblatex.sty" to "asciidoc3-dblatex.sty", "asciidoc-dblatex.xsl" to "asciidoc3-dblatex.xsl", and "asciidoc-docbook-xsl.txt" to "asciidoc3-docbook-xsl.txt."
The source of our new "a2x3.py" has to be heavily edited by hand to change every "asciidoc" to "asciidoc3" (I do not list all the changes here, see the final version of "a2x3.py"). Only for development we change "CONF_DIR = '/etc/asciidoc'" to CONF_DIR = '~/asciidoc3'. And in this state of "testing the code" I tag some lines with "TODO". Some examples: "VERSION = '8.6.9' # TODO" or "stdout = stderr = subprocess.PIPE # TODO" - do you remember the prior subprocess-handling - perhaps we have to do similar work? A third: "# HTMLParser has problems with non-ASCII strings. … parser.feed(contents.decode(encoding)) # TODO" - is this relevant in Python3?

Let’s try again:

python3 a2x3.py --verbose -v -f pdf ~/.asciidoc3/inputfiles/in003.txt

The output:

a2x3: args: ['--verbose', '-f', 'pdf', '~/.asciidoc3/inputfiles/in003.txt']
a2x3: resource files: []
a2x3: resource directories: ['~/.asciidoc3/images', '~/.asciidoc3/stylesheets']
a2x3: executing: "~/.asciidoc3/asciidoc3.py" --backend docbook -a "a2x3-format=pdf"  --verbose  --out-file "~/.asciidoc3/inputfiles/in003.xml" "~/.asciidoc3/inputfiles/in003.txt"
b'' 1
b'asciidoc3: reading: ~/.asciidoc3/asciidoc3.conf\nasciidoc3: Entering "update_input-encoding()" with file: ~/.asciidoc3/asciidoc3.conf\nasciidoc3: input-encoding updated as in ~/.asciidoc3/asciidoc3.conf: utf-8\nasciidoc3: Entering "update/fix input-errors" with input-file: ~/.asciidoc3/asciidoc3.conf\nasciidoc3: input-errors updated as in ~/.asciidoc3/asciidoc3.conf: strict\n
[...]
asciidoc3: in003.txt: line 6031: filtering: "/usr/bin/python3" "~/.asciidoc3/asciidoc3.py" -b docbook45  --attribute "a2x3-format=pdf" --attribute "icons" --attribute "toc" --attribute "website=http://asciidoc3.org/" --attribute "authorinitials=SJR" --attribute "numbered" -a icons -a "iconsdir=~/.asciidoc3/images/icons" -a "imagesdir=~/.asciidoc3/images" -a "indir=~/.asciidoc3/inputfiles"   -a "blockname=table" -a "input-encoding=utf-8" -a "input-errors=strict" -a "output-encoding=utf-8" -a "output-errors=strict" -s -\n'
a2x3: executing: "xmllint" --nonet --noout --valid "~/.asciidoc3/inputfiles/in003.xml"
b''
b''
a2x3: executing: "dblatex" -t pdf -p "~/.asciidoc3/dblatex/asciidoc3-dblatex.xsl" -s "~/.asciidoc3/dblatex/asciidoc3-dblatex.sty"  -V  "~/.asciidoc3/inputfiles/in003.xml"
b''
[...]
'in003.pdf' successfully built\n" 2
a2x3: deleting ~/.asciidoc3/inputfiles/in003.xml
1 First, we see the output is given as a bytestring b\' ' - That seems not to be what we expected.
2 Second, 'in003.pdf' is "successfully built".

We care about 1 later on, I start with 2 :
Taking a look into the pdf shows unfortunately a weighty difference on page 46v3/45v2 regarding table7:

pdfv2table7

and a few other tables, too.

pdfv3table7

Looking in the source of the docbook-file "in003.xml" v2 vs. v3 shows the difference:

<section id="_example_tables">  1
<title>Example tables</title>
<table
frame="all"
rowsep="1" colsep="1"
>
<title>Simple table</title>
<?dbhtml table-width="15%"?>
<?dbfo table-width="15%"?>
<?dblatex table-width="15%"?>
<tgroup cols="3">
<colspec colname="col_1" colwidth="21*"/>
<colspec colname="col_2" colwidth="21*"/>
<colspec colname="col_3" colwidth="21*"/>
<tbody>
<row>
<entry align="left" valign="top"><simpara>1</simpara></entry>
[...]
</tbody>
</tgroup>
</table>

\--- snipp ---

<section id="_example_tables">  2
<title>Example tables</title>
<table
frame="all"
rowsep="1" colsep="1"
>
<title>Simple table</title>
<?dbhtml table-width="15%"?>
<?dbfo table-width="15%"?>
<?dblatex table-width="15%"?>
<tgroup cols="3">
<colspec colname="col_1" colwidth="0*"/> 3
<colspec colname="col_2" colwidth="0*"/>
<colspec colname="col_3" colwidth="0*"/>
<tbody>
<row>
<entry align="left" valign="top"><simpara>1</simpara></entry>
[...]
</tbody>
</tgroup>
</table>
1 v2, this is correct
2 v3, not correct …
3 colwidth="0*" ↯ ← something here went wrong, should be "21*" (btw do you see the flash-glyph ↯ \0xE2 \0x86 \0xAF?)

We go to "docbook45.conf" to find the crucial point:

[...]
# Tables.
[tabletags-default]
colspec=<colspec colname="col_{colnumber}" colwidth="{width!{colpcwidth}*}{width?{colabswidth}{pageunits}}"/>
bodyrow=<row>|</row>
[...]

but that’s not the place, because nothing is computed here, so we go to asciidoc3.py - btw, "asciidoc3.conf" doesn’t deal with "colwidth/colpcwidth" anyway. Here we find the calculating place in "class Table()" … I add some "print()"

class Table(AbstractBlock)
[...]
    def parse_cols(self, cols, halign, valign):                                ##_nr_: 3275 _
        """                                                                    ##_nr_: 3276 _
        Build list of column objects from table 'cols', 'halign' and 'valign'  ##_nr_: 3277 _
        attributes.                                                            ##_nr_: 3278 _
        """                                                                    ##_nr_: 3279 _
        # [<multiplier>*][<align>][<width>][<style>]
        [...]
        elif round(percents) < 100:                                            ##_nr_: 3349 _
            self.error('total width less than 100%%: %s' % cols, self.start)   ##_nr_: 3350 _
        print("col.pcwidth", col.pcwidth)
        print("col.abswidth", col.abswidth)
        print("props", props)
        print("percents", percents)
        print("col.width", col.width)
        print("pcunits", pcunits)
[...]

and the same at the identical lines in asciidoc2.py. The output shows the point:

        v2                             v3
[...]
('col.pcwidth', 33)               col.pcwidth 33
('col.abswidth', '267')           col.abswidth 0
('props', 3)                      props 3
('percents', 99.99999999999999)   percents 99.99999999999999
('col.width', '1')                col.width 1
('pcunits', False)                pcunits False
('col.pcwidth', 33)               col.pcwidth 33
('col.abswidth', '267')           col.abswidth 0
('props', 3)                      props 3
('percents', 99.99999999999999)   percents 99.99999999999999
('col.width', '1')                col.width 1
('pcunits', False)                pcunits False
('col.pcwidth', 71)               col.pcwidth 71
('col.abswidth', '571')           col.abswidth 571
[...]

v3 has "col.abswidth" = 0, that doesn’t fit. I’ve tracked down the cause to the critical line of code at line3340:

  col.abswidth = self.abswidth * (col.pcwidth//100) ##_nr_: 3340 _

We don’t need the integer division here! Change this to

# integer divison in nr3340
ad3new, counter = re.subn(r'\(col\.pcwidth//100\)', r'(col.pcwidth/100) ', ad3new)
assert counter == 1

and this bug is gone, table7 looks fine …
To be more precisely, I compare the two pdf’s byte by byte via k3diff:

/CreationDate (D:20180218115437+01'00')
/ModDate (D:20180218115437+01'00')
/ID [<C5DDA3D67C7F315454F94CE694819526> <C5DDA3D67C7F315454F94CE694819526>]

They are not binary identical, but that doesn’t matter here. And the tool "diffpdf" says: "files seem to be identical" - Allright!

empty

Going to the issue that we see bytestrings b\' ' on stdout: this behavior starts when a2x3.py reports in verbose modus about the executed command and gives out "(stdoutdata, stderrdata, returncode)", see "def shell()":

Output of shelling in a2x3.py
[...]
def shell(cmd, raise_error=True):
    '''
    Execute command cmd in shell and return tuple
    (stdoutdata, stderrdata, returncode).
    If raise_error is True then a non-zero return terminates the application.
    '''
[...]
    verbose('executing: %s' % cmd)
    if OPTIONS.dry_run:
        return
    stdout = stderr = subprocess.PIPE            # TODO
    try:
        popen = subprocess.Popen(cmd, stdout=stdout, stderr=stderr,      1
                shell=True, env=ENV)
    except OSError as e:
        die('failed: %s: %s' % (cmd, e))
    stdoutdata, stderrdata = popen.communicate() # TODO
    if OPTIONS.verbose:
        print(stdoutdata)
        print(stderrdata)
    if popen.returncode != 0 and raise_error:    # TODO
        die('%s returned non-zero exit status %d' % (cmd, popen.returncode))
    return (stdoutdata, stderrdata, popen.returncode)
[...]
1 Do you remember? We have seen this before dealing with filters.

So we change this line:

    try:
        popen = subprocess.Popen(cmd, stdout=stdout, stderr=stderr, shell=True,
                                 env=ENV, universal_newlines=True, bufsize=-1)

and it works!

Warning Be careful! popen = subprocess.Popen( …, shell=True, …) opens a security hole. Make sure that all shelling files/filters come from a trusted source.

The next portion of code to take care about is "def find_resources(files, tagname, attrname, filter=None)".
At first I decided to replace "filter" with "_filter". This is because of 2to3 refactored "… or filter(attrs)" to "… or list(filter(attrs))". The identifier "filter" conflicts with the fixer "filter". We can avoid this by excluding the filter "2to3 -x filter …" or, as I prefer, rename the variable and erase the needless "list"-addition.
In the same function we see an "encoding" variable:

a2x3.py HTMLParser before refactoring
[...]
   contents = read_file(filename)
   mo = re.search(r'\A<\?xml.* encoding="(.*?)"', contents)
   if mo:
       encoding = mo.group(1)
       parser.feed(contents.decode(encoding))
   else:
       parser.feed(contents)
   parser.close()
[...]

Oh no - I don’t like this encoding-mess any more … To begin with (see Python3 docs): "HTMLParser.feed(data) … data must be str". So here "contents" has to be str. And it is of course str, since "contents = read_file(filename)" and "def read_file(filename, mode='r')" - one of a2x3.py’s utility functions - reads (in Python3) by default all files assuming them str with encoding = 'utf-8' by default. We could alternatively more verbose use "def read_file(filename, mode='r', encoding='utf-8')". So far so good. Because "contens" is of type "str" the given unchanged code throws an error (and it did when I tried): "AttributeError: 'str' object has no attribute 'decode' ".
What is to do here? We make it easy "keep it simple, stupid": The encoding of the 'data' (the text not included by tags) doesn’t care - it is unused information. We need only the information in tag-attributes and may assume 'utf-8'. The match-object, mo and if-start - the five lines given above - are strictly speaking dispensable.
This solution is considered to be safe unless the forthcoming tests disprove it. See the tests here.

a2x3.py HTMLParser refactored
[...]
   contents = read_file(filename)
   mo = re.search(r'\A<\?xml.* encoding="(.*?)"', contents)
   if mo:
       #encoding = mo.group(1)
       #parser.feed(contents.decode(encoding))                    # v2
       contents = str(contents.encode('utf-8', errors='ignore'))  # v3
   parser.feed(contents)
   parser.close()
[...]

At last I add some comments, edit the copyright etc.

empty

8. Checking out Mathematical Formulas

Let’s check if the mechanism providing good looking formulas are working well.

8.1. LaTeX Math

When testing this feature, we have to pay attention to the differences between the required outputs und the use with dblatex, see the "AsciiDoc(3) User Guide" chapter "Mathematical Formulas". We make use of the given file /doc/latexmath.txt and start with Python2:

~/.asciidoc3$ python2 ~/asciidoc2/a2x.py -f pdf --verbose -a icons ~/.asciidoc3/doc/latexmath.txt

The output "latexmath.pdf" is found in the source directory "~/.asciidoc3/doc/". I rename it to "latexmath2.pdf"

Let’s try the same with a2x3.py:

~/.asciidoc3$ python3 a2x3.py -f pdf --verbose -a icons ~/.asciidoc3/doc/latexmath.txt

Yes, it works right out of the box - "diffpdf": files seem to be the same.

Output of latexmath-macro

latexmath

empty

8.2. AsciiMathML

The identical procedure as before.

~/.asciidoc3$ python2 ~/asciidoc2/asciidoc.py --verbose -n -a icons -a asciimath ~/.asciidoc3/doc/asciimathml.txt

The output "asciimathml.html" is found in the source directory "~/.asciidoc3/doc/". I rename it to "asciimathml2.html"

~/.asciidoc3$ python3 asciidoc3.py --verbose -n -a icons -a asciimath ~/.asciidoc3/doc/asciimathml.txt

Be sure that the program can find/include "~/.asciidoc3/javascripts/ASCIIMathML.js". It works.

Output of asciimathml-macro

asciimathml

empty

8.3. LaTeXMathML

Once more the idetical procedure.

~/.asciidoc3$ python2 ~/asciidoc2/asciidoc.py --verbose -n -a icons -a latexmath ~/.asciidoc3/doc/latexmathml.txt

The output "latexmathml.html" is found in the source directory "~/.asciidoc3/doc/". I rename it to "latexmathml2.html"

~/.asciidoc3$ python3 asciidoc3.py --verbose -n -a icons -a latexmath ~/.asciidoc3/doc/latexmathml.txt

Be sure that the program can find/include "~/.asciidoc3/javascripts/LaTeXMathML.js". It works.

Output of latexmathml-macro

latexmathml

empty

9. Massive Testing

We have done a considerable part of the migrating part (… so I do hope by all means). To prepare the "automatic" testing in the upcoming chapter, it’s time to rename more files and label them as "AsciiDoc3". A first step: from now on, the v3 of asciidoc.py is named no longer "asciidoc3_new.py" but "asciidoc3.py" - this is easily done by editing editad3.py. (This was applied already.)

9.1. testasciidoc3.py

A second step and the beginning of the intense test-procedure: why not make use of the delivered test-system of AsciiDoc? It’s provided in the "tests" directory. Here we go: We change to directory ~asciidoc2/tests and start

~/asciidoc2/tests$ python2 testasciidoc.py --force update

WRITING: data/testcases-html4.html
WRITING: data/testcases-xhtml11.html
WRITING: data/testcases-docbook.xml
[...]
WRITING: data/article-html5.html
sh: 1: cannot open ~/.asciidoc3/images/images/smallnew.png: No such file
sh: 1: cannot open ~/.asciidoc3/images/images/tiger.png: No such file
WRITING: data/article-data-uri-html4.html
sh: 1: cannot open ~/.asciidoc3/images/images/smallnew.png: No such file
sh: 1: cannot open ~/.asciidoc3/images/images/tiger.png: No such file
[...]

The well-known "image-dir" annoyance. By the way, "testasciidoc.py" finds the required executable "asciidoc.py" via "asciidocapi.py" and the symbolic link in /usr/bin from "asciidoc" to ~/asciidoc2/asciidoc.py. We added this link earlier when migrating asciidocapi.py to asciidoc3api.py. Since we altered "asciidoc.conf" (which resides in the same directory ~/asciidoc2) we see some "No such file": we have in the meantime changed some links, they target at the asciidoc3-image-directory. We come back very soon to this issue when dealing with v3 and especially when comparing v2 and v3. But when we use the original "asciidoc.conf", everything seems to be ok; "force" generates 178 files in ~asciidoc2/tests/data:

Output of testasciidoc.py (using the original - unchanged - asciidoc.conf)
~/asciidoc2/tests$ python2 testasciidoc.py --force update       1

     1  WRITING: data/testcases-html4.html
     2  WRITING: data/testcases-xhtml11.html
     3  WRITING: data/testcases-docbook.xml
     4  WRITING: data/testcases-html5.html
     5  WRITING: data/filters-test-html4.html
     6  WRITING: data/filters-test-xhtml11.html
     7  WRITING: data/filters-test-docbook.xml
     8  WRITING: data/filters-test-html5.html
     9  WRITING: data/newtables-html4.html
    10  WRITING: data/newtables-xhtml11.html
    11  WRITING: data/newtables-docbook.xml
    12  WRITING: data/newtables-html5.html
    13  WRITING: data/oldtables-html4.html
    14  WRITING: data/oldtables-xhtml11.html
    15  WRITING: data/oldtables-docbook.xml
    16  WRITING: data/oldtables-html5.html
    17  WRITING: data/source-highlight-filter-html4.html
    18  WRITING: data/source-highlight-filter-xhtml11.html
    19  WRITING: data/source-highlight-filter-docbook.xml
    20  WRITING: data/source-highlight-filter-html5.html
    21  WRITING: data/article-html4.html
    22  WRITING: data/article-xhtml11.html
    23  WRITING: data/article-docbook.xml
    24  WRITING: data/article-html5.html
    25  WRITING: data/article-data-uri-html4.html
    26  WRITING: data/article-data-uri-xhtml11.html
    27  WRITING: data/article-data-uri-html5.html
    28  WRITING: data/article-docinfo-docbook.xml
    29  WRITING: data/book-html4.html
    30  WRITING: data/book-xhtml11.html
    31  WRITING: data/book-docbook.xml
    32  WRITING: data/book-html5.html
    33  WRITING: data/book-multi-html4.html
    34  WRITING: data/book-multi-xhtml11.html
    35  WRITING: data/book-multi-docbook.xml
    36  WRITING: data/book-multi-html5.html
    37  WRITING: data/asciidoc.1-html4.html
    38  WRITING: data/asciidoc.1-xhtml11.html
    39  WRITING: data/asciidoc.1-docbook.xml
    40  WRITING: data/asciidoc.1-html5.html
    41  WRITING: data/slidy-example-slidy.html
    42  WRITING: data/asciimathml-xhtml11.html
    43  WRITING: data/asciimathml-html5.html
    44  WRITING: data/latexmathml-xhtml11.html
    45  WRITING: data/latexmathml-html5.html
    46  WRITING: data/latexmath-docbook.xml
    47  WRITING: data/latex-filter-html4.html
    48  WRITING: data/latex-filter-xhtml11.html
    49  WRITING: data/latex-filter-docbook.xml
    50  WRITING: data/latex-filter-html5.html
    51  WRITING: data/utf8-examples-html4.html
    52  WRITING: data/utf8-examples-xhtml11.html
    53  WRITING: data/utf8-examples-docbook.xml
    54  WRITING: data/utf8-examples-html5.html
    55  WRITING: data/open-block-test-html4.html
    56  WRITING: data/open-block-test-xhtml11.html
    57  WRITING: data/open-block-test-docbook.xml
    58  WRITING: data/open-block-test-html5.html
    59  WRITING: data/lang-en-article-test-docbook.xml
    60  WRITING: data/lang-en-article-test-xhtml11.html
    61  WRITING: data/lang-en-article-test-html4.html
    62  WRITING: data/lang-en-article-test-html5.html
    63  WRITING: data/lang-en-book-test-docbook.xml
    64  WRITING: data/lang-en-book-test-xhtml11.html
    65  WRITING: data/lang-en-book-test-html4.html
    66  WRITING: data/lang-en-book-test-html5.html
    67  WRITING: data/lang-en-man-test-docbook.xml
    68  WRITING: data/lang-ru-article-test-docbook.xml
    69  WRITING: data/lang-ru-article-test-xhtml11.html
    70  WRITING: data/lang-ru-article-test-html4.html
    71  WRITING: data/lang-ru-article-test-html5.html
    72  WRITING: data/lang-ru-book-test-docbook.xml
    73  WRITING: data/lang-ru-book-test-xhtml11.html
    74  WRITING: data/lang-ru-book-test-html4.html
    75  WRITING: data/lang-ru-book-test-html5.html
    76  WRITING: data/lang-ru-man-test-docbook.xml
    77  WRITING: data/lang-fr-article-test-docbook.xml
    78  WRITING: data/lang-fr-article-test-xhtml11.html
    79  WRITING: data/lang-fr-article-test-html4.html
    80  WRITING: data/lang-fr-article-test-html5.html
    81  WRITING: data/lang-fr-book-test-docbook.xml
    82  WRITING: data/lang-fr-book-test-xhtml11.html
    83  WRITING: data/lang-fr-book-test-html4.html
    84  WRITING: data/lang-fr-book-test-html5.html
    85  WRITING: data/lang-fr-man-test-docbook.xml
    86  WRITING: data/lang-de-article-test-docbook.xml
    87  WRITING: data/lang-de-article-test-xhtml11.html
    88  WRITING: data/lang-de-article-test-html4.html
    89  WRITING: data/lang-de-article-test-html5.html
    90  WRITING: data/lang-de-book-test-docbook.xml
    91  WRITING: data/lang-de-book-test-xhtml11.html
    92  WRITING: data/lang-de-book-test-html4.html
    93  WRITING: data/lang-de-book-test-html5.html
    94  WRITING: data/lang-de-man-test-docbook.xml
    95  WRITING: data/lang-hu-article-test-docbook.xml
    96  WRITING: data/lang-hu-article-test-xhtml11.html
    97  WRITING: data/lang-hu-article-test-html4.html
    98  WRITING: data/lang-hu-article-test-html5.html
    99  WRITING: data/lang-hu-book-test-docbook.xml
   100  WRITING: data/lang-hu-book-test-xhtml11.html
   101  WRITING: data/lang-hu-book-test-html4.html
   102  WRITING: data/lang-hu-book-test-html5.html
   103  WRITING: data/lang-hu-man-test-docbook.xml
   104  WRITING: data/lang-es-article-test-docbook.xml
   105  WRITING: data/lang-es-article-test-xhtml11.html
   106  WRITING: data/lang-es-article-test-html4.html
   107  WRITING: data/lang-es-article-test-html5.html
   108  WRITING: data/lang-es-book-test-docbook.xml
   109  WRITING: data/lang-es-book-test-xhtml11.html
   110  WRITING: data/lang-es-book-test-html4.html
   111  WRITING: data/lang-es-book-test-html5.html
   112  WRITING: data/lang-es-man-test-docbook.xml
   113  WRITING: data/lang-pt-BR-article-test-docbook.xml
   114  WRITING: data/lang-pt-BR-article-test-xhtml11.html
   115  WRITING: data/lang-pt-BR-article-test-html4.html
   116  WRITING: data/lang-pt-BR-article-test-html5.html
   117  WRITING: data/lang-pt-BR-book-test-docbook.xml
   118  WRITING: data/lang-pt-BR-book-test-xhtml11.html
   119  WRITING: data/lang-pt-BR-book-test-html4.html
   120  WRITING: data/lang-pt-BR-book-test-html5.html
   121  WRITING: data/lang-pt-BR-man-test-docbook.xml
   122  WRITING: data/lang-uk-article-test-docbook.xml
   123  WRITING: data/lang-uk-article-test-xhtml11.html
   124  WRITING: data/lang-uk-article-test-html4.html
   125  WRITING: data/lang-uk-article-test-html5.html
   126  WRITING: data/lang-uk-book-test-docbook.xml
   127  WRITING: data/lang-uk-book-test-xhtml11.html
   128  WRITING: data/lang-uk-book-test-html4.html
   129  WRITING: data/lang-uk-book-test-html5.html
   130  WRITING: data/lang-uk-man-test-docbook.xml
   131  WRITING: data/lang-nl-article-test-docbook.xml
   132  WRITING: data/lang-nl-article-test-xhtml11.html
   133  WRITING: data/lang-nl-article-test-html4.html
   134  WRITING: data/lang-nl-article-test-html5.html
   135  WRITING: data/lang-nl-book-test-docbook.xml
   136  WRITING: data/lang-nl-book-test-xhtml11.html
   137  WRITING: data/lang-nl-book-test-html4.html
   138  WRITING: data/lang-nl-book-test-html5.html
   139  WRITING: data/lang-nl-man-test-docbook.xml
   140  WRITING: data/lang-it-article-test-docbook.xml
   141  WRITING: data/lang-it-article-test-xhtml11.html
   142  WRITING: data/lang-it-article-test-html4.html
   143  WRITING: data/lang-it-article-test-html5.html
   144  WRITING: data/lang-it-book-test-docbook.xml
   145  WRITING: data/lang-it-book-test-xhtml11.html
   146  WRITING: data/lang-it-book-test-html4.html
   147  WRITING: data/lang-it-book-test-html5.html
   148  WRITING: data/lang-it-man-test-docbook.xml
   149  WRITING: data/lang-cs-article-test-docbook.xml
   150  WRITING: data/lang-cs-article-test-xhtml11.html
   151  WRITING: data/lang-cs-article-test-html4.html
   152  WRITING: data/lang-cs-article-test-html5.html
   153  WRITING: data/lang-cs-book-test-docbook.xml
   154  WRITING: data/lang-cs-book-test-xhtml11.html
   155  WRITING: data/lang-cs-book-test-html4.html
   156  WRITING: data/lang-cs-book-test-html5.html
   157  WRITING: data/lang-cs-man-test-docbook.xml
   158  WRITING: data/lang-ro-article-test-docbook.xml
   159  WRITING: data/lang-ro-article-test-xhtml11.html
   160  WRITING: data/lang-ro-article-test-html4.html
   161  WRITING: data/lang-ro-article-test-html5.html
   162  WRITING: data/lang-ro-book-test-docbook.xml
   163  WRITING: data/lang-ro-book-test-xhtml11.html
   164  WRITING: data/lang-ro-book-test-html4.html
   165  WRITING: data/lang-ro-book-test-html5.html
   166  WRITING: data/lang-ro-man-test-docbook.xml
   167  WRITING: data/rcs-id-marker-test-html4.html
   168  WRITING: data/rcs-id-marker-test-xhtml11.html
   169  WRITING: data/rcs-id-marker-test-docbook.xml
   170  WRITING: data/rcs-id-marker-test-html5.html
   171  WRITING: data/utf8-bom-test-html4.html
   172  WRITING: data/utf8-bom-test-xhtml11.html
   173  WRITING: data/utf8-bom-test-docbook.xml
   174  WRITING: data/utf8-bom-test-html5.html
   175  WRITING: data/deprecated-quotes-html4.html
   176  WRITING: data/deprecated-quotes-xhtml11.html
   177  WRITING: data/deprecated-quotes-docbook.xml
   178  WRITING: data/deprecated-quotes-html5.html
1 "line numbers" are added by "~$ nl …"

We switch the working directory and go to asciidoc3 using the command "cd ~/.asciidoc3/tests". First, be sure to see our new "asciidoc3api.py" in this directory. Second, we migrate "testasciidoc.py" to Python3 "testasciidoc3.py":

2to3 -v -w -f all -f buffer -f set_literal -f idioms -f ws_comma -n --add-suffix=3 testasciidoc.py

I rename the result (with shows not that much changes!?) to "testasciidoc3.py" - but I have to edit manually a few lines.

Some manual editing of testasciidoc3.py
#!/usr/bin/env python3
[...]
import io
import asciidoc3api
[...]
#if sys.platform[:4] == 'java':
#    Jython cStringIO is more compatible with CPython StringIO.
#    import io as StringIO
[...]
def generate_expected(self, backend):
        """
        Generate and return test data output for backend.
        """
        asciidoc = asciidoc3api.AsciiDoc3API()           1
        asciidoc.options.values = self.options
        asciidoc.attributes = self.attributes
1 Later I change all "asciidoc" to "asciidoc3" without further notice.

That’s all for the moment to make a beginning …

First run of testasciidoc3.py
~/.asciidoc3/tests$ python3 testasciidoc3.py --force update
WRITING: data/testcases-html4.html
Traceback (most recent call last):
  File "<string>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: '../../images/smallnew.png'    1
WRITING: data/testcases-xhtml11.html
WRITING: data/testcases-docbook.xml
WRITING: data/testcases-html5.html
WRITING: data/filters-test-html4.html
WRITING: data/filters-test-xhtml11.html
WRITING: data/filters-test-docbook.xml
WRITING: data/filters-test-html5.html
asciidoc3: WARNING: <stdin>: line 14: missing style: [blockdef-listing]: source        2
asciidoc3: WARNING: <stdin>: line 17: missing style: [blockdef-listing]: source
WRITING: data/newtables-html4.html
asciidoc3: WARNING: <stdin>: line 14: missing style: [blockdef-listing]: source
asciidoc3: WARNING: <stdin>: line 17: missing style: [blockdef-listing]: source
WRITING: data/newtables-xhtml11.html
WRITING: data/newtables-docbook.xml
asciidoc3: WARNING: <stdin>: line 14: missing style: [blockdef-listing]: source
asciidoc3: WARNING: <stdin>: line 17: missing style: [blockdef-listing]: source
WRITING: data/newtables-html5.html
Traceback (most recent call last):
  File "testasciidoc3.py", line 414, in <module>
    tests.update(number, backend, force=force)
  File "testasciidoc3.py", line 325, in update
    test.update(backend, force=force)
  File "testasciidoc3.py", line 210, in update
    self.update_expected(backend)
  File "testasciidoc3.py", line 189, in update_expected
    lines = self.generate_expected(backend)
  File "testasciidoc3.py", line 182, in generate_expected
    asciidoc.execute(infile, outfile, backend)
  File "~/.asciidoc3/tests/asciidoc3api.py", line 212, in execute
    self.asciidoc3.execute(self.cmd, opts.values, args)
  File "~/.asciidoc3/asciidoc3.py", line 6012, in execute
    asciidoc3(backend, doctype, confiles, infile, outfile, options)
  File "~/.asciidoc3/asciidoc3.py", line 5862, in asciidoc3
    document.translate(has_header) # Generate the output.
  File "~/.asciidoc3/asciidoc3.py", line 1863, in translate
    Section.translate_body()
  File "~/.asciidoc3/asciidoc3.py", line 2529, in translate_body
    nxt.translate()
  File "~/.asciidoc3/asciidoc3.py", line 5423, in translate
    raise EAsciiDoc("Deprecated old tables found --> old tables are no longer supported in AsciiDoc3.py")  3
[...]
1 This again …
2 and why this?
3 but that looks good: "newtables-html5.html" is ok. "oldtables-html4.html" is not produced as expected since we eliminated "oldtables".

1 This "FileNotFoundError" occurs because within the infile is given a "wrong" - regarding our directory-layout - relative path as the source of the image. We fix this later on in several ways, see below.
2 The testfile doesn’t find "source-highlight-filter.conf" since the "HTML source code highlighter" is commented out in asciidoc3.conf.
3 As said before, this message is expected and shows that the old-table syntax was recognized correctly and throws an exception. To suppress this we comment out the matching entry of the testasciidoc(3).conf. We’ll test throwing this exception in another explicit testcase.

We eliminate these three points one by one, at first the third.

Editing testasciidoc(3).conf to remove "old tables"
[...]
../examples/website/newtables.txt

# %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
# Old tables
#
#
# % source
# data/oldtables.txt
#
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Source highlighter
[...]

Please keep in mind that our "testasciidoc.conf" is later renamed to testasciidoc3.conf.
Doing so, the 'EAsciiDoc("Deprecated old tables found …' is of course gone - I omit the output at this point.
Next, I activate the "source code highlighter" in asciidoc3.conf.

Set the proper "highlighter" in asciidoc3.conf
[...]
# HTML source code highlighter (source-highlight, pygments or highlight).
source-highlighter=source-highlight
# source-highlighter=pygments
# source-highlighter=highlight
# Uncomment to use deprecated quote attributes.
#deprecated-quotes=
[...]

When doing so, the 'WARNING: <stdin>: line 14: missing style: [blockdef-listing]: source …' is also gone - I omit the output at this point, too.
To cover the 'FileNotFoundError: ../../images/smallnew.png' we edit the file 'testcases.txt' (at the end this is nothing else than the original version from the asciidoc-tarball)

Toggle the image-directory in textcases.txt
[...]
// Images and icons directories.
:imagesdir: ../../doc
image::music2.png[]

:icons:
:iconsdir:  ../../images/icons
NOTE: Lorum ipsum.

:icons!:

ifdef::backend-xhtml11[]                1
:imagesdir: ../../images
:data-uri:
image:smallnew.png[NEW] 'testing' `123`.


:data-uri!:
[...]
1 the following line ":imagesdir: .." was till now commented out. But that does’t work here for testasciidoc3.py

In an similar way we have to edit "../doc/article.txt"

This "article.txt" works
[...]
Note that multi-entry terms generate separate index entries.

Here are a couple of image examples: an image:smallnew.png[]
example inline image followed by an example block image:

.Tiger block image
image::tiger.png[Tiger image]


Followed by an example table:
[...]

and "../doc/slidy-example.txt"

This "slidy-example.txt" works
[...]
conubia. Feugiat felis justo. Nunc amet nulla. Eu ac orci mollis.

.Tiger
image::tiger.png[]


Incremental Elements
 ------------------
The remaining elements on this page are incremental, press the space

[...]

NOTE: 'Note' admonition paragraph.

  * Volutpat tristique nec.
+
image::tiger.png[]
  * Iaculis commodo et.
  * Volutpat tristique nec.
[...]

At last we have to edit testasciidoc(3).conf again to avoid testing utf-8 BOM. Do you remember? In v3 BOM isn’t existing any more.

Editing testasciidoc(3).conf to discard "BOM"
[...]
% source
data/rcs-id-marker-test.txt

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
# UTF-8 BOM test
#
# % source
# data/utf8-bom-test.txt
#
# %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Deprecated quote attributes
[...]

When we have the things done as described, all of the 'test-files' are produced without any errors or warnings:

Successful run of testasciidoc3.py
     1  WRITING: data/testcases-html4.html
     2  WRITING: data/testcases-xhtml11.html
     3  WRITING: data/testcases-docbook.xml
     4  WRITING: data/testcases-html5.html
     5  WRITING: data/filters-test-html4.html
     6  WRITING: data/filters-test-xhtml11.html
[...]
   166  WRITING: data/rcs-id-marker-test-html5.html
   167  WRITING: data/deprecated-quotes-html4.html
   168  WRITING: data/deprecated-quotes-xhtml11.html
   169  WRITING: data/deprecated-quotes-docbook.xml
   170  WRITING: data/deprecated-quotes-html5.html

Hey, what’s this - 8 files are missing?! No, keep calm, they aren’t:

Eight v2 files are not generated any more, but that’s ok!
[...]
    13  WRITING: data/oldtables-html4.html
    14  WRITING: data/oldtables-xhtml11.html
    15  WRITING: data/oldtables-docbook.xml
    16  WRITING: data/oldtables-html5.html
[...]
   171  WRITING: data/utf8-bom-test-html4.html
   172  WRITING: data/utf8-bom-test-xhtml11.html
   173  WRITING: data/utf8-bom-test-docbook.xml
   174  WRITING: data/utf8-bom-test-html5.html
[...]

So we reach the first milestone of testing: all needed files are properly generated. But that’s only the first leg at all: Are the files generated by asciidoc3.py "the same" as generated by v2? This is exactly what’s they intend to do - unfortunately they are not the same. Here is what we have to do.

9.2. v2v3tescases.py

I wrote a short program "v2v3tescases.py": It starts assuming the files generated by "python3 testasciidoc3.py --force update" and "python2 testasciidoc.py --force update" live in the directory "~/.asciidoc3/tests/data" and "~/asciidoc2/tests/data", respectively. And then:

v2v3testcases.py
  • v2 and v3 use their standard/original AsciiDoc(3)-conf-files like asciidoc(3).conf, xhtml11.conf, html4.conf, slidy.conf etc.,

  • open subsequently all v2-files,

  • alter them if necessary,

  • write the result to /dev/shm (that is, to keep them in memory, do not write on disk - that’s much faster),

  • do the same work with all v3-files,

  • compare the matching v2/v3-files: first to be of the same size, second to have the identical hash value.

What does this means in detail?
"Using the original conf-files" is obvious,
"open subsequently all v2-files" is clear, but
"alter them if necessary"? An example: in v2, the data-uri-image-src endtag is something like 'JRU5ErkJggg==">', in v3 'JRU5ErkJggg==" />'. (We have altered the conf-file in that way to have closing tags.) That is irrelevant when rendering the page, but of course not when comparing the two files bytewise. So v2v3testcaseses.py contains the statement:

Edit v2-files, i
[...]
      v2content, counter = re.subn(r'=">', r'=" />', v2content)
[...]

A second example is the "timestamp" in the footer: "Last update …". The date is in most cases not the same in v2 compared to v3:

Edit v2-files, ii
[...]
      v2content, counter = re.subn(r'Last updated .*? CE', r'Last updated timestamp CE', v2content)
[...]

The word "timestamp" serves as a placeholder, you may choose any token you like to.

The most changes - you guess it - deal with the (relative) path to the image-sources … See an example:

Edit v2-files, iii
[...]
      v2content, counter = re.subn(r'img src="([^>]*?)>', r'img src="\g<1> />', v2content)
[...]

To have all images and/or ressources together at one single place, I copy all files ending with png/jpg to "dir_of_outlying_data", but that’s only for convenience of a better visualization and not mandatory. The bytewise comparison doesn’t care about the sense of the letters in the img-src-tag as long as they are identical.

Edit v2-files, iv
[...]
      v2content, counter = re.subn(r'<img src=".*?([^./]*?)\.png"', \
                                   r'<img src="'+ dir_of_outlying_data + '\g<1>.png"', v2content)
[...]

Due to the in-memory-processing and the use of "ProcessPoolExecutor()" ("from concurrent import futures") "v2v3testcases.py" works very fast. The result of the size-tests are successfull, but interestingly not the hashes in two cases. Let’s scrutinize this two "special" cases.

9.2.1. xml-bug testcases

Let’s first go to "newtables-docbook.xml".

v2 "newtables-docbook.xml"
[...]
<?dblatex table-width="50%"?>
<tgroup cols="3">
<colspec colname="col_1" colwidth="43*"/>
<colspec colname="col_2" colwidth="85*"/>
<colspec colname="col_3" colwidth="85*"/>
<thead>
[...]

And now let’s take a look on v3.

v3 "newtables-docbook.xml"
[...]
<?dblatex table-width="50%"?>
<tgroup cols="3">
<colspec colname="col_1" colwidth="42*"/>
<colspec colname="col_2" colwidth="85*"/>
<colspec colname="col_3" colwidth="85*"/>
<thead>
[...]

This is not the only place of finding: we have several times the same issue of 43 vs. 42 - v3 with "42" is "wrong". What happens here? The crucial point is found in "docbook45.conf":

First clue in "docbook45.conf"
[...]
# Tables.
[tabletags-default]
colspec=<colspec colname="col_{colnumber}" colwidth="{width!{colpcwidth}*}{width?{colabswidth}{pageunits}}"/>
bodyrow=<row>|</row>
headdata=<entry align="{halign}" valign="{valign}"{colspan@1:: namest="col_{colstart}" nameend="col_{colend}"}{morerows@0:: morerows="{morerows}"}>|</entry>
bodydata=<entry align="{halign}" valign="{valign}"{colspan@1:: namest="col_{colstart}" nameend="col_{colend}"}{morerows@0:: morerows="{morerows}"}>|</entry>
paragraph=<simpara>|</simpara>
[...]

And therefore we see the deciding lines in "asciidoc3.py":

Crucial lines in "asciidoc3.py"
[...]
class Table(AbstractBlock):
    ALIGN = {'<':'left', '>':'right', '^':'center'}
    VALIGN = {'<':'top', '>':'bottom', '^':'middle'}
    FORMATS = ('psv', 'csv', 'dsv')
    [...]
    def parse_cols(self, cols, halign, valign):
        """
        Build list of column objects from table 'cols', 'halign' and 'valign'
        attributes.
        """
        [...]
        # Calculate column alignment and absolute and percent width values.
        percents = 0
        for col in self.columns:
            if pcunits:
                col.pcwidth = float(col.width[:-1])
            else:
                col.pcwidth = (float(col.width)/props)*100
            col.abswidth = self.abswidth * (col.pcwidth/100)
            if config.pageunits in ('cm', 'mm', 'in', 'em'):
                col.abswidth = '%.2f' % round(col.abswidth, 2)                1
            else:
                col.abswidth = '%d' % round(col.abswidth)                     1
            percents += col.pcwidth
            col.pcwidth = int(col.pcwidth)
        if round(percents) > 100:
            self.error('total width exceeds 100%%: %s' % cols, self.start)    1
        elif round(percents) < 100:
            self.error('total width less than 100%%: %s' % cols, self.start)  1

    def build_colspecs(self):
        [...]
1 These are the lines, did you find out?

And what exactly causes the difference from 42 to 43? Blame it on builtin function "round()"! It’s behavior changed in v3 slightly but significant, often cited as "Banker’s rounding":

[…]
The round() function rounding strategy and return type have changed. Exact halfway cases are now rounded to the nearest even result instead of away from zero. (For example, round(2.5) now returns 2 rather than 3.) and the documentation for round: For the built-in types supporting round(), values are rounded to the closest multiple of 10 to the power minus n; if two multiples are equally close, rounding is done toward the even choice …

— https://stackoverflow.com/questions/10825926/python-3-x-rounding-behavior

Yes, and that is just what’s going on here: float 42.5 is rounded to 43 by asciidoc2.py, and to 42 by asciidoc3.py.
There are various suggestions to enforce the v2-behaviour. My idea was to add 0.5 and than "floor" the sum:

from math import floor as math_floor
...
col.abswidth = '%d' % math_floor(col.abswidth + 0.5)

Another user (see the just given stackoverflow link) offers the proposal

round2=lambda x,y=None: round(x+1e-15,y)

Probably both solutions are sufficient to solve our bug. But I prefer to go the "official" way (also given via this link). We make use of the module "Decimal", which lets us adjust the preferred rounding. That does the trick.

Avoid "Banker’s rounding" in "asciidoc3.py"
[...]
from decimal import Decimal, ROUND_HALF_UP
[...]
class Table(AbstractBlock):
    ALIGN = {'<':'left', '>':'right', '^':'center'}
    VALIGN = {'<':'top', '>':'bottom', '^':'middle'}
    FORMATS = ('psv', 'csv', 'dsv')
    [...]
    def parse_cols(self, cols, halign, valign):
        """
        Build list of column objects from table 'cols', 'halign' and 'valign'
        attributes.
        """
        [...]
        # Calculate column alignment and absolute and percent width values.
        percents = 0
        for col in self.columns:
            if pcunits:
                col.pcwidth = float(col.width[:-1])
            else:
                col.pcwidth = (float(col.width)/props)*100
            col.abswidth = self.abswidth * (col.pcwidth/100)
            if config.pageunits in ('cm', 'mm', 'in', 'em'):
                col.abswidth = '%.2f' % round(col.abswidth, 2)          1
            else:
                col.abswidth = '%d' % Decimal(str(col.abswidth)).quantize(Decimal('1'), \
                                                                          rounding=ROUND_HALF_UP)
            percents += col.pcwidth
            col.pcwidth = int(col.pcwidth)
        if Decimal(str(percents)).quantize(Decimal('1'), \
                                           rounding=ROUND_HALF_UP) > 100:
            self.error('total width exceeds 100%%: %s' % cols, self.start)
        elif Decimal(str(percents)).quantize(Decimal('1'), \
                                             rounding=ROUND_HALF_UP) < 100:
            self.error('total width less than 100%%: %s' % cols, self.start)

    def build_colspecs(self):
        [...]
1 No need of "quantize()" here because we are rounding to 2 digits precision.

Doing so, v2v3testcases.py works with "newtables-docbook.xml", our example now shows the expected 43 vs. 43 - ok.

9.2.2. base64-bug testcases

The second "special case" occurs when comparing the two files v2slidy-example-slidy.html and v3slidy-example-slidy.html. Interestingly the "os.path.getsize()" provides the result of equality, but the "hashlib.md5(…read()).hexdigest()" does not: the two files aren’t not binary identical (as they should!). A look with kdiff3 shows why. At the end of the file in the block "<img alt="slidy-example__1.png" src="data:image/png;base64, iVBORw0KGgoAAA …5CYII=" /> in the base64-code differ a few byte: "EMFRM4AyMN2EM" vs. "IDGww1BYyhdTo" (v2 to v3). The resulting image is the same, by viewing in a browser you’ll see no difference. I could reproduce this with some other input into the "music filter". The resulting png’s looks the same, but aren’t identically. This got to be inside the "music2png.py" of the music filter? I confess, I didn’t find out what’s going on here in detail and least of all how to fix this. Because that issue doesn’t affect the visual result I postpone any further examination. Let’s do it, when asciidoc3.py runs - if it’s necessary at all …

9.2.3. Summary on testcases

We have reached the second milestone of testing: all 170 (onehundred and seventy) AsciiDoc3 "internal" tests are successfully passed! We can use this test as a quick assertion after every step of editing asciidoc3.py.

9.2.4. Multiprocessing version testcases.py

To make testing even faster as part of the forthcoming "unittest" I wrote a second version of testcases.py. This

10. (Not) The End

Note This documentation text ends for the present at this point …
I’ll complete the missing parts step by step - just visit the homepage https://asciidoc3.org every once a while to see the progress.