Skip to contents

Function that takes a character vector, x, representing a 'LaTeX' file and it outputs a tree structure with the structure specified by layersNames and layersCmd.

It assumes x is representing a 'LaTeX' file that can has been checked it compiles apropitaly before we make anymodification.

Note however that this function only moves lines around, it doesn't split a line in two.

Usage

StructureDocument(x, layersNames, layersCmd)

Arguments

x

A character vector, each element represents one line of the latex document

layersNames

A character vector, with each element representating the environment name to be searched as cmdName as describe in FindBegin and FindEnd

layersCmd

A character vector, with the same length as layersNames. with each element representing the environment command to be serached as cmdName as described in FindCommand.

Value

It returns a list, with each element having a name. Recreating the tree structure identified by layersNames and layersCmd in the text file x.

It first divides the document into two lists:

preamble

Contains a character vector identifying everything before the \begin{document}

document

Contains the tree structure identifying the document

Now, the naming convention for each layer of the document is as follows. We will use the convention <layerName>, <layerCmd>.

Note the convention first, everything that it finds prior to the first environment, it throws it into a character vector that it calls prior_to_<layesName>. After the first environment <layerName> ends, it assumes that everything from that \end{<layerName>} onwards corresponding to the next environment, and it will throw it to the prior part of that one.

post_to_<layerName>

prior_to_layersName

Includes everything up to the first \begin{<layerName> without including that line

1_<layerName>_begin_<layerName>

Includes the \begin{layerName} for the 1st section, and everything until it finds the first \<layerCmd>

1_<layerName>_1_<layerCmd>

Includes everything from the 1\(^{st}\) \<layerCmd> until the second \<layerCmd>, without including the line in which the second command is found

1_<layerName>_2_<layerCmd>

Same thing... and it keeps going until the last \<layerCmd> is found

1_<layerName>_end_<layerName>

It includes the \end{<layerName>} for the 1st section.

...

It then repeats the same structure for the next environment, changing the naming convention to start with 2_<...> and so on until it does the last environemt

post_to_<layerName>

After the last layer ends with \end{layerName}, it throws the rest of the lines into this last character vector

This structure is applied recursively to each i_<layerName>_j_<layerCmd> of the previous layer to find the structure for the next layer. The result is a tree of lists, with names that identify the whole structure, and the ending node of each branch is always a character vector

IMPORTANT NOTE: Note that this function only rearranges the lines of the document, it can't split a document between a line. So if you want to make sure something always stays together, put them both in the same line. This is intentional, to force a more clear structure on the document that will be parsed

In Summary, the sketch of the tree structure would be:

  • preamble

  • Document

    • prior_to_LayerName[1]

    • 1_layerName[1]_begin_layerName[1]

    • 1_layerName[1]_1_layerCmd[1]

      • prior_to_LayerName[2]

      • 1_layerName[2]_begin_layerName[2]

      • 1_layerName[2]_1_layerCmd[2]

        • Continues...

      • 1_layerName[2]_2_layerCmd[2]

        • Continues...

      • ...

      • post_to_layerName[2]

    • 2_layerName[1]_begin_layerName[1]

    • 2_layerName[1]_1_layerCmd[1]

      • ...

    • ...

    • n_layerName[1]_end_layerName[1]

    • post_to_layerName[1]

If a \<layerCmd> is not found inside an environment, everything inside that environment is thrown into the begin_layerName part and instead of the numbered environments, an empty character list is added in the middle, with name empty_<layerCmd> section.

Details

Both layersNames and layersCmd must have the same length, since for each index, i, layersNames[i] and layersCmd[i] refer to one layer of the tree structure of the document. Consequent layers must be found inside previous layers.

If it finds the structure of the document to not be completed, it will throw an error.

See also

FindStructure for more information on the details of how the layers are found.

Other Structuring Document: CompileDocument(), DivideFile(), FindStructure, IsWellSectioned()

Examples

file <- system.file(
    "extdata",
    "ExampleTexDocuments",
    "exam_testing_jsonparser.tex",
    package = "TexExamRandomizer"
)
x <- readLines(file)
layersNames <- c("questions", "choices")
layersCmd <- c("question", "(choice|CorrectChoice)")
doc <- StructureDocument(x, layersNames, layersCmd)