Function that takes a character vector, x
, representing a 'LaTeX' file and it outputs a tree structure with the structure specified by layersNames
and layersCmd
.
It assumes x
is representing a 'LaTeX' file that can has been checked it compiles apropitaly before we make anymodification.
Note however that this function only moves lines around, it doesn't split a line in two.
Arguments
- x
A character vector, each element represents one line of the latex document
- layersNames
A character vector, with each element representating the environment name to be searched as
cmdName
as describe inFindBegin
andFindEnd
- layersCmd
A character vector, with the same length as
layersNames
. with each element representing the environment command to be serached ascmdName
as described inFindCommand
.
Value
It returns a list, with each element having a name. Recreating the tree structure identified by layersNames
and layersCmd
in the text file x
.
It first divides the document into two lists:
- preamble
Contains a character vector identifying everything before the \begin{document}
- document
Contains the tree structure identifying the document
Now, the naming convention for each layer of the document is as follows. We will use the convention <layerName>
, <layerCmd>
.
Note the convention first, everything that it finds prior to the first environment, it throws it into a character vector that it calls prior_to_<layesName>
.
After the first environment <layerName>
ends, it assumes that everything from that \end{<layerName>}
onwards corresponding to the next environment, and it will throw it to the prior part of that one.
post_to_<layerName>
prior_to_layersName
Includes everything up to the first
\begin{<layerName>
without including that line1_<layerName>_begin_<layerName>
Includes the
\begin{layerName}
for the 1st section, and everything until it finds the first\<layerCmd>
1_<layerName>_1_<layerCmd>
Includes everything from the 1\(^{st}\)
\<layerCmd>
until the second\<layerCmd>
, without including the line in which the second command is found1_<layerName>_2_<layerCmd>
Same thing... and it keeps going until the last
\<layerCmd>
is found1_<layerName>_end_<layerName>
It includes the
\end{<layerName>}
for the 1st section.- ...
It then repeats the same structure for the next environment, changing the naming convention to start with 2_<...> and so on until it does the last environemt
post_to_<layerName>
After the last layer ends with
\end{layerName}
, it throws the rest of the lines into this last character vector
This structure is applied recursively to each i_<layerName>_j_<layerCmd>
of the previous layer to find the structure for the next layer.
The result is a tree of lists, with names that identify the whole structure, and the ending node of each branch is always a character vector
IMPORTANT NOTE: Note that this function only rearranges the lines of the document, it can't split a document between a line. So if you want to make sure something always stays together, put them both in the same line. This is intentional, to force a more clear structure on the document that will be parsed
In Summary, the sketch of the tree structure would be:
preamble
Document
prior_to_LayerName[1]
1_layerName[1]_begin_layerName[1]
1_layerName[1]_1_layerCmd[1]
prior_to_LayerName[2]
1_layerName[2]_begin_layerName[2]
1_layerName[2]_1_layerCmd[2]
Continues...
1_layerName[2]_2_layerCmd[2]
Continues...
...
post_to_layerName[2]
2_layerName[1]_begin_layerName[1]
2_layerName[1]_1_layerCmd[1]
...
...
n_layerName[1]_end_layerName[1]
post_to_layerName[1]
If a \<layerCmd>
is not found inside an environment, everything inside that environment is thrown into the begin_layerName part and instead of the numbered environments, an empty character list is added in the middle, with name empty_<layerCmd>
section.
Details
Both layersNames
and layersCmd
must have the same length, since for each index, i
, layersNames[i]
and layersCmd[i]
refer to one layer of the tree structure of the document. Consequent layers must be found inside previous layers.
If it finds the structure of the document to not be completed, it will throw an error.
See also
FindStructure for more information on the details of how the layers are found.
Other Structuring Document:
CompileDocument()
,
DivideFile()
,
FindStructure
,
IsWellSectioned()
Examples
file <- system.file(
"extdata",
"ExampleTexDocuments",
"exam_testing_jsonparser.tex",
package = "TexExamRandomizer"
)
x <- readLines(file)
layersNames <- c("questions", "choices")
layersCmd <- c("question", "(choice|CorrectChoice)")
doc <- StructureDocument(x, layersNames, layersCmd)