Built-ins for strings
These built-ins act on a string left-value. However, if the left-value is number or date/time/date-time or boolean, it will automatically converted to string according the current number-, date/time/date-time- and boolean-format settings (which are the same formatters that are applied when inserting such values with ${...}
).
boolean
The string converted to boolean value. The string must be true
or false
(case sensitive!), or must be in the format specified by the boolean_format
setting.
If the string is not in the appropriate format, an error will abort template processing when you try to access this built-in.
cap_first
The string with the very first word of the string capitalized. For the precise meaning of “word” see the word_list built-in. Example:
The output:
In the case of "- green mouse"
, the first word is the -
.
capitalize
The string with all words capitalized. For the precise meaning of “word” see the word_list built-in. Example:
The output:
chop_linebreak
Returns the string without the line-break at its very end if there was a line-break, otherwise the unchanged string. If the string ends with multiple line-breaks, only the last line-break is removed.
contains
Returns if the substring specified as the parameter to this built-in occurrs in the string. For example:
This will output:
date, time, datetime
The string value converted to a date, time, or date-time value. It will expect the format specified by the date_format, time_format and datetime_format settings. If the string is not in the appropriate format, an error will abort template processing when you try to access this built-in.
You can also specify the format explicitly like ?datetime.format
(and hence also as ?datetime["format"]
) or ?datetime("format")
; these three forms do the same. The format can be specified similarly with ?date and ?time too. For the syntax and meaning of format values see the possible values of the date_format, time_format and datetime_format settings. Example:
To prevent misunderstandings, the left-hand value need not be a string literal. For example, when you read data from XML DOM (from where all values come as unparsed strings), you may do things like order.confirmDate?date.xs
to convert the string value to a real date.
Of course, the format also can be a variable, like in "..."?datetime(myFormat)
.
ends_with
Returns whether this string ends with the substring specified in the parameter. For example "ahead"?ends_with("head")
returns boolean true
. Also, "head"?ends_with("head")
will return true
.
ensure_ends_with
If the string doesn’t end with the substring specified as the 1st parameter, it adds it after the string, otherwise it returns the original string. For example, both "foo"?ensure_ends_with("/")
and "foo/"?ensure_ends_with("/")
returns "foo/"
.
ensure_starts_with
If the string doesn’t start with the substring specified as the 1st parameter, it adds it before the string, otherwise it returns the original string. For example, both "foo"?ensure_starts_with("/")
and "/foo"?ensure_starts_with("/")
returns "/foo"
.
If you specify two parameters, then the 1st parameter is interpreted as a Java regular expression, and if it doesn’t match the beginning of the string, then the string specified as the 2nd parameter is added before the string. For example someURL?ensure_starts_with("[a-zA-Z]+://", "http://")
will check if the string starts with something that matches "[a-zA-Z]+://"
(note that no ^
is needed), and if it doesn’t, it prepends "http://"
.
This method also accepts a 3rd flags parameter. As calling with 2 parameters implies "r"
there (i.e., regular expression mode), you rarely need this. One notable case is when you don’t want the 1st parameter to be interpreted as a regular expression, only as plain text, but you want the comparison to be case-insensitive, in which case you would use "i"
as the 3rd parameter.
esc
Escapes the value with the current output format, and prevents the auto-escaping of the returned value (to avoid double escaping). Because of auto-escaping, you usually only need this where auto-escaping was disabled:
In templates, where auto-escaping is on, using it is redundant:
This built-in works by converting the string value to a markup output value, by escaping the string with the current output format, and using the result as the markup. The resulting markup output value belongs to the current output format at the point of the invocation.
This built-in can also be applied on markup output values, which it will bypass without change, as far as the input markup output value belongs to the current output format. If it doesn’t, then the markup has to be converted to the current output format, which currently will be only successful if that value was created by escaping plain text (usually, with ?esc
).
This built-in can’t be used where the current output format is a non-markup output format. An attempt to do so will cause a parse-time error.
This built-in is not related to the deprecated escape and noescape directives. In fact, the parser will prevent using them on the same place, to prevent confusion.
groups
This is used only with the result of the matches
built-in. See there…
html (deprecated)
Note: This built-in is deprecated by the auto-escaping mechanism. To prevent double escaping and confusion in general, using this built-in on places where auto-escaping is active is a parse-time error. To help migration, this built-in silently bypasses HTML markup output values without changing them.
The string as HTML markup. That is, the string with all:
<
replaced with<
>
replaced with>
&
replaced with&
"
replaced with"
'
is replaced with'
When inserting the value of an attribute, always quote it, or else it can be exploited by attackers! This is WRONG:
<input name="user" value=${user?xhtml}>
. This is good:<input name="user" value="${user?xhtml}">
.
Note that in HTML pages usually you want to use this built-in for all interpolations. You can spare a lot of typing and lessen the chances of accidental mistakes by using the escape directive.
index_of
Returns the index within this string of the first occurrence of the specified substring. For example, "abcabc"?index_of("bc")
will return 1 (don’t forget that the index of the first character is 0). Also, you can specify the index to start the search from: "abcabc"?index_of("bc", 2)
will return 4. There is no restriction on the numerical value of the second parameter: if it is negative, it has the same effect as if it were zero, and if it is greater than the length of this string, it has the same effect as if it were equal to the length of this string. Decimal values will be truncated to integers.
If the 1st parameter does not occur as a substring in this string (starting from the given index, if you use the second parameter), then it returns -1.
j_string
Escapes the string with the escaping rules of Java language string literals, so it’s safe to insert the value into a string literal. Note that it will not add quotation marks around the inserted value; you meant to use this inside the string literal.
All characters under UCS code point 0x20 will be escaped. When they have no dedicated escape sequence in the Java language (like \n
, \t
, etc.), they will be replaced with a UNICODE escape (\uXXXX
).
Example:
will output:
js_string
Escapes the string with the escaping rules of JavaScript language string literals, so it’s safe to insert the value into a string literal. Note that it will not add quotation marks around the inserted value; you meant to use this inside the string literal.
When inserting into a JavaScript string literal that’s inside a HTML attribute, you also must escape the value with HTML escaping. Thus, of you don’t have automatic HTML escaping, this is WRONG:
<p onclick="alert('${message?js_string}')">
, and this is good:<p onclick="alert('${message?js_string?html}')">
.
Example:
will output:
The exact escaping rules are:
-
"
is escaped as\"
-
'
is escaped as\'
-
\
is escaped as\\
-
/
is escaped as\/
if the/
is directly after<
in the escaped string, or if it’s at the beginning of the escaped string -
>
is escaped as\>
if the>
is directly after]]
or--
in the escaped string, or if it’s at the beginning of the escaped string, or if there’s only a]
or-
before it at the beginning of the escaped string -
<
is escaped as\u003C
if it’s followed by?
or!
in the escaped string, or if it’s at the end of the escaped string -
Control characters in UCS code point ranges U+0000…U+001f and U+007f…U+009f are escaped as
\r
,\n
, etc., or as\xXX
where there’s no special escape for them in JavaScript. -
Control characters with UCS code point U+2028 (Line separator) and U+2029 (Paragraph separator) are escaped as
\uXXXX
, as they are source code line-breaks in ECMAScript.
json_string
Escapes the string with the escaping rules of JSON language string literals, so it’s safe to insert the value into a string literal. Note that it will not add quotation marks around the inserted value; you meant to use this inside the string literal.
This will not escape ‘ characters, since JSON strings must be quoted with “.
The escaping rules are almost identical to those documented for js_string
. The differences are that '
is not escaped at all, that >
is escaped as \u003E
(not as \>
), and that \uXXXX
escapes are used instead of \xXX
escapes.
keep_after
Removes the part of the string that is not after the first occurrence of the given substring. For example:
will print
If the parameter string is not found, it will return an empty string. If the parameter string is a 0-length string, it will return the original string unchanged.
This method accepts an optional flags parameter, as its 2nd parameter:
will print
keep_after_last
Same as keep_after, but keeps the part after the last occurrence of the parameter, rather than after the first. Example:
will print
while with keep_after
you would get bar.txt
.
keep_before
Removes the part of the string that starts with the given substring. For example:
will print
If the parameter string is not found, it will return the original string unchanged. If the parameter string is a 0-length string, it will return an empty string.
This method accepts an optional flags parameter, as its 2nd parameter:
will print
keep_before_last
Same as keep_before, but keeps the part before the last occurrence of the parameter, rather than after the first. Example:
will print
while with keep_before
you would get foo
.
last_index_of
Returns the index within this string of the last (rightmost) occurrence of the specified substring. It returns the index of the first (leftmost) character of the substring. For example: "abcabc"?last_index_of("ab")
will return 3. Also, you can specify the index to start the search from. For example, "abcabc"?last_index_of("ab", 2)
will return 0. Note that the second parameter indicates the maximum index of the start of the substring. There is no restriction on the numerical value of the second parameter: if it is negative, it has the same effect as if it were zero, and if it is greater than the length of this string, it has the same effect as if it were equal to the length of this string. Decimal values will be truncated to inegers.
If the 1st parameter does not occur as a substring in this string (before the given index, if you use the second parameter), then it returns -1.
left_pad
If it’s used with 1 parameter, then it inserts spaces on the beginning of the string until it reaches the length that is specified as the parameter. If the string is already as long or longer than the specified length, then it does nothing. For example, this:
will output this:
If it’s used with 2 parameters, then the 1st parameter means the same as if you were using the built-in with only 1 parameter, and the second parameter specifies what to insert instead of space characters. For example:
will output this:
The 2nd parameter can be a string whose length is greater than 1. Then the string will be inserted periodically, for example:
will output this:
The 2nd parameter must be a string value, and it must be at least 1 character long.
length
The number of characters in the string.
lower_case
The lower case version of the string. For example "GrEeN MoUsE"?lower_case
will be "green mouse"
.
matches
This built-in determines if the string exactly matches the pattern. Also, it returns the list of matching sub-strings. The return value is a multi-type value:
-
Boolean:
true
, if it the entire string matches the pattern, otherwisefalse
. For example,"fooo"?matches('fo*')
istrue
, but"fooo bar"?matches('fo*')
isfalse
. -
Sequence: the list of matched substrings of the string. Possibly a 0 length sequence.
For example:
will print:
If the regular expression contains groups (parentheses), then you can access them with the groups built-in:
This will print:
Notes regarding the behavior of the groups
built-in:
-
It works both with substring matches and with the result of entire string matching (as it was shown in the above example)
-
The first item in the sequence that
groups
returns is the whole substring matched by the regular expression. Hence, the index of the first explicit regular expression group (with other words, of the first (…) in the regular expression) is 1, and not 0. Also, because of this, the size of the sequence is one more than the number of explicit regular expression groups. -
The size of the sequence returned by
groups
only depends on the number of explicit groups in the regular expression, and so it will be the same (non-0) even if there was no match found for the regular expression. Attempting to access an item of the sequence (as inres?groups[1]
) when there was match will cause an error. Thus, before accessing the groups, you should always check if there was any match (as in<#if res>access the groups here</#if>
). -
When there’s a match for the regular expression, but not for a certain explicit group inside the regular expression, then for that group the sequence will contain a 0 length string. So accessing a group that matches nothing is safe, as far as the containing regular expression has matched something.
matches
accepts an optional 2nd parameter, the flags. Note that it doesn’t support flag f
, and ignores the r
flag.
no_esc
Prevents the auto-escaping of a value. For example:
This works by converting the string value to a markup output value, which uses the string as the markup as is, and belongs to the current output format at the point of the invocation.
This built-in can also be applied on markup output values, which it will bypass without change, as far as the input markup output value belongs to current output format. If it doesn’t, then the markup has to be converted to the current output format, which currently will be only successful if that value was created by escaping plain text (usually, with ?esc
).
This built-in can’t be used where the current output format is a non-markup output format. An attempt to do so will cause a parse-time error.
This built-in is not related to the deprecated [escape
and noescape
directives]. In fact, the parser will prevent using them on the same place, to prevent confusion.
number
The string converted to numerical value. The number must be in “computer language” format. That is, it must be in the locale independent form, where the decimal separator is dot, and there’s no grouping.
This built-in recognizes numbers in the format that the FreeMarker template language uses. In additionally, it recognizes scientific notation (e.g. "1.23E6"
, "1.5e-8"
). It also recognizes all XML Schema number formats, like NaN
, INF
, -INF
, plus the Java-native formats Infinity
and -Infinity
.
If the string is not in the appropriate format, an error will abort template processing when you try to access this built-in.
replace
It is used to replace all occurrences of a string in the original string with another string. It does not deal with word boundaries. For example:
will print:
The replacing occurs in left-to-right order. This means that this:
will print:
If the 1st parameter is an empty string, then all occurrences of the empty string will be replaced, like "foo"?replace("","|")
will evaluate to "|f|o|o|"
.
replace accepts an optional flags parameter, as its 3rd parameter.
right_pad
This is the same as left_pad, but it inserts the characters at the end of the string instead of the beginning of the string.
Example:
This will output this:
remove_beginning
Removes the parameter substring from the beginning of the string, or returns the original string if it doesn’t start with the parameter substring. For example:
will print:
remove_ending
Removes the parameter substring from the ending of the string, or returns the original string if it doesn’t end with the parameter substring. For example:
will print:
rtf (deprecated)
This built-in is deprecated by the auto-escaping mechanism. To prevent double escaping and confusion in general, using this built-in on places where auto-escaping is active is a parse-time error. To help migration, this built-in silently bypasses RTF markup output values without changing them.
The string as Rich text (RTF text). That is, the string with all:
-
\
replaced with\\
-
{
replaced with\{
-
}
replaced with\}
split
It is used to split a string into a sequence of strings along the occurrences of another string. For example:
will print:
Note that it is assumed that all occurrences of the separator is before a new item (except with "r"
flag - see later), thus:
will print:
split
accepts an optional flags parameter, as its 2nd parameter. There’s a historical glitch with the r
(regular expression) flag; it removes the empty elements from the end of the resulting list, so with ?split(",", "r")
in the last example the last ""
would be missing from the output.
If the 1st parameter is an empty string, the string will be split to characters.
To check if a strings ends with something and append it otherwise, use the ensure_ends_with built-in.
starts_with
Returns if this string starts with the specified substring. For example "redirect"?starts_with("red")
returns boolean true
. Also, "red"?starts_with("red")
will return true
.
To check if a strings starts with something and prepend it otherwise, use the ensure_starts_with built-in.
string (when used with a string value)
Does nothing, just returns the string as-is. The exception is that if the value is a multi-type value (e.g. it is both string and sequence at the same time), then the resulting value will be only a simple string, not a multi-type value. This can be utilized to prevent the artifacts of multi-typing.
substring (deprecated)
This built-in is deprecated by slicing expressions, like
str[from..<toExclusive]
,str[from..]
, andstr[from..*maxLength]
.A warning if you are processing XML: Since slicing expressions work both for sequences and strings, and since XML nodes are typically both sequences and strings at the same time, there the equivalent expression is
someXmlNode?string[from..<toExclusive]
andexp?string[from..]
, as without?string
it would slice the node sequence instead of the text value of the node.
Some of the typical use-cases of string slicing is covered by convenient built-ins: remove_beginning, remove_ending, keep_before, keep_after, keep_before_last, keep_after_last
Synopsis: exp?substring(from, toExclusive)
, also callable as exp?substring(from)
A substring of the string. from
is the index of the first character. It must be a number that is at least 0 and less than or equal with toExclusive
, or else an error will abort the template processing. The toExclusive
is the index of the character position after the last character of the substring, or with other words, it is one greater than the index of the last character. It must be a number that is at least 0 and less than or equal to the length of the string, or else an error will abort the template processing. If the toExclusive
is omitted, then it defaults to the length of the string. If a parameter is a number that is not an integer, only the integer part of the number will be used.
Example:
The output:
trim
The string without leading and trailing white-space. Example:
The output:
truncate, truncate_…
Cuts off the end of a string if that’s necessary to keep it under a the length given as parameter, and appends a terminator string ([...]
by default) to indicate that the string was truncated. Example (assuming default FreeMarker configuration settings):
Things to note above:
-
The string is returned as is if its length doesn’t exceed the specified length (16 in this case).
-
When the string exceeded that length, its end was cut off in a way so that together with the added terminator string (
[...]
here) its length won’t exceed 16. The result length is possibly shorter than 16, for the sake of better look (see later). Actually, the result length can also be longer than the parameter length, when the desired length is shorter than the terminator string alone, in which case the terminator is still returned as is. Also, an algorithms other than the default might choses to return a longer string, as the length parameter is in principle just hint for the desired visual length. -
truncate
prefers cutting at word boundary, rather than mid-word, however, if doing so would give a result that’s shorter than the 75% of the length specified with the argument, it falls back to cut mid-word. In the last line of the above example, “This[...]
” would be too short(11 < 16 * 75%)
, so it was cut mid-word instead. -
If the cut happened at word boundary, there’s a space between the word end and the terminator string, otherwise there’s no space between them. Only whitespace is treated as word separator, not punctuation, so this generally gives intuitive results.
Adjusting truncation rules
Truncation rules can be influenced right in the template:
-
Specifying if the truncation should happen at word boundary or not:
-
truncate_w
will always truncate at word boundary. For example,difficultName?truncate_w(16)
returns “This […]”, rather than “This isonev[…]” (as saw in earlier example). -
truncate_c
will truncate at any character, not just at word ends. For example,longName?truncate_c(16)
returns “This is a t[…]”, rather than “This is a […]” (as saw in earlier example). This tends to give a string length closer to the length specified, but still not an exact length, as it removes white-space before the terminator string, and re-adds a space if we are just after the end of a word, etc.
-
-
Specifying the terminator string (instead of relying on its default):
truncate
and alltruncate_...
built-ins have an additional optional parameter for it. After that, a further optional parameter can specify the assumed length of the terminator string (otherwise its real length will be used). Example:
When the terminator string starts with dot (.
) or ellipsis (…
), the default algorithm will remove the dots and ellipses that the terminator touches, to prevent ending up with more than 3 dots at the end:
Using markup as terminator string
Each truncation built-in has a variation whose name ends with _m
(for markup). These allow using markup (like HTML) as terminator, which is useful if you want the terminator to be styled differently than the truncated text. By default the markup terminator is <span class='truncateTerminator'>[…]</span>
, (where …
prints an ellipsis character), but of course this can be changed with the truncate_builtin_algorithm configuration setting (see earlier). Example (see the variables used in earlier example):
Note above that the terminator string was considered to be only 3 characters long ('['
, '…'
, ']'
) by the truncation built-ins, because inside the terminator string they only count the characters outside HTML/XML tags and comments, and they can also interpret numeric character references (but not other entity references). (The same applies when they decide if the terminator starts with dot or ellipsis; preceding tags/comments are skipped, etc.)
If a markup terminator is used (like above), the return value of the truncate..._m
built-in will be markup as well, which means that auto-escaping won’t escape it. Of course, the content of the truncated string itself will be still auto-escaped:
uncap_first
The opposite of cap_first. The string with the very first word of the string un-capitalized.
upper_case
The upper case version of the string. For example "GrEeN MoUsE"
will be "GREEN MOUSE"
.
url
The string after URL escaping. This means that all non-US-ASCII and reserved URL characters will be escaped with %XX. For example:
The output will be (assuming that the charset used for the escaping is an US-ASCII compatible charset):
Note that it escapes all reserved URL characters (/
, =
, &
, …etc), so this encoding can be used for encoding query parameter values, for example:
Above no HTML encoding (
?html
) was needed, because URL escaping escapes all reserved HTML characters anyway. But watch: always quote the attribute value, and always with normal quotation mark ("
), never with apostrophe quotation mark ('
), because apostrophe quotation mark is not escaped by the URL escaping. (: .note)
To do URL escaping a charset must be chosen that will be used for calculating the escaped parts (%XX). For example:
Furthermore, you can explicitly specify a charset for a single URL escaping as the parameter to the built-in:
url_path
This is the same as the url built-in, except that it doesn’t escape slash (/
) characters. This meant to be used for converting paths (like paths coming from the OS or some content repository) that use slash (not backslash!) to a path the can be inserted into an URL. The most common reason why this conversion is needed is that folder names or file names might contain non-US-ASCII letters (“national” characters).
Just like with the url built-in, the desired URL escaping charset (or as a fall back, the output encoding) must be set in the FreeMarker configuration settings, or else the built-in will give error. Or, you you have to specify the charset like somePath?url_path(‘utf-8’).
word_list
A sequence that contains all words of the string in the order as they appear in the string. Words are continual character sequences that contain any character but white-space. Example:
will output:
xhtml (deprecated)
This built-in is deprecated by the auto-escaping mechanism. To prevent double escaping and confusion in general, using this built-in on places where auto-escaping is active is a parse-time error. To help migration, this built-in silently bypasses HTML markup output values without changing them.
The string as XHTML text. That is, the string with all:
<
replaced with<
>
replaced with>
&
replaced with&
"
replaced with"
'
replaced with'
The only difference between this built-in and the xml
built-in is that the xhtml
built-in escapes '
as '
instead of as '
, because some older browsers don’t know '
.
Warning!
When inserting the value of an attribute, always quote it, or else it can be exploited by attacker! This is WRONG: <input name="user" value=${user?xhtml}/>
. These are good: <input name="user" value="${user?xhtml}"/>
, <input name="user" value='${user?xhtml}'/>
.
xml (deprecated)
This built-in is deprecated by the auto-escaping mechanism. To prevent double escaping and confusion in general, using this built-in on places where auto-escaping is active is a parse-time error. To help migration, this built-in silently bypasses XML and HTML markup output values without changing them.
The string as XML text. That is, the string with all:
- < replaced with
<
-
replaced with
>
- & replaced with
&
- ” replaced with
"
- ’ replaced with
'
When inserting the value of an attribute, always quote it, or else it can be exploited by attackers! This is WRONG: <input name="user" value=${user?xml}/>
. These are good: <input name="user" value="${user?xml}"/>
, <input name="user" value='${user?xml}'/>
.
Common flags
Many string built-ins accept an optional string parameter, the so called “flags”. In this string, each letter influences a certain aspect of the behavior of the built-in. For example, letter i
means that the built-in should not differentiate the lower and upper-case variation of the same letter. The order of the letters in the flags string is not significant.
This is the complete list of letters (flags):
-
i
: Case insensitive: do not differentiate the lower and upper-case variation of the same letter. -
f
: First only. That is, replace/find/etc. only the first occurrence of something. -
r
: The substring to find is a regular expression. -
m
: Multi-line mode for regular expressions. In multi-line mode the expressions^
and$
match just after or just before, respectively, a line terminator or the end of the string. By default these expressions only match at the beginning and the end of the entire string. Note that^
and$
doesn’t match the line-break character itself. -
s
: Enables dot-all mode for regular expressions (same as Perl singe-line mode). In dot-all mode, the expression.
matches any character, including a line terminator. By default this expression does not match line terminators. -
c
: Permits whitespace and comments in regular expressions.
Example:
This outputs this:
This is the table of built-ins that use these common flags, and which supports which flags:
Built-in | i (ignore case) |
r (reg. exp.) |
m (multi-line mode) |
s (dot-all mode) |
c (whitesp. and comments) |
f (first only) |
---|---|---|---|---|---|---|
replace |
Yes | Yes | Only with r | Only with r | Only with r | Yes |
split |
Yes | Yes | Only with r | Only with r | Only with r | No |
matches |
Yes | Ignored | Yes | Yes | Yes | No |
keep_after |
Yes | Yes | Yes | Yes | Yes | Ignored |
keep_after_last |
Yes | Yes | Yes | Yes | Yes | Ignored |
keep_before |
Yes | Yes | Yes | Yes | Yes | Ignored |
keep_before_last |
Yes | Yes | Yes | Yes | Yes | Ignored |
ensure_starts_with |
Yes | Ignored | Yes | Yes | Yes | Ignored |