SubString Heater, extracting substrings from a string
The SubString Heater is used to extract a part of a string. After you drag/drop the Heater into the Designer workspace, you open the configuration popup by a double click on the icon. Alternatively, you can gain access to the properties in the popup using the context menu for the Heater (right click).
In the popup you can specify the part of the incoming string you wish to extract and deliver forward for subsequent processing.
Start Offset: With this parameter you define the offset position in the incoming string you wish to start copying characters from. If the string is shorter than this, an empty substring is delivered.
Length: maximum number of characters that are to be copied from the Start Offset position of the incoming string and output as a substring. In this case a zero means until the end of the string.
We have a CSV text file with German address data, where the postcode is always of five digits and precedes the name of the town as a single field, divided by a single space. For subsequent processing, this field will be imported into an SQL database table where t he information is stored as two separate fields.
...;Hauptstrasse 123;90471 Nuernberg;Deutschland/Germany;...
In this case we can simply use two SubString Heaters and feed the same input into both (via a Clone Heater, if necessary).
1. SubString Heater for the postcode
Start Offset = 1
Length = 5
2. SubString Heater for the city
Start Offset = 7
Length = 0
Note: Note that in this case the dividing space is assumed, but effectively dropped.
Extract substrings between delimiter strings: If you check the Activate extended settings option you can extract substrings that are enclosed by delimiter strings that you specify.
Specify the start and end delimiter strings to either side of "<- begins with and ends with ->". If you also check the Case insensitive compare option then the search for the start and end delimiter strings will treat equivalent upper/lower cases as matching.
The Which substring enables you to select which of the resulting substrings you wish to extract.
- "first only" will extract only the first substring found
- "last only" will return only the last when several substrings are found
- "all" will return all substrings found concatenated into a single string, punctuated by the content in Separator string below this option. Note that the separator can be one or several characters.
The illustrated settings will extract all content from list tags of an HTML text file. If more than one instance is found, the extracted substrings are concatenated with a "#" character as the separator.
... <li>First element</li> ... <li>Second element</li> ... <LI>Third element</LI> ...
First element#Second element#Third element