

# String and Search Functions
<a name="sql-reference-string-and-search-functions"></a>

The topics in this section describe the string and search functions for Amazon Kinesis Data Analytics streaming SQL.

**Topics**
+ [CHAR\$1LENGTH / CHARACTER\$1LENGTH](sql-reference-char-length.md)
+ [INITCAP](sql-reference-initcap.md)
+ [LOWER](sql-reference-lower.md)
+ [OVERLAY](sql-reference-overlay.md)
+ [POSITION](sql-reference-position.md)
+ [REGEX\$1REPLACE](sql-reference-regex-replace.md)
+ [SUBSTRING](sql-reference-substring.md)
+ [TRIM](sql-reference-trim.md)
+ [UPPER](sql-reference-upper.md)

# CHAR\$1LENGTH / CHARACTER\$1LENGTH
<a name="sql-reference-char-length"></a>

```
 CHAR_LENGTH | CHARACTER_LENGTH ( <character-expression> )
```

Returns the length in characters of the string passed as the input argument. Returns null if input argument is null.

## Examples
<a name="sql-reference-char-length-examples"></a>


|  |  | 
| --- |--- |
|  <pre>CHAR_LENGTH('one')</pre>  |  3  | 
|  <pre>CHAR_LENGTH('')</pre>  |  0  | 
|  <pre>CHARACTER_LENGTH('fred')</pre>  |  4  | 
|  <pre>CHARACTER_LENGTH( cast (null as varchar(16) )</pre>  |  null  | 
|  <pre>CHARACTER_LENGTH( cast ('fred' as char(16) )</pre>  |  16  | 

## Limitations
<a name="sql-reference-char-length-limitations"></a>

Amazon Kinesis Data Analytics streaming SQL does not support the optional USING CHARACTERS \$1 OCTETS clause. This is a departure from the SQL:2008 standard.

# INITCAP
<a name="sql-reference-initcap"></a>

```
INITCAP ( <character-expression> )
```

Returns a converted version of the input string such that the first character of each space-delimited word is upper-cased, and all other characters are lower-cased.

## Examples
<a name="sql-reference-initcap-examples"></a>


| Function | Result | 
| --- | --- | 
|  INITCAP('Each FIRST lEtTeR is cAPITALIZED')  |  Each First Letter Is Capitalized  | 

## 
<a name="sqlrf-initcap-notes"></a>

**Note**  
The INITCAP function is not part of the SQL:2008 standard. It is an Amazon Kinesis Data Analytics extension.

# LOWER
<a name="sql-reference-lower"></a>

```
LOWER ( <character-expression> )
```

Converts a string to all lower-case characters. Returns null if input argument is null, and the empty string if the input argument is an empty string.

## Examples
<a name="sql-reference-lower-examples"></a>


| Function | Result | 
| --- | --- | 
|  LOWER('abcDEFghi123')  |  abcdefghi123  | 

# OVERLAY
<a name="sql-reference-overlay"></a>

```
 OVERLAY ( <original-string>
           PLACING <replacement-string>
           FROM <start-position>
           [ FOR <string-length> ]
         )
 <original-string> := <character-expression>
 <replacement-string> := <character-expression>
 <start-position> := <integer-expression>
 <string-length> := <integer-expression>
```

The OVERLAY function is used to replace a portion of the first string argument (the original string) with the second string argument (the replacement string).

The start position indicates the character position in the original string where the replacement string should be overlaid. The optional string length parameter determines how many characters of the original string to replace (if not specified, it defaults to the length of the replacement string). If there are more characters in the replacement string than are left in the original string, the remaining characters are simply appended.

If the start position is greater than the length of the original string, the replacement string is simply appended. If the start position is less than 1, then ( 1 - start position) characters of the replacement string is prepended to the result, and the rest overlaid on the original (see examples below).

If the string length is less than zero, an exception is raised.

If any of the input arguments are null, the result is null.

## Examples
<a name="sqlrf-overlay-examples"></a>


| Function | Result | 
| --- | --- | 
|  OVERLAY ('12345' PLACING 'foo' FROM 1)  |  foo45  | 
|  OVERLAY ('12345' PLACING 'foo' FROM 0)  |  foo345  | 
|  OVERLAY ('12345' PLACING 'foo' FROM -2)  |  foo12345  | 
|  OVERLAY ('12345' PLACING 'foo' FROM 4)  |  123foo  | 
|  OVERLAY ('12345' PLACING 'foo' FROM 17)  |  12345foo  | 
|  OVERLAY ('12345' PLACING 'foo' FROM 2 FOR 0)  |  1foo2345  | 
|  OVERLAY ('12345' PLACING 'foo' FROM 2 FOR 2)  |  1foo45  | 
|  OVERLAY ('12345' PLACING 'foo' FROM 2 FOR 9)  |  1foo  | 

## Limitations
<a name="sqlrf-overlay-limitations"></a>

Amazon Kinesis Data Analytics does not support the optional USING CHARACTERS \$1 OCTETS clause defined in SQL:2008; USING CHARACTERS is simply assumed. Strict SQL:2008 also requires that a start position less than 1 return a null result, rather than the behavior described above. These are departures from the standard.

# POSITION
<a name="sql-reference-position"></a>

```
 POSITION ( <search-string> IN <source-string> )
 search-string := <character-expression>
 source-string := <character-expression>
```

The POSITION function searches for the first input argument (the search string) within the second input argument (the source string).

If the search string is found within the source string, POSITION returns the character position of the first instance of the search string (subsequent instances are ignored). If the search string is the empty string, POSITION returns 1.

If the search string is not found, POSITION returns 0.

If either the search string or the source string is null, POSITION returns null.

## Examples
<a name="sql-reference-position-examples"></a>


| Function | Result | 
| --- | --- | 
|  POSITION ('findme' IN '1234findmeXXX')  |  5  | 
|  POSITION ('findme' IN '1234not-hereXXX')  |  0  | 
|  POSITION ('1' IN '1234567')  |  1  | 
|  POSITION ('7' IN '1234567')  |  7  | 
|  POSITION ('' IN '1234567')  |  1  | 

## Limitations
<a name="sql-reference-position-limitations"></a>

Amazon Kinesis Data Analytics streaming SQL does not support the optional USING CHARACTERS \$1 OCTETS clause defined in SQL:2008; USING CHARACTERS is simply assumed. This is a departure from the standard.

# REGEX\$1REPLACE
<a name="sql-reference-regex-replace"></a>

REGEX\$1REPLACE replaces a substring with an alternative substring. It returns the value of the following Java expression.

```
java.lang.String.replaceAll(regex, replacement)
```

## Syntax
<a name="sql-reference-regex-replace-syntax"></a>

```
REGEX_REPLACE(original VARCHAR(65535), regex VARCHAR(65535), replacement VARCHAR(65535), startPosition int, occurence int)

RETURNS VARCHAR(65535)
```

## Parameters
<a name="sql-reference-regex-replace-parameters"></a>

*original*

The string on which to execute the regex operation.

*regex*

The [regular expression](https://en.wikipedia.org/wiki/Regular_expression) to match. If the encoding for *regex* doesn't match the encoding for *original*, an error is written to the error stream.

*replacement*

The string to replace *regex* matches in the *original* string. If the encoding for *replacement* doesn't match the encoding for *original* or *regex*, an error is written to the error stream.

*startPosition*

The first character in the *original* string to search. If *startPosition* is less than 1, an error is written to the error stream. If *startPosition* is greater than the length of *original*, then *original* is returned.

*occurence*

The occurrence of the string that matches the *regex* expression to replace. If *occurence* is 0, all substrings matching *regex* are replaced. If *occurence* is less than 0, an error is written to the error stream.

## Example
<a name="sql-reference-regex-replace-example"></a>

### Example Dataset
<a name="w2aac22c33c17c11b2"></a>

The examples following are based on the sample stock dataset that is part of [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. 

To run each example, you need an Amazon Kinesis Analytics application that has the input stream for the sample stock ticker. To learn how to create an Analytics application and configure the input stream for the sample stock ticker, see [Getting Started Exercise](https://docs.aws.amazon.com/kinesisanalytics/latest/dev/get-started-exercise.html) in the *Amazon Kinesis Analytics Developer Guide*. 

The sample stock dataset has the schema following.

```
(ticker_symbol  VARCHAR(4),
sector          VARCHAR(16),
change          REAL,
price           REAL)
```

### Example 1: Replace All String Values in a Source String with a New Value
<a name="w2aac22c33c17c11b4"></a>

In this example, all character strings in the `sector` field are replaced if they match a regular expression.

```
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
        ticker_symbol VARCHAR(4), 
        SECTOR VARCHAR(24), 
        CHANGE REAL, 
        PRICE REAL);

CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"

SELECT STREAM   TICKER_SYMBOL,
                REGEX_REPLACE(SECTOR, 'TECHNOLOGY', 'INFORMATION TECHNOLOGY', 1, 0);
                CHANGE,
                PRICE
FROM "SOURCE_SQL_STREAM_001"
```

The preceding example outputs a stream similar to the following.

![\[Table showing stock data with columns for time, ticker symbol, sector, change, and price.\]](http://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/images/sql-reference-regex-replace.png)


## Notes
<a name="sql-reference-regex-replace-notes"></a>

REGEX\$1REPLACE is not part of the SQL:2008 standard. It is an Amazon Kinesis Data Analytics streaming SQL extension.

REGEX\$1REPLACE returns `null` if any parameters are `null`.

# SUBSTRING
<a name="sql-reference-substring"></a>

```
 SUBSTRING ( <source-string> FROM <start-position> [ FOR <string-length> ] )
 SUBSTRING ( <source-string>, <start-position> [ , <string-length> ] )
 SUBSTRING ( <source-string> SIMILAR <pattern> ESCAPE <escape-char> )
 <source-string> := <character-expression>
 <start-position> := <integer-expression>
 <string-length> := <integer-expression>
 <regex-expression> := <character-expression>
 <pattern> := <character-expression>
 <escape-char> := <character-expression>
```

SUBSTRING extracts a portion of the source string specified in the first argument. Extraction starts at the value of *start-position* or the first expression matching the value of *regex-expression*.

If a value is specified for *string-length*, only that number of characters is returned. If there aren't that many characters left in the string, only the characters that are left are returned. If *string-length* is not specified, the string length defaults to the remaining length of the input string.

If the start position is less than 1, then the start position is interpreted as if it is 1 and the string length is reduced by (1–start position). For examples, see following. If the start position is greater than the number of characters in the string, or the length parameter is 0, the result is an empty string.

## Parameters
<a name="sql-reference-substring-parameters"></a>

*source-string*

The string to search for positional or regular-expression matches.

*start-position*

The first character of *source-string* to return. If *start-position* is greater than the length of *source-string*, SUBSTRING returns null.

*string-length*

The number of characters from *source-string* to return.

*regex-expression*

A pattern of characters to match and return from *source-string*. Only the first match is returned.

*pattern*

A three-part pattern of characters that consists of the following:
+ The string to be found before the returned substring
+ The returned substring
+ The string to be found after the returned substring

The parts are delimited by a double quotation mark (") and a specified escape character. For more information, see [Similar...Escape](#sql-reference-substring-examples-similar) Samples following.

## Examples
<a name="sql-reference-substring-examples"></a>

### FROM/ FOR
<a name="sql-reference-substring-examples-from-for"></a>


| Function | Result | 
| --- | --- | 
|  SUBSTRING('123456789' FROM 3 FOR 4)  |  3456  | 
|  SUBSTRING('123456789' FROM 17 FOR 4)  |  <empty string>  | 
|  SUBSTRING('123456789' FROM -1 FOR 4)  |  12  | 
|  SUBSTRING('123456789' FROM 6 FOR 0)  |  <empty string>  | 
|  SUBSTRING('123456789' FROM 8 FOR 4)  |  89  | 

### FROM Regex
<a name="sql-reference-substring-examples-from"></a>


| Function | Result | 
| --- | --- | 
| SUBSTRING('TECHNOLOGY' FROM 'L[A-Z]\$1') | LOGY | 
| SUBSTRING('TECHNOLOGY' FROM 'FOO') | null | 
| SUBSTRING('TECHNOLOGY' FROM 'O[A-Z]') | OL | 

### Numeric
<a name="sql-reference-substring-examples-numeric"></a>


| Function | Result | 
| --- | --- | 
|  SUBSTRING('123456789', 3, 4)  |  3456  | 
|  SUBSTRING('123456789', 7, 4)  |  789  | 
|  SUBSTRING('123456789', 10, 4)  |  null  | 

### Similar...Escape
<a name="sql-reference-substring-examples-similar"></a>


| Function | Result | 
| --- | --- | 
|  SUBSTRING('123456789' SIMILAR '23\$1"456\$1"78' ESCAPE '\$1')  |  456  | 
|  SUBSTRING('TECHNOLOGY' SIMILAR 'TECH%"NOLO%"GY' ESCAPE '%')  |  NOLO  | 

## Notes
<a name="sql-reference-substring-notes"></a>
+ Amazon Kinesis Data Analytics streaming SQL doesn't support the optional 'USING CHARACTERS \$1 OCTETS' clause defined in SQL:2008. USING CHARACTERS is simply assumed.
+ The second and third forms of the SUBSTRING function listed preceding (using a regular expression, and using commas rather than FROM...FOR) are not part of the SQL:2008 standard. They are part of the streaming SQL extension to Amazon Kinesis Data Analytics.

# TRIM
<a name="sql-reference-trim"></a>

```
TRIM ( [ [ <trim-specification> ] [ <trim-character> ] FROM ] <trim-source> )
 <trim-specification> := LEADING | TRAILING | BOTH
 <trim-character> := <character-expression>
 <trim-source> := <character-expression>
```



TRIM removes instances of the specified trim-character from the beginning and/or end of the trim-source string as dictated by the trim-specification (that is, LEADING, TRAILING, or BOTH). If LEADING is specified, only repetitions of the trim character at the beginning of the source string are removed. If TRAILING is specified, only repetitions of the trim character at the end of the source string are removed. If BOTH is specified, or the trim specifier is left out entirely, then repetitions are removed from both the beginning and end of the source string.

If the trim-character is not explicitly specified, it defaults to the space character (' '). Only one trim character is allowed; specifying an empty string or a string longer than one character results in an exception.

If either input is null, null is returned.

## Examples
<a name="sql-reference-trim-examples"></a>


| Function | Result | 
| --- | --- | 
|  <pre>TRIM(' Trim front and back ')</pre>  |  'Trim front and back'  | 
|  <pre>TRIM (BOTH FROM ' Trim front and back ')</pre>  |  'Trim front and back'  | 
|  <pre>TRIM (BOTH ' ' FROM ' Trim front and back ')</pre>  |  'Trim front and back'  | 
|  <pre>TRIM (LEADING 'x' FROM 'xxxTrim frontxxx')</pre>  |  'Trim frontxxx'  | 
|  <pre>TRIM (TRAILING 'x' FROM 'xxxTrimxBackxxx')</pre>  |  'xxxTrimxBack'  | 
|  <pre>TRIM (BOTH 'y' FROM 'xxxNo y to trimxxx')</pre>  |  'xxxNo y to trimxxx'  | 

# UPPER
<a name="sql-reference-upper"></a>

```
< UPPER ( <character-expression> )
```

Converts a string to all upper-case characters. Returns null if the input argument is null, and the empty string if the input argument is an empty string.

## Examples
<a name="sqlrf-upper-examples"></a>


| Function | Result | 
| --- | --- | 
|  UPPER('abcDEFghi123')  |  ABCDEFGHI123  | 