Regular expression patterns [[:upper:]] vs [A-Z]
Note: the comparisons in this article also applies to [[:lower:]]
vs [a-z]
regexp patterns.
Oracle regular expressions (regexp) support both [[:upper:]]
or [A-Z]
to find uppercase letters. At first glance they appear the same. Even regex101.com defines them as the same:
[[:upper:]]
: Matches uppercase letters. Equivalent to [A-Z]. The double square brackets is not a typo, POSIX notation demands it.
There is a slight difference between the two. [A-Z]
only deals with the 26 letters in the English alphabet whereas [[:upper:]]
deals with special alphabet characters such as Ê
- accent circumflex (or as we learned in French glass "e avec un chapeau "). The following example highlights the differences using the demo Oracle emp
table:
-- Change the "A" in Martin to A with an accent on it
update emp
set ename = 'MÄRTIN'
where empno = 7654;
-- [A-Z]
select *
from emp
where 1=1
and empno = 7654
and regexp_like(ename, '^[A-Z]+$')
;
-- Returns
/*
No data found
*/
-- [[:upper:]]
select ename
from emp
where 1=1
and empno = 7654
and regexp_like(ename, '^[[:upper:]]+$')
;
-- Returns:
/*
ENAME
------
MÄRTIN
*/
-- Look at ASCII characters
select ename, dump(ename)
from emp
where empno = 7654
;
/*
ENAME DUMP(ENAME)
------ -----------------------------------
MÄRTIN Typ=1 Len=7: 77,195,132,82,84,73,78
*/
-- You can see the second characters out of normal a-Z ASCII characters
-- Reset
update emp
set ename = 'MARTIN'
where empno = 7654
;
As you can see the results are different and [[:upper:]]
matched the special characters. The following description from this Stackoverflow post highlights the differences:
[A-Z]
matches only an ASCII uppercase letter, that is, a letter from A through Z. There are other, non-ASCII uppercase letters (e.g., in languages other than English).
If you use regular expressions in your code do not go change everything from [A-Z]
without consideration. They're some times where you may want to keep it in place (example lookup codes, etc). I tend to use [[:upper:]]
for when dealing with user inputed fields when it makes sense.