Oracle中的varchar2类型及其字符串截取

1、varchar2类型介绍

1.1 Oracle数据库常用的数据类型

介绍varchar2类型的同时,我们顺便了解下Oracle数据库支持的其他数据类型。
如下摘自Oracle官方文档Built-in Data Type Summary,简单介绍了Oracle中常用的数据类型。

Code Data Type Description Memo
1 VARCHAR2(size [BYTE | CHAR]) Variable-length character string having maximum length size bytes or characters. Maximum size is 4000 bytes or characters, and minimum is 1 byte or 1 character. You must specify size for VARCHAR2.BYTE indicates that the column will have byte length semantics. CHAR indicates that the column will have character semantics. varchar2中指定的长度参数,指的是最多可以存储的长度,实际存储超过这个长度的内容会报错。类型中var的意思是说,存储小于该长度的内容时,实际占用的存储根据具体存储内容长度而定。比如定义varchar2(10)实际存储内容为3就占用长度为3的存储空间。相比于定长的char,可以节省存储空间。
1 NVARCHAR2(size) Variable-length Unicode character string having maximum length size characters. The number of bytes can be up to two times size for AL16UTF16 encoding and three times size for UTF8 encoding. Maximum size is determined by the national character set definition, with an upper limit of 4000 bytes. You must specify size for NVARCHAR2. 主要是用来存储Unicode字符的
2 NUMBER [ (p [, s]) ] Number having precision p and scale s. The precision p can range from 1 to 38. The scale s can range from -84 to 127. Both precision and scale are in decimal digits. A NUMBER value requires from 1 to 22 bytes.
2 FLOAT [(p)] A subtype of the NUMBER data type having precision p. A FLOAT value is represented internally as NUMBER. The precision p can range from 1 to 126 binary digits. A FLOAT value requires from 1 to 22 bytes.
8 LONG Character data of variable length up to 2 gigabytes, or 231 -1 bytes. Provided for backward compatibility.
12 DATE Valid date range from January 1, 4712 BC, to December 31, 9999 AD. The default format is determined explicitly by the NLS_DATE_FORMAT parameter or implicitly by the NLS_TERRITORY parameter. The size is fixed at 7 bytes. This data type contains the datetime fields YEAR, MONTH, DAY, HOUR, MINUTE, and SECOND. It does not have fractional seconds or a time zone. 表示的日期范围可以是公元前4712年1月1日至公元9999年12月31日。You can add and subtract constants to and from a DATE value, and these numbers will be interpreted as numbers of days. For example, SYSDATE+1 will be tomorrow. You cannot multiply or divide DATE values.
100 BINARY_FLOAT 32-bit floating point number. This data type requires 4 bytes.
101 BINARY_DOUBLE 64-bit floating point number. This data type requires 8 bytes.
180 TIMESTAMP [(fractional_seconds_precision)] Year, month, and day values of date, as well as hour, minute, and second values of time, where fractional_seconds_precision is the number of digits in the fractional part of the SECOND datetime field. Accepted values of fractional_seconds_precision are 0 to 9. The default is 6. The default format is determined explicitly by the NLS_TIMESTAMP_FORMAT parameter or implicitly by the NLS_TERRITORY parameter. The size is 7 or 11 bytes, depending on the precision. This data type contains the datetime fields YEAR, MONTH, DAY, HOUR, MINUTE, and SECOND. It contains fractional seconds but does not have a time zone.
181 TIMESTAMP [(fractional_seconds_precision)] WITH TIME ZONE All values of TIMESTAMP as well as time zone displacement value, where fractional_seconds_precision is the number of digits in the fractional part of the SECOND datetime field. Accepted values are 0 to 9. The default is 6. The default format is determined explicitly by the NLS_TIMESTAMP_FORMAT parameter or implicitly by the NLS_TERRITORY parameter. The size is fixed at 13 bytes. This data type contains the datetime fields YEAR, MONTH, DAY, HOUR, MINUTE, SECOND, TIMEZONE_HOUR, and TIMEZONE_MINUTE. It has fractional seconds and an explicit time zone.
231 TIMESTAMP [(fractional_seconds_precision)] WITH LOCAL TIME ZONE All values of TIMESTAMP WITH TIME ZONE, with the following exceptions: Data is normalized to the database time zone when it is stored in the database. When the data is retrieved, users see the data in the session time zone. The default format is determined explicitly by the NLS_TIMESTAMP_FORMAT parameter or implicitly by the NLS_TERRITORY parameter. The size is 7 or 11 bytes, depending on the precision.
182 INTERVAL YEAR [(year_precision)] TO MONTH Stores a period of time in years and months, where year_precision is the number of digits in the YEAR datetime field. Accepted values are 0 to 9. The default is 2. The size is fixed at 5 bytes.
183 INTERVAL DAY [(day_precision)] TO SECOND [(fractional_seconds_precision)] Stores a period of time in days, hours, minutes, and seconds, where day_precision is the maximum number of digits in the DAY datetime field. Accepted values are 0 to 9. The default is 2. fractional_seconds_precision is the number of digits in the fractional part of the SECOND field. Accepted values are 0 to 9. The default is 6. The size is fixed at 11 bytes.
23 RAW(size) Raw binary data of length size bytes. Maximum size is 2000 bytes. You must specify size for a RAW value.
24 LONG RAW Raw binary data of variable length up to 2 gigabytes.
69 ROWID Base 64 string representing the unique address of a row in its table. This data type is primarily for values returned by the ROWID pseudocolumn.
208 UROWID [(size)] Base 64 string representing the logical address of a row of an index-organized table. The optional size is the size of a column of type UROWID. The maximum size and default is 4000 bytes.
96 CHAR [(size [BYTE | CHAR])] Fixed-length character data of length size bytes or characters. Maximum size is 2000 bytes or characters. Default and minimum size is 1 byte. BYTE and CHAR have the same semantics as for VARCHAR2.
96 NCHAR[(size)] Fixed-length character data of length size characters. The number of bytes can be up to two times size for AL16UTF16 encoding and three times size for UTF8 encoding. Maximum size is determined by the national character set definition, with an upper limit of 2000 bytes. Default and minimum size is 1 character.
112 CLOB A character large object containing single-byte or multibyte characters. Both fixed-width and variable-width character sets are supported, both using the database character set. Maximum size is (4 gigabytes - 1) * (database block size).
112 NCLOB A character large object containing Unicode characters. Both fixed-width and variable-width character sets are supported, both using the database national character set. Maximum size is (4 gigabytes - 1) * (database block size). Stores national character set data.
113 BLOB A binary large object. Maximum size is (4 gigabytes - 1) * (database block size).
114 BFILE Contains a locator to a large binary file stored outside the database. Enables byte stream I/O access to external LOBs residing on the database server. Maximum size is 4 gigabytes.

需要注意的是,如果在声明varchar2类型长度的时候,没有显式指定是BYTE还是CHAR,缺省的情况取决于数据库参数NLS_LENGTH_SEMANTICS。具体可以通过如下sql查询,如下是一个例子:

select name,value from v$parameter where name='nls_length_semantics';

结果:

NAME VALUE
nls_length_semantics BYTE

在这个例子中,缺省的情况,就是按照字节长度来计算。

1.2 Oracle中varchar2和varchar的区别

截止现在,varchar和varchar2是没有区别的。
按照ANSI的标准,varchar在Oracle中是被保留的,用来后续支持区分NULL和空字符串。VARCHAR2现在不区分NULL和空字符串,以后也不会区分。如果程序对于NULL和空字符串等同有依赖,就应该使用varchar2类型。

2、Oracle中截取字符串

2.1 substr函数

官网说明如下:

{ SUBSTR
| SUBSTRB
| SUBSTRC
| SUBSTR2
| SUBSTR4
}
(char, position [, substring_length ])

The SUBSTR functions return a portion of char, beginning at character position, substring_length characters long. SUBSTR calculates lengths using characters as defined by the input character set. SUBSTRB uses bytes instead of characters. SUBSTRC uses Unicode complete characters. SUBSTR2 uses UCS2 code points. SUBSTR4 uses UCS4 code points.
If position is 0, then it is treated as 1.
If position is positive, then Oracle Database counts from the beginning of char to find the first character.
If position is negative, then Oracle counts backward from the end of char.
If substring_length is omitted, then Oracle returns all characters to the end of char. If substring_length is less than 1, then Oracle returns null.
char can be any of the datatypes CHAR, VARCHAR2, NCHAR, NVARCHAR2, CLOB, or NCLOB. Both position and substring_length must be of datatype NUMBER, or any datatype that can be implicitly converted to NUMBER, and must resolve to an integer. The return value is the same datatype as char. Floating-point numbers passed as arguments to SUBSTR are automatically converted to integers.
用法介绍:

substr 函数:截取字符串
语法:SUBSTR(string,start, [length])
string:表示源字符串,即要截取的字符串。
start:开始位置,从1开始查找。如果start是负数,则从string字符串末尾开始算起。
length:可选项,表示截取字符串长度。

下面是几个例子:

SELECT '截取所有字符串' func, 'SUBSTR(''Hello SQL!'', 1)' expr, SUBSTR('Hello SQL!', 1) expr_val FROM dual union all
SELECT '从第2个字符开始,截取到末尾' func, 'SUBSTR(''Hello SQL!'', 2)' expr, SUBSTR('Hello SQL!', 2) expr_val FROM dual union all
SELECT '从倒数第4个字符开始,截取到末尾' func, 'SUBSTR(''Hello SQL!'', -4)' expr, SUBSTR('Hello SQL!', -4) expr_val FROM dual union all
SELECT '从第3个字符开始,截取6个字符' func, 'SUBSTR(''Hello SQL!'', 3, 6)' expr, SUBSTR('Hello SQL!', 3, 6) expr_val FROM dual union all
SELECT '从倒数第4个字符开始,截取3个字符' func, 'SUBSTR(''Hello SQL!'', -4, 3)' expr, SUBSTR('Hello SQL!', -4, 3) expr_val FROM dual union all
SELECT '从截取开始位置超过字符串长度' func, 'SUBSTR(''Hello SQL!'', 20, 3)' expr, SUBSTR('Hello SQL!', 20, 3) expr_val FROM dual

结果如下:


substr例子

注:上图中的-表示空。

2.2 根据指定字符出现的位置截取字符串

这里需要用到oracle中的instr函数,用法如下。

instr 函数:返回子字符串在源字符串中的位置
语法:INSTR(string,child_string,[start],[show_time])
string:表示源字符串。
child_string:子字符串,即要查找的字符串。
start:可选项,开始位置,默认从1开始。如果为负数,则从右向左检索。
show_time:可选项,表示子字符串第几次出现在源字符串当中,默认第1次,负数则报错。
返回指定字符出现的位置从1开始,如果找不到返回0

下面是几个例子:

SELECT '查找字符串中l出现的位置' func, 'INSTR(''Hello SQL!'', ''l'')' expr, INSTR('Hello SQL!', 'l') expr_val FROM dual union all
SELECT '查找字符串中H出现的位置' func, 'INSTR(''Hello SQL!'', ''H'')' expr, INSTR('Hello SQL!', 'H') expr_val FROM dual union all
SELECT '查找字符串中l出现的位置' func, 'INSTR(''Hello SQL!'', ''l'', 1)' expr, INSTR('Hello SQL!', 'l', 1) expr_val FROM dual union all
SELECT '查找字符串中ll出现的位置' func, 'INSTR(''Hello SQL!'', ''ll'')' expr, INSTR('Hello SQL!', 'll') expr_val FROM dual union all
SELECT '查找字符串中aa出现的位置' func, 'INSTR(''Hello SQL!'', ''aa'')' expr, INSTR('Hello SQL!', 'aa') expr_val FROM dual union all
SELECT '查找字符串中l第2次出现的位置' func, 'INSTR(''Hello SQL!'', ''l'', 1, 2)' expr, INSTR('Hello SQL!', 'l', 1, 2) expr_val FROM dual

结果如下:


instr例子

结合instr比如我们想截取某个字符出现位置之前的字符,下面是一个例子。

-- 截取空格前的内容
SELECT SUBSTR('Hello SQL!', 1, INSTR('Hello SQL!', ' ')-1) FROM dual --Hello
-- 截取空格后的内容
SELECT SUBSTR('Hello SQL!', INSTR('Hello SQL!', ' ')+1) FROM dual --SQL!

2.3 substrb按字节截取字符串

2.3.1 Oracle数据中文占几个字节

可以通过查看如下参数(查询当前数据库的字符集)来了解当前的oracle环境中一个汉字占几个字节:

SELECT * FROM v$nls_parameters WHERE PARAMETER='NLS_CHARACTERSET';

如果value=ZHS16GBK,那么一个汉字占用2个字节;如果value=AL32UTF8,那么一个汉字占用3个字节。
另外一个参数也可以:

select userenv('language') from dual;

结果如下:

USERENV('LANGUAGE')
AMERICAN_AMERICA.AL32UTF8

这里AL32UTF8说明当前环境下,一个汉字占3个字节。
实际上,也可以通过语句进行测试,如下:

select 'length(''美喵泡泡'')' expr, length('美喵泡泡') expr_val from dual union all
select 'lengthb(''美喵泡泡'')' expr, lengthb('美喵泡泡') expr_val from dual 

结果如下:

expr expr_val
length('美喵泡泡') 4
lengthb('美喵泡泡') 12

2.3.2 Oracle多字节数据截取

如下是几个跨字节截取的例子:

select 'dump(''美喵泡泡'')' expr, dump('美喵泡泡') expr_val from dual union all
select 'substrb(''美喵泡泡'', 1, 3)' expr, substrb('美喵泡泡', 1, 3) expr_val from dual union all
select 'dump(substrb(''美喵泡泡'', 1, 3))' expr, dump(substrb('美喵泡泡', 1, 3)) expr_val from dual union all
select 'substrb(''美喵泡泡'', 1, 5)' expr, substrb('美喵泡泡', 1, 5) expr_val from dual union all
select 'dump(substrb(''美喵泡泡'', 1, 5))' expr, dump(substrb('美喵泡泡', 1, 5)) from dual union all
select 'substr(substrb(''美喵泡泡'', 1, 5), 1, 2)' expr, substr(substrb('美喵泡泡', 1, 5), 1, 2) expr_val from dual union all
select 'dump(substr(substrb(''美喵泡泡'', 1, 5), 1, 2))' expr, dump(substr(substrb('美喵泡泡', 1, 5), 1, 2)) expr_val from dual union all
select 'substr(substrb(''美喵泡泡'', 1, 5), 1, 3)' expr, substr(substrb('美喵泡泡', 1, 5), 1, 3) expr_val from dual union all
select 'dump(substr(substrb(''美喵泡泡'', 1, 5), 1, 3))' expr, dump(substr(substrb('美喵泡泡', 1, 5), 1, 3)) expr_val from dual union all
select 'substr(substrb(''美喵泡泡'', 1, 5), 1, 4)' expr, substr(substrb('美喵泡泡', 1, 5), 1, 4) expr_val from dual union all
select 'dump(substr(substrb(''美喵泡泡'', 1, 5), 1, 4))' expr, dump(substr(substrb('美喵泡泡', 1, 5), 1, 4)) expr_val from dual

结果如下:


按字节截取例子

这个地方还是需要注意的,按照字节截取的时候,落单的两个字节会被oracle识别为两个字符。

参考资料

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 213,616评论 6 492
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,020评论 3 387
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 159,078评论 0 349
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,040评论 1 285
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,154评论 6 385
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,265评论 1 292
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,298评论 3 412
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,072评论 0 268
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,491评论 1 306
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,795评论 2 328
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,970评论 1 341
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,654评论 4 337
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,272评论 3 318
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,985评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,223评论 1 267
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,815评论 2 365
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,852评论 2 351