This post basically is a collection of resources.
1. What're URI, URN and URL?A URI is a uniform resources
identifier reference. A URL is a uniform resource
locator. URNs name resources but do not specify how to locate them. The
mailto,
news, and
isbn URIs shown above are examples of URNs. URLs and URNs both fall into URIs. But normally URNs and URLs doesn't interset. Refer to
RFC 2396: Uniform Resource Identifiers (URI): Generic Syntax for the syntax of a URI instance. And the
java class description of URI gives a very good summary of URI, besides providing some useful resources.
2. SyntaxBelow is a little breakdown on the syntax of URI, which is directly taken from the priviously mentioned JAVA API spec.
At the highest level a URI reference (hereinafter simply "URI") in string form has the syntax:
[
scheme:]
scheme-specific-part [
#fragment]
where square brackets [...] delineate optional components and the characters
: and
# stand for themselves. With a further breakdown on
scheme-specific-part we can get a URI in the string form:
[
scheme:]
[//authority][path][?query] [
#fragment]
where the characters
:,
/,
?, and
# stand for themselves. With a further breakdown on authority we can get a more general form:
[
scheme:]
[// [
user-info@]
host[
:port]
] [path] [?query] [
#fragment]
where the characters
@ and
: stand for themselves.
Note: a scheme of a URL is called a protocol, such as "http" and "ftp".
3. ExamplesAn
opaque URI is an absolute URI whose scheme-specific part does not begin with a slash character (
'/'). Opaque URIs are not subject to further parsing. Some examples of opaque URIs are:
mailto:java-net@java.sun.com /*URN*/
news:comp.lang.java /*URN*/
urn:isbn:096139210x /*URN*/
A
hierarchical URI is either an absolute URI whose scheme-specific part begins with a slash character, or a relative URI, that is, a URI that does not specify a scheme. Some examples of hierarchical URIs are:
http://java.sun.com/j2se/1.3/ /*URL*/
docs/guide/collections/designfaq.html#28
../../../demo/jfc/SwingSet2/src/SwingSet2.java
file:///~/calendar
Also URIs are categorized as absolute URIs and relative URIs. An
absolute URI specifies a scheme, and a URI that is not absolute is called a
relative URI.
Among above examples comments in the form of "/*...*/" tell which are URLs and which are URNs.
4. Character categoriesAs specified in resources accessible, we quote some that matters:
RFC 2396 specifies precisely which characters are permitted in the various components of a URI reference. The following categories, most of which are taken from that specification, are used below to describe these constraints:
| alpha | The US-ASCII alphabetic characters, 'A' through 'Z' and 'a' through 'z' |
|---|
| digit | The US-ASCII decimal digit characters, '0' through '9' |
|---|
| alphanum | All alpha and digit characters |
|---|
| unreserved | All alphanum characters together with those in the string "_-!.~'()*" |
|---|
| punct | The characters in the string ",;:$&+=" |
|---|
| reserved | All punct characters together with those in the string "?/[]@" |
|---|
| escaped | Escaped octets, that is, triplets consisting of the percent character ('%') followed by two hexadecimal digits ('0'-'9', 'A'-'F', and 'a'-'f') |
|---|
| other | The Unicode characters that are not in the US-ASCII character set, are not control characters (according to the Character.isISOControl method), and are not space characters (according to the Character.isSpaceChar method) (Deviation from RFC 2396, which is limited to US-ASCII) |
|---|
The set of all legal URI characters consists of the
unreserved,
reserved,
escaped, and
other characters.
Escaped octets, quotation, encoding, and decoding
RFC 2396 allows escaped octets to appear in the user-info, path, query, and fragment components. Escaping serves two purposes in URIs:
- To encode non-US-ASCII characters when a URI is required to conform strictly to RFC 2396 by not containing any other characters.
- To quote characters that are otherwise illegal in a component. The user-info, path, query, and fragment components differ slightly in terms of which characters are considered legal and illegal.
These purposes are served in this class by three related operations:
- A character is encoded by replacing it with the sequence of escaped octets that represent that character in the UTF-8 character set. The Euro currency symbol ('\u20AC'), for example, is encoded as "%E2%82%AC". (Deviation from RFC 2396, which does not specify any particular character set.)
- An illegal character is quoted simply by encoding it. The space character, for example, is quoted by replacing it with "%20". UTF-8 contains US-ASCII, hence for US-ASCII characters this transformation has exactly the effect required by RFC 2396.
- A sequence of escaped octets is decoded by replacing it with the sequence of characters that it represents in the UTF-8 character set. UTF-8 contains US-ASCII, hence decoding has the effect of de-quoting any quoted US-ASCII characters as well as that of decoding any encoded non-US-ASCII characters. If a decoding error occurs when decoding the escaped octets then the erroneous octets are replaced by '\uFFFD', the Unicode replacement character.
5. Java Impl.URI,
URL,
URLClassLoader,
URLConnection,
URLDecoder,
URLEncoder and
URLStreamHandler, which are all located in package
java.net.
Resources:- Uniform Resource Identifiers (URI): Generic Syntax
- Format for Literal IPv6 Addresses in URL's
- JavaTM 2 Platform Standard Edition 5.0 API Specification
Technorati :
Java,
URI,
URL,
URNDel.icio.us :
Java,
URI,
URL,
URN