A little bit more...

Tuesday, December 26, 2006

对编译原理的一点总结(Beta)

编译原理本人学的很差,虽然这是比较专业的课程。像大多数很多课程一样,考试的话虽然我都拿不到优秀,但是我对这些课程的内容都有浓厚兴趣(估计没 人信,有兴趣还学的差?谁信啊?!)。所以,我时常会看一看,总结一下思路(只总结一部分,其他以后看到再补充),记下来,下次看的时候进入状态快一点, 有错误欢迎指正,这个写的东西类似于学习笔记。

1. 概念、基础

首先编程语言分为编译型语言(或者静态(static)、半静态,半动态语言,如c/c++)和解释型语言(或动态(dynamic)语言,如 c#/java/大多数脚本语言),对这两种语言来说”编译”所做的工作是有所不同的。高层次的语言解释执行是一种趋势,现代的面向对象语言一般都是或者 将会设计成解释型语言,因为动态性虽然带来的性能的损失,但是随着计算机性能的提高我们通常会愿意用性能会换取动态语言的灵活性。下面我将会尝试说明两种 语言的编译过程,并在必要的时候指出某个过程是针对哪种语言的。

说一件趣事,会对你立即编译有帮助,编译在英文里面是compile,这个单词不仅用在计算机方面,还用在书籍出版方面,编译汇总一本书籍然后出 版,所以compile基本上是把分散的东西(你自己的多个程序源码文件、程序语言库等)汇总起来(一个目标代码程序)使之有秩序并能为之所用。

编译的流程一般分为以下几个过程。预处理(preprocess,有时候包括在词法分析中);词法分析(lexical analysis or scanner);语法分析(syntax analysis);语义分析(semantic analysis);代码优化(code optimatization);目标代码生产(code generation)。通常有两个数据结构与所有这些过程交互,符号表(symbol table)和出错管理器。这些过程一般都以相应模块出现在编译器中。与编译有关的过程还有链接(linking)和加载(loading)。但是对于解 释型语言上面的过程可以归于编译阶段最多到语法分析,其他的过程都是在解释执行时完成的,包括链接、加载、和符号表和出错管理器的交互。

编译器的架构一般采用管道模式(pipe line design pattern),可以想想bash命令行的使用中的管道模式,但是区别是编译器里面有”遍(pass)”的概念。

编译技术为了重用和方便建造,把编译阶段分为前端和后端,有一套理论(如自展)来指导如何用一种语言来生成另一种语言的编译程序,例如如何用同一种语言来生成自身语言的编译程序。

2. 预处理、词法分析

对于编译型语言和解释型语言来说,这两个过程大体相似。

预处理完成的基本上是清洁、整理源代码的作用,包括删除注释,替换宏(或者等价的inline结构)等等。现代的程序语言有一种趋势,把越来越多的 工作尽可能提到编译的前阶段来处理(只要是理论上允许,不对整个编译架构产生影响),如预处理程序能检查保留字的拼写,简单的语法错误等等。

对于注释,现代的程序语言越来越倾向于采取可识别语义的注释,如Java中的javadoc格式和annotation。这些注释本身就是另一种语言,并有可能(如annotation)对它注释的语义有补充、限制的作用。

词法分析完成的工作主要是把程序源代码中出现关键字、标识符、常量等用一种中间表示替代,在这一过程中主要涉及到符号表和出错管理器就会被建立。

3. 语法分析

4. 语义分析

5. 代码优化

6. 目标代码生成

7. 加载

8. 解释型语言的编译、加载、执行过程

下面大体以Java为代表说一下解释型语言的编译过程。

调用Javac把程序源代码java文件处理成二进制的class文件,这就是Java全部的所谓的编译过程,class文件可以理解为一种中间表示,JVM规范中定义了其格式,在class文件里面会建立常量表。

当用Java命令运行java程序(class 文件)时,JVM会接管从class文件到能够使你的程序运行起来、接受用户输入的全部过程。

JVM首先会用内建的bootstrap class loader加载一些必须的核心类(如rt.jar, i18n.jar等),这个加载过程对于运行任何java程序都是一样的。接着会使用Extension loader(不确定是属于bootstrap class loader范畴还是属于System class loader范畴)加载JAVA_HOME\LIB\EXT目录下的类,最后按照一定顺序和需要加载位于CLASSPATH上的类。可以在运行java程 序时使用下面的参数看详细的加载过程。

Setting the parameter -verbose:class on the java command line prints a trace of the class loading process.

关于加载器的整个层次架构可以看资源里面的相应的文章。加载后的每一个类会用一个Class类的实例表示,这些Class实例就是JVM内部可以识 别的关于类的所有信息。一个类加载器加载的类对这个加载器和他所有的子加载器可见,对其他的类加载器不可见,也就是说如果这些不可见的类加载器也加载了同 样的类,那么这两个类实例在JVM内部会被识别成两个类。

加载后进行的一个过程是链接(linking),包括验证(verification,验证加载的class是否是well-formed),准备 (preparation,为静态存储和JVM中内部使用的数据结构如method talbe(类比C++中的VTABLE)分配空间),解析(Resolution,解析一个类中引用到的所有类,这可能又包括一个递归的加载、链接过 程)。

之后进行的是一个初始化过程,包括静态变量的初始器(initializer)和静态结构(如static {})的初始器。接下来进行新的类实例的创建并进入程序的运行状态。

下面把这个过程和编译型语言的编译过程进行一个非常不严格的类比,以方便理解。

类加载创建Class实例和分配空间的初始化过程可能有点像编译型语言的代码生成过程,生成可以运行的二进制程序,但是生成的二进制程序不会进行固 化存储。链接过程中的验证执行的是一部分的语法分析和全部的语义分析,这可能和编译型语言的编译过程倒过来了。最后的类实例的创建过程可能和编译型语言的 加载二进制程序文件(类比Class的实例)并运行的过程有点类似,包括JVM也会类似于活动记录(Activation Record)的Method Frame结构,包括动态链、静态链,这些结构JVM也都会创建。

(临时发布草稿,还会不断更新补充)

Resources:

  1. 侯文永,张冬茉,《编译原理》,电子工业出版社
  2. The JavaTM Virtual Machine Specification Second Edition
  3. The Java Language Specification, Third Edition
  4. 金山词霸
  5. Java programming dynamics, Part 1: Classes and class loading
  6. Understanding Extension Class Loading
  7. Bootstrap class loader

Technorati : , , , ,
Del.icio.us : , , , ,
Zooomr : , , , ,

Thursday, December 21, 2006

关于正则表达式

说明:作者放弃本文的所有权力,本文摘录自其他资料,仅作自己学习之用,请尊重摘录源的作者的版权。

1. 基本概念

正则表达式最早是由数学家Stephen Kleene于1956年提出,正则表达式并非一门专用语言,但它可用于在一个文件或字符里查找和替代文本的一种标准。正则表达式经过几个时期的发展,现在的标准已经被ISO(国际标准组织)批准和被Open Group组织认定。

2. 正则表达式基础

正则表达式由一些普通字符和一些元字符(metacharacters)组成。普通字符包括大小写的字母和数字,而元字符则具有特殊的含义。
元字符
描述



.
匹配任何单个字符。例如正则表达式r.t匹配这些字符串:ratrutr t,但是不匹配root
$
匹配行结束符。例如正则表达式weasel$ 能够匹配字符串"He's a weasel"的末尾,但是不能匹配字符串"They are a bunch of weasels."。
^
匹配一行的开始。例如正则表达式^When in能够匹配字符串"When in the course of human events"的开始,但是不能匹配"What and When in the"。
*
匹配0或多个正好在它之前的那个字符。例如正则表达式.*意味着能够匹配任意数量的任何字符。
\
这是引用府,用来将这里列出的这些元字符当作普通的字符来进行匹配。例如正则表达式\$被用来匹配美元符号,而不是行尾,类似的,正则表达式\.用来匹配点字符,而不是任何字符的通配符。
[ ]
[c1-c2]
[^c1-c2]

匹配括号中的任何一个字符。例如正则表达式r[aou]t匹配ratrotrut,但是不匹配ret。可以在括号中使用连字符-来指定字符的区间,例如正则表达式 [0-9]可以匹配任何数字字符;还可以制定多个区间,例如正则表达式[A-Za-z]可以匹配任何大小写字母。另一个重要的用法是"排除",要想匹配 除了指定区间之外的字符--也就是所谓的补集--在左边的括号和第一个字符之间使用^字符,例如正则表达式[^269A-Z] 将匹配除了2、6、9和所有大写字母之外的任何字符。
\< \>
匹配词(word)的开始(\<)和结束(\>)。例如正则表达式\能够匹配字符串"for the wise"中的"the",但是不能匹配字符串"otherwise"中的"the"。注意:这个元字符不是所有的软件都支持的。
\( \)
将 \( 和 \) 之间的表达式定义为"组"(group),并且将匹配这个表达式的字符保存到一个临时区域(一个正则表达式中最多可以保存9个),它们可以用 \1\9 的符号来引用。
|
将两个匹配条件进行逻辑"或"(Or)运算。例如正则表达式(him|her) 匹配"it belongs to him"和"it belongs to her",但是不能匹配"it belongs to them."。注意:这个元字符不是所有的软件都支持的。


3. 程序语言或环境的支持

4. 实例

临时发布,待续

Resources:
  1. 揭开正则表达式语法的神秘面纱

  2. 正则表达式之道

Technorati : , , ,
Del.icio.us : , , ,

Monday, November 20, 2006

Basics of Javascript

Note:
My recent posts about basics or overview of something
mostly cite select matirials of sources listed in the Resources section of every
post. It only serves for personal study and learning. And if you like, you can
take any part or all of them as desired. It would be my pleasure.

Overview

Javascript is an html scripting language. In the official specification it is
called ECMAScript.

Built-in Features

Datatypes and Values

All numbers in JavaScript are represented as 64-bit floating-point values
(i.e., similar to double in java and C++).

Conversion between Strings and Numbers can be done in several ways in both
direction. Numbers are automatically converted to strings when needed, so are
strings converted to numbers.

Numbers to strings:

var n = 100;
var s = n + " bottles of beer.";

var n_as_string = n + "";

var string_value = String(number);

string_value = number.toString();

Strings to numbers:

var product = "21" * "2"; // get number 42

var number = string_value - 0;
(Note: adding zero to a string value
results in string concatenation)

var number = Number(string_value);

// And parseInt(), parseFloat.

In JavaScript, functions are values that can be manipulated
by JavaScript code. It means that functions can be stored in variables, arrays,
and objects, and it means that functions can be passed as arguments to other
functions.

Functions can be defined in three ways:

function square(x) { return x*x;}

var square = function(x) { return x*x; }
// function name here is
optional.

var square = new Function("x", "return x*x");
// awkward, less useful and
less efficient.

An object is a collection of named values. These named values are usually
referred to as properties of the object. Properties of objects are, in
many ways, just like JavaScript variables; they can contain any type of data,
including arrays, functions, and other objects. Objects in JavaScript can serve
as associative arrays (recall the same concept in Delphi/Pascal, if you
know that language); that is, they can associate arbitrary data values with
arbitrary strings.

image.width
image.height

image["width"]
image["height"]

Arrays may contain any type of JavaScript data, including references to other
arrays or to objects or functions. Also note that
JavaScript does not support multidimensional arrays,
except as arrays of arrays. Finally, because JavaScript is an untyped language,
the elements of an array do not all need to be
of the same type
, as they do in typed languages like Java.

A corresponding object class is defined for each of the three key
primitive datatypes
. That is, besides supporting the number, string,
and boolean datatypes, JavaScript also supports Number, String, and Boolean
classes. JavaScript can flexibly convert values from one type to another. When
you use a string in an object contexti.e., when you try to access a property or
method of the string, JavaScript internally creates a String wrapper
object for the string value
. Note that the String object created when
you use a string in an object context is a transient one.

Primitive types are manipulated by value, and reference types, as the
name suggests, are manipulated by reference
. Numbers and booleans are
easily manipulated at the low levels of the JavaScript interpreter. Objects, on
the other hand, are reference types. Arrays and functions, which are specialized
types of objects, are therefore also reference types.

Since strings (primitive type, not the wrapper) are immutable in JavaScript,
there is no way to tell whether strings are passed by value or by reference.

Variables

There's no fundamental difference in JavaScript between variables and
the properties of objects
.

When the JavaScript interpreter starts up, one of the first things it
does, before executing any JavaScript code, is create a global
object
. The properties of this object are the
global variables of JavaScript programs. When you declare a global JavaScript
variable, what you are actually doing is defining a property of the global
object.

The JavaScript interpreter initializes the global object with a number of
properties that refer to predefined values and functions. For example, the
Infinity, parseInt, and Math properties refer to the
number infinity, the predefined parseInt( ) function, and the
predefined Math object, respectively.

In top-level code (i.e., JavaScript code that is not part of a function), you
can use the JavaScript keyword this to refer to the global
object
.

In client-side JavaScript, the Window object
serves as the global object
for all JavaScript code contained in the
browser window it represents. This global Window object has a self-referential
window property that can be used instead of this to refer to
the global object. The Window object defines the core global properties, such as
parseInt and Math, and also global client-side properties,
such as navigator and screen.

For local variables, while the body of a function is executing, the function
arguments and local variables are stored as properties of another special
object. This object is known as the call object.

Each time the JavaScript interpreter begins to execute a function, it creates
a new execution context for that function. Thus,
JavaScript code that is not part of any function runs in an execution context
that uses the global object for variable definitions. A JavaScript
implementation may allow multiple "global" execution contexts
. The
obvious example is client-side JavaScript, in which each separate browser
window, or each frame within a window, defines a separate global execution
context.

Object Support

ECMAScript does not contain proper classes such as those in C++, Smalltalk,
or Java. An ECMAScript object is an unordered collection of properties each with
zero or more attributes.

It turns out that every JavaScript object includes an internal
reference to another object, known as its prototype
object. All
functions have a prototype property that is automatically created and
initialized when the function is defined. The initial value of the
prototype property is an object with a single property. This property
is named constructor and refers back to the constructor function with
which the prototype is associated.

Property inheritance occurs only when you read property values, not
when you write them
. If you set the property p in an
object o that inherits that property from its prototype, what
happens is that you create a new property p directly in
o. Now that o has its own property named
p, it no longer inherits the value of p from
its prototype.

Navigator Object

The JavaScript
navigator object
is the object representation of the client internet browser
or web navigator program that is being used. This object is the top level object
to all others.

DOM Object

Overview

The goal of the DOM group is to define a programmatic interface for XML and
HTML. It is platform- and language-neutral interface. The DOM is separated into
three parts: Core, HTML, and XML. The Core DOM provides a low-level set of
objects that can represent any structured document.

DOM is being designed at several levels:

  • "Level 1. This concentrates on the actual core, HTML, and XML document
    models. It contains functionality for document navigation and manipulation.

  • Level 2. Includes a style sheet object model, and defines functionality for
    manipulating the style information attached to a document. It also enables
    traversals on the document, defines an event model and provides support for XML
    namespaces.

  • Level 3. Will address document loading and saving, as well as content models
    (such as DTDs and schemas) with document validation support. In addition, it
    will also address document views and formatting, key events and event groups.
    First public working drafts are available.

  • Further Levels. These may specify some interface with the possibly
    underlying window system, including some ways to prompt the user. They may also
    contain a query language interface, and address multithreading and
    synchronization, security, and repository."

Resources

  1. ECMAScript
    Language Specification 3rd edition

  2. Ajax
    in Action

  3. The
    CTDP JavaScript Manual Version 0.6.0, December 31, 2000

  4. W3C Document Object Model
    (DOM)

  5. DOM
    objects and methods

  6. JavaScript - The Definitive Guide, 5th Edition


This is a rough draft and published temporarily.

Wednesday, November 15, 2006

Study XSLT Tutorial

Overview

XSL = XML Style Sheets

XSL consists of three parts:

  • XSLT - a language for transforming XML documents

  • XPath - a language for navigating in XML documents

  • XSL-FO - a language for formatting XML documents


The root element that declares the document to be an XSL style sheet is <xsl:stylesheet> or <xsl:transform>.

Note: <xsl:stylesheet> and <xsl:transform> are completely synonymous and either can be used!

More Color On The Overview
An XSLT style sheet consists of a set of template rules, each of which takes the form "if this condition is encountered in the input, then generate the following output." The order of the rules is immaterial, and there is a conflict-resolution algorithm applied when several rules match the same input. One respect in which XSLT differs from serial text processing languages, however, is that the input is not processed sequentially line by line. Rather, the input XML document is treated as a tree structure, and each template rule is applied to a node in the tree. The template rule itself can decide which nodes to process next, so the input is not necessarily scanned in its original document order. [via]

Use XSL To Transform a XML Document

First declare the a xsl document and then define templates:

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h2>My CD Collection</h2>
<table border="1">
<tr bgcolor="#9acd32">
<th>Title</th>
<th>Artist</th>
</tr>
<xsl:for-each select="catalog/cd">
<tr>
<td><xsl:value-of select="title"/></td>
<td><xsl:value-of select="artist"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>

</xsl:stylesheet>

Then specify the stylesheet in your xml source document, simply like this: <?xml-stylesheet type="text/xsl" href="cdcat.xsl"?>.

The match attribute is used to associate a template with an XML element. But match="/" defines the whole document by associating the template with the root of the xml source document, in which the value of the match attribute is an XPath expression.

In the element <xsl:for-each select="catalog/cd">, "catalog/cd" matches (case-sensitive match, after all an xsl instance is an xml document.) the data structure in the xml document, i.e., the value of the select attribute (a little bit like "select" in SQL) is an XPath expression.

We can also filter the output from the XML file by adding some criterions to the select attribute in the <xsl:for-each> element.

<xsl:for-each select="catalog/cd[artist='Bob Dylan']">

Legal filter operators are:

  • = (equal)

  • != (not equal)

  • < less than

  • > greater than
Note: 'Bob Dylan' should match exactly what is between the <artist> and </artist>, including white spaces and line breaker.

We can use an <xsl:sort> element inside the <xsl:for-each> to sort the output.

To add an if statement use the syntax below:
<xsl:if test="expression">
...
...some output if the expression is true...
...
</xsl:if>
For example:
<xsl:for-each select="catalog/cd">
<xsl:if test="price > 10">
<tr>
<td><xsl:value-of select="title"/></td>
<td><xsl:value-of select="artist"/></td>
</tr>
</xsl:if>
</xsl:for-each>

See here for more conditional tests to filter the output by using <xsl:choose> and <xsl:when>.

Without its select attribute specified, <xsl:apply-templates> is used to apply any relevant template(s) to the matched node(s)'s children. While using this element's select attribute, you can be pickier about exactly which children of a node should be processed and in what order.

Referring to the xsl file directly in an xml docuemt requires that there be a XSLT aware browser.Actually, we could have alternatives for the transformation. First, we can use javascript on the client side to invoke a stand alone xml parser, such as MS XML Parser, to do the transformation. Second, we can also use server side scripting language (e.g., asp, jsp, python, etc) to do the transformation, which meets the cross browser needs.

The xsl:attribute element can be used to add attributes to result elements whether created by literal result elements in the stylesheet or by instructions such as xsl:element.

Little Tricks

1. Use <...select="@width"> to identify the attribute of an element, in which case width is the attribute name. The XPath expression ../@title selects the title attribute of the element that is the parent of the current node.

2. Use curly braces ({}) surrounding an expression to specifiy an attribute value template. e.g., <h1><a href="{../link}"><xsl:apply-templates/></a></h1> (".." may be meant to go to parent node of the current node). And see the following example for more details:
The following example creates an img result element from a photograph element in the source; the value of the src attribute of the img element is computed from the value of the image-dir variable and the string-value of the href child of the photograph element; the value of the width attribute of the img element is computed from the value of the width attribute of the size child of the photograph element:

<xsl:variable name="image-dir">/images</xsl:variable>

<xsl:template match="photograph">
<img src="{$image-dir}/{href}" width="{size/@width}"/>
</xsl:template>


With this source

<photograph>
<href>headquarters.jpg</href>
<size width="300"/>
</photograph>


the result would be

<img src="/images/headquarters.jpg" width="300"/>

3. The order in which various template rules appears in the stylesheet mean nothing to the XSLT processor.

4. The XSLT processor uses the most specific template it can find to process each node of the source tree. So template: <xsl: template match="*|@*|text()"> might do nothing if any other templates are defined, since it just matches any element, attribute and text nodes. And another example, in the existence of <xsl:template match="channel/title">, <xsl:template match="title"> might do nothing also.

Conclusion

Conceptually (the fact is almost the same most of the time), you can think of the transformation process with XSLT like this: the input xml source document is parsed as a source tree structure (DOM?), and another input, the style sheet is also parsed as a tree stucture, then it's the XSLT Processor's job to write the source tree as the result tree according to the stylesheet (mostly, template rules). Figure 1 illustrates the process.

Figure 1. Operation of an XSLT ProcessorOperation of an XSLT Processor

Resources:

1. XSLT Tutorial

2. XSL Transformations (XSLT) Version 1.0

3. 使用XML: XSLT 2.0和XQuery对比

4. What kind of language is XSLT?

5. Book: XSLT Quickly

6. Saxon: Anatomy of an XSLT processor

Tuesday, November 14, 2006

A Little Trick: XML Data Embedded in HTML

You can embed xml which contained data you want to display in a html document. The line of code does this embeding thing is like this:
<xml id="cdcat" src="cd_catalog.xml"></xml>

But there's a little trick. It requires that the xml source document's name reflect the structure of the xml document. For example, below is a fragment of the source document:
<?xml ...?>
<CATALOG>
<CD>
<...


For the xml document containing this fragment of codes should be referred as "cd_catalog.xml" in the embeding html document. So below is the whole example.

The XML document containing the data:
<?xml version="1.0" encoding="ISO-8859-1"?>
<CATALOG>
<CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>

<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>
<CD>
<TITLE>Hide your heart</TITLE>
<ARTIST>Bonnie Tyler</ARTIST>
<COUNTRY>UK</COUNTRY>

<COMPANY>CBS Records</COMPANY>
<PRICE>9.90</PRICE>
<YEAR>1988</YEAR>
</CD> ... </CATALOG>

The HTML document embeding the xml data:
<html>
<body>

<xml id="cdcat" src="cd_catalog.xml"></xml>

<table border="3" datasrc="#cdcat">

<tr>
<td><span datafld="ARTIST"></span></td>
<td><span datafld="TITLE"></span></td>
</tr>

</table>

</body>
</html>

Click this link to see the live example. And as the tutorial mentioned, it seems it only functions on IE 5.0 or later version, but not functions on Firefox.

Resourses:

XML Data Island

Monday, November 13, 2006

The Java SE 6 Platform Quiz

The following quiz answers cites The Java SE 6
Platform Quiz
:

1. What scripting language can you use in the Java SE 6 platform?

Answer (E): The Mozilla
Rhino
engine implements the JavaScript technology
scripting language and is available in the core Java Runtime Environment (JRE).
However, the scripting API allows you to use any scripting engine that conforms
with JSR 223.

2. What is the normalization of Unicode text?
Answer (C):
The Java SE 6 platform provides the public java.text.Normalizer
class, which allows you to convert text data to common composed or decomposed
forms, allowing for accurate comparisons and searches on text. Before the Java
SE 6 platform release, the Normalizer class had been hidden in the
Java platform. The class is now a public API.

3. How do you launch your host’s default browser to view a specific
URL?

Answer (B): The Desktop
API
allows your program to launch applications associated with certain file
types on the host platform. The current implementation can launch a web browser,
text editor, and email application.

4. How can I sort JTable content?
Answer
(D): A javax.swing.table.TableRowSorter wraps your existing
TableModel. You can configure it to filter or sort your
JTable contents.

5. What is the correct annotation to use to export a method as a web
service operation using Java API for XML Web Services (JAX-WS), version
2.0?

Answer (B): The @WebMethod annotation is used to
mark a method that is exposed as a web service operation. Note that the
@WebService annotation is used to specify that the class is a web
service or that the interface defines a web service. The programmer will likely
use the @WebService annotation in conjunction with the
@WebMethod annotation. See the article “Introducing
JAX-WS 2.0 With the Java SE 6 Platform, Part 1
” for more information.

6. In JDK 6, the JMX Monitor API now uses a thread pool to increase
performance. What is the purpose of the JMX Monitor API?

Answer (D):
The JMX
Monitor API
allows an application to sample an attribute property of an
MBean periodically and send a notification event if it passes a given threshold.
It now uses a thread pool instead of creating a thread for each monitor. Another
improvement is the ability to monitor a value within a complex type.

7. JDK 6 incorporates an advanced version of the
SwingWorker class into core Java technology. What is the purpose of
the SwingWorker class?

Answer (D): Since the 1998
publication of SwingWorker in the article “Threads
and Swing
,” developers have continuously requested that it be moved into
core. At the 2004 JavaOne conference, the Desktop team presented a new version
of SwingWorker that included generification, use of the concurrency
package, and PropertyChangeListener support. Much of this
functionality assists with interthread communication. The Java SE 6 platform
release incorporates a similar version of SwingWorker
that greatly assists developers in processing GUI-driven functionality off the
event-dispatching thread, indicating status and progress and aggregating the
results.

8. What is the best Java platform to use with the upcoming release of
the Microsoft Windows Vista operating system?
Answer (A): The Java
SE 6 platform release works best with the latest user interface (UI)
enhancements of Windows Vista. According to a recent blog entry by Chet Haase: “The
primary delivery of Java for Vista is Java SE 6; that release has received most
of our focus during the Vista beta release timeframe.” Go to the JDK 6 Project site to download the most
recent version. The release is pretty close to final, so it is working very well
at this point. All of the serious Windows Vista problems have been fixed in this
release for months, so it is a particularly good test vehicle for Java
technology on Vista.

9. In the Java SE 6 platform, what key tuning option(s) are needed to
achieve high performance?

Answer (D): See the blog entry “No Tuning
Required: Java SE Out-of-Box Vs. Tuned Performance
” for a comparison of
out-of-box and hand-tuned performance.

10. The Java SE 6 platform delivers a technology that can greatly
improve performance by reducing unnecessary synchronization overhead. It allows
a thread to lock and unlock an object with minimal use of atomic operations.
What is this technology called?

Answer (B): The technique called
store-free biased locking eliminates all synchronization-related atomic
operations on uncontended object monitors. The technique supports the bulk
transfer of object ownership from one thread to another, and the selective
disabling of the optimization where unprofitable, using epoch-based bulk
rebiasing and revocation. It has been implemented in the production version of
the Java HotSpot virtual machine (VM) and has yielded significant performance
improvements on a range of benchmarks and applications.

Three ways of validating a xml document with Java

With the rollout of Java 5.0 last year, JAXP 1.3 was in place for use. And one of the new features provided by JAXP 1.3 is a brand new Schema Validation Framework.

The newly provided framework decouples the validation of an instance document as a process independent of parsing. The Validation APIs are in the new package javax.xml.validation and let developers obtain from a compiled schema a Validator or/and a Validator Handler which are used to validate xml against the given schema. Alternatively, a compiled schema instance could also be passed to any Reader/Parser to validate xml. So there're roughly two ways provided by the new Schema Validation Framework. And besides these two, setting the uncomplied schema source on Reader/Parser is also available due to the issue of backward compatibility. As we can see in the first article and the accompanying example codes listed in the Resources section, the newly introduced Validation Frame improves the performance, effiency and flexibility.

Below are simple code snippets to respectively illustrate how validating xml documents is done in these three ways.

1. Set uncompiled schema (since JAXP 1.2):
private static void saxParseJAXP1_2(String xmlFile, DefaultHandler dh,
String schemaFile) {
try {
SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setNamespaceAware(true);
spf.setValidating(true);
SAXParser sp = spf.newSAXParser();
sp.setProperty(
http://java.sun.com/xml/jaxp/properties/schemaLanguage,
XMLConstants.W3C_XML_SCHEMA_NS_URI);
sp.setProperty(
"
http://java.sun.com/xml/jaxp/properties/schemaSource",
schemaFile);

sp.parse(new File(xmlFile), dh);
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}

2. Set compiled schema instance (since JAXP 1.3, FIX ME HERE)
private static void saxParseSetSchemaJAXP1_3(String xmlFile, DefaultHandler dh,
String schemaFile) {
try {
SchemaFactory sf = SchemaFactory.newInstance(
XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = sf.newSchema(new File(schemaFile));
SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setNamespaceAware(true);
spf.setSchema(schema);
SAXParser sp = spf.newSAXParser();
sp.parse(new File(xmlFile), dh);
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
}
}

3. Validator (since JAXP1.3)
private static void saxParseValidateJAXP1_3(String xmlFile,
ErrorHandler dh, String schemaFile) {
try {
SchemaFactory sf = SchemaFactory.newInstance(
XMLConstants.W3C_XML_SCHEMA_NS_URI);
Validator validator = sf.newSchema(
new File(schemaFile)).newValidator();

validator.setErrorHandler(dh);
validator.validate(new StreamSource(xmlFile));
} catch (Exception e) {
e.printStackTrace();
}

It's noteworthy that the first way and the second way can apply for both DOM source and SAX source, while the third way is usually only used to validate a SAX stream (FIX ME HERE).

Update (20061113):

Basics of using Schema

Be aware of the concept of xml target namespace and "source namespaces". The name defined in a schema are said to belong to its target namespace. Definitions and declarations in a schema can refer to names that may belong to other namespaces. In the fourth article those namespaces are referred to as "source namespaces". And here follows a little colour as to simple type and complex type. An element that doesn't contain attributes or other elements can be defined to be of a simple type, predefined or user-defined, such as string, integer, decimal, time, etc. Elements with attributes and embeded elements must have a complex type. There're a huge amount of details about XML Schema definition that are not covered here but can be found here.

Simple example

A xml instance document:
<?xml version = "1.0" encoding = "utf-8"?>
<SONGS xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
xsi:noNamespaceSchemaLocation='mySong.xsd'>
<SONG genre = "pop">
<TITLE > Hot Cop </TITLE>
<COMPOSER > Jacques Morali
</COMPOSER>
<COMPOSER>Henri Belolo</COMPOSER>
<COMPOSER>Victor Willis</COMPOSER>
<PRODUCER>Jacques Morali</PRODUCER>
<PUBLISHER>PolyGram Records</PUBLISHER>
<LENGTH>6:20</LENGTH>
<YEAR>1978</YEAR>
<ARTIST>Village People</ARTIST>
</SONG>
</SONGS>

The corresponding schema definition:
<?xml version="1.0" encoding="UTF-8" ?>
<xsd:schema xmlns:xsd='http://www.w3.org/2001/XMLSchema'>
<xsd:element name="SONGS">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="SONG" minOccurs='1' maxOccurs='unbounded' />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name="SONG">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="TITLE" type="xsd:string" />
<xsd:element name="COMPOSER" type="xsd:string" maxOccurs='unbounded' />
<xsd:element name="PRODUCER" type="xsd:string" maxOccurs='unbounded' />
<xsd:element name="PUBLISHER" type="xsd:string" maxOccurs='unbounded' />
<xsd:element name="LENGTH" type="xsd:string" />
<xsd:element name="YEAR" type="xsd:gYear" />
<xsd:element name="ARTIST" type="xsd:string" maxOccurs='unbounded' />
</xsd:sequence>
<xsd:attribute name="genre" type="xsd:string" />
</xsd:complexType>
</xsd:element>
</xsd:schema>

Resources:

1. Easy and Efficient XML Processing: Upgrade to JAXP 1.3

2. Java 2 Platform Standard Edition 5.0 API Specification

3. Java 2 Platform Standard Edition 1.4.2 API Specification

4. The basics of using XML Schema to define elements

5. XML Schema Part 0: Primer Second Edition

Saturday, November 11, 2006

Comment on W3C DOM and various implementations in different PL

First I have to confess I'm quite unfamiliar with xml processing. I've only done it once extensively in Delphi due to a project I was involved in.

These days I'm studying tricks and technologies as to xml processing with java. So as I mentioned in a previous post I wrote about the overview on it. In particular, I mentioned the DOM way which is based on DOM, Document Object Model, a standard Object Model of XML maintained by the W3C Consortium. Here I first give some simple details about DOM itself.

For a simple xml document shown below:
<?xml version="1.0" encoding="UTF-8" ?>
<song genre="rock">
<name>My December</name>
<singer>Linkin Park</singer>
</song>

The DOM tree-like structure should be like this (E indicates a element node and T indicates a text node):
E:song
|--T:characters(whitespace)
|--E:name---T:characters(My December)
|--T:characters(whitespace)
|--E:singer---T:characters( Linkin Park)
|--T:characters(whitespace)

As depicted above, the root node song has five child nodes among wich two have their child nodes. I wanna emphasize the text node here. Before I start going deep into xml processing these days, I even don't know the existence of so-called text nodes. Because in Delphi, they're just ignored. So the DOM tree-like structure is like this:
E:song---E:name
|--E:singer

Only an element is called a node. I think this is quite intuitive, though definitely the official DOM structure is more theoretically complete. But with the white space and other text nodes the process of xml parsing is complicated. The example is worth a thousand words. Let's see how the simple xml document is parsed defferently in Java and Delphi:

In Java (exceptions are left unhandled):
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new File(<xml file name>);
Element root = doc.getDocumentElement();
NodeList list = root.getChildNodes();
// A simple helper method
printStr("name: " + list.item(1).getFirstChild().getNodeValue());
printStr("singer: " + list.item(3).getFirstChild().getNodeValue());

In Delphi:
var
XMLDoc: IXMLDocument;
XMLNode, CtlNode: IXMLNode;
i, index: integer;
str: string;
begin
str = '';
XMLDoc := TXMLDocument.Create(nil);
XMLNode = XMLDoc.ChildNodes.Nodes['song'];
for i := 0 to XMLNode.ChildNodes.Count - 1 do
begin
str := str + XMLNode.ChildNodes.Nodes[i].NodeValue;
end;
end;

Apparently, the Java version is more awkward and will be more complicated provided the xml document is very long. This is because the element nodes can't be sequentially accessed due to the existence of white space text nodes. In contrast, with text nodes ignored, the Delphi version is quite clear and adaptive to document of any size. As I know, besides Java many implementations (at least Javascript, as I know) of DOM are aware of the text nodes, especially the white space text nodes.

So various kinds of helper method are used by developers to improve this awkward situation.
Method 1:
private Node getNodeByName(final NodeList list, final String name) {

for (int i = 0; i < list.getLength(); i++) {

final Node node = list.item(i);

// to pass the white space node

if (name.equals(node.getNodeName())) {

return node;

}

}

return null; // not found

}

Method 2:

...

NodeList list = e.getChildNodes();

for (int i = 0; i < list.getLength(); i++) {

Node n = list.item(i);

  // to pass the white space node

if (!(n instanceof Element)) { continue; }

nsFixup((Element) n, map, false);

}

And I believe there must be more.

I really don't see any benifits of keeping the awareness of text nodes until now. But If you know, tell me please.

Friday, November 10, 2006

Unicode, UTF等字符编码摘记

两个遵守相同规范的编码标准
unicode 3.0(最新版本5.0)和ISO-10646。从Unicode2.0开始,unicode采用了与ISO 10646-1相同的字库和字码。ISO-10646也叫做UCS (Universal Character Set)。

几个术语:
UTF: Unicode/UCS Transformation Format

UTF-16: 16位编码。基本上是Unicode的双字节编码,额外附加空间用于非常用字符和未来扩充需要(一般很少用到),常用字符在0-0xFFFF,包括扩充空 间的范围是0-0×10FFFF,所以最长编码位数是21位。关于扩充空间,在ISO-10646有相对应的定义。由于它是一个变长码,与CPU字序有关 (例如“汉”字的Unicode编码是6C49。那么写到文件里时,究竟是将6C写在前面,还是将49写在前面?如果将6C写在前面,就是big endian。如果将49写在前面,就是little endian。),最节省空间,所以常作为网络传输的外码。UTF-16是Unicode的preferred encoding。

UTF-8: 由于UTF-16直接就是Unicode编码,没有变换,包含了0×00在编码内,这个在操作系统内(C语言)中有特殊意义(和ASCII不兼容?),会 引起问题,所以有时候需要采用UTF-8编码对Unicode的直接编码做一些变换。UTF-8对ASCII不作变换,进行8位编码,其他字符做变长编 码,每个字符1-3个字节。与CPU字序无关,可以在不同平台之间交流。

UCS-2: 与UTF-16基本一样。

UCS-4: 4字节编码,目前是在UCS-2前加上2个全零的byte。

内码:内码是指操作系统内部的字符编码。早期操作系统的内码是与语言相关的.现在的Windows在内部统一使 用Unicode,然后用代码页适应各种语言,“内码”的概念就比较模糊了。微软一般将缺省代码页指定的编码说成是内码,在特殊的场合也会说自己的内码是 Unicode,例如在GB18030问题的处理上。

字符集:charcterset 字符的集合,例如Unicode是一种字符集。

字符编码:Encoding 如何将二进制数据识别为字符的编码,一种编码表示的字符是有限的,常常一种编码设计为表示一种字符集。例如UTF-8,UTF-16是两种字符编码,它们能够表示Unicode字符集的所有字符。

中国国标编码:

GB 13000: 完全等同于ISO 10646-1/Unicode 2.1, 今后也将随ISO 10646/Unicode的标准更改而同步更改.

GBK: 对GB2312的扩充, 以容纳GB2312字符集范围以外的Unicode 2.1的统一汉字部分, 并且增加了部分unicode中没有的字符.

GB
18030-2000: 基于GB 13000, 作为Unicode 3.0的GBK扩展版本, 覆盖了所有unicode编码,
地位等同于UTF-8, UTF-16, 是一种unicode编码形式. 变长编码, 用单字节/双字节/4字节对字符编码.
GB18030向下兼容GB2312/GBK.
GB 18030是中国所有非手持/嵌入式计算机系统的强制实施标准.

Update (20061114): ISO 8859-1:

ISO/IEC 8859-1,又称Latin-1或“西欧语言”,是国际标准化组织ISO/IEC 8859的第一个8位字符集。它以ASCII为基础,在空置的0xA0-0xFF的范围内,加入192个字母及符号,借以供使用变音符号拉丁字母语言使用。

其他:

UCS只是规定如何编码,并没有规定如何传输、保存这个编码。例如“汉”字的UCS编码是6C49,我可
以用4个ascii数字来传输、保存这个编码;也可以用utf-8编码:3个连续的字节E6 B1
89来表示它。关键在于通信双方都要认可。UTF-8、UTF-7、UTF-16都是被广泛接受的方案。UTF-8的一个特别的好处是它与ISO-
8859-1完全兼容。UTF是“UCS Transformation Format”的缩写。

所谓代码页(code page)就是针对一种语言文字的字符编码。例如GBK的code page是CP936,BIG5的code page是CP950,GB2312的code page是CP20936。

  Windows中有缺省代码页的概念,即缺省用什么编码来解释字符。例如Windows的记事本打开了一个文本文件,里面的内容是字节流:BA、BA、D7、D6。Windows应该去怎么解释它呢?

 
 是按照Unicode编码解释、还是按照GBK解释、还是按照BIG5解释,还是按照ISO8859-1去解释?如果按GBK去解释,就会得到“汉字”
两个字。按照其它编码解释,可能找不到对应的字符,也可能找到错误的字符。所谓“错误”是指与文本作者的本意不符,这时就产生了乱码。

  答案是Windows按照当前的缺省代码页去解释文本文件里的字节流。缺省代码页可以通过控制面板的区域选项设置。记事本的另存为中有一项ANSI,其实就是按照缺省代码页的编码方法保存。

  Windows的内码是Unicode,它在技术上可以同时支持多个代码页。只要文件能说明自己使用什么编码,用户又安装了对应的代码页,Windows就能正确显示,例如在HTML文件中就可以指定charset。

趣事:

  “endian”这个词出自《格列佛游记》。小人国的内战就源于吃鸡蛋时是究竟从大头(Big-Endian)敲开还是从小头(Little-Endian)敲开,由此曾发生过六次叛乱,一个皇帝送了命,另一个丢了王位。

  我们一般将endian翻译成“字节序”,将big endian和little endian称作“大尾”和“小尾”。

Resources:
1. 对字符编码与Unicode,ISO 10646,UCS,UTF8,UTF16,GBK,GB2312的理解 国际化支持 USENIX.CN - powered by Sinoprise Technology Lab (有比较详细的介绍)

2. 无废话XML

3. 简要解释UCS、UTF、BMP、BOM等名词

4. 中文编码处理(1) -- 编码与字符集

5. ISO 8859-1

注:文中参考的不全是官方或权威资料,难免有错误,仅作学习用,本人对文中错误不负任何责任,并欢迎改正错误。

XML Processing With Java Overview

There’re basically two ways of processing xml with Java. One is the DOM way, that is tree-structure based way, and the other way is the SAX way that is event-driven stream based way. However, the bad thing is that there’re pros and cons for both ways, and the good thing is that we can use one of them in different situation to meet different needs.

The DOM way
DOM, Document Object Model, is the standard specification released by w3c consortium. It is a tree like structure which represents the structure of a XML document and is what what we often first parse a XML document into before we do any manipulation to it. It is quite intuitive for most programmers to manipulate. With it we can easily get what we want from a XML document, element names, attributes, values of elements, etc. But the price to pay is that before any manipulation we have to read the entire xml document and parse it into a DOM object during which everything must be stored in memory. This is inefficient and sometimes impossible, especially for extremely large documents. By the way, besides DOM, there’re some unofficially object models in use, such as JDOM, XOM, DOM4J and so on.

The SAX way

It is a stream like and event-based way. We can processing a document while we’re reading it. It is a very flexible but more complicated way than the DOM way. It is flexible because the SAX stream can be redirected to other process or document. It is complicated because the event handler (usually the DefaultHandler or ContentHandler) must be first written and then registered with the Parser (alternatively reader, or something like that). And there’re other disadvantages. Because it is processed like a stream, it is impossible to make changes to it or move backward to the data stream. But it is possilbe to make some simple structure (not the data itself) changes by using xsl transformation. In general, the SAX way is much faster than the DOM way.

What make up the “XML Processing”

So-called XML processing or sometimes called parsing consists of several aspects or procedures.
Validation
Data Modification and Retrieve
Transformation
Data Query

Examples

Higher Level Application
What are mentioned above are only those basic aspects about xml processing. Seen from a more global perspective, there’re many other higher level application of xml or xml processing.

Published temporarily and remains further refinement.

Wednesday, July 26, 2006

Always prototyping your new features that are to be added

Wow, this new feature is awesome. It’s super cool!. I woner if we could add it our product. Ok, I’m gonna do it right now.

But wait…

First develop a prototype with this new feature added. Then see how this feature functions within your product, how it collaborate with other features. If these all are just fine then change your artifacts to add it to your product extensively and “aggressively”. This would be safer.

Never do so-called feasibility analysis just in your mind or just on a paper with your pen. You’re cheating yourself. In this way you would be very excited with your analysis results at most time and say to your colleagues those words written in the first paragraph of this post.

We experienced this cheating-ourself process. And it turns out to be that this new feature actually doesn’t function as we expected but meantime we have to keep those codes modified during adding this new feature for not wasting more time removing them and bringing new risks even though we know they’re useless. This may be a lesson you can learn from.

It is noteworthy that the beta policy of most web2.0 applications goes to extremes in prototyping new features and that developping iteratively is the essence of contemporary software development.

One post of my development diary series…to be continued.


Technorati : , , , ,
Del.icio.us : , , , ,

Help me with automating KDE environment settings

I have pasted this on some tech forums, but no one seems to be willing to help me. So I would like to also paste it here. Any help or even any response making no sence would be very appreciated.

我刚接触Linux不久,现在碰到要做这个,请各位高手支支招。

主要分为几个部分:
1. Desktop shortcut, background, etc;
2. Kicker (Panel), start menu, custom menu, etc; and
3. Konqueror, Konsole, etc.

由于KDE采用 Cascading Configuration Files的结构 (至少KDE3.1+是这样),有针对所有用户的设置和用户自定义设置,主要的配置文件分别在/usr/share/config, ~/.kde/share/config。我现在采取的策略主要是写了一个脚本用定制的标准桌面环境的配置文件去覆盖 ~/.kde/share/config下面的配置文件(其实我把整个share文件夹都覆盖了)。

现在问题是其他的都似乎没有什么大问题,但是覆盖的kickerrc(配置上文提到的第二个部分中的kicker)文件没有作用,无论是在 /usr/share/config还是在~/.kde/share/config下面,kickerrc里面的设置没法被应用,而且通过手动去修改 Panel(比如增减Applet)后relog in kde session会用修改的配置覆盖掉~/.kde/share/config/kickerrc。似乎kickerrc只是反映当前的panel的设置而 不是系统根据kickerrc去配置当前的panel。

我在网上查了很多资料没找到原因,不知道这里有没有人知道。

更新:我用的是Redhat 3企业版(update几忘了,等查到再来更新:)。

How to capture key combination press in java

Though it may appear extremely easy to someone who has mastered it, I wanna paste some useful code here both for myself to keep a note and for those who haven’t ever addressed this problem to use as a guideline.

Code snippet (catch “ctrl+shift+`”):

public final static int CTRL_SHIFT_MASK =
KeyEvent.SHIFT_MASK | KeyEvent.CTRL_MASK;

if ((evt.getModifiers() & CTRL_SHIFT_MASK) != 0) {
if (inputEnabled) holdInput(true);

// Press “Ctrl+Shift+`” to toggle view only option.
if ((evt.getModifiers() & CTRL_SHIFT_MASK) == CTRL_SHIFT_MASK &&
keyCode == KeyEvent.VK_BACK_QUOTE) {
boolean viewOnly = !_vc.getConfigManager().isViewOnly();
_vc.getConfigManager().setViewOnly(viewOnly);
if (!viewOnly) {
inputRecorder.setPaused(false);
inputRecorder.setLastLogTime(System.currentTimeMillis());
} else {
inputRecorder.setPaused(true);
}
clearInput();
return;
}

Here key combination restricts to the pattern of modifier key(s) (ctrl, shift, alt) plus ordinary character key or only combination of modifier keys themselves.

The end.

About Me

My photo
I'm finishing my master degree in Software Engineering, Computer Science. I believe and have been following what Forrest Gump's Mam said: you have to do the best with what god gave you.