Feeds:
Posts
Comments

Archive for May, 2013

大多數的編程員,讀書時只學懂procedural programming,會用loop來處理隊列(例如下載多個網頁的隊列)對於像NodeJS這類Lambda演算的電腦語言,卻束手無策。要用NodeJS寫一個,等一個網頁下載完再下載另一個的程式,若不懂得用Recrusion,便寫不成了。

For many programming, as their education were mainly on procedural programming. They used to write a while loop to download a sequence of website. However, writing program in NodeJS, as a programming language in Lambda calculus, we need to understand how to process a series of tasks using recursion. It’s near impossible if you don’t know how to use recursion to call itself one by one until all the tasks have finished.

以下是一個不斷下載的例子,這個例子用NodeJS試圖下載數千個網頁,一個跟著一個。
The following is an example of how to use recursion to download thousands of webpages one by one using NodeJS.

var https =require('http-get');
function getPage(page){
	url="http://localhost/test.php?page="+page;
	req=https.get({url:url},'/tmp/temp'+page+'.html',function(err,result){
		if(err){
			console.log(err);
		} else {
			if(page<3000){
				page++;
				getPage(p,d);
			}
		}
	});
}
getPage(1,0);
Advertisements

Read Full Post »

我在網上看到有文章說可以很簡單的使用Java的Input Stream來開啟Big5 HKSCS編碼的網頁,然後直接轉成UTF-8。
I’ve seen some articles in internet saying that it’s quite easy to use Java’s Input Stream to open a webpage encrypted in Big5 HKSCS and convert it directly to UTF-8.

我將那些網頁介紹的方法,寫了一個到香港政府某網頁的程式,試試看轉出的內容是否UTF-8。
According to the method introduced, I’ve written a java program to test if it can download a Hong Kong Government’s webpage and display the content in UTF-8.

這個程式的內容如下:
The source code is as following:

import java.net.URL;
import java.net.URLConnection;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.BufferedReader;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Building{
	public static void main(String args[]) throws Exception{
		URL bdgSite=new URL("https://bmis.buildingmgt.gov.hk/chi/building.php?count=0&ordfield=&district_id=0");
		StringBuffer s=new StringBuffer();
		String inputLine;
		URLConnection fc=bdgSite.openConnection();
		BufferedReader in=new BufferedReader(new InputStreamReader(fc.getInputStream(),"Big5_HKSCS"));
		while((inputLine=in.readLine())!=null)
			s.append(inputLine);
		in.close();
		Pattern p1=Pattern.compile("<option[^>]+>([^<]+)</option>");
		Matcher m1=p1.matcher(s);
		while(m1.find()){
			System.out.println(m1.group(1));
		}
	}
}

在程式中,我試圖將十八區的名稱找下來。(尤其是深水埗的「埗」字,這個若用普通的Big5編碼是看不到的。)
In the program, I tried to use regular expression to find the 18 regions in Chinese, one of the word “埗” cannot be processed properly if using Big5 encryption.

我在mac機的Terminal用UTF-8輸出,看到亂碼,用BIG 5 HKSCS輸出,看到正常。這表示,Java只能正常地讀取網頁,卻沒有將網頁的內容轉成Unicode或UTF-8。

I use mac os and set the output of terminal into UTF-8. I found the output is messy and is in wrong encoding. Then I switch the output of terminal to BIG 5 HKSCS, I found the output become normal. This shows that Java can process the encoding of the web page properly but, it doesn’t convert it into Unicode or UTF-8 for processing. Thus, the output is still in BIG 5 HKSCS.

我將這程式安裝在Ubuntu 13.04, JDK 1.7.0_21的環境運行卻發現正常,最後,我找到了Mac OS的Java 設定中,Default file encoding 是並不是UTF-8而很可能是ISO-8859-1。而要正常運行,必需要加入參數 -Dfile.encoding=UTF-8

I’ve tried to install this program in one of my Ubuntu desktop 13.04 with JDK 1.7.0_21. The program runs properly. Finally, I found out a fact that there is a setting “-Dfile.encoding” which affect Mac OS to output the result properly. In order to fix the problem, I need to add an option “-Dfile.encoding=UTF-8”:

java -Dfile.encoding=UTF-8 Bulding

Read Full Post »

最近工作桌的那部電從Ubuntu 12.10升級到Ubuntu 13.04突然發現無法使用phpmyadmin。

Recently, I’ve upgraded my desktop in my working place from Ubuntu 12.10 to Ubuntu 13.04. After installation had finished, I found phpmyadmin couldnot operate properly.

從WebBroswer自到代碼500。於是我查看Apache 2 log:

There was 500 server error found in my webbrowser. So, I decided to check the Apache2 log:

cat /var/log/apache2/error.log

我看到:
From the log, I found the followings:

[Mon May 20 10:23:07 2013] [error] [client 127.0.0.1] PHP Stack trace:
[Mon May 20 10:23:07 2013] [error] [client 127.0.0.1] PHP   1. {main}() /usr/share/phpmyadmin/index.php:0
[Mon May 20 10:23:07 2013] [error] [client 127.0.0.1] PHP   2. require_once() /usr/share/phpmyadmin/index.php:13
[Mon May 20 10:23:07 2013] [error] [client 127.0.0.1] PHP   3. require() /usr/share/phpmyadmin/libraries/common.inc.php:614
[Mon May 20 10:23:26 2013] [error] [client 127.0.0.1] PHP Fatal error:  require_once(): Failed opening required './libraries/php-gettext/gettext.inc' (include_path='.') in /usr/share/phpmyadmin/libraries/select_lang.lib.php on line 370

看似是找不到gettext library。經網上查証,只要更改phpmyadmin所提供的apache.conf,在php_admin_value一行,末端加入:/usr/share/php/php-gettext/
即:

It seemed to be an error when the web page tried to look up the gettext library. I found a solution from the internet that requires me to add /usr/share/php/php-gettext/ to the line of php_admin_value in my apache.conf (the one that provided by phpmyadmin). The updated conf file should look like:

<Directory /usr/share/phpmyadmin>
	Options FollowSymLinks
	DirectoryIndex index.php

	<IfModule mod_php5.c>
		AddType application/x-httpd-php .php

		php_flag magic_quotes_gpc Off
		php_flag track_vars On
		php_flag register_globals Off
		php_admin_flag allow_url_fopen Off
		php_value include_path .
		php_admin_value upload_tmp_dir /var/lib/phpmyadmin/tmp
		php_admin_value open_basedir /usr/share/phpmyadmin/:/etc/phpmyadmin/:/var/lib/phpmyadmin/:/usr/share/php/php-gettext/
	</IfModule>
</Directory>

再重啟apache2:
After I had restarted my apache 2:

sudo /etc/init.d/apache2 restart

Wow.. phpmyadmin became normal again!

Read Full Post »