Thursday, April 3, 2008

Perl by Example: English Dictionary In 22 Lines

[Update #1] This post has been cited from Princeton's WordNet Project as a nice Perl extension. You can find lots of more of information about WordNet here.

[Update #2] Many thanks to my brother Costas, for buying me the ultimate Perl book: "Programming Perl, 3rd edition" by Larry Wall. Thanx bro! :)

WordNet is an online English dictionary from the Princeton University. It has some hundreds of thousands of words and it keeps growing. You have absolute access to the resource and there are interfaces in practically all programming languages. It is not tough to build one from your own, but I ceize the opportunity to make an introduction to a really special programming language: Perl. Of course a crappy blog post cannot introduce a such extensive language as Perl, but additional Perl articles are sure to follow!

For the history, Perl was designed as a so-called 'glue' language for Unix, simply for doing things that with other tools were somewhat difficult. Created in the mid-80's by Larry Wall, in a one-man project as a fast reporting tool, it is now spread to all machines and applications. I have never used Perl for large GUI-demanding projects, but it has been a killer for simple but heavy tasks.

Whatever the objections, you may find Perl great for one simple but rare thing: Perl has culture. Its creator and all people actively involved in Perl's development share a number of values: Freedom, innovation and a unique sense of humor. To explain, I have two Perl books in my library: "Learning Perl" and "Programming Perl", affectionately called by the Perl community as the "Alpaca Book" and the "Camel Book" for the animals on the cover (Camel is the Perl mascot as shown in the figure). For me they are the most easy-read programming books that I have read. Anything else I use as a reference because I find programming books too hard to follow for long. But these differ. The style of writing and the humor that they transmit make you read them in literally one breath.

Anyway, the best way to introduce a language is by example, so here is mine for Perl: To build an English dictionary based on Princeton's WordNet. Believe it or not, all in all it takes 22 lines to do it! You may see the interface with the program in the following video. When you are done you might want to check the source code with some comments for better understanding.


The Perl source code follows:

my $word; #declares a scalar variable.Perl has two basic types: scalars and plurals(lists,hashes)
our $message = "
\nPlease enter a word to look for:\n>";
print $message;
#prints our message
while($word = <>) #reads the word the user searches for
{
chomp($word);
#remove last character (the '\n', new line)
if($word eq "<") { system("cls"); #if user types "<" clear screen
print $message; }
else {
print "Connecting..";
my $url = "http://wordnet.princeton.edu/perl/webwn"."?"."s=".$word;
#WordNet URL
`wget -q -Oindex.html $url`; #download the results
open FILE,"<","index.html"; #open the result web page locally
print "\rOk.Results:";
while(<FILE>)
#for every line in file
{
if(/<\s*b>$word<\/b>[^\(]+\(([^\)]*)\)/)
#match a pattern, don't be scared!
{ print "\n#def#$1"; } #..and print the word definition !
}
close FILE;
#close file
print "\n>";
}
}



What might scare you off is the pattern matching line, especially if you have never come across regular expressions before. The logic is this: In WordNet result page, each word definition comes with original word in bold and the explanation in parentheses. For example if we search for 'developer' it may have the following format:
...developer...(a man who develops)...
...developer...(sometimes referring people creating software)...and so on
What the pattern matching does it to match if the line has the word we search for in bold and then extracts the sentence in parentheses.
When it comes to pattern matching, Perl is the king.

That's all. If you already have a Perl distribution, you can test it yourself. If you don't and you would like to give it try, for Windows there mainly two distributions. The one is from ActiveState and you can download a Windows Perl distribution named ActivePerl here (just click the download button). The other is called Strawberry Perl, it recently came out of beta and you can find it here. I haven't tried but it may well be good.

Needless to say, Perl is free and always will be! So ride the camel and enjoy!