... has been quite effective. I already solved some easy to intermediate problems with less code. I am far from mastering the language, but it is one of the few times I saw results this quickly

Learning path

featured image

I need to do some data mining and I was tired to have some sed, awk/ Unix utilities and some PHP. I wanted one solution. And although I have Python on my list to learn (as well as Ruby) by coincidence I stumbled upon this old Perl 4 tutorial from the early 1990's! It really wetted my appetite as it had some good exercises to get a hang of the basics. After finishing this short tutorial, I wanted something more detailed, so I went to the book shop and bought Learning Perl, 6th Edition (the 6th edition just came out in June 2011, this book is also called "the LLama"). This title is absolutely fantastic, quite thorough. It also has some good exercises that you really should do to rehash what you have read. I really understand most concepts now. After this one you can continue with Intermediate Perl and Mastering Perl of the same authors.

What I like about this language

Perl is very robust, there are few to no things you cannot do with it. It has strong support for Regular Expressions - this is particularly of interest for what I am going to use the language: text parsing (but I think Python is as powerful, I can probably tell in some months, because it is next on my agenda).

The shortcuts in syntax take a bit of time to get used to. There are special variables to save code. In a loop $_ automatically takes the iterator value and you don't need to mention it. See the next code example:

   #!/usr/bin/perl
   	use strict; 
   	open POST, '<', 'blogpost.txt';
   	while(< POST >){   #leave spaces out here! WP does not display this right
   		chomp; # take new line char off
   		print; # perl knows you want to print $_, the internal loop variable  
   	}
   	close POST;

Another example is when matching a regular expression:

   instead of
   	if ($sentence =~ /under/){
   	
   	} 
   you can write
   	if (/under/){
   	
   	} 

Compact syntax

I like the distinction in variable names: $ for a scalar, @ for a list (array) and % for a hash. You can assign a list to a @list and you get the complete list, but if you assign it to a $scalar you get the number of list items. There are many examples where you save typing. The language is well designed (thanks Larry)

Counting words in a text

An example how easy it is to get a hash of words found with their frequencies in this post (top 10):

   	#!/usr/bin/perl
   
   	use strict; 
   
   	my %words = (); # declares the hash in which we are going to store the words
   
   	open OUTPUT, '<', '2011.08.13_first_week_perl.txt'; #this post in txt file
   
   	while(< OUTPUT >){   #leave spaces out here! WP does not display this right
   		chomp; # get \n new line character off
   	
   		# break each line down in separate words
   		#split uses per default a space and works on $_
   		foreach my $word (split) {    
   			# store the word as key, and add +1 for each occurrence 
   	     	$words{$word}++;  
   	  	}
   	}
   
   	close OUTPUT;
   
   	# Output hash values in descending order 
   	foreach my $w ( sort { $words{$b} <=> $words{$a} } keys %words) {
   	    printf "%-10s %5d\n", $w, $words{$w} if $words{$w} > 10;
   	}

Outputs:

   	to            24
   	I             20
   	the           20
   	a             19
   	you           17
   	in            16
   	is            16
   	and           13
   	of            12
   	this          11

Another example I gave in my previous blogpost where I converted text into html (based upon some easy rules). I use it now as I write this post. It is an example of the many things you can use Perl to get things done and save time.

I didn't even experiment with Perl Modules and I haven't mentioned a lot of cool stuff. Read Learning Perl! More in another blog post I guess...

Conclusion:

  • Perl is a powerful and versatile language. You can use it for almost anything.
  • If you want to parse text (using Regex) Perl is a very good candidate.
  • It has a compact syntax which you need to get used to a bit but is not hard to learn. You will write less code eventually.
  • I enjoy more writing code in Perl than in PHP (might also be the "wow this is new/cool" effect, call it love at first sight)
  • There are great resources - the mentioned O'Reilly books are highly recommended. When you start googling for solutions you find a very active community.
  • For Regex Mastering Regular Expressions is a great title to learn in parallel.


Bob Belderbos

Software Developer, Pythonista, Data Geek, Student of Life. About me