5.2. To Import Flashcards From Other Formats

If you already have a collection of words or phrases in an ASCII (UTF-8) file separated by one of the Space, Tab, Semicolon, Colon, or Comma separators, then use Deck->Import from CSV menu entry to import your collection in the deck as described in Section 2.6.

NOTE: One word of caution - Granule expects a valid UTF-8 input file. Always check your input file with any of the UTF-8 compliant (gedit, etc.) text editors.

Figure 5-1. An Example of CSV Input File

The rest of this section is devoted to those who need finer control over parsing the input file format. They must resort to the magic of the scriptiong languages such as Perl.

There are tons of on-line dictionaries and flashcard collections that you may find on the Web. Because Granule's internal data format is so simple, you can easily convert any word list from any reasonable format to Granule.

To illustrate the point, we are going to look at one such script written in Perl. This srcipt converts a list of GRE vocabulary words created by Oleg Smirnov to the Deck format.

The original list of words comes in the following format:

 
q: abacus
a: frame with balls for calculating

q: abate
a: to lessen to subside

q: abdication
a: giving up control authority

     ...
	

A question word (denoted with 'q:') is followed by the line with the answer word/phrase (denoted with 'a:') followed by an empty line. The Perl script is very simple - scan each line and create a Card for each question/answer pair:

 
001 #!/usr/bin/perl
002
003 # You might want to change these to your liking
004 #--------------------------------------------------
005 #
006 $sound_path="/usr/share/WyabdcRealPeopleTTS/";
007 $user_name="";                                   # Your name
008
009 #
010 #--------------------------------------------------
...
015
016 if ($ARGV[1] eq "") {
017   die "\nMissing to-file argument!\n";
018 }
019
020 open(INFILE,"< $ARGV[0]") || die "\nCannot open from-file!";
021 open(OUTFILE,"> $ARGV[1]") || die "\nCannot create to-file!";
022
023 $date = `date`;
024 chop ($date);
025
026 $card_id = `date +%s`;          # Number of seconds since Jan 1 1970
027 chop ($card_id);
028
029 if ($user_name eq "") {
030   $login_name = `whoami`;
031   chop($login_name);
032   open(PASSWD, "/etc/passwd") || die "\nCannot open /etc/passwd";
033   while (<PASSWD>) {
034     chop;
035     ($login,$passd,$uid,$gid,$gcos,$rest) = split(/:/);
036     if ($login eq $login_name) {
037       ($user_name,$rest) = split(/,/,$gcos);
038       last;
039     }
040   }
041   close(PASSWD);
042 }
043
044 # Write out the header
045 #
046 print OUTFILE "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n";
047 print OUTFILE "<!DOCTYPE deck SYSTEM \"/etc/xml/granule/granule.dtd\">\n";
048 print OUTFILE "<deck>\n";
049 print OUTFILE "  <author>", $user_name, "</author>\n";
050 print OUTFILE "  <description>Created on ", $date, "</description>\n";
051 print OUTFILE "  <sound_path>", $sound_path, "</sound_path>\n";
052
053 while(<INFILE>) {
054   chop($_);
055   chop($_);
056   ($tag,$data) = split(':');
057   if ($tag eq "") {
058     next;
059   }
060   $data =~ s/\s+//;             # eat leading whitespace
061   $data =~ s/\s+$//;            # remove extra whitespaces at the end
062
063   if ($tag eq "q") {
064     print OUTFILE  "<card id=\"_", $card_id, "\">\n";
065     $card_id += 1;
066     print OUTFILE "  <front>", $data, "</front>\n";
067   }
068   elsif ($tag eq "a") {
069     print OUTFILE "  <back>", $data, "</back>\n";
070     print OUTFILE "  <back_example></back_example>\n";
071     print OUTFILE "</card>\n";
072   }
073 } # while(INFILE)
074
075 print OUTFILE "</deck>\n";
076
077 close(OUTFILE);
078
079 $time_now=`date +%s`;
080 $time_now -=$card_id;
081
082 # When we convert a batch of cards, we might overrunning Card ID numbers.
083 # To avoid duplicates, wait for the current time to catch up.
084 #
085 if ($time_now < 0) {
086   $time_now *= -1;
087   print "Don't use this script in the next ", $time_now/60.0, " minutes\n";
088 }
089

	

As you can see, there is nothing difficult about the whole conversion process - you can use any scripting language you want. If you have your input data file in a format other then UTF-8, you can always use "iconv" utility that converts encoding of given files from one encoding to another.

You can find the script, convert_gre_1162.pl and the imported Deck file, GRE_1162_en_en.dkf, in the Example section of the Granule's web site.

Good luck!