Granule User's Manual | ||
---|---|---|
Prev | Chapter 5. File Formats And Conversions |
If you already have a collection of words or phrases in an ASCII (UTF-8) file separated by one of the Space, Tab, Semicolon, Colon, or Comma separators, then use Deck->Import from CSV menu entry to import your collection in the deck as described in Section 2.6.
NOTE: One word of caution - Granule expects a valid UTF-8 input file. Always check your input file with any of the UTF-8 compliant (gedit, etc.) text editors.
The rest of this section is devoted to those who need finer control over parsing the input file format. They must resort to the magic of the scriptiong languages such as Perl.
There are tons of on-line dictionaries and flashcard collections that you may find on the Web. Because Granule's internal data format is so simple, you can easily convert any word list from any reasonable format to Granule.
To illustrate the point, we are going to look at one such script written in Perl. This srcipt converts a list of GRE vocabulary words created by Oleg Smirnov to the Deck format.
The original list of words comes in the following format:
q: abacus a: frame with balls for calculating q: abate a: to lessen to subside q: abdication a: giving up control authority ... |
A question word (denoted with 'q:') is followed by the line with the answer word/phrase (denoted with 'a:') followed by an empty line. The Perl script is very simple - scan each line and create a Card for each question/answer pair:
001 #!/usr/bin/perl 002 003 # You might want to change these to your liking 004 #-------------------------------------------------- 005 # 006 $sound_path="/usr/share/WyabdcRealPeopleTTS/"; 007 $user_name=""; # Your name 008 009 # 010 #-------------------------------------------------- ... 015 016 if ($ARGV[1] eq "") { 017 die "\nMissing to-file argument!\n"; 018 } 019 020 open(INFILE,"< $ARGV[0]") || die "\nCannot open from-file!"; 021 open(OUTFILE,"> $ARGV[1]") || die "\nCannot create to-file!"; 022 023 $date = `date`; 024 chop ($date); 025 026 $card_id = `date +%s`; # Number of seconds since Jan 1 1970 027 chop ($card_id); 028 029 if ($user_name eq "") { 030 $login_name = `whoami`; 031 chop($login_name); 032 open(PASSWD, "/etc/passwd") || die "\nCannot open /etc/passwd"; 033 while (<PASSWD>) { 034 chop; 035 ($login,$passd,$uid,$gid,$gcos,$rest) = split(/:/); 036 if ($login eq $login_name) { 037 ($user_name,$rest) = split(/,/,$gcos); 038 last; 039 } 040 } 041 close(PASSWD); 042 } 043 044 # Write out the header 045 # 046 print OUTFILE "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"; 047 print OUTFILE "<!DOCTYPE deck SYSTEM \"/etc/xml/granule/granule.dtd\">\n"; 048 print OUTFILE "<deck>\n"; 049 print OUTFILE " <author>", $user_name, "</author>\n"; 050 print OUTFILE " <description>Created on ", $date, "</description>\n"; 051 print OUTFILE " <sound_path>", $sound_path, "</sound_path>\n"; 052 053 while(<INFILE>) { 054 chop($_); 055 chop($_); 056 ($tag,$data) = split(':'); 057 if ($tag eq "") { 058 next; 059 } 060 $data =~ s/\s+//; # eat leading whitespace 061 $data =~ s/\s+$//; # remove extra whitespaces at the end 062 063 if ($tag eq "q") { 064 print OUTFILE "<card id=\"_", $card_id, "\">\n"; 065 $card_id += 1; 066 print OUTFILE " <front>", $data, "</front>\n"; 067 } 068 elsif ($tag eq "a") { 069 print OUTFILE " <back>", $data, "</back>\n"; 070 print OUTFILE " <back_example></back_example>\n"; 071 print OUTFILE "</card>\n"; 072 } 073 } # while(INFILE) 074 075 print OUTFILE "</deck>\n"; 076 077 close(OUTFILE); 078 079 $time_now=`date +%s`; 080 $time_now -=$card_id; 081 082 # When we convert a batch of cards, we might overrunning Card ID numbers. 083 # To avoid duplicates, wait for the current time to catch up. 084 # 085 if ($time_now < 0) { 086 $time_now *= -1; 087 print "Don't use this script in the next ", $time_now/60.0, " minutes\n"; 088 } 089 |
Line 001 declares the script as to be interpreted by Perl.
Line 006 defines the sound dictionary path. If you don't have one installed, leave it like that.
Line 007 defines the creator's name. If left empty, scrip will try to figure out who you are by your login name.
Lines 016-018 test to see if the output file name has been given on the command line.
Line 020 opens input data file.
Line 021 opens output file.
Lines 023-024 gets current time and chops off its newline character.
Line 026-027 creates the first ID. As mentioned, each ID is Card's creation timestamp in UNIX seconds.
Lines 029 through 042 try to figure out user's real name based on the login name and an entry into /etc/passwd file.
Lines 046 through 051 write out the header information. User name, creation date, and sound dictionary path values are used.
Line 053 starts reading input file, line at a time.
Lines 054-055 get rid of newline characters and the end of the line.
Lines 056-059 split the line into two tokens - characters before and after ':' separator (tag and data). If line is empty, we scipt to read the next line.
Line 060 removes leading white spaces, and line 061 removes extra white spaces from the data side of the line.
Lines 063-067 test to see if the tag is 'q' (question) and if so, start the Card's definition block. On line 065, we increase the card_id for the next Card to be one second apart.
Lines 068-72 test to see if the tag is 'a' (answer) and if so, finish the Card's block by writing the answer, an empty example field and the closure tag.
Line 077 closes the output file.
Lines 079 through 088 deal with the fact that our last value of card_id might be well into the future if we convert a whole bunch of cards. In case of our example file, there are 1162 cards to convert and Perl does it in a blink of an eye - we have to pause and wait for about 20 minutes before we can create new Cards.
As you can see, there is nothing difficult about the whole conversion process - you can use any scripting language you want. If you have your input data file in a format other then UTF-8, you can always use "iconv" utility that converts encoding of given files from one encoding to another.
You can find the script, convert_gre_1162.pl and the imported Deck file, GRE_1162_en_en.dkf, in the Example section of the Granule's web site.
Good luck!