Programming and Scripting :: MyDSL info parser



Quote
The comments are full of odd characters that break things. Using blobs might help, but then not sure of search capabilities.... It is the past 10% of the data that is difficult.

If it's 10%, can those characters be changed? Or would you effectively be reformatting the info files altogether?

Quote
Does 750 items warrant using an sql database? It would be easy moving forward.

It would be a lot easier moving forward.

I think I will take a second look. I have wanted to make an sqlite GUI for DSL. If I could exclude the comments it would be a snap.

The program that I wrote on the server side that generates the html tables is an awk program. Part by position, part by parsing field names. I have been wanting to replace it too. By using sqlite I can drop the awk program for the website tables and have a nice gui for the client side.

I made a variant that might be closer to what you were talking about. It creates two pulldown menus, but I can't say if they might have performance issues on slow machines.

The "Download and Install" button currently just prints the command that would ordinarily be executed. The "Download" button does nothing. I was debating whether or not to have a separate action to simply download the extension to a chosen directory for future use.

Code Sample
#!/bin/murgaLua
-- mydsl info browser, 2008 mikshaw
--
-- Changelog (yyyy/mm/dd)
--   2008/01/14: Fixed display of incorrect duplicate info file
--               Check for trailing spaces in title string
--   2008/01/13: start of project

listfile="/home/dsl/WORK/mydsl_master_list/list.txt"

inputfile=io.open(listfile)
if not inputfile then
 print("can't open "..listfile)
 os.exit(1)
end

data=inputfile:read("*a")
inputfile:close()

w=fltk:Fl_Window(350,400,"MyDSL Info Search")

b_loc=fltk:Fl_Choice(100,10,240,30,"category: ")
b_loc:callback(
function()
 while b_files:size()>1 do b_files:remove(0) end
 b_files:redraw()
   for i=1,info_count do
     if info[i].location==b_loc:text() then
       b_files:add(info[i].title)
     end
   end
 b_files:value(0)
 b_files:do_callback()
end
)

b_files=fltk:Fl_Choice(100,40,240,30,"file: ")
b_files:callback(
function()
 for i=1,info_count do
   if info[i].title == b_files:text() and info[i].location == b_loc:text() then
     info_display_buffer:text(info[i].text)
     break
   end
 end
end
)

info_display=fltk:Fl_Text_Display(10,70,330,280)
--info_display:textfont(fltk.FL_COURIER)
info_display:textfont(fltk.FL_SCREEN)
info_display:textsize(12)
info_display_buffer=fltk:Fl_Text_Buffer()
info_display:buffer(info_display_buffer)

download=fltk:Fl_Button(10,350,120,30,"Download")
install=fltk:Fl_Button(130,350,120,30,"Download and Install")
install:callback(
function()
 print("mydsl-load "..b_files:text().." "..b_loc:text())
end
)

info={}
info_count=0
-- breaks up each info file into a separate string
for s in string.gmatch(data,"(Location:.-\n)Location:") do
 info_count=info_count+1
 info[info_count]={
 location=string.match(s,"Location:%s*(.-)%s*\n"),
 title=string.match(s,"Title:%s*(.-)%s*\n"),
 text=s
 }
 b_loc:add(info[info_count].location)
end

b_loc:value(0)
b_loc:do_callback()
w:resizable(info_display)
w:show()
Fl:run()

There seems to be two issues so far that I don't know how to deal with, one is with the list file itself and one I think is in my script.

There are a few infos in the text file that are run together, such as wvdial.dsl and zsh.uci in testing. This may be the result of a missing newline at the end of the file?

Some infos will not display in this tool, such as japanese_fonts.tar.gz in testing. I'm not sure why but it may be related to a mix of spaces and tabs. I thought %s* covered both of these, though.

EDIT: fixed the second issue. It was caused by trailing spaces that were getting included in the title string.

I think you are likely seeing issues that I have seen in trying to import into sql database.

The info files come from users all over the globle.
Interestingly enough doing a 'file' command on the individual .info files reveals the following:

ASCII text
ASCII English text
ISO-8859 text
ISO-8859 English text
UTF-8 Unicode English text

This mix together with the array of special characters in the comments field, which, by the way, does not even exist for all .info files, has been challenging to say the least.

Not sure how or which murgaLua supports, or even the bash version within DSL.

Next Page...
original here.