iB::Topic::Recursive-strip C comments

	Damn Small Linux :: Damn Small Linux Board The DSL Forums

» Welcome Guest
[ Log In :: Register ]

Damn Small Linux Board » Damn Small Linux » DSL Tips and Tricks » Recursive-strip C comments

		Mini-ITX Boards Sale, Fanless BareBones Mini-ITX, Bootable 1G DSL USBs, 533MHz Fanless PC <-- SALE $200 each!
		Get The Official Damn Small Linux Book. DSL Market , Great VPS hosting provided by Tektonic

Pages: (5) </ [1] 2 3 4 5 >/

[ Track this topic :: Email this topic :: Print this topic ]

Topic: Recursive-strip C comments

< Next Oldest | Next Newest >

stupid_idiot Offline

Group: Members
Posts: 344
Joined: Oct. 2006

Posted: Dec. 07 2007,12:41

Big Fat Warning:
Hi all,
None of the methods described here actually work.
I haven't found any workable solution yet.
Everything I've posted are just preliminary attempts.
Please do not use ANY of these examples in their unmodified form on any source code!!!
Problems:
-- No reliable way to preserve directives ("#define", "#ifdef", "#endif", etc).
-- "Missing semicolon" compilation errors after deleting comment blocks (the comment blocks were acting the role of semicolons by signifying the end of a function).
-- Most probably, there are many, many more problems which I haven't ran into yet.

I am a newbie who is very new to sed and regular expressions, and:
I am not even sure sed is the proper tool for this purpose (stripping source code).
I am also a total Perl newbie (only just started reading the 'Llama Book' AKA 'Learning Perl, Third Edition' from O'Reilly).
(I wholeheartedly recommend the 'Llama Book' to anyone who is new to Perl -- It is very well written!)
Speaking as a total Perl newbie:
IMHO, I think someone who is versed in Perl may be able to produce a more ideal solution in Perl.

Code Sample

find ./ -name "*.h" | \
while read i; do \
cpp -P -fpreprocessed "$i" > "$i.tmp" \
&& mv "$i.tmp" "$i"; done
find ./ -name "*.h" | \
while read i; do \
sed -i '/^ *$/d' "$i"; done

This will recursively strip all comments and empty lines from any C headers in the current directory. This would help reduce the size of any '-dev' extensions.
Any improvements are most welcome!
Thanks.

Explanation:

Code Sample

cpp -P -fpreprocessed "$i" > "$i.tmp" && mv "$i.tmp" "$i"; done

1. Process all '.h' files with 'cpp' and overwrite original files with processed files.

Code Sample

sed -i '/^ *$/d' "$i"

2. '/^ *$/d' -- Remove lines that: begin with any number of spaces and end after any number of spaces (i.e. lines that contain only spaces).

Possibly useful also:
1. 'whitespace_stripper.sh'

Code Sample

for i in "$@";
do sed -i '/^ *$/d' "$i"; done

e.g. 'whitespace_stripper.sh file1 file2 file3 file4 [...]'
2. 'stripsh' (strip shell scripts)

Code Sample

for i in "$@";
do sed -i -e 's/\t*#.*//g' \
-e 's/\ *#.*//g' \
-e '/^#/d' \
-e '/^\t*#/d' \
-e '/^ *#/d' \
-e '/^ *$/d' \
"$i"; done

e.g. 'stripsh file1 file2 file3 file4 [...]'
NOTE: This script has a problem -- It deletes the first line of any script; for example:

Code Sample

#!/bin/sh

You have to put the line back again after running the script.
Question for everyone: How do we make 'sed' ignore lines that begin with "#!"?
Thank you very much!

Explanation of above script:
's/\t*#.*//g' -- (Substitution) Pattern: Begins with any number of TABs ("\t"), followed by a "#", followed by any number of any character (".*"). Replacement: Null (i.e. deletes the relevant part of matching lines -- does NOT mean deleting the entire line).

'/^\t*#/d' -- (Deletion) Delete lines that: begin with any number of TABs, followed by a "#" (i.e. bash and perl comments).

'/^ *#/d' -- (Deletion) Delete lines that: begin with any number of SPACEs, followed by a "#" (i.e. bash and perl comments).

'/^ *$/d' -- (Deletion) Delete lines that: begin with any number of SPACEs, followed by an end-of-line ("$") -- i.e. delete empty lines.

curaga

Group: Members
Posts: 2163
Joined: Feb. 2007

Posted: Dec. 07 2007,18:49

well, find uses the current dir as a default, so the first command could only be

find -name "*.h" |

Nice anyway

How about adding a help function, and then calling this as remove-comments.sh?

There's also this issue: if the file has bash-style comments, aka lines starting with #, cpp will bail out and the .tmp file will be left there..

Here's my go:

Quote

#!/bin/sh
ext=h
case $1 in
-h* | --h*) cat << EOF
Use $0 in the top directory of your sources to strip all headers of comments.

$0 -ext sh operates on .sh files instead of .h
EOF
;;
-ext) ext=$2;; esac

find -name "*.$ext" | \
while read i; do \
cpp -P -fpreprocessed "$i" > "$i.tmp"
mv "$i.tmp" "$i"; done
find -name "*.$ext" | \
while read i; do \
sed -i -e '/^ *$/d' -e '/^# /d' "$i"; done

--------------
There's no such thing as life. Those mean little jocks invented it ;)
-
Windows is not a virus. A virus does something!

mikshaw

Group: Members
Posts: 4856
Joined: July 2004

Posted: Dec. 08 2007,01:54

I don't really see a whole lot of use for removing comments from headers, unless you see headers as being nothing more than dependencies for compiling. If you are a software developer, those comments can be very useful.

The final product can easily be reduced with the strip command.

--------------
http://www.tldp.org/LDP/intro-linux/html/index.html

jpeters

Group: Members
Posts: 804
Joined: April 2006

Posted: Dec. 08 2007,04:00

One difference, stupid's version doesn't remove lines beginning with '#!' (maintaining the space after # request).

Edit: I guess it needs line 14 joined to line 15 "\ && ", then it works. (as long as everyone places spaces in their comments after the '#')

Interesting code (although I'd agree with mikshaw in favor of comments and spaces)