一、简单统计命令 find + wc
用法一: 查找当前目录下的python文件总行数
$ find . -name '*.c' | xargs wc -l
输出如下:
...
592 ./lzma920/C/Util/SfxSetup/SfxSetup.c
88 ./lzma920/C/Xz.c
33 ./lzma920/C/XzCrc64.c
875 ./lzma920/C/XzDec.c
497 ./lzma920/C/XzEnc.c
306 ./lzma920/C/XzIn.c
30884 总用量
改进一下,只显示代码行:
$ find . -name '*.c' -print0 | xargs -0 cat | wc -l
输出如下:
30884
还有’*.h’文件呢,难道在来一遍?多个类型文件呢?再改进如下:
$ find . \( -name '*.h' -o -name '*.c' \) -print0 | xargs -0 cat | wc -l
或者,更直观的多类型文件统计:
$ find . '*.[h|c|cpp|cc|hpp]' -print0 | xargs -0 cat | wc -l
输出:
74575
我去,原来头文件还有这么多工作量啊?!
空行不能算代码量
注释也不能算代码量(腹诽:为毛不算,程序员的注释是特么写小说的吗?那可是维护代码的财富,重要性不亚于代码本身)
还是不够专业,于是下面的工具出场:
二、cloc、SLOCCount
专业的工具来了。
cloc, SLOCCount,介绍什么的略过,自己看文档。下面拿cloc为例看一下工具的效果。
- 安装
$ sudo apt-get install cloc
- 使用
统计7zip源码根目录下的代码量:
$ cloc .
结果:
1512 text files.
1179 unique files.
223 files ignored.
http://cloc.sourceforge.net v 1.60 T=6.76 s (156.1 files/s, 30121.9 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
C++ 447 15589 6199 111225
C/C++ Header 475 6759 2259 26356
C 56 2427 722 19067
C# 20 470 229 3846
make 32 470 0 3715
Java 14 428 22 3077
Assembly 4 110 10 457
MSBuild scripts 1 0 0 90
Teamcenter def 6 6 0 60
-------------------------------------------------------------------------------
SUM: 1055 26259 9441 167893
-------------------------------------------------------------------------------
很酷有没有,但sloccount有更酷的功能,它能统计代码量和开发效率,不知道它根据什么统计的,可能是git或者svn的提交记录,但若是没有版本库只有源码呢,时间有限没有尝试也没有看官方文档,但看结果:
Creating filelist for code
Creating filelist for lzma920
Have a non-directory at the top, so creating directory top_dir
Adding /media/joshua/My_Resource/Research/Open_Source/7-zip/./lzma920.tar.bz2 to top_dir
Categorizing files.
Finding a working MD5 command....
Found a working MD5 command.
Warning: in lzma920, number of duplicates=255
Computing results.
SLOC Directory SLOC-by-Language (Sorted)
131716 code cpp=114807,ansic=16460,asm=449
32960 lzma920 cpp=20380,ansic=5657,cs=3846,java=3077
0 top_dir (none)
Totals grouped by language (dominant language first):
cpp: 135187 (82.09%)
ansic: 22117 (13.43%)
cs: 3846 (2.34%)
java: 3077 (1.87%)
asm: 449 (0.27%)
Total Physical Source Lines of Code (SLOC) = 164,676
Development Effort Estimate, Person-Years (Person-Months) = 42.51 (510.12)
(Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05))
Schedule Estimate, Years (Months) = 2.23 (26.72)
(Basic COCOMO model, Months = 2.5 * (person-months**0.38))
Estimated Average Number of Developers (Effort/Schedule) = 19.09
Total Estimated Cost to Develop = $ 5,742,532
(average salary = $56,286/year, overhead = 2.40).
SLOCCount, Copyright (C) 2001-2004 David A. Wheeler
SLOCCount is Open Source Software/Free Software, licensed under the GNU GPL.
SLOCCount comes with ABSOLUTELY NO WARRANTY, and you are welcome to
redistribute it under certain conditions as specified by the GNU GPL license;
see the documentation for details.
Please credit this data as "generated using David A. Wheeler's 'SLOCCount'."
连每个开发者的花费都算出来了,着实,呵呵!!