优秀的程序员都有哪些好习惯

Hacker News上一篇不错的回答。不同情况下养成不同的习惯:
极小的公司(或试验性质的项目):
写的代码比读的多,优化写操作。
成熟点的公司里面读代码比写代码多,优化读操作。
This is a consequence of the reader:writer ratio - startup code is written a lot more than it is read and so readability matters little, but mature code is read much more than it is written. (Switching to the latter culture when you need to develop like the former to get users & funding & stay alive is left as an exercise for the reader.)

Meta-habit: learn to adopt different habits for different situations. With that in mind, some techniques I've found useful for various situations:"Researchey" green-field development for data-science-like problems:

  1. If it can be done manually first, do it manually. You'll gain an intuition for how you might approach it.能手动解决先手动
  2. Collect examples. Start with a spreadsheet of data that highlights the data you have available.收集案例,建立电子表格程序
  3. Make it work for one case before you make it work for all cases.先在一个项目上跑起来,再考虑在所有项目上
  4. Build debugging output into your algorithm itself. You should be able to dump the intermediate results of each step and inspect them manually with a text editor or web browser.排除程序上的算法漏洞,你要能随时倒掉每一步中间产生的垃圾,用text编辑器或者网页来审查
  5. Don't bother with unit tests - they're useless until you can define what correct behavior is, and when you're doing this sort of programming, by definition you can't.别怕麻烦,用单元检测,当你找到正确的代码是什么就不会觉得之前工作白费了
    Maintenance programming for a large, unfamiliar codebase:维护大型项目,不熟悉的代码数据
  6. Take a look at filesizes. The biggest files usually contain the meat of the program, or at least a dispatcher that points to the meat of the program. main.cc is usually tiny and useless for finding your way around.看一下文件大小,最大的文件通常包含主要部分,或者至少是任务分发,说明项目的主要部分。main.cc通常小而无用
  7. Single-step through the program with a debugger, starting at the main dispatch loop. You'll learn a lot about control flow.使用挖臭虫程序单独步骤,从主要的分发循环开始。
  8. Look for data structures, particularly ones that are passed into many functions as parameters. Most programs have a small set of key data structures; find them and orienting yourself to the rest becomes much easier.查看数据结构,特别是那些作为函数的参数的自变量。大多数程序有小部分关键的数据结构,找到他们指导你在剩下的部分变得更容易
  9. Write unit tests. They're the best way to confirm that your understanding of the code is actually how the code works.写单元测试,这是帮助你确认理解代码工作的最好方式
  10. Remove code and see what breaks. (Don't check it in though!)移除代码,看会出现什么情况,不要只停留在思考中
    Performance work:执行工作
  11. Don't, unless you've built it and it's too slow for users. Have performance targets for how much you need to improve, and stop when you hit them.除非建好否则不要执行。明确任务目标你需要提高的程度,在运行时挺迟
  12. Before all else (even profiling!), build a set of benchmarks representing typical real-world use. Don't let your performance regress 回归unless you're very certain you're stuck at a local maxima and there's a better global solution just around the corner. (And if that's the case, tag your branch in the VCS so you can back out your changes if you're wrong.)标记可能出错的地方,回头再修改
  13. Many performance bottlenecks are at the intersection between systems. Collect timing stats in any RPC framework, and have some way of propagating 繁殖& visualizing the time spent for a request to make its way through each server, as well as which parts of the request happen in parallel and where the critical path is.大多数瓶颈出现在系统交叉部分。
  14. Profile.
  15. Oftentimes you can get big initial wins by avoiding unnecessary work. Cache 贮藏your biggest computations, and lazily evaluate things that are usually not needed.
  16. Don't ignore constant factors. Sometimes an algorithm with asymptotically 渐进地;渐进的worse performance will perform better in practice because it has much better cache locality. You can identify opportunities for this in the functions that are called a lot.
  17. When you've got a flat profile, there are often still very significant gains that can be obtained through changing your data structures. Pay attention to memory use; often shrinking memory requirements speeds up the system significantly through less cache pressure. Pay attention to locality, and put commonly-used data together. If your language allows it (shame on you, Java), eliminate pointer-chasing in favor of value containment.
    General code hygiene:卫生
  18. Don't build speculatively. Make sure there's a customer for every feature you put in.
  19. Control your dependencies carefully. That library you pulled in for one utility function may have helped you save an hour implementing the utility function, but it adds many more places where things can break - deployment, versioning, security, logging, unexpected process deaths.
  20. When developing for yourself or a small team, let problems accumulate and fix them all at once (or throw out the codebase and start anew). When developing for a large team, never let problems accumulate; the codebase should always be in a state where a new developer could look at it and say "I know what this does and how to change it." This is a consequence of the reader:writer ratio - startup code is written a lot more than it is read and so readability matters little, but mature code is read much more than it is written. (Switching to the latter culture when you need to develop like the former to get users & funding & stay alive is left as an exercise for the reader.)

Take a look at filesizes. The biggest files usually contain the meat of the program, or at least a dispatcher that points to the meat of the program. main.cc is usually tiny and useless for finding your way around.This is my #1 pet peeve with GitHub. When I first look at an unfamiliar repo, I want to get a sense of what the code is about and what it looks like. The way I do that with a local project is by looking at the largest files first. But GitHub loves their clean uncluttered interface so much, they won't show me the file sizes!

I think this Chrome extension called "GitHub Repository Size" might be exactly what you are looking for

Yes, absolutely! I really want a whole-repo tree view of files along with their sizes and file types.

Check out Octotree if you're using Chrome. No file sizes still, but I've found that when you just want to quickly explore some potential new source this beats having to clone the repo first.

  1. Collect examples. Start with a spreadsheet of data that highlights the data you have available.This is true not just for data science but when trying to solve any numerical problem. Using a spreadsheet (or a R / Python notebook) to implement the algorithm and getting some results has helped me in the past to really understand the problem and avoid dead ends.
    For example, when building a FX pricing system, I was able to use a spreadsheet to describe how the pricing algorithm would work and explain it to the traders (the end users). We could tweak the calculations and make sure things were clear to all before implementing and deploying the algorithm.
    Great advice!

Great advice. One nit to pick:> Don't ignore constant factors. Sometimes an algorithm with asymptotically worse performance will perform better in practice because it has much better cache locality.
Forget the cache, sometimes they're just plain faster (edit in response to comment: I mean faster for your use case). I've e.g. found that convolutions can be much faster with the naive algorithm than with an FFT in a pretty decent set of cases. (Edit: To be specific, these cases necessarily only occur for "sufficiently small" vectors, but it turned out that was a larger size than I expected.) Caching doesn't necessarily explain it I think, it can just simply be extra computation that doesn't end up paying off.

Good correction, but a small second nit.> sometimes they're just plain faster
Not faster for sufficiently large N (by definition).
But your general point is correct.
I've best seen this expressed in Rob Pike's 5 Rules of Programming [0], Rule 3:
Rule 3. Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don't get fancy.
[0] http://users.ece.utexas.edu/~adnan/pike.html

Not faster for sufficiently large N (by definition).True, but supposedly researchers keep publishing algorithms with lower complexity that will be faster only if N is, like 10^30 or so.
Or so Sedgwick keeps telling us.

Great comment! I have one point to add to '"Researchey" green-field development for data-science-like problems':6. Use assertions for defining your expectations at each stage of the algorithm - they will make the debugging much more easier

To expand on your #2:I work with a lot of traders. One antipattern I noticed is that when there's a problem with the data, they'll do all sorts of permutations and aggregations and then scratch their chins and ponder about it for hours.
Go to the fucking source and find an example of the problem! Read it line by line, usually it will be obvious what happened.
Corollary: Don't assume your data is correct, most outliers ina large data set are problems with the data itself. Build a few columns that serve as sanity checks. One good example is a column that shows the distance between this sequence number and the last, anything >1 is a dropped message.

Great list. Only thing I'd add is lean towards clever and often simple architecture when modelling a solution, it often will beat clever programming..

Addition to Performance/2.: Synchronization costs are typically the biggest deal in applications that involve I/O (e.g. hard drive or network). Try an average database transaction with and without synchronization. 1) On sqlite3, it's dozens vs hundreds of milliseconds. Bigger databases, probably not much difference. 2) Lookup NFS sync issues. It's a huge speed/safety tradeoff. 3) On some file systems, a debian installation may take 10 minutes or 90 minutes depending on whether you disabled sync (eatmydata command).

This is a great comment! If you have a blog could you make it a blog post? It deserves to be read more widely.

I've been tempted to start a blog...I've gotten a few requests on HN...but I'm still in the "rather be a programmer than a web celebrity" phase. I'm afraid it'll be too much of a distraction from my projects. Plus, I usually write better in response to a prompt than coming up with content in a vacuum.

You could perhaps consider turning a such 'response to a prompt' into a full-blown post on whatever easy-to-use blogging system (Medium, wordpress.com, hell even pastebin or its ilk?), and linking to it alongside a comment.On HN, I often check specific commenters activity, or new comments in a thread that can be days-old, because I often find hidden gems. A link to more elaboration would definitely count as such. Perhaps an audience of even just a dozen or so people like me might be worth it.
EDIT: Coincidentally (I swear), there's exactly a dozen of comments that positively engage with your comment!

Could you put a link back to the original source in the doc? That way you capture the comments & discussion, which has a bunch of useful clarifications.

Point #5 about unit tests is so true, wish i knew it before jumping on the bandwagon. I wasted so much time writing tdd code only to learn later that the specs had changed. This can save you an insane amount of pain and time.

I hate printers. I haven't used one in months. However, I just printed this comment and I'm hanging it up in my office. Perfect.

You're a genius! How long have you been into the developing world, if I may ask?

12 years, plus college, a gap year, and a couple internships & projects in high school.It's helped that it's really been 12 (well, 13+change) years of experience, rather than one year repeated 12 times. Each year has brought something new and different that's just out of my comfort zone.
原文:https://news.ycombinator.com/item?id=14709076

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 217,509评论 6 504
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,806评论 3 394
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 163,875评论 0 354
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,441评论 1 293
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,488评论 6 392
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,365评论 1 302
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,190评论 3 418
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 39,062评论 0 276
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,500评论 1 314
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,706评论 3 335
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,834评论 1 347
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,559评论 5 345
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,167评论 3 328
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,779评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,912评论 1 269
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,958评论 2 370
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,779评论 2 354

推荐阅读更多精彩内容