First ever half marathon trial

Try the first ever half marathon today - Yarra Boulevard run.

New experience such routine, Yarra Boulevard from suburb Kew to Hawthorn, either by walking or cycling.

Terrible weather condition today, windy and hot. Have to stop several time to get water supplied in case of dehydrated.

After 17km, left leg calf muscles start cramping, force me to reduce the pace. After 19km, right leg calf muscles also start cramping, but I keep moving forward.

New new personal best of Performance Level - 55 after run. Meanwhile, remind me there is limits inside me, but I know I can finish it.

First ever half marathon trial

Quotes from Johnnie Walker commercial:

“When the sun came shining and I was strolling and the wheat fields waving and the dust clouds rolling. As the fog was lifting, a voice was chanting, this land was made for you and me.”

“As I was walking that ribbon of highway, I saw above me that endless sky way. I saw below me that golden valley, Esta tierra fue hecha para ti y para mí.”

“I’ve roamed and rambled and I’ve followed my footsteps to the sparkling sands over diamond deserts, and all around me a voice was sounding, this land was made for you and me.”

Keep running!

浅谈程序员的数学修养

可能有很多朋友在网上看过 Google 公司早几年的招聘广告,它的第一题如下了:

{first 10-digit prime found in consecutive digits e}.com

e 中出现的连续的第一个10个数字组成的质数。

据说当时这个试题在美国很多地铁的出站口都有大幅广告,只要正确解答了这道题,在浏览器的地址栏中输入这个答案,就可以进入下一轮的测试,整个测试过程如同一个数学迷宫,直到你成为 Google 的一员。

又如 Intel 某年的一道面试题目:

巴拿赫病故于 1945年8月31日。他的出生年份恰好是他在世时某年年龄的平方。
问:他是哪年出生的? 这道看似很简单的数学问题,你能不能能快地解答呢?

下面则是一道世界第一大软件公司微软的招聘测试题:

中间只隔一个数字的两个素数被称为素数对,比如 5 和 7,17 和 19,证明素数对之间的数字总能被 6 整除
(假设这两个素数都大于 6),现在证明没有由三个素数组成的素数对。

这样的试题还有很多很多,这些题目乍初看上去都是一些数学问题。但是世界上一些著名的公司都把它们用于招聘测试,可见它们对新员工数学基础的重视。数学试题与应用程序试题是许多大型软件公司面试中指向性最明显的一类试题,这些试题就是考察应聘者的数学能力与计算机能力。某咨询公司的一名高级顾问曾说: 微软是一家电脑软件公司,当然要求其员工有一定的计算机和数学能力,面试中自然就会考察这类能力。微软的面试题目就考察了应聘人员对基础知识的掌握程度、对基础知识的应用能力,甚至暗含了对计算机基本原理的考察。所以,这样的面试题目的确很“毒辣”,足以筛选到合适的人。

四川大学数学学院的曹广福教授曾说过: “一个大学生将来的作为与他的数学修养有很大的关系”。

大学计算机专业学生都有感触,计算机专业课程中最难的几门课程莫过于离散数学、编译原理、数据结构,当然像组合数学、密码学、计算机图形学等课程也令许多人学起来相当吃力,很多自认为数据库学得很好的学生在范式、函数依赖、传递依赖等数学性比较强的概念面前感到力不从心,这些都是因为数学基础或者说数学知识的缺乏所造成的。数学是计算机的基础,这也是为什么考计算机专业研究生数学都采用最难试题 (数学一) 的原因,当然这也能促使一些新的交叉学科如数学与应用软件、信息与计算科学专业等飞速发展。

许多天才程序员本身就是数学尖子,众所周知,Bill Gates 的数学成绩一直都很棒,他甚至曾经期望当一名数学教授,他的母校 — 湖滨中学的数学系主任弗雷福·赖特曾这样谈起过他的学生: “他能用一种最简单的方法来解决某个代数或计算机问题,他可以用数学的方法来找到一条处理问题的捷径,我教了这么多年的书,没见过像他这样天分的数学奇才。他甚至可以和我工作过多年的那些优秀数学家媲美。当然,比尔也各方面表现得都很优秀,不仅仅是数学,他的知识面非常广泛,数学仅是他众多特长之一。”

影响一代中国程序人的金山软件股份有限公司董事长求伯君当年高考数学成绩满分进一步说明了问题。很多数学基础很好的人,一旦熟悉了某种计算机语言,他可以很快地理解一些算法的精髓,使之能够运用自如,并可能写出时间与空间复杂度都有明显改善的算法。

程序设计当中解决的相当一部分问题都会涉及各种各样的科学计算,这需要程序员具有什么样的基础呢? 实际问题转换为程序,要经过一个对问题抽象的过程,建立起完善的数学模型,只有这样,我们才能建立一个设计良好的程序。从中我们不难看出数学在程序设计领域的重要性。算法与计算理论是计算机程序设计领域的灵魂所在,是发挥程序设计者严谨,敏锐思维的有效工具,任何的程序设计语言都试图将之发挥得淋漓尽致。程序员需要一定的数学修养,不但是编程本身的需要,同时也是培养逻辑思维以及严谨的编程作风的需要。数学可以锻炼我们的思维能力,可以帮助我们解决现实中的问题。可以帮助我们更高的学习哲学。为什么经常有人对一些科学计算程序一筹莫展,他可以读懂每一行代码,但是却无法预测程序的预测结果,甚至对程序的结构与功能也一知半解,给他一个稍微复杂点的数学公式,他可能就不知道怎么把它变成计算机程序。很多程序员还停留在做做简单的 MIS,设计一下 MDI,写写简单的 Class 或用 SQL 语句实现查询等基础的编程工作上,对于一些需要用到数学知识的编程工作就避而远之,当然实现一个累加程序或者一个税率的换算程序还是很容易的,因为它们并不需要什么高深的数学知识。

一名有过 10 多年开发经验的老程序员曾说过: “所有程序的本质就是逻辑。技术你已经较好地掌握了,但只有完成逻辑能力的提高,你才能成为一名职业程序员。打一个比方吧,你会十八般武艺,刀枪棍棒都很精通,但就是力气不够,所以永远都上不了战场,这个力气对程序员而言就是逻辑能力 (其本质是一个人的数学修养,注意,不是数学知识)。”

程序员的数学修养不是一朝一夕就可以培养的。数学修养与数学知识不一样,修养需要一个长期的过程,而知识的学习可能只是一段短暂的时间。下面是一些我个人对于程序员如何提高与培养自己的数学修养的基本看法。

首先,应该意识到数学修养的重要性。作为一个优秀的程序员,一定的数学修养是十分重要也是必要的。数学是自然科学的基础,计算机科学实际上是数学的一个分支。计算机理论其实是很多数学知识的融合,软件工程需要图论,密码学需要数论,软件测试需要组合数学,计算机程序的编制更需要很多的数学知识,如集合论、排队论、离散数学、统计学,当然还有微积分。计算机科学一个最大的特征是信息与知识更新速度很快,随着数学知识与计算机理论的进一步结合,数据挖掘、模式识别、神经网络等分支科学得到了迅速发展,控制论、模糊数学、耗散理论、分形科学都促进了计算机软件理论、信息管理技术的发展。严格的说,一个数学基础不扎实的程序不能算一个合格的程序员,很多介绍计算机算法的书籍本身也就是数学知识的应用与计算机实现手册。

其次,自身数学知识的积累,培养自己的空间思维能力和逻辑判断能力。数学是一门分支众多的学科,我们无法在短暂的一生中学会所有的数学知识,像泛函理论、混沌理论以及一些非线性数学问题不是三五几天就可以掌握的。数学修养的培养并不在与数学知识的多少,但要求程序员有良好的数学学习能力,能够很快地把一些数学知识和自己正在解决的问题联系起来,很多理学大师虽然不是数学出身,但是他们对数学有很强的理解能力和敏锐的观察力,于是一系列新的学科诞生了,如计算化学、计算生物学、生物信息学、化学信息学、计算物理学,计算材料学等等。数学是自然学科的基础,计算机技术作为理论与实践的结合,更需要把数学的一些精髓融入其中。从计算机的诞生来看它就是在数学的基础上产生的,最简单的 0、1 进制就是一个古老的数学问题。程序设计作为一项创造性很强的职业,它需要程序员有一定的数学修养,也具有一定的数学知识的积累,可以更好地把一些数学原理与思想应用于实际的编程工作中去。学无止境,不断的学习是提高修养的必经之路。

第三,多在实践中运用数学。有些高等学校开设了一门这样的课程 —《数学建模》。我在大学时期也曾学过,这是一门内容很丰富的课程。它把很多相关的学科与数学都联系在一起,通过很多数学模型来解决实际的生产生活问题,很多问题的解决需要计算机程序来实现。我在大学和研究生阶段都参加过数学建模竞赛,获得了不少的经验,同时也进一步提高了自己的数学修养。实际上,现在的程序设计从某些角度来看就是一个数学建模的过程,模型的好坏关系到系统的成败,现在数学建模的思想已经用于计算机的许多相关学科中,不单只是计算机程序设计与算法分析。应该知道,数学是一门需要在实践中展示其魅力的科学,而计算机程序也是为帮助解决实际问题而编制的,因此,应该尽量使它们结合起来,在这个方面,计算机密码学是我认为运用数学知识最深最广泛的,每一个好的加密算法后面都有一个数学理论的支持,如椭圆曲线、背包问题、素数理论等。作为一名优秀的程序员,应该在实际工作中根据需要灵活运用数学知识,培养一定的数学建模能力,善于归纳总结,慢慢使自己的数学知识更加全面,数学修养得到进一步提高。

第四,程序员培养制度与教学的改革。许多程序员培养体制存在很多缺陷,一开始就要求学员能够快速精通某种语言,以语言为中心,对算法的核心思想与相关的数学知识都一笔带过,讲得很少,这造成很多程序员成为背程序的机器,这样不利于程序员自身的快速成长,也不利于程序员解决新问题。我在长期的程序员培训与计算机教学工作采用了一些与传统方式不一致的方法,收到了一定的效果。很多初学程序的人往往写程序时有时候会有思维中断,或者对一些稍难的程序觉得无法下手,我采用了一些课前解决数学小问题的方法来激励大家的学习兴趣,这些小问题不单单是脑筋急转弯,其中不少是很有代表意义的数学思考题。通过数学问题来做编程的热身运动,让学员在数学试题中激发自己的思维能力,记得有位专家曾经说过,经常做做数学题目会使自己变聪明,很长时间不去接触数学问题会使自己思维迟钝。通过一些经典的数学问题来培养学员的思维的严谨性和跳跃性。很多人可能不以为然,其实有些看似简单的问题并不一定能够快速给出答案,大脑也是在不断的运用中变更加灵活的。不信吗? 大家有兴趣可以做做下面这道题目,看看能不能在1分钟之内想到答案,这只是一道小学数学课后习题。很多人认为自己的数学基础很好,但是据说这道题目 90% 以上的人不能在一个小时内给出正确答案。试试,如果你觉得我说的是错的。

证明: AB + AC > DB + DC (D 为三角形 ABC 的一个内点)

最后,多学多问,多看好书,看经典。我在这里向大家推荐两部可能大家已经很熟悉的经典的计算机算法教材,它们中间很多内容其实就是数学知识的介绍。

第一部是《算法导论》,英文名称: Introduction to Algorithms,作者: Thomas H. Cormen ,Charles E. Leiserson ,Ronald L. Rivest ,Clifford Stein。本书的主要作者来自麻省理工大学计算机,作者之一 Ronald L. Rivest 由于其在公开秘钥密码算法 RSA 上的贡献获得了图灵奖。这本书目前是算法的标准教材,美国许多名校的计算机系都使用它,国内有些院校也将本书作为算法课程的教材。另外许多专业人员也经常引用它。本书基本包含了所有的经典算法,程序全部由伪代码实现,这更增添了本书的通用性,使得利用各种程序设计语言进行程序开发的程序员都可以作为参考。语言方面通俗,很适合作为算法教材和自学算法之用。

另一部是很多人都应该知道的 Donald E. Knuth 所著《计算机程序设计艺术》,英文名称: The Art of Computer Programming。 Donald E. Knuth 人生最辉煌的时刻在斯坦福大学计算机系渡过,美国计算机协会图灵奖的获得者,是本领域内当之无愧的泰斗。有戏言称搞计算机程序设计的不认识 Knuth 就等于搞物理的不知道爱因斯坦,搞数学的不知道欧拉,搞化学的不知道道尔顿。被简称为 TAOCP 的这本巨著内容博大精深,几乎涵盖了计算机程序设计算法与理论最重要的内容。现在发行的只有三卷,分别为基础运算法则,半数值算法,以及排序和搜索 (在写本文之际,第四卷已经出来了,我也在第一时间抢购了一本)。本书结合大量数学知识,分析不同应用领域中的各种算法,研究算法的复杂性,即算法的时间、空间效率,探讨各种适用算法等,其理论和实践价值得到了全世界计算机工作者的公认。书中引入的许多术语、得到的许多结论都变成了计算机领域的标准术语和被广泛引用的结果。另外,作者对有关领域的科学发展史也有深入研究,因此本书介绍众多研究成果的同时,也对其历史渊源和发展过程做了很好的介绍,这种特色在全球科学著作中是不多见的。至于本书的价值我觉得 Bill Gates 先生的话足以说明问题: “如果你认为你是一名真正优秀的程序员读 Knuth 的《计算机程序设计艺术》,如果你能读懂整套书的话,请给我发一份你的简历”。作者数学方面的功底造就了本书严谨的风格,虽然本书不是用当今流行的程序设计语言描述的,但这丝毫不损伤它 “程序设计史诗” 的地位。道理很简单,它内涵的设计思想是永远不会过时的。除非英语实在有困难,否则建议读者选用英文版。我个人就是阅读的该书的英文版,虽然花了不少 money 和时间,但是收获颇丰,值得。

总之,要想成为一名有潜力有发展前途的程序员,或者想成为程序员中的佼佼者,你一定要培养良好的数学修养。切记: 对于一名能够灵活自如编写各种程序的人,

数学是程序的灵魂。

Reference

Geoffrey Hinton interview

About this course

If you want to break into cutting-edge AI, this course will help you do so. Deep learning engineers are highly sought after, and mastering deep learning will give you numerous new career opportunities.

Deep learning is also a new “superpower” that will let you build AI systems that just were not possible a few years ago. In this course, you will learn the foundations of deep learning. When you finish this class, you will:

  • Understand the major technology trends driving Deep Learning
  • Be able to build, train and apply fully connected deep neural networks
  • Know how to implement efficient (vectorized) neural networks
  • Understand the key parameters in a neural network’s architecture

This course also teaches you how Deep Learning actually works, rather than presenting only a cursory or surface-level description. So after completing it, you will be able to apply deep learning to a your own applications.

If you are looking for a job in AI, after this course you will also be able to answer basic interview questions. This is the first course of the Deep Learning Specialization.

Lecture transcript

As part of this course by deeplearning.ai, hope to not just teach you the technical ideas in deep learning, but also introduce you to some of the people, some of the heroes in deep learning.

The people that invented so many of these ideas that you learn about in this course or in this specialization. In these videos, I hope to also ask these leaders of deep learning to give you career advice for how you can break into deep learning, for how you can do research or find a job in deep learning.

As the first of this interview series, I am delighted to present to you an interview with Geoffrey Hinton.

AN: I think that at this point you more than anyone else on this planet has invented so many of the ideas behind deep learning. And a lot of people have been calling you the godfather of deep learning. Although it was not until we were chatting a few minutes ago, until I realized you think I’m the first one to call you that, which I’m quite happy to have done. But what I want to ask is, many people know you as a legend, I want to ask about your personal story behind the legend. So how did you get involved in, going way back, how did you get involved in AI and machine learning and neural networks?

So when I was at high school, I had a classmate who was always better than me at everything, he was a brilliant mathematician. And he came into school one day and said, did you know the brain uses holograms? And I guess that was about 1966, and I said, sort of what’s a hologram? And he explained that in a hologram you can chop off half of it, and you still get the whole picture. And that memories in the brain might be distributed over the whole brain. And so I guess he’d read about Lashley’s experiments, where you chop off bits of a rat’s brain and discover that it’s very hard to find one bit where it stores one particular memory. So that’s what first got me interested in how does the brain store memories.

And then when I went to university, I started off studying physiology and physics. I think when I was at Cambridge, I was the only undergraduate doing physiology and physics. And then I gave up on that and tried to do philosophy, because I thought that might give me more insight. But that seemed to me actually lacking in ways of distinguishing when they said something false. And so then I switched to psychology. And in psychology they had very, very simple theories, and it seemed to me it was sort of hopelessly inadequate to explaining what the brain was doing. So then I took some time off and became a carpenter. And then I decided that I’d try AI, and went of to Edinburgh, to study AI with Langer Higgins. And he had done very nice work on neural networks, and he’d just given up on neural networks, and been very impressed by Winograd’s thesis. So when I arrived he thought I was kind of doing this old fashioned stuff, and I ought to start on symbolic AI. And we had a lot of fights about that, but I just kept on doing what I believed in.

AN: And then what?

I eventually got a PhD in AI, and then I couldn’t get a job in Britain. But I saw this very nice advertisement for Sloan Fellowships in California, and I managed to get one of those. And I went to California, and everything was different there. So in Britain, neural nets was regarded as kind of silly, and in California, Don Norman and David Rumelhart were very open to ideas about neural nets. It was the first time I’d been somewhere where thinking about how the brain works, and thinking about how that might relate to psychology, was seen as a very positive thing. And it was a lot of fun there, in particular collaborating with David Rumelhart was great.

AN: I see, great. So this was when you were at UCSD, and you and Rumelhart around what, 1982, wound up writing the seminal back prop paper, right?

Actually, it was more complicated than that.

In, I think, early 1982, David Rumelhart and me, and Ron Williams, between us developed the back prop algorithm, it was mainly David Rumelhart’s idea. We discovered later that many other people had invented it. David Parker had invented, it probably after us, but before we’d published. Paul Werbos had published it already quite a few years earlier, but nobody paid it much attention. And there were other people who’d developed very similar algorithms, it’s not clear what’s meant by back prop. But using the chain rule to get derivatives was not a novel idea.

AN: I see, why do you think it was your paper that helped so much the community latch on to back prop? It feels like your paper marked an infection in the acceptance of this algorithm, whoever accepted it.

So we managed to get a paper into Nature in 1986. And I did quite a lot of political work to get the paper accepted. I figured out that one of the referees was probably going to be Stuart Sutherland, who was a well known psychologist in Britain. And I went to talk to him for a long time, and explained to him exactly what was going on. And he was very impressed by the fact that we showed that back prop could learn representations for words. And you could look at those representations, which are little vectors, and you could understand the meaning of the individual features. So we actually trained it on little triples of words about family trees, like Mary has mother Victoria. And you’d give it the first two words, and it would have to predict the last word. And after you trained it, you could see all sorts of features in the representations of the individual words. Like the nationality of the person there, what generation they were, which branch of the family tree they were in, and so on. That was what made Stuart Sutherland really impressed with it, and I think that’s why the paper got accepted.

AN: Very early word embeddings, and you’re already seeing learned features of semantic meanings emerge from the training algorithm.

Yes, so from a psychologist’s point of view, what was interesting was it unified two completely different strands of ideas about what knowledge was like. So there was the old psychologist’s view that a concept is just a big bundle of features, and there’s lots of evidence for that. And then there was the AI view of the time, which is a formal structurist view. Which was that a concept is how it relates to other concepts. And to capture a concept, you’d have to do something like a graph structure or maybe a semantic net. And what this back propagation example showed was, you could give it the information that would go into a graph structure, or in this case a family tree. And it could convert that information into features in such a way that it could then use the features to derive new consistent information, ie generalize. But the crucial thing was this to and fro between the graphical representation or the tree structured representation of the family tree, and a representation of the people as big feature vectors. And in fact that from the graph-like representation you could get feature vectors. And from the feature vectors, you could get more of the graph-like representation.

AN: So this is 1986?

In the early 90s, Bengio showed that you can actually take real data, you could take English text, and apply the same techniques there, and get embeddings for real words from English text, and that impressed people a lot.

AN: I guess recently we’ve been talking a lot about how fast computers like GPUs and supercomputers that’s driving deep learning. I didn’t realize that back between 1986 and the early 90’s, it sounds like between you and Bengio there was already the beginnings of this trend.

Yes, it was a huge advance. In 1986, I was using a list machine which was less than a tenth of a mega flop. And by about 1993 or thereabouts, people were seeing ten mega flops. So there was a factor of 100, and that’s the point at which is was easy to use, because computers were just getting faster.

AN: Over the past several decades, you’ve invented so many pieces of neural networks and deep learning. I’m actually curious, of all of the things you’ve invented, which of the ones you’re still most excited about today?

So I think the most beautiful one is the work I do with Terry Sejnowski on Boltzmann machines. So we discovered there was this really, really simple learning algorithm that applied to great big density connected nets where you could only see a few of the nodes. So it would learn hidden representations and it was a very simple algorithm. And it looked like the kind of thing you should be able to get in a brain because each synapse only needed to know about the behavior of the two neurons it was directly connected to. And the information that was propagated was the same. There were two different phases, which we called wake and sleep. But in the two different phases, you’re propagating information in just the same way. Where as in something like back propagation, there’s a forward pass and a backward pass, and they work differently. They’re sending different kinds of signals. So I think that’s the most beautiful thing.

And for many years it looked just like a curiosity, because it looked like it was much too slow. But then later on, I got rid of a little bit of the beauty, and it started letting me settle down and just use one iteration, in a somewhat simpler net. And that gave restricted Boltzmann machines, which actually worked effectively in practice. So in the Netflix competition, for example, restricted Boltzmann machines were one of the ingredients of the winning entry.

AN: And in fact, a lot of the recent resurgence of neural net and deep learning, starting about 2007, was the restricted Boltzmann machine, and de-restricted Boltzmann machine work that you and your lab did.

Yes so that’s another of the pieces of work I’m very happy with, the idea of that you could train your restricted Boltzmann machine, which just had one layer of hidden features and you could learn one layer of feature. And then you could treat those features as data and do it again, and then you could treat the new features you learned as data and do it again, as many times as you liked. So that was nice, it worked in practice. And then UY Tay realized that the whole thing could be treated as a single model, but it was a weird kind of model. It was a model where at the top you had a restricted Boltzmann machine, but below that you had a Sigmoid belief net which was something that invented many years early. So it was a directed model and what we’d managed to come up with by training these restricted Boltzmann machines was an efficient way of doing inferences in Sigmoid belief nets. So, around that time, there were people doing neural nets, who would use densely connected nets, but didn’t have any good ways of doing probabilistic imprints in them. And you had people doing graphical models, unlike my children, who could do inference properly, but only in sparsely connected nets. And what we managed to show was the way of learning these deep belief nets so that there’s an approximate form of inference that’s very fast, it’s just hands in a single forward pass and that was a very beautiful result. And you could guarantee that each time you learn that extra layer of features there was a band, each time you learned a new layer, you got a new band, and the new band was always better than the old band.

AN: The variational bands, showing as you add layers. Yes, I remember that video.

So that was the second thing that I was really excited about. And I guess the third thing was the work I did with on variational methods. It turns out people in statistics had done similar work earlier, but we didn’t know about that. So we managed to make EN work a whole lot better by showing you didn’t need to do a perfect E step. You could do an approximate E step. And EN was a big algorithm in statistics. And we’d showed a big generalization of it.

And in particular, in 1993, I guess, with Van Camp. I did a paper, with I think, the first variational Bayes paper, where we showed that you could actually do a version of Bayesian learning that was far more tractable, by approximating the true posterior with a guessing. And you could do that in neural net. And I was very excited by that.

AN: I see. Wow, right. Yep, I think I remember all of these papers. You and Hinton, approximate paper, spent many hours reading over that. And I think some of the algorithms you use today, or some of the algorithms that lots of people use almost every day, are what, things like dropouts, or I guess activations came from your group?

Yes and no. So other people have thought about rectified linear units. And we actually did some work with restricted Boltzmann machines showing that a ReLU was almost exactly equivalent to a whole stack of logistic units. And that’s one of the things that helped ReLUs catch on.

AN: I was really curious about that. The value paper had a lot of math showing that this function can be approximated with this really complicated formula. Did you do that math so your paper would get accepted into an academic conference, or did all that math really influence the development of max of 0 and x?

That was one of the cases where actually the math was important to the development of the idea. So I knew about rectified linear units, obviously, and I knew about logistic units. And because of the work on Boltzmann machines, all of the basic work was done using logistic units. And so the question was, could the learning algorithm work in something with rectified linear units? And by showing the rectified linear units were almost exactly equivalent to a stack of logistic units, we showed that all the math would go through.

AN: I see. And it provided the inspiration for today, tons of people use ReLU and it just works without without necessarily needing to understand the same motivation.

Yeah, one thing I noticed later when I went to Google. I guess in 2014, I gave a talk at Google about using ReLUs and initializing with the identity matrix. because the nice thing about ReLUs is that if you keep replicating the hidden layers and you initialize with the identity, it just copies the pattern in the layer below. And so I was showing that you could train networks with 300 hidden layers and you could train them really efficiently if you initialize with their identity. But I didn’t pursue that any further and I really regret not pursuing that. We published one paper with showing you could initialize an active showing you could initialize recurringness like that. But I should have pursued it further because Later on these residual networks is really that kind of thing.

AN: Over the years I’ve heard you talk a lot about the brain. I’ve heard you talk about relationship being back prop and the brain. What are your current thoughts on that?

I’m actually working on a paper on that right now. I guess my main thought is this. If it turns out the back prop is a really good algorithm for doing learning. Then for sure evolution could’ve figured out how to prevent it. I mean you have cells that could turn into either eyeballs or teeth. Now, if cells can do that, they can for sure implement back propagation and presumably this huge selective pressure for it. So I think the neuroscientist idea that it doesn’t look plausible is just silly. There may be some subtle implementation of it. And I think the brain probably has something that may not be exactly be back propagation, but it’s quite close to it. And over the years, I’ve come up with a number of ideas about how this might work. So in 1987, working with Jay McClelland, I came up with the recirculation algorithm, where the idea is you send information round a loop. And you try to make it so that things don’t change as information goes around this loop. So the simplest version would be you have input units and hidden units, and you send information from the input to the hidden and then back to the input, and then back to the hidden and then back to the input and so on. And what you want, you want to train an auto-encoder, but you want to train it without having to do back propagation. So you just train it to try and get rid of all variation in the activities. So the idea is that the learning rule for synapse is change the weighting proportion to the pre-synaptic input and in proportion to the rate of change at the post synaptic input. But in recirculation, you’re trying to make the post synaptic input, you’re trying to make the old one be good and the new one be bad, so you’re changing in that direction.

We invented this algorithm before neuroscientists come up with spike-timing-dependent plasticity. Spike-timing-dependent plasticity is actually the same algorithm but the other way round, where the new thing is good and the old thing is bad in the learning rule. So you’re changing the weighting proportions to the preset outlook activity times the new person outlook activity minus the old one. Later on I realized in 2007, that if you took a stack of Restricted Boltzmann machines and you trained it up. After it was trained, you then had exactly the right conditions for implementing back propagation by just trying to reconstruct. If you looked at the reconstruction era, that reconstruction era would actually tell you the derivative of the discriminative performance. And at the first deep learning workshop at in 2007, I gave a talk about that. That was almost completely ignored. Later on, Yoshua Bengio, took up the idea and that’s actually done quite a lot of more work on that. And I’ve been doing more work on it myself. And I think this idea that if you have a stack of auto-encoders, then you can get derivatives by sending activity backwards and locate reconstructionaires, is a really interesting idea and may well be how the brain does it.

AN: One other topic that I know you follow about and that I hear you’re still working on is how to deal with multiple time skills in deep learning? So, can you share your thoughts on that?

Yes, so actually, that goes back to my first years of graduate student. The first talk I ever gave was about using what I called fast weights. So weights that adapt rapidly, but decay rapidly. And therefore can hold short term memory. And I showed in a very simple system in 1973 that you could do true recursion with those weights. And what I mean by true recursion is that the neurons that is used in representing things get re-used for representing things in the recursive core. And the weights that is used for actually knowledge get re-used in the recursive core. And so that leads the question of when you pop out your recursive core, how do you remember what it was you were in the middle of doing? Where’s that memory? because you used the neurons for the recursive core. And the answer is you can put that memory into fast weights, and you can recover the activities neurons from those fast weights.

And more recently working with Jimmy Ba, we actually got a paper in it by using fast weights for recursion like that. I see. So that was quite a big gap. The first model was unpublished in 1973 and then Jimmy Ba’s model was in 2015, I think, or 2016. So it’s about 40 years later.

AN: And, I guess, one other idea of quite a few years now, over five years, I think is capsules, where are you with that?

Okay, so I’m back to the state I’m used to being in. Which is I have this idea I really believe in and nobody else believes it. And I submit papers about it and they would get rejected. But I really believe in this idea and I’m just going to keep pushing it. So it hinges on, there’s a couple of key ideas. One is about how you represent multi dimensional entities, and you can represent multi-dimensional entities by just a little backdoor activities. As long as you know there’s any one of them. So the idea is in each region of the image, you’ll assume there’s at most, one of the particular kind of feature. And then you’ll use a bunch of neurons, and their activities will represent the different aspects to that feature, like within that region exactly what are its x and y coordinates? What orientation is it at? How fast is it moving? What color is it? How bright is it? And stuff like that. So you can use a whole bunch of neurons to represent different dimensions of the same thing. Provided there’s only one of them. That’s a very different way of doing representation from what we’re normally used to in neuronettes. Normally in neuronettes, we just have a great big layer, and all the units go off and do whatever they do. But you don’t think of bundling them up into little groups that represent different coordinates of the same thing. So I think we should beat this extra structure. And then the other idea that goes with that.

AN: So this means in the truth of the representation, you partition the representation to different subsets, to represent, right, rather than

I call each of those subsets a capsule. And the idea is a capsule is able to represent an instance of a feature, but only one. And it represents all the different properties of that feature. It’s a feature that has a lot of properties as opposed to a normal neuron and a normal neuronette, which has just one scale of property.

And then what you can do if you’ve got that, is you can do something that normal neuronettes are very bad at, which is you can do what I call routine by agreement. So let’s suppose you want to do segmentation and you have something that might be a mouth and something else that might be a nose. And you want to know if you should put them together to make one thing. So the idea should have a capsule for a mouth that has the parameters of the mouth. And you have a capsule for a nose that has the parameters of the nose. And then to decipher whether to put them together or not, you get each of them to vote for what the parameters should be for a face. Now if the mouth and the nose are in the right spacial relationship, they will agree. So when you get two captures at one level voting for the same set of parameters at the next level up, you can assume they’re probably right, because agreement in a high dimensional space is very unlikely. And that’s a very different way of doing filtering, than what we normally use in neural nets. So I think this routing by agreement is going to be crucial for getting neural nets to generalize much better from limited data. I think it’d be very good at getting the changes in viewpoint, very good at doing segmentation. And I’m hoping it will be much more statistically efficient than what we currently do in neural nets. Which is, if you want to deal with changes in viewpoint, you just give it a whole bunch of changes in view point and training on them all.

AN: I see, right, so rather than FIFO learning, supervised learning, you can learn this in some different way.

Well, I still plan to do it with supervised learning, but the mechanics of the forward paths are very different. It’s not a pure forward path in the sense that there’s little bits of iteration going on, where you think you found a mouth and you think you found a nose. And use a little bit of iteration to decide whether they should really go together to make a face. And you can do back props from that iteration. So you can try and do it a little discriminatively, and we’re working on that now at my group in Toronto. So I now have a little Google team in Toronto, part of the Brain team. That’s what I’m excited about right now.

AN: I see, great, yeah. Look forward to that paper when that comes out. You worked in deep learning for several decades. I’m actually really curious, how has your thinking, your understanding of AI changed over these years?

So I guess a lot of my intellectual history has been around back propagation, and how to use back propagation, how to make use of its power. So to begin with, in the mid 80s, we were using it for discriminative learning and it was working well. I then decided, by the early 90s, that actually most human learning was going to be unsupervised learning. And I got much more interested in unsupervised learning, and that’s when I worked on things like the Wegstein algorithm.

AN: And your comments at that time really influenced my thinking as well. So when I was leading Google Brain, our first project spent a lot of work in unsupervised learning because of your influence.

Right, and I may have misled you. Because in the long run, I think unsupervised learning is going to be absolutely crucial. But you have to sort of face reality. And what’s worked over the last ten years or so is supervised learning. Discriminative training, where you have labels, or you’re trying to predict the next thing in the series, so that acts as the label. And that’s worked incredibly well.

I still believe that unsupervised learning is going to be crucial, and things will work incredibly much better than they do now when we get that working properly, but we haven’t yet.

AN: Yeah, I think many of the senior people in deep learning, including myself, remain very excited about it. It’s just none of us really have almost any idea how to do it yet. Maybe you do, I don’t feel like I do.

Variational altering code is where you use the reparameterization tricks. Seemed to me like a really nice idea. And generative adversarial nets also seemed to me to be a really nice idea. I think generative adversarial nets are one of the sort of biggest ideas in deep learning that’s really new. I’m hoping I can make capsules that successful, but right now generative adversarial nets, I think, have been a big breakthrough.

AN: What happened to sparsity and slow features, which were two of the other principles for building unsupervised models?

I was never as big on sparsity as you were, buddy. But slow features, I think, is a mistake. You shouldn’t say slow. The basic idea is right, but you shouldn’t go for features that don’t change, you should go for features that change in predictable ways. So here’s a sort of basic principle about how you model anything. You take your measurements, and you’re applying nonlinear transformations to your measurements until you get to a representation as a state vector in which the action is linear. So you don’t just pretend it’s linear like you do with common filters. But you actually find a transformation from the observables to the underlying variables where linear operations, like matrix multipliers on the underlying variables, will do the work.

So for example, if you want to change viewpoints. If you want to produce the image from another viewpoint, what you should do is go from the pixels to coordinates. And once you got to the coordinate representation, which is a kind of thing I’m hoping captures will find. You can then do a matrix multiplier to change viewpoint, and then you can map it back to pixels.

AN: Right, that’s why you did all that.

I think that’s a very, very general principle.

AN: That’s why you did all that work on face synthesis, right? Where you take a face and compress it to very low dimensional vector, and so you can fiddle with that and get back other faces.

I had a student who worked on that, I didn’t do much work on that myself.

AN: Now I’m sure you still get asked all the time, if someone wants to break into deep learning, what should they do? So what advice would you have? I’m sure you’ve given a lot of advice to people in one on one settings, but for the global audience of people watching this video. What advice would you have for them to get into deep learning?

Okay, so my advice is sort of read the literature, but don’t read too much of it. So this is advice I got from my advisor, which is very unlike what most people say. Most people say you should spend several years reading the literature and then you should start working on your own ideas. And that may be true for some researchers, but for creative researchers I think what you want to do is read a little bit of the literature. And notice something that you think everybody is doing wrong, I’m contrary in that sense. You look at it and it just doesn’t feel right. And then figure out how to do it right. And then when people tell you, that’s no good, just keep at it. And I have a very good principle for helping people keep at it, which is either your intuitions are good or they’re not. If your intuitions are good, you should follow them and you’ll eventually be successful. If your intuitions are not good, it doesn’t matter what you do.

AN: Inspiring advice, might as well go for it.

You might as well trust your intuitions. There’s no point not trusting them.

AN: I usually advise people to not just read, but replicate published papers. And maybe that puts a natural limiter on how many you could do, because replicating results is pretty time consuming.

Yes, it’s true that when you’re trying to replicate a published you discover all over little tricks necessary to make it work.

The other advice I have is, never stop programming. Because if you give a student something to do, if they’re botching, they’ll come back and say, it didn’t work. And the reason it didn’t work would be some little decision they made, that they didn’t realize is crucial. And if you give it to a good student, like UY Tay for example. You can give him anything and he’ll come back and say, it worked. I remember doing this once, and I said, but wait a minute UY. Since we last talked, I realized it couldn’t possibly work for the following reason. And said, yeah, I realized that right away, so I assumed you didn’t mean that.

AN: I see, yeah, that’s great, yeah. Let’s see, any other advice for people that want to break into AI and deep learning?

I think that’s basically, read enough so you start developing intuitions. And then, trust your intuitions and go for it, don’t be too worried if everybody else says it’s nonsense.

AN: And I guess there’s no way to know if others are right or wrong when they say it’s nonsense, but you just have to go for it, and then find out.

Right, but there is one thing, which is, if you think it’s a really good idea, and other people tell you it’s complete nonsense, then you know you’re really on to something. So one example of that is when and I first came up with variational methods. I sent mail explaining it to a former student of mine called Peter Brown, who knew a lot about. And he showed it to people who worked with him, called the brothers, they were twins, I think. And he then told me later what they said, and they said, either this guy’s drunk, or he’s just stupid, so they really, really thought it was nonsense. Now, it could have been partly the way I explained it, because I explained it in intuitive terms. But when you have what you think is a good idea and other people think is complete rubbish, that’s the sign of a really good idea.

AN: I see, and research topics, new grad students should work on capsules and maybe unsupervised learning, any other?

One good piece of advice for new grad students is, see if you can find an advisor who has beliefs similar to yours. Because if you work on stuff that your advisor feels deeply about, you’ll get a lot of good advice and time from your advisor. If you work on stuff your advisor’s not interested in, all you’ll get is, you get some advice, but it won’t be nearly so useful.

AN: I see, and last one on advice for learners, how do you feel about people entering a PhD program? Versus joining a top company, or a top research group?

Yeah, it’s complicated, I think right now, what’s happening is, there aren’t enough academics trained in deep learning to educate all the people that we need educated in universities. There just isn’t the faculty bandwidth there, but I think that’s going to be temporary. I think what’s happened is, most departments have been very slow to understand the kind of revolution that’s going on. I kind of agree with you, that it’s not quite a second industrial revolution, but it’s something on nearly that scale. And there’s a huge sea change going on, basically because our relationship to computers has changed. Instead of programming them, we now show them, and they figure it out. That’s a completely different way of using computers, and computer science departments are built around the idea of programming computers. And they don’t understand that sort of, this showing computers is going to be as big as programming computers. Except they don’t understand that half the people in the department should be people who get computers to do things by showing them. So my department refuses to acknowledge that it should have lots and lots of people doing this. They think they got a couple, maybe a few more, but not too many. And in that situation, you have to remind the big companies to do quite a lot of the training. So Google is now training people, we call brain residence, I suspect the universities will eventually catch up.

AN: In fact, maybe a lot of students have figured this out. A lot of top 50 programs, over half of the applicants are actually wanting to work on showing, rather than programming.

Yeah, cool,

AN: yeah, in fact, to give credit where it’s due, whereas a deep learning AI is creating a deep learning specialization. As far as I know, their first deep learning MOOC was actually yours taught on Coursera, back in 2012, as well. And somewhat strangely, that’s when you first published the RMS algorithm, which also is a rough.

Right, yes, well, as you know, that was because you invited me to do the MOOC. And then when I was very dubious about doing, you kept pushing me to do it, so it was very good that I did, although it was a lot of work.

AN: Yes, and thank you for doing that, I remember you complaining to me, how much work it was. And you staying out late at night, but I think many, many learners have benefited for your first MOOC, so I’m very grateful to you for it, so, over the years, I’ve seen you embroiled in debates about paradigms for AI, and whether there’s been a paradigm shift for AI. What are your, can you share your thoughts on that?

Yes, happily, so I think that in the early days, back in the 50s, people like von Neumann and didn’t believe in symbolic AI, they were far more inspired by the brain. Unfortunately, they both died much too young, and their voice wasn’t heard. And in the early days of AI, people were completely convinced that the representations you need for intelligence were symbolic expressions of some kind. Sort of cleaned up logic, where you could do nomeratonic things, and not quite logic, but something like logic, and that the essence of intelligence was reasoning. What’s happened now is, there’s a completely different view, which is that what a thought is, is just a great big vector of neural activity, so contrast that with a thought being a symbolic expression. And I think the people who thought that thoughts were symbolic expressions just made a huge mistake. What comes in is a string of words, and what comes out is a string of words. And because of that, strings of words are the obvious way to represent things. So they thought what must be in between was a string of words, or something like a string of words. And I think what’s in between is nothing like a string of words. I think the idea that thoughts must be in some kind of language is as silly as the idea that understanding the layout of a spatial scene must be in pixels, pixels come in. And if we could, if we had a dot matrix printer attached to us, then pixels would come out, but what’s in between isn’t pixels. And so I think thoughts are just these great big vectors, and that big vectors have causal powers. They cause other big vectors, and that’s utterly unlike the standard AI view that thoughts are symbolic expressions.

AN: I see, good, I guess AI is certainly coming round to this new point of view these days.

Some of it, I think a lot of people in AI still think thoughts have to be symbolic expressions.

AN: Thank you very much for doing this interview. It was fascinating to hear how deep learning has evolved over the years, as well as how you’re still helping drive it into the future, so thank you, Jeff.

Well, thank you for giving me this opportunity.

AN: Thank you.

References

画虾

今购得一画,甚喜。

构图严谨而不失大气,疏密有致且遥相呼应,笔墨灵动犹如活物跃然纸上。齐白石三字字迹遒劲有力,却又透露出儿童般的天真烂漫。

字,画相应得章,疏可走马密不透风,装裱更是别具一格,边上圆点疏密一致,深色背景与浅色画面相互映衬。

实为难得佳作,并经砖家鉴定真迹无疑。

画虾

此画还有一神奇之处,若放笼屉蒸 10 分钟,画面生彩,触觉肌理,意境更佳。

为什么要读书

Maiden

父子二人饮茶,儿问:“为什么要我读书?”

父答:“我这么跟你说吧!你读了书,喝这茶时就会说:‘此茶汤色澄红透亮,气味幽香如兰,口感饱满纯正,圆润如诗,回味甘醇,齿颊留芳,韵味十足,顿觉如梦似幻,仿佛天上人间,真乃茶中极品!’”

而如果你没有读书,你就会说::“‘卧操!这茶不赖’”

Maiden

近日对《中国诗词大会》上瘾,于是思考人为什么要读书?又如何用好的词语来描述心情和感受呢?

有人曾提出这样一个问题:大部分读过的书最后都会忘掉,那读书的意义何在?这是我见过最好的回答:“小的时候我吃了很多东西,其中的大部分我已记不清是什么,但我知道,他们已经成了我现在的骨和肉”。读书,也是如此。它在不知不觉中就已经影响了你的思想,你的言行,你的形象。

1,当你开心的时候
你可以说:
春风得意马蹄疾
一日看尽长安花

而不是只会说:
哈哈, 哈哈, 哈哈, 哈哈, 哈哈哈

2,当你伤心的时候
你可以说:
问君能有几多愁,
恰似一江春水向东流

而不是只会说:
我的心好痛

3,当你看到帅哥时
你可以说:
陌上人如玉
公子世无双

而不是只会说:
我靠,好帅!
我靠靠靠,太帅了

4,当你看到美女时
你可以说:
北方有佳人,绝世而独立

而不是只会说:
我去,她好美
我去,她真美

5,当你遇见渣男时
你可以说:
遇人不淑 ,识人不善

而不是只会说:
瞎了老子的狗眼

6,当你向一个人表达爱意时
你可以说:
山有木兮木有枝
心悦君兮君不知

而不是只会说:
我喜欢你,天荒地老,海枯石烂

7,当你思念一个人的时候
你可以说:
衣带渐宽终不悔
为伊消得人憔悴

而不是只会说:
我想死你啦

8,当你失恋的时候
你可以说:
人生若只如初见
何事秋风悲画扇
等闲变却故人心
却道故人心易变

而不是只会千万遍的呼喊:
蓝瘦,香菇

9,结婚的时候
你可以说:
春宵一刻值千金
花有清香月有阴

而不是只会说:
嘿嘿, 嘿嘿, 嘿嘿嘿

10,分手的时候
你可以说:
相濡以沫
不如相忘于江湖

而不是只会说:
我们不合适

11,看见大漠戈壁的时候
你可以说:
大漠孤烟直,长河落日圆

而不是只会说:
唉呀妈呀,这全都是沙子

12,看见夕阳余晖的时候
你可以说:
落霞与孤鹜齐飞
秋水共长天一色

而不是只会说:
卧槽
这么多鸟
真好看
真他妈太好看了

C10k / C10M Challenge

C10k / C10M Challenge 挑战

History

The term was coined in 1999 by Dan Kegel, citing the Simtel FTP host, cdrom.com, serving 10,000 clients at once over 1 gigabit per second Ethernet in that year. The term has since been used for the general issue of large number of clients, with similar numeronyms for larger number of connections, most recently C10M in the 2010s.

By the early 2010s millions of connections on a single commodity 1U server became possible: over 2 million connections (WhatsApp, 24 cores, using Erlang on FreeBSD), 10–12 million connections (MigratoryData, 12 cores, using Java on Linux).

C10k(concurrently handling 10k connections)是一个在 1999 年被提出来的技术挑战,如何在一颗 1GHz CPU,2G 内存,1Gbps 网络环境下,让单台服务器同时为 1 万个客户端提供 FTP 服务。而到了 2010 年后,随着硬件技术的发展,这个问题被延伸为 C10M,即如何利用 8 核心 CPU,64G 内存,在 10Gbps 的网络上保持 1000 万并发连接,或是每秒钟处理 100 万的连接。(两种类型的计算机资源在各自的时代都约为 1200 美元)。

C10k / C10M 问题则是从技术角度出发挑战软硬件极限。C10k / C10M 问题得解,成本问题和效率问题迎刃而解。

Network data flow through kernel

References

Plato and Socrates - 柏拉图麦穗问题

The meaning of Love, Marriage and Life

One day, Plato asked his teacher Socrates, “What is love? How can I find it?” Socrates answered, “There is a vast wheat field in front. Walk forward without turning back, and pick only one stalk. If you find the most magnificent stalk, then you have found love.” Plato walked forward, and before long, he returned with empty hands, having picked nothing. His teacher asked, “Why did you not pick any stalk?” Plato answered, “Because I could only pick once, and yet I could not turn back. I did find the most magnificent stalk, but did not know if there were any better ones ahead, so I did not pick it. As I walked further, the stalks that I saw were not as good as the earlier one, so! I did not pick any in the end.”

Socrates then said, “And that is LOVE.”

On another day, Plato asked Socrates: “What is marriage? How can I find it?” His teacher answered, “There is a thriving forest in front. Walk forward without turning back, and chop down only one tree. If you find the tallest tree, then you have found marriage”. Plato walked forward, and before long, He returned with a tree. The tree was not bad but it was not tall, either. It was only an ordinary tree, not the best but just a good tree. His teacher asked, “Why did you chop down such an ordinary tree?” Plato answered, “Based on my previous experience, I had walked through the field, but returned with empty hands. This time, I saw this tree, and I felt that it was the first good tree I had seen, so I chopped it down and brought it back. I did not want to miss the chance.”

Socrates then said, “And that is MARRIAGE.

On another day, Plato asked his teacher, “What is life?” Socrates asked him to go to the forest again, allowed back and forth as well, and pluck the most beautiful flower. Plato walked forward. However he hadn’t come back for 3 days. His teacher went to find him. When he saw Plato’s camping in the forest, he asked:” Have you found the most beautiful flower?” Plato pointed a flower near to his camp and answered, “This is the most beautiful flower!” “Why didn’t you take it out?” Socrates asked. “Because if I pick it, it would be drooping. Even though I didn’t pick, it would die in a couple of days for sure. So I had been living by its side while it was blooming. When it’s drooped, I was up to find another one. This is the second most beautiful flower I have found!”

Socrates then said, “You’ve got the truth of LIFE”

“Love” is the most beautiful thing to happen to a person, it’s an opportunity you don’t realize its worth when you have it but only when it’s gone like the field of stalks.

“Marriage” is like the tree you chopped, it’s a compromise; you pick the first best thing you see and learn to live a happy life with it. Having an affair is alluring. It’s like lightning - bright but disappeared so quickly that you cannot catch up with and keep it.

“Life” is to follow and enjoy the every beautiful moment of living. That’s why you should enjoy your life wherever you live.

有一天,古希腊哲学家柏拉图问他的老师苏格拉底什么是爱情,他的老师就叫他先到麦田里,摘一棵全麦田里最大最金黄的的麦穗。期间只能摘一次,并且只可以向前走,不能回头。柏拉图于是照着老师的说话做。结果,他两手空空的走出麦田。老师问他为什么摘不到,他说:“因为只能摘一次,又不能走回头路,其间即使见到一棵又大又金黄的,因为不知前面是否有更好,所以没有摘;走到前面时,又发觉总不及之前见到的好,原来麦田里最大最金黄的麦穗,早就错过了;于是,我什么也没摘到。”

苏格拉底说:“这就是爱情。”

之后又有一天,柏拉图问他的老师什么是婚姻,他的老师就叫他先到树林里,砍下一棵全树林最大最茂盛、最适合放在家作圣诞树的树。其间同样只能摘一次,以及同样只可以向前走,不能回头。柏拉图于是照着老师的说话做。今次,他带了一棵普普通通,不是很茂盛,亦不算太差的树回来。老师问他,怎么带这棵普普通通的树回来。他说:“有了上一次经验,当我走到大半路程,已经感到累了却还两手空空时,我觉得虽然树林里还有很多树,但这棵树还是挺不错的,便砍下来,免得最后又什么也带不出来。”

苏格拉底说:“这就是婚姻。”

又有一天柏拉图又问老师苏格拉底什么是生活,苏格拉底还是叫他到树林走一次。要求是随便走,在途中要取一支最好看的花。柏拉图有了以前的教训又充满信心地出去过了三天三夜,他也没有回来。苏格拉底只好走进树林里去找他,最后发现柏拉图已在树林里安营扎寨。苏格拉底问他:“你找着最好看的花么?” 柏拉图指着边上的一朵花说:“这就是最好看的花。” 苏格拉底问:“为什么不把它带出去呢?” 柏拉图回答老师: “我如果把它摘下来,它马上就枯萎。即使我不摘它,它也迟早会枯。所以我就在它还盛开的时候,住在它边上。等它凋谢的时候,再找下一朵。这已经是我找着的第二朵最好看的花了。”

苏格拉底说: “你已经懂得生活的真谛了。”

爱情给人经历和回忆,之后,婚姻靠的是明智的决定和好好的把握,经过了这些考验,到最后才会明白生活是一种珍惜和守护。

柏拉图麦穗问题的数学解答

  
现在我们用数学的角度来讨论这个问题。

假设我们碰到的麦穗有 n 个,我们用这样的策略来选麦穗,前 k 个,记住一个最大的麦穗记为 d(可能是重量,也可能是体积),然后 k + 1 个开始,只要大于 d 的,就选择,否则就不选择。

对于某个固定的 k,如果最大的麦穗出现在了第 i 个位置(k < i ≤ n),要想让他有幸正好被选中,就必须得满足前 i - 1 个麦穗中的最好的麦穗在前 k 个麦穗里,这有 k / (i - 1) 的可能。考虑所有可能的 i,我们便得到了前 k 个麦穗作为参考,能选中最大麦穗的总概率 P(k):

Wheat Paradox

设 k / n = x,并且假设 n 充分大,则上述公式可以改为:
  
Wheat Paradox

对 x·ln(x) 求导,并令这个导数为 0,可以解出 x 的最优值,它就是欧拉研究的神秘常数的倒数 1 / e.

所以 k = n / e.
  
如果你想摘取最大的麦穗,假设有 n 个麦穗,你应该先将前 n / e 个麦穗作为参考,然后再 k + 1 个麦穗开始选择比前面 k 个最大的麦穗即可。

e = 2.718281828459

1 / e = 0.36787944117144

其他例子

一、一楼到十楼的每层电梯门口都放着一颗钻石,钻石大小不一。你乘坐电梯从一楼到十楼,每层楼电梯门都会打开一次,只能拿一次钻石,问怎样才能拿到最大的一颗。

首先,这个题目说的,并不能完全拿到最大的钻石。但可以保证拿到最大钻石的概率最大。10 / e = 3.67,向上取整得 4。前四层皆不取,只记下最大的。后面遇到的,只要比前面最大的还大,取之即可。

二、秘书问题。在机率及博弈论上,秘书问题(类似名称有相亲问题、止步问题、见好就收问题、苏丹的嫁妆问题、挑剔的求婚者问题等) 内容是这样的:

要聘请一名秘书,有 n 人来面试。每次面试一人,面试过后便要即时决定聘不聘他,如果当时决定不聘他,他便不会回来。面试时总能清楚了解求职者的适合程度,并能和之前的每个人作比较。问凭什么策略,才使选得到最适合担任秘书的人的机率最大?

References

Why Functional Programming?

Why Functional Programming?

Why Static Type?

  • 性能 - 方法调用速度更快,因为不需要在运行时才来判断调用的是哪个方法。
  • 可靠性 - 编译器验证了程序的正确性,因而运行时崩溃的概率更低。
  • 可维护性 - 陌生代码更容易维护,因为你可以看到代码中用到的对象的类型。
  • 工具支持 - 静态类型使 IDE 能提供可靠的重构、精确的代码补全以及其他特性。

Benefit of Functional Programming

  • 头等函数 - 把函数(一小段行为)当作值使用,可以用变量保存它,把它当作参数传递,或者当作其他函数的返回值。
  • 不可变性 - 使用不可变对象,这保证了它们的状态在其创建之后不能再变化。
  • 无副作用 - 使用的是纯函数。此类函数在输入相同时会产生同样的结果,并且不会修改其他对象的状态,也不会和外面的世界交互。

Moving Parts

  • 简洁

函数式风格的代码 比相应的命令式风格的代码更优雅、更简练,因为把函数当作值可以让你获得更强大的抽象能力,从而避免重复代码。

假设你有两段类似的代码,实现相似的任务但具体细节略有不同,可以轻易地将这段逻辑中公共的部分提取到一个函数中,并将其他不同的部分作为参数传递给它。这些参数本身也是函数,但你可以使用一种简洁的语法来表示这些匿名函数,被称作 lambda 表达式。

  • 多线程安全

多线程程序中最大的错误来源之一就是,在没有采用适当同步机制的情况下,在不同的线程上修改同一份数据。如果你使用的是不可变数据结构和纯函数,就能保证这样不安全的修改根本不会发生,也就不需要考虑为其设计复杂的同步方案。

  • 测试更加容易

没有副作用的函数可以独立地进行测试,因为不需要写大量的设置代码来构造它们所依赖的整个环境。

Functional programming, views a program as a mathematical function which is evaluated to produce a result value. That function may call upon nested functions, which in turn may call upon more nested functions. A nested function evaluates to produce a result. From there, that result is passed on to the enclosing function, which uses the nested function values to calculate its own return value. To enable functions to easily pass data to and from other functions, functional programming languages typically define data structures in the most generic possible way, as a collection of (any) things. They also allow functions to be passed to other functions as if they were data parameters. A function in this paradigm is not allowed to produce any side effects such as modifying a global variable that maintains state information. Instead, it is only allowed to receive parameters and perform some operations on them in order to produce its return value. Executing a functional program involves evaluating the outermost function, which in turn causes evaluation of all the nested functions, recursively down to the most basic functions that have no nested functions.

Why is functional programming a big deal?

  • Clarity

Programming without side effects creates code that is easier to follow - a function is completely described by what goes in and what comes out. A function that produces the right answer today will produce the right answer tomorrow. This creates code that is easier to debug, easier to test, and easier to re-use.

  • Brevity

In functional languages, data is implicitly passed from a nested function to its parent function, via a general-purpose collection data type. This makes functional programs much more compact than those of other paradigms, which require substantial “housekeeping” code to pass data from one function to the next.

  • Efficiency

Because functions do not have side effects, operations can be re-ordered or performed in parallel in order to optimize performance, or can be skipped entirely if their result is not used by any other function.

References

Customisation of Filco Majestouch 2 Mechanical Keyboard

A Filco Majestouch 2 Tenkeyless Mechanical Keyboard, with Cherry Brown switches. It has been replaced with GMK Honeywell keycaps from Originative Co. (https://originative.co/products/honeywell) soon after it bought.

GMK Honeywell

GMK Honeywell keycaps, made in Germany

Replace Filco Majestouch 2’s with GMK Honeywell keycaps.

Replace with GMK Honeywell keycaps

Replace with GMK Honeywell keycaps

Replace with GMK Honeywell keycaps

Replace with GMK Honeywell keycaps

Replace with GMK Honeywell keycaps

After joined Massdrop Lambo 80% Anodized Aluminum Case for Filco 87 TKL campaign (https://www.massdrop.com/buy/lambo-80-anodized-aluminum-case-for-filco-87-tkl)

Lambo 80% Anodized Aluminum Case for Filco 87 TKL

After a few months waiting, case delivered in, shipped from USA.

Lambo 80% Anodized Aluminum Case

Lambo 80% Anodized Aluminum Case

Get hands warmed up and dirty.

Replace the case

Replace the case

Replace the case

Replace the case

Replace the case

Now, show time.

Show time

Show time

Show time

Show time

Show time

Show time

Show time