IKS workshop: semantic technology Parsing a large JSON file efficiently and easily » In a recent keynote at SOCC, Jeff Dean of Google listed a number of design patterns for system design and a number of challenges for the future. I wrote them down, and thought I might as well share them.
You can find the presentation here (it begins with something else, just skip forward a few slides, Linux users: install moonlight).
He starts off noting a shift that happened over the last 5 to 10 years: (small) devices interact with services that are provided by large data centers. This allows clients to use large bursts of computational power, such as in the case of a single Google search which runs across 1000's of servers.
Then he goes on with the typical introductions to MapReduce (with a map tile generation example) and BigTable (mentioning what's new since the paper). He also mentions Colossus (next-gen GFS) and talks about Spanner, a cross data center storage and computing system.
And then we get to the system design experiences and design patterns. Some of these are very generic, others are specifically for distributed systems. The below is just a tight summary, the actual slides and talk go into more detail on each of these.
1.Break large complex systems down into many services, with few dependencies.
Easy to test and deploy, allows lots of experiments, reimplement without affecting clients, small teams can work independently.
A single google.com search touches over 100 services.
Personal note: this is important in any design, distributed or not. It is the purpose of the module system in our Daisy/Kauri Runtime system.
2.Protocol description language is a must.
See protocol buffers.
Servers ignore tags they don't understand, but pass the information through.
Personal note: in the XML world, this is also known as the "must ignore" pattern.
3.Ability to estimate performance of a system design, without actually having to build it: do 'back of the envelope' calculations. See slide on numbers everyone should know, shown here to the right. Know your basic building blocks (understand their implementation at a high level).
4.Designing and building infrastructure: important not to try to be all things to all people, don't build infrastructure just for its own sake: identify common needs and address them.
5.Design for growth, but don't design to scale infinitely: 5 to 50 times growth good to consider, 1000 times probably requires rethink and rewrite.
6.Single master, 1000's of servers. Master orchestrates global operation of the system, but client interaction with the master is fairly minimal. Often: hot standby of master. Simpler to reason about, but scales less (1000's of workers, not 100,000's).
7.Canary requests: odd requests sometimes crash server process. When sending same request to many servers, all the servers might crash. Therefore: first send the request to one server.
8.Tree distribution of requests, to avoid many outgoing RCP requests from one server.
9.Use backup requests to minimize latency. This avoids waiting on a few slow machines when request is sent to 1000's of machines.
10.Use multiple smaller units per machine, to minimize recovery time when a machine crashes, and to have fine-grained load balancing. See the many tablets per tablet server in BigTable.
Personal note: I found this a key insight in understanding how scalable stores or indexes work in contrast to say a more traditional partitioned RDBMS setup (see earlier blog). Besides BigTable/HBase, this idea is also applied in Elastic Search and Katta.
11.Range distribution of data, not hash. Allows users to reason about, and control, locality across keys.
12.Elastic systems. Avoid overcapacity and undercapacity. Design to shrink & grow capacity. Do something reasonable in case of overload, e.g. disable certain features (reduce size of index searched, disable spelling correction tip, ...)
13.One interface, multiple implementations. E.g. in search the combination of freshness & massive size is rather impossible, therefore partition into subproblems.
14.Add sufficient monitoring/status/debugging hooks.
He ends with some challenges for the future:
1.Adaptivity in world-wide systems. Challenge: automatic, dynamic world-wide placement of data & computation to minimize latency and/or cost.
2.Building applications on top of weakly consistent storage systems. Challenge: general model of consistency choice, explained and codified. Challenge: easy to use abstractions for resolving conflicting updates to multiple versions of a piece of state.
3.Distributed system abstractions. Cf. MapReduce: are there unifying abstractions for other kinds of distributed systems problems?
分享到:
相关推荐
计算机界神级人物、谷歌人工智能主管Jeff Dean发表了独自署名论文《The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design》,17页pdf论文,长文介绍了后摩尔定律时代的...
Jeff Dean本科学位论文,阐述了分布式ANN训练方案。对后续Tensor flow有参考。
Jeff Dean - Large Scale Deep Learning with TensorFlow
dean 大神 写的关于大规模服务系统的设计思路,来自于google实践,不得不看
2016 ScaledML会议演讲合辑:谷歌Jeff Dean讲解TensorFlow,微软陆奇解读FPGA(附PPT)
Jeff Dean是Google检索系统的架构师,在这个主题演讲中,Jeff讲述了Google在10年中,Google检索系统的演变和发展。本文档包含中英文,中文由银杏泰克有限公司 郝培强翻译。
Google Jeff-Dean-Large Scale Deep Learning 大规模深度学习
jeff dean 的讲稿,精华中的精华
谷歌大牛Jeff Dean经验分享,如何借助TensorFlow构建大规模智能深度学习系统。
Quote of Jeff Dean. Very easy easy to read and use.
Jeff Dean:智能计算机系统的大规模深度(中文版).pdf
Jeff Dean 2013年斯坦福大学技术讲座
这个是Google Brain大牛Jeff dean讲课视频的ppt,讲述了目前google目前在人工智能方面的研究和进展。视频地址 https://www.youtube.com/watch?v=HcStlHGpjN8&feature=youtu.be
发布于 终端版本可从cheat.sh/latencies获得 $ curl cheat.sh/latencies 如果您接受了拉取请求,很高兴推出新版本!
专知之前报道过Google AI Jeff Dean独自撰文的关于AI时代芯片的历史发展趋势-【Google Jeff Dean独自署名论文】深度学习革命及其对计算机架构和芯片设计的影响,讲述AI芯片发展历程与未来,但学术业界一直缺乏对当前...
Lessons Learned While Building Infrastructure Software at Google Jeff Dean
Notes on _Challenges in Building Large-Scale Information Retrieval Systems_ by google jeff dean
这是一个记录列表,用于记录日常学习和有关分布式系统和数据库的阅读。 Sanjay Ghemawat,Howard Gobioff,梁信德。 2003。Google。 谷歌。 杰弗里·迪恩(Jeffrey Dean),桑杰·格玛瓦(Sanjay Ghemawat)。 。 ...
题记:google 的成功除了一个个出色的创意外,还因为有 Jeff Dean 这样的软件架构天才。 ------ 编者 官方的 Google Reader blog 中有对BigTable 的解释。这是Google 内部开发的一个用来处理大数据量的系统。这种...