Building a concurrent system ultimately boils down to:
- Partitioning the data into chunks that can be separately acted upon
- Applying computations against those chunks to produce results
The smaller or more fine-grained the chunks, the more concurrent activity will be possible. In theory the closer one can get to one chunk per core the better but in reality it’s rare (a function of throughput and size of calculation) one needs to do computation across all chunks simultaneously such that a core can be assigned many chunks any one of which it will dispatch operations against at a moment in time.
There are many solutions for building concurrent systems but those that provide some abstraction which makes request routing easy to implement are likely to work best as it makes re-balancing of computation easier. One shouldn’t immediately assume that message passing is the answer as there are many ways to achieve routing (e.g. via DNS).
Any solution represents a transparency tradeoff. If for example routing is hidden inside of the solution, this can make it easy to get something up and running but we might find it difficult to transition from one box to a multi-box deployment. There are many tradeoffs to be made and for any case where control is given to the developer/architect it’s likely there will be libraries/frameworks to ease the initial implementation burden, programming languages alone will not be enough (Scala makes such a differentiation quite difficult given it’s language extension capabilities).
One aspect discussed less often is the difference between processing on a set of cores all in one box versus processing across a set of cores on many boxes. The latter brings the following challenges all related to the fallacies of distributed computing:
- Cores are more likely to become inaccessible
- The latency of an operation can become substantially more variable
- Any centralised functions (e.g. job scheduler or watchdogs) are more vulnerable to becoming isolated from the resources they manage such that processing ceases.
The latency factor is particularly challenging as few concurrent approaches make it sufficiently explicit that developers/architects are encouraged to be appropriately mindful.
Thus far, as has been the case throughout our history, the solutions are polarising into those that work within the confines of a single box and those that work across multiple boxes with the emphasis on the former. I fully expect developers and architects to fall into the old trap of using a single-box solution to solve a multi-box problem with all the associated issues. Of the solutions that work across multiple boxes, very few account fully for the impact of the network.
Comments Off
The act of design includes:
- Consideration of many possible technology options
- Examination and identification of constraints
- Thought about the pros and cons of using various patterns and styles
- Comparing various splits of role and responsibility
- Looking at various tradeoffs of complexity versus function
- Formulation of opinion on possible future directions of system growth
A large proportion of this information is lost when the design document is written, because the focus is typically on providing a (notionally) definitive view of how a system should be structured which might be in the form of a bunch of UML diagrams or merely a collection of Visio-type diagrams and explanation of what each of the boxes in the diagrams does. At the code-level there is almost no chance that any of this information will have been retained. Yet this information is of high value since it:
- is the explanation as to why a design is the way it is
- provides reviewers with a clearer view of what was and was not considered
- forms the basis for assessment of the maturity of a designer and can be used for coaching/mentoring
- can provide insight for those with less experience
- contains assumptions which if breached by changes in circumstance would dictate a re-design
- dictates to a large extent how suitable for purpose a design might be
Thus I believe It’s important to expose elements of the act of design via documentation alongside the design itself, conversations during the design work etc.
Comments Off
Jim Waldo writes:
My own conclusion is that system design is really a matter of technique, a way of thinking rather than a subject that can be taught in a particular course. It might be possible to build a program that teaches system design by putting students through a series of courses that hone their system design skills as they move through the subject matter of the courses. Such a series of courses would, in effect, be a formalized version of the apprenticeship that is now the way people acquire their system design technique…..
…..Even worse than not being visible to the customer, work done on designing the system is not visible to the management of the company that is developing the system. Even though managers will pay lip service to the teaching of The Mythical Man Month, there is still the worry that engineers who aren’t producing code are not doing anything useful. While there are few companies that explicitly measure productivity in lines-of-code per week, there is still pressure to produce something that can be seen. The notion that design can take weeks or months and that during that time little or no code will be written is hard to sell to managers. Harder still is selling the notion that any code that does get written will be thrown away, which often appears to be regression rather than progress.
In such an environment lip service often extends to technical strategy as well.
Comments Off
From On Designing and Deploying Internet-Scale Services (James Hamilton – Windows Live Services Platform):
We have long believed that 80% of operations issues originate in design and development, so this section on overall service design is the largest and most important. When systems fail, there is a natural tendency to look first to operations since that is where the problem actually took place. Most operations issues, however, either have their genesis in design and development or are best solved there. Throughout the sections that follow, a consensus emerges that firm separation of development, test, and operations isn’t the most effective approach in the services world. The trend we’ve seen when looking across many services is that low-cost administration correlates highly with how closely the development, test, and operations teams work together.
Yep, I’m a firm believer too….
1 Comment »