<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>olympum</title>
	<atom:link href="http://www.olympum.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.olympum.com</link>
	<description>random thoughts from a 0xCAFED00D</description>
	<lastBuildDate>Sun, 11 Nov 2012 13:48:05 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Convert Maibox from Mail.app to Microsoft Outlook 2011 in Lion</title>
		<link>http://www.olympum.com/architecture/convert-maibox-from-mail-app-to-microsoft-outlook-2011-in-lion/</link>
		<comments>http://www.olympum.com/architecture/convert-maibox-from-mail-app-to-microsoft-outlook-2011-in-lion/#comments</comments>
		<pubDate>Fri, 09 Nov 2012 07:45:14 +0000</pubDate>
		<dc:creator>Bruno Fernandez-Ruiz</dc:creator>
				<category><![CDATA[Architecture]]></category>

		<guid isPermaLink="false">http://www.olympum.com/?p=602</guid>
		<description><![CDATA[In Mac OS 10.7 (Lion) and 10.8 (Mountain Lion), it&#8217;s not currently possible to export mailboxes from Mail.app to something that Microsoft Outlook 2011 can import directly. After a bit of research, and plenty of forum messages seeing the frustration of folks, including <a href="http://support.microsoft.com/kb/2598783">Microsoft&#8217;s reluctance to fix Outlook 2011 for Mac</a> (instead they decided to just disable the feature, see in the release note &#8220;Import from Apple Mail is disabled in Outlook on Mac OS X 10.7 Lion&#8221;), I found the issue was due to the <code>FSTypeCode</code> not being set by Mail.app, which is actually really easy to fix.&#8230;]]></description>
				<content:encoded><![CDATA[<p>In Mac OS 10.7 (Lion) and 10.8 (Mountain Lion), it&#8217;s not currently possible to export mailboxes from Mail.app to something that Microsoft Outlook 2011 can import directly. After a bit of research, and plenty of forum messages seeing the frustration of folks, including <a href="http://support.microsoft.com/kb/2598783">Microsoft&#8217;s reluctance to fix Outlook 2011 for Mac</a> (instead they decided to just disable the feature, see in the release note &#8220;Import from Apple Mail is disabled in Outlook on Mac OS X 10.7 Lion&#8221;), I found the issue was due to the <code>FSTypeCode</code> not being set by Mail.app, which is actually really easy to fix.</p>

<p>First, in Mail.app, I export my mailbox. In my case, after I select my Archive mailbox, I get:</p>

<pre><code>$ ls -1
Inbox2007.mbox
</code></pre>

<p>From Lion onwards, the Apple mbox format is actually a folder that contains a few files inside:</p>

<pre><code>$ ls -1 Inbox2007.mbox
Info.plist
mbox
table_of_contents
</code></pre>

<p>Where <code>mbox</code> is the real mbox file we need to import into Outlook, but if we try now (Import -> Contacts or messages from a text file -> Import messages from an MBOX-format text file) we&#8217;ll see the mbox file in the Finder is greyed out. This is because the <code>FSTypeCode</code> is not set:</p>

<pre><code>$ mdls Inbox2007.mbox/mbox | grep 'FSTypeCode'
kMDItemFSTypeCode = ""
</code></pre>

<p>What we need to do is to change the code to <code>'TEXT'</code>. We can do this with a simple command (I used <code>find</code> since I exported several mboxes at once):</p>

<pre><code>$ find . -name '*.mbox' -exec SetFile -t 'TEXT' {}/mbox \; -print
./Inbox2007.mbox
</code></pre>

<p>If we now test for the <code>FSTypeCode</code>, we verify it&#8217;s correctly set as <code>'TEXT'</code>.</p>

<pre><code>$ mdls Inbox2007.mbox/mbox | grep 'FSTypeCode'
kMDItemFSTypeCode              = "TEXT"
</code></pre>

<p>We can now finally go back to Outlook and import the mbox file: Import -> Contacts or messages from a text file -> Import messages from an MBOX-format text file.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.olympum.com/architecture/convert-maibox-from-mail-app-to-microsoft-outlook-2011-in-lion/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Failing to Scale Out Push Web Services</title>
		<link>http://www.olympum.com/future/failing-to-scale-out-push-web-services/</link>
		<comments>http://www.olympum.com/future/failing-to-scale-out-push-web-services/#comments</comments>
		<pubDate>Thu, 23 Feb 2012 05:28:11 +0000</pubDate>
		<dc:creator>Bruno Fernandez-Ruiz</dc:creator>
				<category><![CDATA[Future]]></category>

		<guid isPermaLink="false">http://www.olympum.com/future/failing-to-scale-out-push-web-services/</guid>
		<description><![CDATA[<strong>Problem</strong>: <em>on the web, enable a large number of message producers send a
very large number of messages to a much larger number of message consumers</em>.
Example: allow 100,000 publishers send a total of 1 million messages per
second to 100 million concurrently connected consumers.

We are dealing with the problem of <em>connection channels</em>, an abstraction that
allows a producer distribute the message to many connected consumers. Our
challenge is to design a distributed channel delivery mechanism that can scale
out to millions of connected consumers.&#8230;]]></description>
				<content:encoded><![CDATA[<p><strong>Problem</strong>: <em>on the web, enable a large number of message producers send a
very large number of messages to a much larger number of message consumers</em>.
Example: allow 100,000 publishers send a total of 1 million messages per
second to 100 million concurrently connected consumers.</p>

<p>We are dealing with the problem of <em>connection channels</em>, an abstraction that
allows a producer distribute the message to many connected consumers. Our
challenge is to design a distributed channel delivery mechanism that can scale
out to millions of connected consumers. Throughout, our assumption is that
this is a stateless delivery system, i.e. messages are either delivered or
dropped and no persistence guarantees exists; if a consumer is not connected,
it will miss the message.</p>

<p>The naïve approach is to perform <strong>consistent hashing by channel</strong>. In this
model, each channel and all its consumers are in the same server. Since the
channel identifier is part of the URI, the load balancer can effectively
perform this operation, and we can add servers as required without requiring
re-balancing. When we have many channels per server, the distribution is
eventually uniform. Problems arise however as some channels have an order of
magnitude more consumers than other channels. There is also a problem if a
channel has more consumers than a server can sustain.</p>

<p>To solve the limitations of hashing by channel, we can instead perform
<strong>consistent hashing by channel and connection (consumer)</strong>. In this model,
each consumer is consistently assigned to a pool of servers and we can add
servers without having to re-balance consumers among servers. The channel
stores a list of all the consumer identifiers and channels are consistently
hashed across servers. To deliver a message, the load balancer will find the
server holding the channel, and dispatch the request. The channel will lookup
the list of consumer identifiers and again apply the consistent hashing
algorithm to reach all the consumers.</p>

<p>Although the hashing by channel and connection is conceptually simple, it
presents significant operability challenges. First, the loss of the server
holding the channel metadata and list of connected consumers will require a
watchdog cleaning up all the stale consumer connections. Second, as consumers
join in and disappear, the channel server would need to maintain a consistent
view of the list of consumers by the means of locks, with the incurred
performance degradation of very large number of consumers. Third, as more
consumers connect uniformly across the nodes, the more chattiness that will
occur. At some point, all nodes will have consumer connections for a given
channel. In order to to fulfill every operation, we must issue <code>N</code> requests to
all nodes, where <code>N</code> is the number of nodes in the cluster. For the cluster to
be able to process and deliver <code>M</code> messages, every node must be capable of
processing <code>N*M</code> messages. This design will be limited in the number of
connections it can hold, because of the centralized channel-consumer tracking
problem, and will also only scale to the maximum request processing capacity
of an individual node.</p>

<p>We can solve some of the operability challenges by removing the channel
management of consumer connections, and instead of keeping a list, keeping the
visibility of the peer nodes. Here the thinking is that since we will
asymptotically reach the point where all nodes hold consumer connections for a
given channel, all we really need to do is keep a list of all nodes in the
cluster. Some centralized agent keeps a directory of all active peers holding
consumer connections, e.g. a Zookeeper ensemble.</p>

<p><img src="/wordpress/wp-content/uploads/2012/02/design.png" alt="Ring-based Cluster" /></p>

<p>Consumers get uniformly connected to the nodes in the cluster by a &#8220;good&#8221;
load-balancing scheme. Since any node can hold connections to consumers on any
channel, there is therefore no snapshot of a channel&#8217;s consumers, and to be
able to identify all consumers connected to a channel it is necessary to
interrogate all nodes in the cluster. Whereas this design improves the
previous ones in that it allows scaling to an infinite number of connections,
it will still only scale to the message processing throughput of an individual
node.</p>

<p>A popular alternative to the directory of nodes is the tree of nodes. In this
model, we start with a single node. As we reach the maximum number of
connections the node can hold, we add two new nodes. The original node still
accepts messages from publishers, but brokers the delivery to the two new
nodes. As those nodes themselves become saturated, we add a new layer of four
nodes. And so forth. This approach has the same limitation as the one using a
directory of nodes, i.e. the maximum throughput is bound to that of the
individual node.</p>

<p>We&#8217;ve seen how to hold the connections to an infinite number of consumers, but
not how to deliver an infinite number of messages. These solutions scale very
well to tens of thousands of messages and millions of active connected
consumers, but have an upper limit. For most producers out there, that upper
limit is probably high enough to be fine. But such limit exists in push-based
systems.</p>

<p>Both the messaging literature and the messaging praxis have historically
preferred using pull-based models rather than push-based ones. In a pull
model, consumers come back to the broker to fetch messages, at each consumer&#8217;s
own rate, and the problem is therefore no longer dispatching across millions
of connections. Pull-based messaging systems chose to store the messages until
consumers come back to fetch them. In fact, the only scalable messaging system
to millions of messages and millions of consumers that we know of uses store-
and-forward: SMTP.</p>

<p>As much as I may think that the techniques that enable web push-models such as
HTTP streaming, long-poll and WebSockets are genuinely useful to solve point
problems, they are not techniques we can use to implement Internet-scale push-
based web services, as they are fundamentally based on a PubSub model. The
scalability of PubSub under high load remains an unresolved research question
and as such is not a paradigm we should apply at Internet-scale.</p>

<p>In fact, I am now <em>almost</em> convinced that we&#8217;ve been looking at this in the
wrong way, and that the right solution to this problem is a store-and-forward
solution, where web consumers connect at their own rate to fetch messages and
intermediaries throttle concurrent connection rates in order to achieve linear
scalability. Essentially, this is a web of <em>partially connected store-and-
forward almost real-time async data peers</em>. And that&#8217;s a mouthful, but a really
exciting one.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.olympum.com/future/failing-to-scale-out-push-web-services/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
		<item>
		<title>The NodeJS Innovation Advantage</title>
		<link>http://www.olympum.com/architecture/the-nodejs-innovation-advantage/</link>
		<comments>http://www.olympum.com/architecture/the-nodejs-innovation-advantage/#comments</comments>
		<pubDate>Sat, 28 Jan 2012 21:40:27 +0000</pubDate>
		<dc:creator>Bruno Fernandez-Ruiz</dc:creator>
				<category><![CDATA[Architecture]]></category>

		<guid isPermaLink="false">http://www.olympum.com/?p=363</guid>
		<description><![CDATA[Thesis: <em>&#8220;when building large scale distributed systems, high performance functional programming languages provide the quickest turnaround from idea to concept; however such advantage disappears as we move from concept to production, and the overall time from idea to production across all programming languages is of the same order of magnitude&#8221;</em>.

<a href="http://www.olympum.com/wordpress/wp-content/uploads/2012/01/nodejs_innov_advantage.png"><img src="http://www.olympum.com/wordpress/wp-content/uploads/2012/01/nodejs_innov_advantage.png" alt="The NodeJS innovation advantage" title="nodejs_innov_advantage" width="510" height="366" class="aligncenter size-full wp-image-364" /></a>

I posted this diagram, without justification, yesterday evening, in an attempt to gauge the reactions of the community, in <a href="http://twitter.com/olympum">twitter</a>. Thank you to all of you that commented. With the experiment done, let me now provide my thesis and hopefully address most of the feedback so far.&#8230;]]></description>
				<content:encoded><![CDATA[<p>Thesis: <em>&#8220;when building large scale distributed systems, high performance functional programming languages provide the quickest turnaround from idea to concept; however such advantage disappears as we move from concept to production, and the overall time from idea to production across all programming languages is of the same order of magnitude&#8221;</em>.</p>

<p><a href="http://www.olympum.com/wordpress/wp-content/uploads/2012/01/nodejs_innov_advantage.png"><img src="http://www.olympum.com/wordpress/wp-content/uploads/2012/01/nodejs_innov_advantage.png" alt="The NodeJS innovation advantage" title="nodejs_innov_advantage" width="510" height="366" class="aligncenter size-full wp-image-364" /></a></p>

<p>I posted this diagram, without justification, yesterday evening, in an attempt to gauge the reactions of the community, in <a href="http://twitter.com/olympum">twitter</a>. Thank you to all of you that commented. With the experiment done, let me now provide my thesis and hopefully address most of the feedback so far. I intend to make this post fluid and keep updating as the conversation evolves.</p>

<p>Now, let me qualify the thesis. First, this is a thesis applicable to multiple domains, whether that&#8217;s a low-level network switch appliance, a http proxy gateway or a web application. As a consequence, this is not a framework comparison, it&#8217;s not about Rails vs Django vs Play vs Express vs &#8230;. This is not an argument about dynamic vs statically typed programming languages, although that definitely plays a partial role in the thesis.</p>

<p>Mine is a thesis about the complete <em>team</em> productivity by programming language and across the full product life-cycle from inception to retirement. The thesis is purely based on my observations throughout the years. There is not data to back it up, except anecdotal evidence. In summary, mine is a qualitative statement, not a quantitative one.</p>

<p>If the thesis were true, this would mean that teams using high-performance and functional programming languages can iterate more quickly from idea to concept. Such quick turnaround allows a constant validation of ideas in the code. Because it is quickly possible to see the idea running, we can afford to have more ideas. The quicker we are able to iterate between idea and concept, the more we are innovating.</p>

<p>Such quick turnaround has a real trade-off and a false trade-off.</p>

<p>Moving these concepts to production is difficult. Whether it&#8217;s Scheme, or Lisp, or Clojure, or JavaScript, it becomes clear that we are &#8220;naked&#8221;. Except in &#8220;safe but useless&#8221; languages, such as Haskell, in most &#8220;useful but unsafe&#8221; programming languages, the developer has to compensate for the dynamic nature of the language. The developer has to provide the <em>strictness</em>, and to a degree perform the job of the compiler. In statically-typed and imperative programming languages, the compile infrastructure helps the developer ensure the correctness of the code. In dynamically-typed and functional programming languages &#8211;by the way, yes these are orthogonal concepts but usually associated&#8211;, the developer needs to structure code and provide the necessary checks, assertions, unit tests, regressions tests and modular design. The addition of type hinting and type inference does not change this, it only shifts effort between &#8220;before concept&#8221; and &#8220;after concept&#8221;.</p>

<p>High-performance functional programming languages trade strictness for innovation potential. Without the additional investment in checks and tests, these systems are not <em>predictable</em>. The time we gained upfront, we pay later for. I&#8217;d argue such trade-off is good, and allows quick iteration to find the right idea.</p>

<p>The second trade-off is about tech debt. The time to production is identical across languages, but because the developer in high-performance functional programming languages had to compensate for the lack of compiler infrastructure and runtime guardrails by adding completeness of tests and checks, the system is easier to maintain and has overall a better architecture. This is almost counter intuitive and I think most people don&#8217;t think this way because they move too quickly from concept to production, and don&#8217;t implement the necessary strictness. There is a tendency in the developer to move quickly from concept to production, and therefore accruing overall technical debt.</p>

<p>Finally, I believe server-side JavaScript and Google V8 provide such high-performance functional programming language, in the skin of an imperative one, and enable this quick turnaround from idea to concept. This possibly one of the key reasons we are seeing such explosion in adoption. I only hope developers start to realize the trade-off they are doing, and that in order to avoid accrual of technical debt, they better invest in checks and tests before they rush into production.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.olympum.com/architecture/the-nodejs-innovation-advantage/feed/</wfw:commentRss>
		<slash:comments>45</slash:comments>
		</item>
		<item>
		<title>A Home Backup Strategy</title>
		<link>http://www.olympum.com/future/a-home-backup-strategy/</link>
		<comments>http://www.olympum.com/future/a-home-backup-strategy/#comments</comments>
		<pubDate>Fri, 09 Dec 2011 06:07:53 +0000</pubDate>
		<dc:creator>Bruno Fernandez-Ruiz</dc:creator>
				<category><![CDATA[Future]]></category>

		<guid isPermaLink="false">http://www.olympum.com/future/a-home-backup-strategy/</guid>
		<description><![CDATA[For years I&#8217;ve been continuously fighting with backups. I have not been
particularly good or consistent at it. We&#8217;ve been okay with Time Machine and
Carbon Copy Cloner, but the recent addition of a digital SLR to our gadget
collection has meant running out of space on our shared home drive.

Historically I&#8217;d been using a WD My Book 2&#215;1 TB RAID1 array connected to an
Apple Airport Extreme and shared via AirDisk (afp). We had Time Machine backup
all the computers in the house in the AirDisk.&#8230;]]></description>
				<content:encoded><![CDATA[<p>For years I&#8217;ve been continuously fighting with backups. I have not been
particularly good or consistent at it. We&#8217;ve been okay with Time Machine and
Carbon Copy Cloner, but the recent addition of a digital SLR to our gadget
collection has meant running out of space on our shared home drive.</p>

<p>Historically I&#8217;d been using a WD My Book 2&#215;1 TB RAID1 array connected to an
Apple Airport Extreme and shared via AirDisk (afp). We had Time Machine backup
all the computers in the house in the AirDisk. We&#8217;d also store our photos and
videos, etc. in the AirDisk. Aside running out of space, we had no good
offsite strategy. Given that our digital picture collection is continuously
becoming more and more valuable as a family history artifact, I though it was
time to address this properly, rather than just adding more space.</p>

<p>I started by classifying the risk associated with each data type:</p>

<ul>
<li><strong>Personal files</strong>: document scans, photos, videos, etc. This is the
critical part of the equation. In case of fire or theft we&#8217;d have no
replacement and the loss would be HUGE.</li>
<li><strong>Shared Media</strong>: movies, music, etc. These are regrettable if lost, but can
be restore. I know how to find them back, even though it might be painful or
costly.</li>
<li><strong>Backups</strong>: the time machine backups from our home computers. Since we
have the laptops plus the backup, I was not too worried about losing this.</li>
</ul>

<p>Since no all data is born equal, the required reliability levels vary. For
personal files, I want to be covered against hardware failures. For the other
kinds, I don&#8217;t mind that much.</p>

<p>Additionally, for videos, I want to have the ability to connect the storage
unit directly to my computer while editing (my experience editing over the
network has not been great).</p>

<p>Finally, I really wanted an offsite backup for our personal data.</p>

<p>Given these constrains, I struggled to find a good (cost-effective)
combination, and since I think I&#8217;ve found something that works well for me, I
thought I&#8217;d share with others just in case this helps:</p>

<ul>
<li>Airport Extreme Station 802.11n 2nd Generation (100/1000) (any 1 Gbps switch
would do).</li>
<li>Buffalo LinkStation Pro Duo 2&#215;3 TB (NAS), connected to AES via 1 Gbps
ethernet. Setup as RAID1 and exposing two afp shares (Personal, Media) and a
Time Machine &#8216;Backup&#8217; share that each computer uses to create a separate
sparsebundle.</li>
<li>2x WD My Book 2&#215;1 TB on RAID0, mounted &#8216;usbdisk&#8217;, connected via USB to
LinkStation and used to backup incrementally every night from the NAS
internal disks. Every week I rotate the USB WD drive units between office
and home.</li>
<li>If I want to edit video, I unmount the USB drive and connect via Firewire to
my computer for doing video edits. When I mount it back, I manually copy the
edits back to the NAS internal drives. This is the only manual step in the
process.</li>
<li>The media files are exposed through the embedded server in the LinkStation
via UPnP and iTunes Server. This allows me to use Boxee on any computer (and
a patched Apple TV 2nd gen) to watch our videos. We also share iTunes music
this way without having to share libraries.</li>
</ul>

<p>I ended up putting all the data under RAID1, but I think the cost is low since
we mostly only read from the array (except the backups). With this setup:</p>

<ul>
<li>We have Time Machine for all computers in the house, happening over the
network and transparently.</li>
<li>We can access all data both via afp mounts as well as directly via USB/FW
for intensive reads / edits.</li>
<li>We have an offsite copy.</li>
</ul>

<p>Overall, I am happy with the setup. I was initially worried about using a
home-grade NAS, but even though the LS is slow for writes, it&#8217;s actually fast
for reads, so I am happy.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.olympum.com/future/a-home-backup-strategy/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The Praxis of Event Loops</title>
		<link>http://www.olympum.com/future/the-praxis-of-event-loops/</link>
		<comments>http://www.olympum.com/future/the-praxis-of-event-loops/#comments</comments>
		<pubDate>Sat, 15 Oct 2011 15:27:47 +0000</pubDate>
		<dc:creator>Bruno Fernandez-Ruiz</dc:creator>
				<category><![CDATA[Future]]></category>

		<guid isPermaLink="false">http://www.olympum.com/future/the-praxis-of-event-loops/</guid>
		<description><![CDATA[On a theoretical world, given the ability for a processor to run an infinite
amount of threads, we could prove the following statements (no attribution
purposely given):

<ul>
<li>If you do more CPU than I/O, use threads.</li>
<li>If you do more I/O than CPU, use more threads.</li>
</ul>

which would allow us to conclude with the following corollary:

<blockquote>
  at full utilization, threads and events have the same theoretical
  throughput.
</blockquote>

Such argument ignores <strong>praxis</strong> &#8212; it is a purely <strong>theoretical</strong> debate
disconnected from the reality of scaling services &#8211;.&#8230;]]></description>
				<content:encoded><![CDATA[<p>On a theoretical world, given the ability for a processor to run an infinite
amount of threads, we could prove the following statements (no attribution
purposely given):</p>

<ul>
<li>If you do more CPU than I/O, use threads.</li>
<li>If you do more I/O than CPU, use more threads.</li>
</ul>

<p>which would allow us to conclude with the following corollary:</p>

<blockquote>
  <p>at full utilization, threads and events have the same theoretical
  throughput.</p>
</blockquote>

<p>Such argument ignores <strong>praxis</strong> &#8212; it is a purely <strong>theoretical</strong> debate
disconnected from the reality of scaling services &#8211;.</p>

<p>Yahoo! serves over 20 billion daily requests through it&#8217;s edge services
(remote proxies and caches throughout the world). These intermediate servers
are doing pure IO workloads, handling slow client IO and handing connections
off to the origin servers through Yahoo&#8217;s pipes. It is critical that we
minimize the CPU cost per connection to be able to max the CPU at the max
number of connections per host.</p>

<p>The hosts on Yahoo!s edge network run exclusively event loops, and have been
doing so for over a decade, originally with Inktomi Traffic Server then with
Yahoo Traffic Server, and now with Apache Traffic Server, etc. The design
throughout is the same: a few &#8220;master&#8221; event loop threads, usually one per
core, and a small pool of worker threads. In total, a handful of 20~50 threads
per server. With this design, Yahoo! is able to scale to hundreds of thousands
of connections per server. It is currently still impossible, <em>in practice</em>, to
run a server with so many threads and still serve data.</p>

<p>Another practical need for event loops occurs at the other end of the serving
stack. Resolving a search query follows a general pattern of parsing and
rewriting the query, followed by fetching potential search results, and
finally doing document re-ranking. The first and last phases are CPU
intensive. The fetch operation is purely an IO workload that performs a
scatter-gather operation which fans out to hundreds to thousands of back
servers holding the search index across tens of columns. As a consequence, for
every client connection, it&#8217;s possible to require one thousand upstream
connections. When the upstream index servers become slow, which is a common
failure situation, or perhaps in scenarios where we have to fetch data from a
remote data center, the number of connections in the system grows to tens of
thousands. It is also important that we keep all three phases running on the
same process to avoid serialization and transfer costs, essentially forcing us
to mix CPU and IO intensive workload. It is currently still impossible, <em>in
practice</em>, to perform this type of data intensive processing without using
event loops.</p>

<p>Unlike Yahoo&#8217;s services, which combine an event loop with a handful of threads
per core, Node.JS design is a single-threaded event-loop per core. For pure IO
workloads this ensures the necessary simplicity required to be able to design
software that scales to thousands of concurrent active connections, as long as
nothing is blocking. I find it unfortunate that some developers have not
internalized this and are trying to run CPU intensive applications using
Node.JS. Inferring that because these badly designed applications are a
failure, therefore Node.JS is a failure is an unnecessary and unfair
generalization. Node.JS has a field of <em>practical</em> applicability, and like any
tool, a seasoned practitioner should know when, and when not, to use it.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.olympum.com/future/the-praxis-of-event-loops/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Markdown, An Open Document Workflow</title>
		<link>http://www.olympum.com/future/markdown-an-open-document-workflow/</link>
		<comments>http://www.olympum.com/future/markdown-an-open-document-workflow/#comments</comments>
		<pubDate>Sat, 15 Oct 2011 14:27:55 +0000</pubDate>
		<dc:creator>Bruno Fernandez-Ruiz</dc:creator>
				<category><![CDATA[Future]]></category>

		<guid isPermaLink="false">http://www.olympum.com/future/markdown-an-open-document-workflow/</guid>
		<description><![CDATA[I&#8217;ve been using <a href="http://daringfireball.net/projects/markdown/">Markdown</a> since
2006, taking all my notes at work using a simple text editor and using the
Markdown text markup format. I also use Markdown for writing down thoughts and
posting to this blog. I rarely, if ever, use Word or even TeX/LaTeX. I treat
markdown as my source format and I generate all my target formats using
<a href="http://fletcherpenney.net/multimarkdown/">multimarkdown</a>: PDF, HTML, RTF,
etc.

I am happy with this process. I use a standard editor I find in any operating
system I happen to work with.&#8230;]]></description>
				<content:encoded><![CDATA[<p>I&#8217;ve been using <a href="http://daringfireball.net/projects/markdown/">Markdown</a> since
2006, taking all my notes at work using a simple text editor and using the
Markdown text markup format. I also use Markdown for writing down thoughts and
posting to this blog. I rarely, if ever, use Word or even TeX/LaTeX. I treat
markdown as my source format and I generate all my target formats using
<a href="http://fletcherpenney.net/multimarkdown/">multimarkdown</a>: PDF, HTML, RTF,
etc.</p>

<p>I am happy with this process. I use a standard editor I find in any operating
system I happen to work with. I use a standard document format I know I will
be able to open in years to come: plain text. I use source control to version
my text files and synchronize between computers. This works everywhere I have
git and I am not locked-in into any particular cloud or tool or format or &#8230;</p>

<p>As an editor, I use <a href="http://jblevins.org/projects/markdown-mode/">emacs markdown
mode</a> and it does me well. And
it&#8217;s not about <a href="http://www.gnu.org/s/emacs/">emacs</a>: you may use
<a href="http://www.vim.org/">vi</a>, or <a href="http://macromates.com/">TextMate</a>, or whatever.
The point is that your favorite text editor, whichever it is, is probably good
enough. I have also recently started using <a href="http://markedapp.com/">Marked</a> as
a convenience for previewing the transformed markdown output without having to
continuously switch back to the browser or do <code>C-c C-c p</code> all the time.</p>

<p>I also use <a href="http://jekyllrb.com/">Jekyll</a> to transform some of my markdown
text files onto a static site where <a href="http://git-scm.com/">git</a> is the glue
here. It works everywhere I have git.</p>

<p>Recently I have had a need to bring my markdown files to the iPhone and the
iPad. And although one may find some specialized git iOS clients for things
like <a href="https://github.com/">github</a>, there is no general git client for iOS,
integrated with a text viewer, as far as I know.</p>

<p>Given the lack of shared file system in iOS, whichever app I use must have
both text editor, ideally with support for markdown preview, as well as sync
capabilities. <a href="http://itunes.apple.com/us/app/id396073482?mt=8">Nocs</a> is
exactly that. It uses <a href="http://db.tt/4rTs9QST">Dropbox</a> to sync your files, and
it has an embedded text editor with markdowns support. It fits perfectly into
my workflow. I continue to use emacs on the laptop, and Nocs on the iPad.
Dropbox syncs between laptop and tablet, and git between computers.</p>

<p>I am sure there are ways I could simplify the flow. But for now, this meets my
requirements.</p>

<p>Finally, I am not keen at all to use tools like
<a href="http://www.evernote.com/">Evernote</a>, since as far as I am concerned I am
losing both freedom and the future-proof aspects of plain text. I have
literally thousands of text notes accumulated over the years, and I have the
reassurance that I will always be able to access and edit my notes in years to
come. I don&#8217;t want to risk putting my documents onto a proprietary platform.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.olympum.com/future/markdown-an-open-document-workflow/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Railroad Diagrams from EBNF</title>
		<link>http://www.olympum.com/future/railroad-diagrams-from-ebnf/</link>
		<comments>http://www.olympum.com/future/railroad-diagrams-from-ebnf/#comments</comments>
		<pubDate>Mon, 10 Oct 2011 21:04:33 +0000</pubDate>
		<dc:creator>Bruno Fernandez-Ruiz</dc:creator>
				<category><![CDATA[Future]]></category>

		<guid isPermaLink="false">http://www.olympum.com/future/railroad-diagrams-from-ebnf/</guid>
		<description><![CDATA[I am playing with a new query language. I am defining the grammar as EBNF, but
I want to show railroad diagrams for those readers that are more graphical and
less familiar with BNF. I&#8217;ve found limited support for generating syntax
diagrams from EBNF. I&#8217;ve found a few tools, some working better than others:

<ul>
<li><a href="http://www.informatik.uni-freiburg.de/~thiemann/haskell/ebnf2ps/">Ebnf2ps</a>
(Haskell). This is the only tool I have not been able to get to work. I seem
to be missing AFM fonts in my TeX installation and I am not sure I want to
spend time figuring out how to generate the AFM files.</li>&#8230;</ul>]]></description>
				<content:encoded><![CDATA[<p>I am playing with a new query language. I am defining the grammar as EBNF, but
I want to show railroad diagrams for those readers that are more graphical and
less familiar with BNF. I&#8217;ve found limited support for generating syntax
diagrams from EBNF. I&#8217;ve found a few tools, some working better than others:</p>

<ul>
<li><a href="http://www.informatik.uni-freiburg.de/~thiemann/haskell/ebnf2ps/">Ebnf2ps</a>
(Haskell). This is the only tool I have not been able to get to work. I seem
to be missing AFM fonts in my TeX installation and I am not sure I want to
spend time figuring out how to generate the AFM files.</li>
<li>SQLite <a href="http://www.sqlite.org/docsrc/doc/tip/art/syntax/bubble-generator.tcl?mimetype=text/plain">bubble generator</a>
(Tk/Tcl). Strictly this tool does not consume EBNF grammars, but a custom
DSL. If I didn&#8217;t care about EBNF, this would be the best tool</li>
<li><a href="https://github.com/featurist/node-ebnf-diagram">node-ebnf-diagram</a>
(Javascript). Although it works, I have to issues with it. One is that it
can only generate PNG files. That would not be too bad if it weren&#8217;t for the
second issue: the tool does not automatically resize the canvas, and it
requires explicit width and height input. If I don&#8217;t find anything else,
I&#8217;ll probably end up using it.</li>
<li><a href="http://www.emacswiki.org/emacs/EbnfToPsPackage">ebnf2ps.el</a> (Emacs Lisp).
It works as advertised. The only issue I found is that the diagrams
generated have a small white gap on the lines on the right hand side.</li>
<li><a href="http://www.antlr.org/download.html">ANTLRWorks</a> (Java). Bundled with antlr,
it fits the task. Once you are in the game of defining the grammar in Java,
why not just go ahead and use the same tool to generate not only the parser
but the diagrams? This is what the tool does. Even if you are not doing a
Java parser/lexer, this is a good tool to use for documentation purposes.</li>
</ul>

<p>I am using ANTLRWorks, generating all the diagrams from the command line as part of my markdown transform pipeline:</p>

<pre><code>java -cp antlrworks-1.1.4.jar org.antlr.works.Console -f yql.g -o output/ -sd eps
</code></pre>

<p>It works very well.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.olympum.com/future/railroad-diagrams-from-ebnf/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ahead with Node.JS and Google V8</title>
		<link>http://www.olympum.com/architecture/ahead-with-node-js-and-google-v8/</link>
		<comments>http://www.olympum.com/architecture/ahead-with-node-js-and-google-v8/#comments</comments>
		<pubDate>Sun, 11 Sep 2011 15:34:12 +0000</pubDate>
		<dc:creator>Bruno Fernandez-Ruiz</dc:creator>
				<category><![CDATA[Architecture]]></category>

		<guid isPermaLink="false">http://www.olympum.com/?p=347</guid>
		<description><![CDATA[It has been 10 months since I posted about Google V8. But somebody re-started
<a href="http://news.ycombinator.com/item?id=2982684">a thread again on Hacker News</a>
about <a href="http://www.olympum.com/future/answering-jason-on-v8-governance-and-impact-to-nodejs/">my old blog
post</a>.
So now I am compelled to briefly say where we are at.

We have continued and extended our investment in Node.JS. I can tell you that
what the teams are doing is transformative and pure awesomeness. Unfortunately
this is as much as I can tell you right now, but really soon you&#8217;ll start
hearing what we&#8217;ve done.&#8230;]]></description>
				<content:encoded><![CDATA[<p>It has been 10 months since I posted about Google V8. But somebody re-started
<a href="http://news.ycombinator.com/item?id=2982684">a thread again on Hacker News</a>
about <a href="http://www.olympum.com/future/answering-jason-on-v8-governance-and-impact-to-nodejs/">my old blog
post</a>.
So now I am compelled to briefly say where we are at.</p>

<p>We have continued and extended our investment in Node.JS. I can tell you that
what the teams are doing is transformative and pure awesomeness. Unfortunately
this is as much as I can tell you right now, but really soon you&#8217;ll start
hearing what we&#8217;ve done. In every single demo, internal and external, we have
done of the technology, the feedback has been fantastic. I am very happy and
fortunate to have a team of super-stars working on this.</p>

<p>As per my concern on being locked into Google V8 and not being able to support
the software, things have changed since I last blogged. First, Google has been
very, very, supportive addressing V8 bugs whenever they existed.</p>

<p>Secondly, here at Yahoo! we donated some code to the fine folks at Mozilla to
<a href="https://github.com/bfrancojr/v8monkey">wrap the Spidermonkey API with the V8
API</a>. Mozilla went on to create a full
implementation of <a href="http://blog.zpao.com/post/4620873765/about-that-hybrid-v8monkey-engine">Node.JS on
Spidermonkay</a>
and provide the architectural re-assurance we needed. I am still surprised
however from the discussion on the blogosphere and twitterspace months ago
about how many developers don&#8217;t seem to pay attention to open source
governance. When you have the responsibility for a multi-million dollar
technology investment, you want to make sure you cover all your bases.</p>

<p>Finally, we have partnered closely with the fine folks at Joyent and are
working through the final stages of negotiation for how we&#8217;ll work together
going forward.</p>

<p>Let me say that I am really excited about what the technology innovation that
Node.JS is bringing to the industry, whether that is to create a new
generation of I/O intensive servers, or to bring Javascript to the
server-side. I see Node.JS&#8217; future is bright.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.olympum.com/architecture/ahead-with-node-js-and-google-v8/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Answering Jason on V8 governance and impact to NodeJS</title>
		<link>http://www.olympum.com/future/answering-jason-on-v8-governance-and-impact-to-nodejs/</link>
		<comments>http://www.olympum.com/future/answering-jason-on-v8-governance-and-impact-to-nodejs/#comments</comments>
		<pubDate>Sun, 06 Feb 2011 05:30:44 +0000</pubDate>
		<dc:creator>Bruno Fernandez-Ruiz</dc:creator>
				<category><![CDATA[Future]]></category>

		<guid isPermaLink="false">http://www.olympum.com/future/answering-jason-on-v8-governance-and-impact-to-nodejs/</guid>
		<description><![CDATA[<strong>Update (2011/9/11)</strong>: this post is picking up again 8 months later, I&#8217;ve <a href="http://www.olympum.com/architecture/ahead-with-node-js-and-google-v8/">written an update</a> as of where we stand.

<hr />

Jason Hoffman (Chief Scientist, Founder at Joyent) has posted some <a href="http://joyeur.com/2011/02/05/on-brunos-concern-about-the-current-coupling-of-node-js-and-v8/">good
questions to
me</a>,
based on my original <a href="http://www.olympum.com/future/nodejs-to-v8-or-not-to-v8/">nodejs and
V8</a> post. Let me
summarise Jason&#8217;s questions and comments into three key messages:

<blockquote>
  It&#8217;s Joyent&#8217;s responsibility that NodeJS runs well period. We&#8217;re not
  afraid of a language VM. [...] actual node.js committers (who all work
  at Joyent) know quite a bit and have pretty good relations with the V8
  team.&#8230;</blockquote>]]></description>
				<content:encoded><![CDATA[<p><strong>Update (2011/9/11)</strong>: this post is picking up again 8 months later, I&#8217;ve <a href="http://www.olympum.com/architecture/ahead-with-node-js-and-google-v8/">written an update</a> as of where we stand.</p>

<hr/>

<p>Jason Hoffman (Chief Scientist, Founder at Joyent) has posted some <a href="http://joyeur.com/2011/02/05/on-brunos-concern-about-the-current-coupling-of-node-js-and-v8/">good
questions to
me</a>,
based on my original <a href="http://www.olympum.com/future/nodejs-to-v8-or-not-to-v8/">nodejs and
V8</a> post. Let me
summarise Jason&#8217;s questions and comments into three key messages:</p>

<blockquote>
  <p>It&#8217;s Joyent&#8217;s responsibility that NodeJS runs well period. We&#8217;re not
  afraid of a language VM. [...] actual node.js committers (who all work
  at Joyent) know quite a bit and have pretty good relations with the V8
  team.</p>
</blockquote>

<p>That&#8217;s a very honourable goal and perfectly within Joyent&#8217;s capabilities and track record. I am <em>not</em> debating that.</p>

<blockquote>
  <p>If there are actual technical problems with V8′s reliability or ???
  and these affect the use of node.js in production then I’d like to
  see details.</p>
</blockquote>

<p>The discussion is not really whether I have technical production problems or
not with V8 (within NodeJS); all software has bugs. My issue is about project
governance. Let&#8217;s say such a problem affecting reliability exists and that
Joyent fixes it by developing a critical patch for V8, a patch that the
upstream maintainer does not deem it necessary to merge. For how long would
Joyent maintain and test such patches?</p>

<p>Another aspect of governance: intellectual property rights.</p>

<p>My point: it&#8217;s not whether Google&#8217;s V8 project is open or not, it&#8217;s that since
governance of V8 may become a problem <em>in the future</em>, it requires a solution
<em>now</em> so that large organisations can put their full weight behind NodeJS.
Joyent&#8217;s weight behind NodeJS is necessary, but not sufficient. I prefer
addressing the problems in the code.</p>

<blockquote>
  <p>node.js is only implemented on V8 but that’s only because we’re
  going to focus on making node.js awesome first. [...] After all, we
  are still working on getting it to 1.0</p>
</blockquote>

<p>Open source software is commonly designed to make it easy to replace parts or
components that may be at risk of being legally tainted or patent encumbered
(e.g. mono&#8217;s architecture as an example of isolation). I would put V8 in the
same bucket of &#8220;risky parts&#8221;, although for different reasons. Given the lack
of definition on Google&#8217;s V8 project governance, and the current tight
coupling between node and V8, I wonder how realistic it will be to implement a
different VM post-1.0 if we haven&#8217;t got the right layer of abstraction
pre-1.0. Or perhaps the problem (the reassurance of governance) can be
addressed by working under the umbrella of a foundation that ensures such
governance.</p>

<p>Let me end by saying that I am still as committed as ever to NodeJS and that I
really believe in Joyent&#8217;s engineering strength to make NodeJS 1.0 a reality.
Joyent is putting their money where their mouth is. We know they are working
hard making node <em>awesome</em>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.olympum.com/future/answering-jason-on-v8-governance-and-impact-to-nodejs/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>NodeJS: To V8 or not to V8</title>
		<link>http://www.olympum.com/future/nodejs-to-v8-or-not-to-v8/</link>
		<comments>http://www.olympum.com/future/nodejs-to-v8-or-not-to-v8/#comments</comments>
		<pubDate>Sun, 06 Feb 2011 00:05:01 +0000</pubDate>
		<dc:creator>Bruno Fernandez-Ruiz</dc:creator>
				<category><![CDATA[Future]]></category>

		<guid isPermaLink="false">http://www.olympum.com/future/nodejs-to-v8-or-not-to-v8/</guid>
		<description><![CDATA[<strong>Update:</strong> Jason Hoffman (Chief Scientist, Founder of Joyent) has written a <a href="http://joyeur.com/2011/02/05/on-brunos-concern-about-the-current-coupling-of-node-js-and-v8/">very good response</a> to this post. Obviously I owe him some responses, which is <a href="http://www.olympum.com/future/answering-jason-on-v8-governance-and-impact-to-nodejs/">on a separate post</a>.

If you have not watched <a href="http://www.yuiblog.com/blog/2010/08/30/yui-theater-douglas-crockford-crockford-on-javascript-scene-6-loopage-52-min/">Douglas Crockford&#8217;s video lecture on server-side
Javascript</a>,
I recommend you do that first before reading further into this post.

I have been saying for a while that <a href="http://www.olympum.com/internet/why-node-js-matters/">server-side Javascript
matters</a>. We, at Yahoo!,
see <a href="http://developer.yahoo.com/blogs/ydn/posts/2010/11/on-deck-yuiconf-2010-with-a-focus-on-yui-yql-and-node-js/">a bright future in server-side
Javascript</a>
and are making a big investment in it.&#8230;]]></description>
				<content:encoded><![CDATA[<p><strong>Update:</strong> Jason Hoffman (Chief Scientist, Founder of Joyent) has written a <a href="http://joyeur.com/2011/02/05/on-brunos-concern-about-the-current-coupling-of-node-js-and-v8/">very good response</a> to this post. Obviously I owe him some responses, which is <a href="http://www.olympum.com/future/answering-jason-on-v8-governance-and-impact-to-nodejs/">on a separate post</a>.</p>

<p>If you have not watched <a href="http://www.yuiblog.com/blog/2010/08/30/yui-theater-douglas-crockford-crockford-on-javascript-scene-6-loopage-52-min/">Douglas Crockford&#8217;s video lecture on server-side
Javascript</a>,
I recommend you do that first before reading further into this post.</p>

<p>I have been saying for a while that <a href="http://www.olympum.com/internet/why-node-js-matters/">server-side Javascript
matters</a>. We, at Yahoo!,
see <a href="http://developer.yahoo.com/blogs/ydn/posts/2010/11/on-deck-yuiconf-2010-with-a-focus-on-yui-yql-and-node-js/">a bright future in server-side
Javascript</a>
and are making a big investment in it. But if you <a href="http://twitter.com/olympum">follow me on
twitter</a>, you&#8217;ll know that I am also looking into
ensuring high-availability of server-side Javascript-based services on
production. Which really comes down to something like: to V8 or not to V8.</p>

<p>NodeJS is currently tightly coupled to Google&#8217;s V8 engine. V8 was not designed
as a server-side engine, but as a browser-based engine. Furthermore, V8 was
designed squarely to run in Chrome&#8217;s multi-process model. As much as I think
V8 is a brilliant piece of engineering, it&#8217;s software that was not designed to
run on a server.</p>

<p>More to the point, it&#8217;s really up to Google to work with the community to make
V8 work on the server-side. Sometimes Google is responsive, but sometimes it
might not. It varies as it depends how fixing a bug or applying a patch may
align with Google&#8217;s product roadmap and plans. I don&#8217;t know Google&#8217;s plans, and I
suspect most NodeJS committers don&#8217;t know either.</p>

<p>Maybe for some folks this might not seem like a big problem. And probably, if
you are running a site with a few thousand daily page views, it might actually
not be a big deal. But to Yahoo!, and <a href="http://amix.dk/blog/post/19577">to
others</a>, it&#8217;s a big deal and we believe this
is fundamental to the success of the NodeJS project.</p>

<p>Writing a Javascript runtime that does not fail is hard. Even the really smart
V8 folks have explicitly designed V8 for failures to happen and safeguard the
browser. In a browser, a JS engine failure is an inconvenience to the user:
damn, you lost a tab. In a thread-per-request blocking I/O server design it&#8217;s
also not a big deal, you lose one in-flight request. But in an event-driven
web server, it&#8217;s a major flaw, you lose thousands of in-flight requests you
have already accepted.</p>

<p>For NodeJS to scale to billions of page views like Yahoo!&#8217;s, we need to
make sure the Javascript engine / VM behind Node is rock-solid for server-side
loads, i.e. it fails extremely rarely.</p>

<p>Google may invest on supporting V8 on the server-side, just like the Mozilla
folks do. Or somebody else might invest and ensure V8 is rock-solid on the
server and Google may merge the patches nicely. Or maybe not, and the
community may need to fork V8. Or something else &#8230; Nobody really knows.</p>

<p>Either way, it&#8217;s time for the NodeJS community to realise there is a
roadblock and discuss it openly.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.olympum.com/future/nodejs-to-v8-or-not-to-v8/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
	</channel>
</rss>
