Why Use SmartOS?

While at ChefConf 2013 Hack Day, Irene Yam, Director, Customer Marketing at Joyent told me that she is often asked "Why should I move from Linux to SmartOS?"

At this same event, I was also asked what is SmartOS, what does Joyent do, and even "Which linux distro does SmartOS run on?"

First of all, SmartOS is based on illumos, which is an open source fork of OpenSolaris, which was based on Sun's Solaris 10 operating system. So SmartOS does not run on top of any operating system. It is the operating system. At Joyent, we consider it to be a hypervisor, as we provision Linux, Windows, FreeBSD, and other virtual machines running on kvm, which we ported from Linux to SmartOS (announced at KVM Forum 2011).

Also on SmartOS, we support OS virtualization. So you can provision a SmartMachine. As with kvm, a SmartMachine looks appears to be a complete machine, with its own hardware (storage, network, and processors), as well as the operating system and libraries. Unlike kvm, there is no extra layer of virtualization. This means that different SmartMachines provisioned on the same hardware are actually sharing the operating system (SmartOS).

So, why would you want to use SmartMachines or SmartOS as opposed to Linux?

At a high level, hear are the main reasons:

  • Performance
  • Observability
  • Reliability

Let's take a closer look at each of these. We'll concentrate on virtual machines, though the much of the following would also apply to bare metal.

Performance

Generally, when speaking of performance, people are interested in the latency and or throughput of the network(s) and storage, and the utilization of processors on the system. An excellent discussion of the performance of SmartMachines versus kvm virtual machines (and other virtualization mechanisms, see virtualization), can be found at Brendan's blog: Virtualization Performance: Zones, KVM, Xen. For raw computations, there should be little difference between SmartOS and Linux. Having said that, there may be differences due to the way in which tasks are scheduled to run. I may write more about scheduling differences in a future blog.

For networking, SmartOS makes use of a kernel mechanism called crossbow. In addition to providing virtual network devices, crossbow offers better throughput and latency on SmartMachines compared Linux virtual machines, and suffers less from jitter (spikes in latency), and better "goodput" (less retransmissions, for instance).

For virtualization purposes, when running on a SmartMachine, the code path for doing network I/O is the same as when running on SmartOS itself (i.e., bare metal). Linux and Windows VMs running on kvm onSmartOS are using crossbow underneath, but also run additional code in the networking code within the Operating system of the virtual machine, thus adding overhead.

For file systems, SmartOS uses ZFS. Every SmartMachine runs in its own ZFS dataset, and each Linux/Windows virtual machine gets its own ZFS volume. Among other things to get excellent performance, ZFS does very aggressive caching of file system data/metadata through the use of the Adjustable Replacement Cache (ARC). ZFS also makes use of pipelining and transactions to help get better performance. See here for more details on ZFS performance.

As with the network, a SmartMachine runs directly on SmartOS to do disk I/O, whereas Linux and Windows virtual machines run additional code within the guest OS when they perform disk I/O.

Observability

To me, this is the number one reason for using SmartOS. What do I mean by "observability"? By this I mean the capability to see what is being done by application(s), libraries, and the operating system itself (the entire software stack from application to hardware). Being able to see what your software stack is doing makes it easier to debug and troubleshoot problems, and gives you the opportunity to identify and fix performance issues.

Traditionally, to do debugging, and often to do performance analysis, one would add instrumentation to the code being debugged or analyzed. This might be via a debugger, print statements, or other intrusive mechanism. And often, finding the correct place to add the instrumentation would be a hit or miss affair.Also, adding instrumentation often involves rebuilding and restarting the application or system, and often the instrumentation itself has an effect on the problem (it can even make the problem "go away," by hiding timing issues).

For observability, SmartOS provides a tool called DTrace. DTrace allows you to see what your entire software stack is doing at any point in time. And it does this without the need to addinstrumentation to your existing or new code. DTrace is meant to be safe to use on production systems. There is no need to rebuild or restart anything. And there are methods for determining what should be traced (see, for example, Brendan Gregg on the USE Method).

The DTrace mechanism can be used for:

  • **Debugging** - For instance, you can use DTrace to trace entry and return from functions, and to print out arguments and return values. You can also use it to determine how a function is getting called (a "stacktrace"), and to see what the code path is for a given event.
  • **Performance Analysis** - DTrace allows you to get nanosecond timing information. You can use it to sample where the CPUs are spending time, as well as to determine why and for how long applications are being blocked from making forward progress.
  • **Code Coverage** - DTrace can be used to determine whether or not code is getting executed. This is important for thorough testing.
  • **Finding out WTF is happening** - Often, you need to understand how a piece of software works. This might be for troubleshooting, but it could also be for adding new features. This is true even when you wrote the code. Having access to millions of lines of source code may be enough to understand everything, but it is very useful to have a tool that can show you the code you need to be reading.

DTrace is slowly but surely coming to Linux (see DTrace on linux). System Tap is not quite the same. It doesn't seem meant to be used on production systems.

Reliability

SmartOS comes indirectly from Solaris, which has a long reputation of being a reliable and secure system.

In addition, SmartOS uses ZFS. ZFS checksums everything to ensure data is not corrupted, and uses a copy-on-write mechanism to ensure that live data is never overwritten. ZFS file systems are always consistent on disk. ZFS has been in production since 2006. You should not experience data loss or corruption when using it. It is easy to administer, and removes the need for a separate volume manager. And it is meant for growth, and should be especially useful for big data.

Other Features

In addition to ZFS and DTrace, SmartOS uses zones. Zones are used both for SmartMachines and Linux/Windows virtual machines. Every SmartMachine and Linux/Windows virtual machine runs in it own zone. Processes running within a zone do not have access to processes running in other zones. Each zone is given a ZFS dataset or volume(s) for its storage needs. Running kvm instances within their own zones means that if a virtual machine is somehow compromised, it ends up in a place where it cannot do any damage to other tenants (zones) on the system. Every zone has its own virtual network stack via crossbow that is separate from other zone's network stacks (i.e., you cannot sniff packets belonging to other zones). The devices within a zone are not accessible from within other zones.

Resource allocation and management is also controlled for each zone. Resources that are controlled include the amount of memory a zone can use, the amount of CPU a zone can use, and the amount of network bandwidth a zone can use. In addition, a mechanism for prioritizing disk I/O exists so that one zone does not use up the available bandwidth when there is contention for disk bandwidth.

On SmartOS, there is a mechanism for managing services called the Service Management Facility (SMF). SMF is a replacement for the /etc/init.d scripts that exist on other systems. You can still use the /etc/init scripts for backward compatibility. SMF gives you much finer grained control over managing services, and a standard way to handle things like log files and configuration of a service.

Why Use Linux?

So, why still use Linux? For many people, familiarity with Linux means that learning SmartOS can be a steep learning curve. Having said that, both SmartOS and Linux are Unix-based systems, and the differences are not really that hard to learn. For help, you can use The Linux-to-SmartOS Cheat Sheet.

Another reason for continuing to use Linux is that Linux has many more drivers, and much more software has been written for Linux. Many applications that have been written for Linux are easily portable to SmartOS, but the degree of difficulty depends on how much Linux-specific code is in the application. Many developers write something meant to be portable, but test on various Linux flavors to decide whether or not they are successful. Testing should be done on Linux, SmartOS, and FreeBSD at a minimum.



Post written by Mr. Max Bruning