Incomplete I/O

Where I/O fails.

Chapter 1

Linux

Content related to the Linux operating system.

Linux Process State

Overview

The Linux kernel exposes a great deal of per-process information through the /proc filesystem interface. This can be very helpful in things like:

Forensics – What is this process that’s running and where did it come from? What files/resources does it have open?
Automation – Being able to programmatically retrieve information about processes without trying to parse the output of commands.

We’ll look at a few common, useful objects under the proc filesystem, but you can do no harm looking around, so please do.

Under /proc, there is a subdirectory for each process ID (PID) currently running on the system. There is also a special symbolic link called self that points to the PID subdirectory for the current process (your shell). For example:

$ ls /proc/self/
arch_status         fd                 net            setgroups
attr                fdinfo             ns             smaps
autogroup           gid_map            numa_maps      smaps_rollup
auxv                io                 oom_adj        stack
cgroup              ksm_merging_pages  oom_score      stat
clear_refs          ksm_stat           oom_score_adj  statm
cmdline             limits             pagemap        status
comm                loginuid           patch_state    syscall
coredump_filter     map_files          personality    task
cpu_resctrl_groups  maps               projid_map     timens_offsets
cpuset              mem                root           timers
cwd                 mountinfo          sched          timerslack_ns
environ             mounts             schedstat      uid_map
exe                 mountstats         sessionid      wchan
$

We’ll take a look at few examples below, and potentially look at others in future articles.

What is that process?

Let’s say, for example, we see a running process called ./foo.

$ ps ax
 ...
 269792 pts/2    Sl     0:00 ./foo
 ...
$

We can see that it was started in the directory containing the executable, but we can’t see from this output which directory that is, or the full path to the executable started. There are multiple ways to do this, but we can quickly find answers to both these questions in a way that is also automation-friendly.

Under /proc/<pid> virtual directory (where <pid> is the process ID of the process we’re interested in), we can see two symbolic links of interest:

The exe symlink, which points to the binary used to instantiate this running process, and
The cwd symlink, which points to the current working directory of the process.

Using the PID 269792 belonging to the process in the output above, we can see both of these files and what they point to:

$ ls -l /proc/269792/{exe,cwd}
lrwxrwxrwx 1 matthew matthew 0 Oct 25 15:38 /proc/269792/cwd -> /home/matthew/src/rust/foo
lrwxrwxrwx 1 matthew matthew 0 Oct 25 15:38 /proc/269792/exe -> /home/matthew/src/rust/foo/foo
$

From this, we can tell that the program foo was started from the binary /home/matthew/src/rust/foo/foo, and that its current working directory (the default location for reading and writing files) is the same directory the binary was executed from, /home/matthew/src/rust/foo.

As a side benefit, the ls -l output above also shows the owner of the files. The Linux proc filesystem uses this to tell us which user the process is running as.

Which files does it have open?

Now that we know where this process was started from, we likely want to know more about what it’s doing. One potential avenue of investigation would be to see what files it has open on the filesystem. The fd subdirectory contains symbolic links that point to files open by that process. Using the above process as our example again:

$ ls -l /proc/269792/fd/
total 0
lrwx------ 1 matthew matthew 64 Oct 25 15:56 0 -> /dev/pts/2
lrwx------ 1 matthew matthew 64 Oct 25 15:56 1 -> /dev/pts/2
lrwx------ 1 matthew matthew 64 Oct 25 15:56 2 -> /dev/pts/2
lrwx------ 1 matthew matthew 64 Oct 25 15:56 3 -> /tmp/.data/log
$

We can see that file descriptors 0, 1 and 2 point to a terminal, which probably makes them the stdin, stdout, and stderr for the process. We can also see that it has a file open called log in the /tmp/.data directory. That might be a good place to continue our search.

Summary

We’ve taken a quick look at a few objects under the /proc directory that can be used for forensics or in automation to find out information about currently running processes in real time. We’ll likely look into others in the future. Suggestions welcome.

Chapter 2

WASM

Content related to Web Assembly.

Why Wasm At Versatus

Overview

At Versatus, we’ve chosen Web Assembly (WASM) as a target for our Smart Contract execution, as well as other general compute use cases such as Serverless Functions. As with any technology, it’s easy to get caught up in all of the hype. This quick article aims to lay out some of the reasons why we’ve chosen to use Web Assembly for these use cases.

Common Programming Languages

The Web3 space is littered with single-purpose, custom languages for writing things like Smart Contracts. There are technical reasons why this is the case for some chains, but it increases the barrier to entry for new developers and isn’t necessary. By targetting Web Assembly for compiled Web3 components, developers are free to use the language of their choice in the IDE of their choice with the debugging and testing tools of their choice.

Common Execution Target

Just as we shouldn’t necessarily need to dictate which programming language a Web3 developer writes their code in, we feel that they shouldn’t care what hardware the code is eventually executed on. They certainly shouldn’t have to compile their code for one architecture when testing their code and for another architecture entirely when they wish to deploy and run their code.

Using the ever-growing set of tools for compiling many common (and some uncommon) languages to the common Web Assembly instruction set gives us a common execution target regardless of the programming language and regardless of the underlying hardware.

Controlled Security

By default, Web Assembly is just a simple instructions for a virtual CPU. There are no networks or files or devices. This alone gives us a very small surface area to secure. We want to secure the network and the general population from malicious software, and we want to protect the resources of the hosts running the software from misuse or overuse. At the same time, we also want to protect the running software from outside attack. Out of the box, Web Assembly makes it easier for us to contain and secure the executing software.

The Web Assembly Systems Interface (WASI) is a set of extensions to WASM that allow us to selectively and carefully enable some basic I/O functionality, including potential file access or potential network access. Rather than these being a simple on/off switch, we can strictly limit these calls and resources and whitelist only what we know to be necessary and safe.

Extensible

It is possible for someone like Versatus to extend a WASM runtime to include new functionality. This can be done in a number of ways, including adding new WASI calls, modifying the underlying implementation of existing WASI calls, or making use of the proposed component model to add new functionalities.

Consider the case where a developer is fairly new to Web3, understands how Content Addressable Storage (CAS) works in Web3, but doesn’t have the bandwidth to totally come up to speed with the inner workings of IPFS and the APIs needed to interact with it. With some extension to the WASM runtime shipped by Versatus, the following Rust code excerpt could potentially be all that is needed by that developer to retrieve the contents of a file from IPFS:

let data = std::fs::read("/ipfs/QmRgKAxcwi6aJUUA3DGnEUVXNEEocyhqweWRqvxjHh3i5z");

The virtual path /ipfs could be intercepted and handled by Versatus, allowing us to handle the complexities of IPFS and making it trivial for a developer to work with Web3 content.

Summary

Web Assembly (WASM) isn’t just for web browsers. It does help us to solve a number of issues around portability, security and simplicity, as described above. It will change the way a number of types of applications and services are deployed more widely in the industry. It isn’t perfect for every situation, and Versatus ourselves have other use cases where we won’t be using Web Assembly. We don’t believe that it will change the way operating systems work any time soon, as is suggested in this great article, but it will certainly have a lot of impact on the way some types of applications are developed and deployed and mtaintained in the future.

Chapter 3

SSH

Content related to OpenSSH and the SSH protocol.

Jump Hosts and OpenSSH

The Problem

If you have used ssh to create a remote terminal session into a Linux or Unix machine, or if you have used related file transfer tools such as scp or sftp to transfer files between machines, you’ll be familiar with the idea that you’re establishing a point-to-point network connection, using the SSH protocol, from one machine (the client) to another (the server).

In the picture below, the laptop on the right is on the same network as the three servers (alvin, simon and theodore). It is able to establish that client-server connection, authenticate, and you have your ssh session.

But what about the laptop on the left? Due to the firewall and probably some stuff called NAT, it’s unlikely that the laptop on the left is able to establish network connectivity to the servers (alvin, simon and theodore) directly. When you’re at home or at the office, you can access those local resources, but when you’re remote, you can’t.

In these cases, it may be that the manager of the firewall might enable ssh on the firewall (or a machine near it in a DMZ) to act as a jump host. This would allow you to ssh from the laptop on the left, into the firewall machine and from there ssh to alvin, simon or theodore behind the firewall.

Here’s what an example session might look like:

mg@laptop$ ssh mg@jump-host.incomplete.io
mg@jump-host.incomplete.io's password:

mg@jump-host$ ssh mg@simon.incomplete.io
mg@simon.incomplete.io's password:

mg@simon$

This approach works, but does have some serious limitations:

Using ssh public key authentication becomes more difficult (more below).
Transferring files using scp or sftp might mean copying to the jump host first and then to the real destination, where the jump host may not have the capacity or a writeable filesystem.
Escaping ssh ~ commands needs a ~ per ssh hop, which can be difficult to keep track of after a few hops.
It’s very interactive, making it difficult to use with automation.

A ProxyJump Tunnel

It is possible to have the ssh command handle the extra step for us by wrapping the SSH session we want inside an SSH session to the jump host. Using the ProxyJump configuration option (or the -J short option on the command line), we can have ssh automatically establish the session to the jump host. From there, ssh will tunnel our session to the servers over the jump host session. For example:

mg@laptop$ ssh -J mg@jump-host.incomplete.io mg@simon.incomplete.io
mg@jump-host.incomplete.io's password:
mg@simon.incomplete.io's password:

mg@simon$

This has many advantages over the two manual steps above, many stemming from the fact that the session to simon in the above example is from laptop to simon, being tunnelled through jump-host. This allows us to use tools like scp to transfer files without having to store them on the jump host as part of the process. For example:

mg@laptop$ scp -o 'ProxyJump mg@jump-host.incomplete.io' ./some-file mg@simon.incomplete.io:
mg@jump-host.incomplete.io's password:
mg@simon.incomplete.io's password:

....

The scp command doesn’t have the short -J option, so we specify the long-form configuration option using -o.

You can use the -J option multiple times if you have multiple jump hosts to pass through.

Tunnelling Your Browser

As mentioned, the -J option establishes a session to a jump host and then tunnels our SSH session to the remote server. OpenSSH is able to tunnel all kinds of things, including web browser traffic. By using the -D option and specifying an unused port number, ssh can listen on your local client machine (laptop in the example above) as a SOCKS5 proxy. It will then tunnel any web requests it receives over the SSH session to the jump host, and make the request on the remote network.

For example:

mg@laptop$ ssh -D3128 -J mg@jump-host.incomplete.io mg@alvin.incomplete.io
mg@jump-host.incomplete.io's password:
mg@simon.incomplete.io's password:

mg@simon$

Then simply configure your web browser to use localhost as a SOCKS5 proxy with the port number specified with the -D option. From there, you ought to be able to browse content inside your private network (perhaps hosted on simon, alvin and theodore), with the web browser traffic being tunnelled (encrypted) over your SSH session through the jump host.

Incomplete I/O

Subsections of Incomplete I/O

Linux

Subsections of Linux

Linux Process State

Overview

What is that process?

Which files does it have open?

Summary

WASM

Subsections of WASM

Why Wasm At Versatus

Overview

Common Programming Languages

Common Execution Target

Controlled Security

Extensible

Summary

SSH

Subsections of SSH

Jump Hosts and OpenSSH

The Problem

A ProxyJump Tunnel

Tunnelling Your Browser