Vesta uses complete filesystem and environment encapsulation based on the chroot system call. Each time you set up a new tool (or even a new version of a tool) to run under Vesta control, you need to construct a complete filesystem with everything it needs to run. Sometimes this requires a process of tral-and-error to discover all the files used by a tool. This page gives describes a number of techniques you can use to do this.
Contents
Background
When the Vesta builder runs a tool, it is restricted to an isolated subset of the filesystem. This subset is defined by the value ./root at the point when _run_tool is called.
Vesta's tool launching process uses the chroot(2) and execve(2) system calls to restrict the filesystem accesses to a directory under Vesta's control and completely define the environment for each tool. Your system may have man pages which describe these in detail (though that's probably more information than you need). If you're not familiar with the concept of chroot, wikipedia's description may also be helpful.
Tools that you run under Vesta will usually need a number of things other than the tool itself included in the filesystem:
- A "loader" program that's provided as part of the operating system and is responsible for boot-strapping a new process including loading the executable code into memory, loading any shared libraries and resolving symbol references to them, and other pieces of process initialization.
- Shared libraries that provide some functionality not directly linked into the program.
Typically the C run-time library (usually named something like "libc.so") is required by most programs.
The C++ run-time library (usually named something like "libstdc++.so") is usually needed by programs written in C++.
- There are many other shared libraries which different programs use as well. (See the section about "Determining Needed Shared Libraries" below.)
- Interpreters (for programs written in languages such as Perl or Python) or run-time environments (such as a program that implements the Java virtual machine).
- Usually these come with a collection of ancillary files that they need to operate correctly
System configuration files from /etc
Constructing a Filesystem from OS Compoenents
Most modern operating systems have a system for dividing the installed files into different named components. Each such component represents a subset of the files and directories that make up the operating system. Many Linux systems use either the RedHat package manager or the Debian package manager, but these are just two examples.
To make it easier to construct the filesystem needed to run a tool, we import entire OS components into Vesta and allow the user to simply give a list of OS component names which need to be in the filesystem. For example, on a Debian system in order to construct a filesystem needed to run the lexical analyzer generator flex, we would ask for:
- The OS component for flex itself
- The OS component for the m4 macro pre-processor (which recent versions of flex use)
- The OS component for the C run-time library
This can be done with a single function call:
1 ./build_root(<"flex", "m4", "libc6">)
This is of course just one approach implemented on top of Vesta in the Vesta SDL language, but we do think it's a useful one. For more on this see std_env which describes the standard build environment.
Evaluator: -fsdeps
One of the Vesta evaluator's debug flags is "-fsdeps". This will print out one line for each dependency recorded by a filesystem access as a tool runs. Here's an example from the introduction to writing bridges:
% vmake -fsdeps [...] 0/hostname: grep a 0/FS dependency: !/./root/.WD/grep 0/FS dependency: N/./root/bin/grep 0/FS dependency: !/./root/lib 0/FS dependency: !/./root/usr 0/Error: invoking _run_tool: [...]
The first character of each dependency path indicates what kind of operation the tool was performing:
! means that the tool searched for a file or directory that wasn't present.
N means that the tool used a file.
- Normally this means that the file was opened and read, but it may mean that it was examined with stat(2).
In this case we can see that the tool looked for /.WD/grep and found that it didn't exist and then moved on to look for /bin/grep and used that file. It then looked for /lib which didn't exist and /usr which didn't exist, and then failed. (This example is specifically meant to illustrate what happens when you leave out certain key filesystem components like the loader and the C run-timw library.)
For more on dependencies and how they are recorded when tools run, see HowCachingWorks and the description of how _run_tool's dependency recording can be controlled.
Evaluator: -evalcalls
Another Vesta evaluator debug flag is "-evalcalls". This prints out one message for every call-back to the evaluator requesting information about some part of the encapsulated filesystem. This usually doesn't provide any additional useful information beyond "-fsdeps" and is mostly interesting to developers working on modifying Vesta. However it's worth knowing about and may be useful in some obscure cases.
Evaluator: -stop-before/after-tool
@@@ Not written yet @@@
Determining Needed Shared Libraries
As mentioend above, tools usually need some shared libraries to run. There's usually a command you can run to get a list of the shared libraries needed by an executable, though it varies depending on the operating system:
- Linux:
% ldd /usr/bin/gcc libc.so.6 => /lib/i686/libc.so.6 (0x40028000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
- MacOS X/Darwin:
% otool -L /usr/bin/gcc /usr/bin/gcc: /usr/lib/libiconv.2.dylib (compatibility version 5.0.0, current version 5.0.0) /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 71.0.0)
Monitoring System Calls
Sometimes it's helpful to monitor the system calls made by a tool. There's usually some utility you can use to do this, though you'll need to import it into Vesta and include it in your tool's filesystem. The exact command depends on the operating system:
- Linux: strace
- Solaris: truss
Environment Variables
Some tools use environment variables during their processing. If they're not set correctly, the tool may not operate correctly. Unfortunately, there's no way to observe which envrionment variables a tool uses. (getenv is a library call, not a system call, so strace cannot monitor it.) If the tool's documentation doesn't tell you, you may need to resort to examining the tool's internal functioning.
For compiled binaries, environment variable names usually appear in the output of strings(1) run on the binary (though many other strings will obviously be included as well). As environment variable names typically follow the convention of using uppercase, underscores, and digits, you can usually find them with a simple filter:
% strings /usr/bin/gcc | grep '^[A-Z_0-9]*$' [...] PATH GCC_EXEC_PREFIX COMPILER_PATH LIBRARY_PATH LPATH BINUTILS _ROOT POSIX LC_COLLATE LC_CTYPE LC_MONETARY LC_NUMERIC LC_TIME LC_MESSAGES LC_ALL LC_XXX LANGUAGE LANG TMPDIR TEMP [...]
For scripts, you can usually come up with a simple pattern which will find environment variable references:
/bin/sh shell scripts:
% grep '\$[A-Z_0-9]' foo.sh [...] $USER [...] [...] $PATH [...]
- Perl:
% grep 'ENV{.*}' foo.pl [...] $ENV{PATH} [...] [...] $ENV{'EDITOR'} [...]
- Python:
% grep 'os\.getenv(.*)' foo.py [...] os.getenv("HOME") [...]
- Tcl:
% grep 'env(.*)' foo.tcl [...] $env(TMPDIR) [...]