Drop The Chaos and Use Component(1)

Let's face it, CommonJS has won — AMD has lost. Let's move on to embracing this fact.

ES6 modules have adopted the CommonJS-style. Thus, both Node.js and the browser will be running the same model. However, (native) ES6 modules are far away, and it still lacks (which is a good thing) distribution.

That's where component(1) steps in. For those unaware, component(1) is a client-side package manager and build tool. It let's you embrace a modularized approach to building apps. component(1) supports JavaScript, CSS, templates and various other kinds of assets. These are all bundled up in a consistent manner. The JavaScript-side uses a CommonJS environment, allowing you to code exactly like in Node.

Note: I'll be addressing component, the project, as component(1), and component, in the abstract concept as is.

// component(1) JavaScript
var dom = require('dom');
var debug = require('debug');

The important part, and what separates from tools like Browserify is the specific focus on the client-side. See, while Node has many modules available, they simply aren't built with the browser in-mind. Browserify feels (and is) a hack.

The Bower Problem

Bower achieved being the more popular package manager for client-side assets. Cool. But, it poses a substancial problem. It leaves you in-charge with managing all the assets it fetched for you. This process is completely ad-hoc. Better get grunt, gulp, or Makefiles to do some work.

Moreover, it not only effects that act of building everything, but the runtime and format of these modules are chaotic. Some bundle minified assets, others don't. Some bundle concatenated files, some don't. Some bundle AMD-compatible JavaScript files, Some don't. Most, however, fill their modules with globals.

This results in an extremely poor experience for the user and community.

component(1) solves this problem because it's a package manager and build tool. Thus, it's a lot more opinionated about things than say, Bower.

Community

The community has embraced these broken tools hole heartly. Most of the angular world, for example, lives with tools like Grunt and Bower.

Wanted an Angular bootstrap module that just worked? Or how about a tooltip module that provided just tooltips dammit. Sorry, but these tools are making these worse. Sure, Bower solves the package manager issue, but that's only part of the pie.

Have you ever wanted to get a dropdown module that works, in a matter of seconds? Again, only dropdowns, that's it. Or modals, or animations, or views, or...

There might be some of these modules on Bower, except they are not consistent, not composable, and will most likely use globals. Globals are not the answer.

Dive into component(1)

You can install component(1) with NPM:

npm install component -g

You'll then have access to the component executable.

component(1) follows a similar metadata format as npm's package.json. Here's a component.json file:

{
  "name": "something",
  "version": "0.0.1",
  "repo": "user/something",
  "dependencies": {
    "component/button": "*"
  },
  "development": {
    "visionmedia/mocha": "*"
  },
  "scripts": [
    "index.js"
  ],
  "templates": [
    "template.html"
  ],
  "styles": [
    "style.css"
  ]
}

You'll notice that you're specifying each file manually, and that's ok! It might be annoying to some (globs are in the works for you), but, if you use component(1) in the manner it was designed, this shouldn't be a problem.

Unix Philosophy

The goal is to try and build small modules that do one thing — and do it extremely well.

That's what component(1) is following. Don't build giant components. That's why specifying each file isn't a big deal, because you shouldn't have that many per component. If you have more than 5, than you're probably doing more than one thing.

Local Components

Because you want a modular environment, you'll want lots of components. However, you don't want to be creating a separate repo on Github for each (especially for a proprietary project). You'll want to use local components.

{
  "name": "moduleA",
  "version": "0.0.1",
  "dependencies": {
    "component/dom": "*"
  },
  "paths": [ "." ],
  "local": [
    "moduleB",
    "moduleC",
    "moduleD",
    "moduleE",
    "moduleF"
  ]
}

You can specify arbitrary paths and component(1) will find the local components.

Installing & Building

It's super simple. component(1) comes with a suite of tools, including a default builder. Yes, default. The builder executable just uses the component-builder module. It supports middleware and extra options.

Let's setup a new demo module called demo in a new folder. We'll include a simple component.json file.

{
  "name": "demo",
  "version": "0.0.1",
  "repo": "user/demo",
  "dependencies": {},
  "scripts": [
    "index.js"
  ]
}

The index.js file is just empty at this point.

Let's install component/dom a modular jQuery-esq DOM manipulation library built for component(1). This isn't meant to have feature parity with jQuery, and it's build up by many smaller pieces.

component install component/dom --save

Now, you let's use that component within our index.js file:

// demo/index.js
var dom = require('dom');

dom('body')
  .addClass('foo')
  .removeClass('fah')
  .on('click', function() {
    console.log('foo');
  });

While the component is called component/dom, component(1) aliases each component to it's name property within their component.json. This allows one to forget the notion of usernames attached to component names (as they might be hard to remember).

We can build our app with a single command:

component build --standalone demo

We'll have generated a new build/ folder with a build.js file inside.

<html>
  <head></head>
  <body>
    <script src="build/build.js"></script>
  </body>
</html>

When the component you're building is going to be loading via HTML, you'll want to build it with the --standalone {name} option. This will add code to automatically require the component when it loads.

Consistency

Why mess with grunt, gulp, etc... to manage building your dependencies? component(1) offers the simplest and most effective route to organizing assets for the client-side.

Above all, the previous example used:

  • 1 script load
  • 0 globals!
  • 2 commands
  • 0 effort

Globals are a massive code-smell. It shows you're not using appropriate tooling to solve the problem. component(1) solves this problem.

Higher-Kinded Polymorphism

Today I'm going to go through higher-kinded polymorphism, what it's uses are, and what it is (briefly). It's extremely fascinating, at least for me, to go through these relatively complex, highly-abstracted subjects.


Let's talk about types first, and specifically types that people are used to, for example Int, String, Boolean, etc... These are types we can use right away, they're values. These are also more formally called proper types.

Now that we can work with proper types, let's go further. Many (but not all) statically-typed languages have a form of generics. For example, List<T>, Map<K,V>, etc... Generics or type parameters, and type constructors (functions that construct other types) form the basis for further abstraction on-top of proper types.

Let's go through what List<T> really is. List is a type abstracting over a type T. This is called a type constructor; given a type T, we construct type List<T>. Moreover, it should be clear that List<T> is not a proper type. That is, it's not a concrete type — it needs a parameter to be passed before the useable type can be formed. That's the type constructor's job. List<Int>, however, is a concrete type; it's finished — we've passed the appropriate type parameters.

How can we distinguish this type constructor from proper types? Because it's a type abstracting over another (proper) type. This is called first-order types; because we only abstract over a type once.

Using a more formal syntax for defining kinds of types, we can say a proper type is of *; whereas a first-ordered type is of * -> * and * -> * -> *.

Let's go through * -> *. It means, given a type, produce another type. That's exactly what the type constructor is. We can further visualize it as: T -> List<Int>. The next form * -> * -> * is almost the same thing. You can think of it as: given two pairs of types, produce a third type. Thus, * -> * ... in the context of * -> * -> * is the pair of types. A common example is a simple key-value map: Map<K,V> is of the form * -> * -> *.

This is all great. We're able to express more abstract types, such as Map<K,V>, List<T>, etc... and void repetition. But, what if we could abstract further. Imagine a type Foo that takes type MFoo<M>. But, what if M is a type constructor?


That's where higher-kinded polymorphism come into play. Instead of having a type parameter that's of a proper type (like in the case of List<T>; where T is a proper type.), we have the type parameter be a type constructor. Thus, we require a type that will itself produce another type. Alternatively, we can say: a type that abstracts over a type that abstracts over a type. Very much abstracted. That's a higher-kinded type.

Thus, the form is: (* -> *) -> * where (* -> *) is a type constructor. We also call these second-order types. For types greater than first-order, we call them higher-kinded.

Following, say, Rust, we could have:

trait Higher<M<_>> {
    // ...
}

That's if we follow the currently defined syntax for generics. However, I feel like the following (used only for defining higher-order types) feels a lot more natural:

trait Higher<M[_]> {
    // ...
}

The square brackets would perform the same function as the angle brackets, but would be used in a higher-order type context.

Now, the _ is simply a forgetter symbol (we don't want to deal with that type, so just infer and forget). Thus, we could have done M[X], instead; but, often times, you don't care about that type parameter.


Now that we got through higher-kinded polymorphism, let's go through it's uses.

You may have heard of terms such as Monads, Monoids, Functors, Semigroups, etc... These are all fairly popular in functional languages, some terms more so than others. These are all represented as higher-kinded types.

Almost every aggregation problem can be represented as Monoids. For example, HyperLogLog is a monoid. HyperLogLog is actually a composition of monoids, because it contains operations such as Min, which is also a monoid.

Monoids are an extremely powerful primitive, as well as Monads, Functors, etc...

All of these come from abstract algebra and category theory, which is a pretty insane field.


How Does This Relate To Rust?

We'll, Rust currently doesn't support higher-kinded polymorphism. But, I'm in the process of writing up an RFC for it and experimenting with various implementations.

Within Rust, you typically use a lot of Option<T> and Result<T>, and may end up with quite a bit of pattern matching pyramids. This can get quite crazy and messy.

Luckily, Monads are the perfect tool for the job. Haskell uses them extensively for working with their equivalent option type and it reduces the ugliness quite a bit.

For the most part, most of these concepts are almost strictly exclusive in functional languages like Scala or Haskell (You can hack yourself some Monads with lots of C++ templates, if you wish), but imagine having these powerful primitives in a systems language!

I'll be following up on this blog post the focuses on proper examples, specifically for Rust, and the advantages of it.

Resources:

Reddit thread within the Rust subreddit

Using LLVM From Within Rust

Continuing with my previous writing theme about lower-level concepts from within Rust; we're going to get start using LLVM with Rust.

As I stated in my previous blog post, Rust is amazing with interoping with C and even C++, which is hard to come by. However, because LLVM is written in C++, it introduces the need to link against libc++ and create wrappers around C++'s classes and types.

Luckily — or maybe not, considering Rust's compiler is built using LLVM — we already have usable Rust bindings in-place. However, there's roughly no documentation on these bindings, except the source, and Rust's compiler, which has a lot more stuff going on than simple LLVM usage.


Now, the first order of operations is to link against rustc, the Rust compiler library. I'll also note that Rust is very good at handling this type of stuff, without adding any arguments to rustc — the compiler executable.

Let's begin to define our program. We'll create a new file called bin.rs and it's full path being src/bin.rs. This is standard Rust operations.

fn main() {

}

Because rustc (the library) is a dynamic library, sitting somewhere in your Rust installation directories, we'll need to tell Rust to find it.

extern mod rustc;

fn main() {}

This tells Rust that we want to link against an external module. However, depending on your installation, you might have multiple versions of Rust installed, or previous versions having leftover files on your system. We'll need to be explicit about which version we should link against.

extern mod rustc =  = "rustc#0.10-pre";

fn main() {}

I'm currently building against the master branch of Rust, so you can replace the version accordingly.


The main LLVM wrapping types are under rustc::lib::llvm. This include ModuleRef, ContextRef, etc... However, all the functions are defined under rustc::lib::llvm::llvm. Let's include some of those modules which will make things less verbose when calling any LLVM functions.

extern mod rustc =  = "rustc#0.10-pre";
use rustc::lib::llvm::llvm;

fn main() {}

Great! Now we'll need to fill out our main function. We'll first need to wrap everything under an unsafe block.

...
fn main() {
  unsafe {
    ...
  }
}

This isn't going to be an LLVM tutorial. You can find out how to use LLVM itself in it's respective documentation. All the docs will be using C++, so you'll have to do some work getting that translated over to Rust land.


We'll need three things created:

  • Module
  • Context
  • Builder

We'll only create a single, global context, but you'd typically want to create multiple contexts if you'd need isolation between, say, multiple threads.

We won't be using the builder, so it's optional at this stage, unless you want to emit some useful LLVM IR.

extern mod rustc = "rustc#0.10-pre";

use rustc::lib::llvm::llvm;

fn main() {
    unsafe {
        // Create our first global context.
        let llvm_context = llvm::LLVMContextCreate();

        // Create our module `module1` and attach our context.
        let llvm_module = "mod1".with_c_str(|buf| {
            llvm::LLVMModuleCreateWithNameInContext(buf, llvm_context)
        });

        // Create a useless builder.
        let builder = llvm::LLVMCreateBuilderInContext(llvm_context);

        // Dump the output of the LLVM module in IR format.
        llvm::LLVMDumpModule(llvm_module);
    }
}

That's it. We can build the program with:

rustc src/bin.rs -o bin/rustllvm

Once running, you should have gotten the following output:

; ModuleID = 'mod1'

That's it! Pretty simple, eh?

You can check out the full source code on github.

A Just-in-time Compiler In Rust

Today, we're going to build a simple, very simple, JIT compiler in Rust. Rust is a safe, concurrent, and practical language that aims at replacing C++ and become a better systems language.

Now, this is only the actual just-in-time compiler that we'll be implementing, not a language compiler, nor the encoding of machine instructions; the latter requires a lot of knowledge about the CPU instruction specification, such as x86.

If you'd like to use a production grade JIT compiler, there's LLVM and LibJit, just to name a couple.


What exactly is a just-in-time (JIT) compiler? I think the following quote does a justice explanation.

Whenever a program, while running, creates and runs some new executable code which was not part of the program when it was stored on disk, it’s a JIT. - Eli Bendersky


Before we get started, you'll need to get yourself a copy of the Rust compiler. The current version, as of this writing — is 0.9. I'll be updating this article to be applicable to future versions. But, if I don't get to it in time, let me know.

Be aware that the Rust compiler takes a fairly long time to compile, mainly because it's a bootstrapped compiler. Thus, the compiler itself is written in Rust. Yeah, that might be confusing, but it makes the development of a language much more streamlined at the cost of more complexity. Currently, as far as I'm aware, the Rust compiler must compile itself 3 times. It also has to compile it's dependencies, such as LLVM, libuv, only once.

Also, make sure you have enough RAM (about 1.5GB) leftover, or dedicated for the compilation process of the Rust compiler. If you start paging/swapping, it'll compile much slower.

Safety & Interop

Let's begin by talking about safety, considering it's one of Rust's principles to begin with. Creating a JIT compiler was one of the first things I tried implementing within Rust. Why? Well, I got straight down to the core system, and, thus, I was able to clearly see how Rust was at handling low-level programming, and C-interop.

Most of the constructs within a JIT compiler that handles any logic will be unsafe. Remember, I'm talking about the JIT being the logic that handles the dynamic execution of code, machine code. If you'd build a VM around this — or a JIT compiler — you could minimize the amount of unsafe code you'd use. It would be much easier to prove that 1000 lines of code is correct, yourself, and let the compiler prove the rest; rather than having to prove mostly everything is correct yourself. This is what Rust offers. A way to minimize the amount of unsafe code you write and read.

The end result would ultimately be to expose safe interfaces to all logic. That's how we're going to work.

Rust Modules

Let's define the files, effectively modules, that we're going to write.

Here's our project's directory contents:

src/
  main.rs
  raw.rs
  region.rs
  safe.rs

We'll be compiling all our examples with with rustc, Rust's compiler executable:

mkdir -p bin && rustc -Z debug-info src/main.rs -o bin/jit

We simply create a new folder bin that will hold the compiled program. You can place this into a Makefile if you'd like, for simplicity.

main.rs will contain our program logic. This will contain our examples that will use our JIT compiler.

raw.rs are C function interfaces. Libc is linked by default with Rust programs, so there are no extra efforts to include the library.

region.rs will hold a Rust-idiomatic struct MappedRegion that contain some raw pointers, which are unsafe. We also define some implementations for a couple Traits.

safe.rs will contain safe interfaces to native C functions.

Let's begin with the raw.rs module. We'll begin by including the libc module.

// src/raw.rs
use std::libc;

Next, we'll want to define some external functions, C functions. These are all unsafe.

// src/raw.rs
use std::libc;

extern {}

You don't need to understand the next function interfaces. They're interfaces to libc functions, such as mmap, memcpy, etc...

// src/raw.rs
use std::libc;

extern {

    pub fn mmap(
        addr : *libc::c_char,
        length : libc::size_t,
        prot : libc::c_int,
        flags  : libc::c_int,
        fd   : libc::c_int,
        offset : libc::off_t
    ) -> *u8;

    pub fn munmap(
        addr : *u8,
        length : libc::size_t
    ) -> libc::c_int;

    pub fn mprotect(
        addr: *libc::c_void,
        length: libc::size_t,
        prot: libc::c_int
    ) -> libc::c_int;

    pub fn memcpy(
        dest: *libc::c_void,
        src: *libc::c_void,
        n: libc::size_t
    ) -> *libc::c_void;
}
Tip: A * defines a raw pointer. This is equivalent to const T* in C. *mut T is equivalent to T* — a normal C pointer.

We don't need to define the actual contents of these functions, because they will be included in the compiled binary by linking with libc.

Next, we'll also include some flags that interop with these functions.

// src/raw.rs
use std::libc;

extern {...}

pub static PROT_NONE   : libc::c_int = 0x0;
pub static PROT_READ   : libc::c_int = 0x1;
pub static PROT_WRITE  : libc::c_int = 0x2;
pub static PROT_EXEC   : libc::c_int = 0x4;

pub static MAP_SHARED  : libc::c_int = 0x1;
pub static MAP_PRIVATE : libc::c_int = 0x2;

Awesome, we're now done with the src/raw.rs module. As you can see, it's dead simple to interop with C code.

Because Rust is trying to provide a way to build safer software, we need to take advantage of this. Thus, we want to wrap the main portion of the state in a safe struct.

But, before we get into that, let's go through how we can execute machine code dynamically.

Execution

The first portion we need is to store our instructions somewhere in memory. Ok, so how about malloc? While malloc works on traditional data, we have some very specific requirements — them being that we need to control the memory's protection flags, and malloc doesn't give us that ability.

Mmap is exactly what we need. We need to create a new memory mapped region with some custom protection. Initially, we only need to read and write to the region. But, we'll need to change the protection to turn off writing then enable executation. By default, you cannot try and execute normal malloced memory. Well, you can try, but the program will blow up because of security reasons. Having a memory region that is both writable and executable is dangerously insecure. That's why we need to do it in steps.

  • Allocate a new memory region of the size we need.
  • Make the region readable and writable.
  • Commit our instructions to that region.
  • Make it read-only and executable.

Let's get to starting src/region.rs. When we create a new memory region with mmap, we'll receive a pointer; which points to the beginning of the new memory block. This isn't safe, so we'll abstract it around safer constructs.

// src/region.rs
use std::os;
use std;

mod raw;

Let's include the specific modules we need from the standard library, then define a local module we need to use — our raw module containing the C interfaces/prototypes.

// src/region.rs
use std::os;
use std;

mod raw;

pub struct MappedRegion {
    addr: *u8,
    len: u64
}

We are defining a new struct that wraps the dirty, raw pointer; and holds the length of the memory region.

The following implementations of Traits are for printing and memory deallocation, respectively.

// src/region.rs
...

impl std::fmt::Default for MappedRegion {
    fn fmt(value: &MappedRegion, f: &mut std::fmt::Formatter) {
        write!(f.buf, "MappedRegion\\{ {}, {}\\}", 
          value.addr, value.len
        );
    }
}
// src.region.rs
...

impl Drop for MappedRegion {
    #[inline(never)]
    fn drop(&mut self) {
        unsafe {
            if raw::munmap(self.addr, self.len) < 0 {
                fail!(format!("munmap({}, {}): {}", 
                  self.addr,
                  self.len, 
                  os::last_os_error()
                ));
            }
        }
    }
}

I'm not going to go into much detail on the Drop trait, but this destructor will be called whenever the owner of the MappedRegion instance has gone out of scope.


As with the Rust model of providing safe interfaces, we need to define friendlier functions than the raw C functions.

Let's create our safe.rs module.

// src/safe.rs
use region::MappedRegion;
use std::libc::{c_char, size_t, c_void};
use std::libc;
use std::os;

mod raw;
mod region;

We'll start off by including some modules.

// src/safe.rs
...

pub fn mmap(size: u64) -> Result<~MappedRegion, ~str> {
    unsafe {
        let buf = raw::mmap(
            0 as *libc::c_char,
            size,
            libc::PROT_READ | libc::PROT_WRITE,
            libc::MAP_PRIVATE | libc::MAP_ANON,
            -1,
            0
        );

        if buf == -1 as *u8 {
          Err(os::last_os_error())
        } else {
          Ok(~MappedRegion{ addr: buf, len: size })
        }
    }
}

Now, we defined a safe version of mmap. Thus, we don't accept unsafe inputs or outputs — no null points allowed.

Result<~MappedRegion, ~str>

Result is one of Rust's types that provide safer and more expression code. Result includes two cases: Ok, and Err. This allows us to mattern match accordingly.

// example
let region = match mmap(1024) {
  Ok(r) => r,
  Err(err) => fail!(err)
};

We also return an owned pointer of MappedRegion if the function succeeded. We could've returned the instance by value, but we'll need to pass this around to multiple functions, so we want to reduce copying.


libc::PROT_READ | libc::PROT_WRITE

These flags are important. They define what protection/permissions the memory region has. We can now read and write to the memory region.

Remember that the contents of this safe function is wrapped within an unsafe block. This is needed to interop with naked, unsafe code.

Remember, it's much easier to understand and prove the correctness of one-thousand lines of completely unsafe code — which can be reasoned about — rather than hundreds of thousands of possibly unsafe, or even completely unsafe lines of code — that cannot be reasoned about.

We can now create a new memory mapped region using safe interfaces. But, the memory doesn't have anything written to it yet. We'll use the memcpy function to copy the machine instructions to the memory region. Let's write a safe interface around the native function.

// src/safe.rs
...

pub fn memcpy(region: &MappedRegion, contents: &[u8]) {
    unsafe {
        raw::memcpy(
            region.addr as * c_void,
            contents.as_ptr() as *c_void,
            region.len as size_t);
        assert_eq!(*(contents.as_ptr()), *region.addr);
    }
}

We'll take a reference, or borrow a pointer to a MappedRegion.

region: &MappedRegion

Pass a reference, or a borrowed pointer to a vector of u8s.

contents: &[u8]

Again, we operate within an unsafe block. memcpy is practically like in C.


// src/safe.rs
...

pub fn mprotect(region: &MappedRegion, contents: &[u8]) {
    unsafe {
        if raw::mprotect(
            region.addr as *libc::c_void,
            contents.len() as libc::size_t,
            libc::PROT_READ | libc::PROT_EXEC
        ) == -1 {
            fail!("err: mprotect failed to protect 
              the memory region.");
        }
    }
}

The last function we need to define, takes the same arguments as memcpy. This is how we transform the memory region to be read-only and executable, after we write to it.


Now we have src/safe.rs, src/raw.rs, and src/region.rs completed. We can now put these pieces together to make a functional JIT compiler.

Let's move onto our src/main.rs file.

We'll start by defining our crate:

// src/main.rs
#[crate_id = "jiter#0.0.1"];
#[desc = "Jiter"];
#[crate_type = "bin"];
#[license = "MIT"];

Include some modules we'll need:

// src/main.rs
...
use std::cast;
use region::MappedRegion;

Include some local modules:

// src/main.rs
...
mod raw;
mod region;
mod safe;

Before we go any further, let's go through the machine code we'll be generating. We're defining a function, because all JIT code needs to be wrapped in some sort of function, which can be called; that takes a single integer as it's only input, adds four to that integer, then returns that value. We'll be using the standard cdecl calling convention.

mov %rdi, %rax
add $4, %rax
ret

Which is compiled/encoded to:

0x48 0x89 0xf8       // mov %rdi, %rax
0x48 0x83 0xc0 0x04  // add $4, %rax
0xc3                 // ret

We can express the code using a vector within Rust.

let code = [
  0x48, 0x89, 0xf8,       // mov %rdi, %rax
  0x48, 0x83, 0xc0, 0x04, // add $4, %rax
  0xc3                    // ret
];

Let's define our main function.

// src/main.rs
...

fn main() {
  let code = [
    0x48, 0x89, 0xf8,       // mov %rdi, %rax
    0x48, 0x83, 0xc0, 0x04, // add $4, %rax
    0xc3                    // ret
  ];

  let region = match safe::mmap(code.len() as u64) {
      Ok(r) => r,
      Err(err) => fail!(err)
  };

  type AddFourFn = extern "C" fn(int) -> int;
  let Add = jit_func::<AddFourFn>(region, code);
  println!("Add(4): {}", Add(4));
}

Again, this was a previous example on using the safe mmap function. We'll assign the new MappedRegion to region.

// src/main.rs
...
let region = match safe::mmap(contents.len() as u64) {
    Ok(r) => r,
    Err(err) => fail!(err)
};
...

Rust is extremely good at interoperating with other runtimes, such as C. We'll define a basic function pointer that takes an int and returns an int. We're also defining this as a type, to make it easier to use.

// src/main.rs
...
type AddFourFn = extern "C" fn(int) -> int;
...

This is the magic bit. We haven't defined this function yet; we'll get to that next.

We'll pass our function pointer as a generic argument, along with the code (our encoded x86 instructions), and region which hold our mmapped memory block.

// src/main.rs
...
let Add = jit_func::<AddFourFn>(region, code);
...

Call the awesome function:

// src/main.rs
...
println!("Add(4): {}", Add(4)); // Add(4): 8
...

Let's define the jit_func function. This should be placed right before the main function.

// src/main.rs
...
fn jit_func<T>(region: &MappedRegion, contents: &[u8]) -> T {
    unsafe {
        safe::memcpy(region, contents);
        safe::mprotect(region, contents);
        assert_eq!(*(contents.as_ptr()), *region.addr);
        cast::transmute(region.addr)
    }
}

fn main() {...}

So, the basic steps are:

  • safe::memcpy: Copy the instructions into the memory block.
  • safe::mprotect Make the memory read-only, but executable.
  • assert_eq!(*(contents.as_ptr()), *region.addr);: Ensure that the contents of the memory block is exactly the same as our vector.
  • cast::transmute(region.addr): We need to transform a raw pointer, which points to the beginning of the JIT function, to a C-style function pointer.

The cast::transmute(region.addr) will be casted against the specific JIT function type (which is a generic argument T). In this example, it's a function extern "C" fn(int) -> int.


Also, the reason we didn't put the safe::mmap call within the jit_func function is because we'd run into a segfault error while trying to execute our JITed function. That's because as soon as jit_func returns, our mmapped memory will be deallocated because the MappedRegion unique pointer will have gone out of scope.

A better way would be to split the functionality into many steps.

type AddFourFn = extern "C" fn(int) -> int;
let add = jit::func::<AddFourFn>();
add.emit_mov("%rdi", "%rax");
add.emit_add(4, "%rax");
add.emit_ret();
let addFn = add.getFunction();
addFn(4);

This would include having instruction encoding for various architectures and such.

Or, you could just use LLVM or LibJit. But, it's a good educational experience to learning, understanding, and implementing a JIT compiler.

You can view the full source code on Github.


Follow me on Twitter

Github