Need fast iterations with hot reload, debugging, and collaboration over k8s envs?

How To Use Low-Level OS Tools To Solve A High-Level Configuration Problem

Gahl Saraf
May 29, 2022
5
 min read

During onboarding, a customer reached out to our support to report a problem they encountered.  As this process is extremely important to us, we jumped right into resolving the issue.

From their report, one of their container's main processes exited immediately upon starting. They tried to use `raftt restart` to get it up but to no avail. Looking at the log files (`raftt logs ui`) revealed that the process was able to reach a certain point successfully but then failed to execute any further. However, when running locally using `docker run`, the process ran successfully! Intriguing… 🙂

Metadata:

- Process command line: `yarn start`

- react-scripts version - 3.4.1

- Container image: based on ubuntu

### Starting to debug: creating working and broken setups

Our first priority was to create two setups, one in which the problem reproduced and one in which `yarn` worked fine.

We started by `ssh-ing` into the container, and trying to run `yarn start` manually. Surprisingly, this worked! → There was a difference in how `raftt` was running the process compared with the command line. We first suspected some kind of workdir / environment variable mismatch, but quickly proved these were identical.

For a broken setup, we just restarted the process through `raftt` - `raftt restart service ui`

With these two setups, we could move on to the next stage -

### Finding the differences

`yarn start` is a complex command. It spawns a shell script which execs node, which in turn creates a whole tree of processes. In order to efficiently find the problem, we reached for `strace` (which logs all system calls made by a process). Running `strace -ff -o /tmp/good yarn start` created a file for each of the processes spawned. This looked something like this:

```bash
root@ui-test:/tmp# ls
good.492  good.499  good.506  good.513  good.520  good.527  good.534  good.541
good.493  good.500  good.507  good.514  good.521  good.528  good.535  good.542
good.494  good.501  good.508  good.515  good.522  good.529  good.536  good.543
good.495  good.502  good.509  good.516  good.523  good.530  good.537  good.544
good.496  good.503  good.510  good.517  good.524  good.531  good.538  good.545
good.497  good.504  good.511  good.518  good.525  good.532  good.539
good.498  good.505  good.512  good.519  good.526  good.533  good.540

```

Some were much larger than others. Some ended due to being sent a SIGTERM, some ended successfully.

Running the same through raftt - `raftt restart service ui -- strace -ff -o /tmp/bad yarn start`, got us a similar batch of `bad.PID` files.

One of the last lines printed by `yarn start, in both the successful and` unsuccessful flows was `Starting the development server...`. `grep -r "Starting the development server" .` yielded a single file, one of the larger ones. Looking for the `exec` syscall pointed us towards the process being run! - `/usr/local/bin/node /app/node_modules/react-app-rewired/scripts/start.js`. A quick test proved that running this directly behaves the same as running `yarn start` - works through shell but not through `raftt`. The code being run is this: [https://github.com/timarney/react-app-rewired/blob/master/scripts/start.js](https://github.com/timarney/react-app-rewired/blob/master/scripts/start.js), which leads to [https://github.com/replicatedhq/react-scripts/blob/master/scripts/start.js](https://github.com/replicatedhq/react-scripts/blob/master/scripts/start.js)

An older version of the code, which the customer was using - 3.4.1 contained this bit (added here - [Commit 7e6d6cd](https://github.com/facebook/create-react-app/commit/7e6d6cd05f3054723c8b015c813e13761659759e)):


```js
if (isInteractive || process.env.CI !== 'true') {
     // Gracefully exit when stdin ends
     process.stdin.on('end', function() {
       devServer.close();
       process.exit();
     });
     process.stdin.resume();
}

```

Our `ui` container did not have `CI` defined in its env, so when `stdin` closes the process will exit!

Our code that starts the process looked like this:

```go
cmd := exec.CommandContext(ctx, args.entrypoint, args.args...)
cmd.Stderr = args.stdout
cmd.Stdout = args.stderr
cmd.Dir = args.workdir
...
if err := cmd.Start(); err != nil {
...


```


We didn’t start the process with `stdin`! We quickly tried adding `CI=true` to the env, which solved the problem.

### So let’s wrap it up

Both Docker and Kubernetes start the container main process without stdin - so from that perspective Raftt’s behavior was correct - we just need to be aware and use one of the following workarounds:

Regardless - two workarounds existed

- set `stdin_open: true` in the docker-compose

- set `CI=true` in the environment variables

- Upgrade to a new version of `react-scripts` :p

Want to learn more? Try our beta now at [raftt.io](http://raftt.io/)

Gahl Saraf

Stop wasting time worrying about your dev env.
Concentrate on your code.

The ability to focus on doing what you love best can be more than a bottled-up desire lost in a sea of frustration. Make it a reality — with Raftt.