I am trying to build Jupyter notebook in docker following the guide here:
https://github.com/cordon-thiago/airflow-spark
and got an error with exit code: 8.
I ran:
$ docker build --rm --force-rm -t jupyter/pyspark-notebook:3.0.1 .
the building stops at the code:
RUN wget -q $(wget -qO- https://www.apache.org/dyn/closer.lua/spark/spark-${APACHE_SPARK_VERSION}/spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz\?as_json | \
python -c "import sys, json; content=json.load(sys.stdin); print(content['preferred']+content['path_info'])") && \
echo "${spark_checksum} *spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz" | sha512sum -c - && \
tar xzf "spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz" -C /usr/local --owner root --group root --no-same-owner && \
rm "spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz"
with error message like below:
=> ERROR [4/9] RUN wget -q $(wget -qO- https://www.apache.org/dyn/closer.lua/spark/spark-3.0.1/spark-3.0.1-bin-hadoop2.7.tgz?as_json | python -c "import sys, json; content=json.load(sys.stdin); 2.3s
------
> [4/9] RUN wget -q $(wget -qO- https://www.apache.org/dyn/closer.lua/spark/spark-3.0.1/spark-3.0.1-bin-hadoop2.7.tgz?as_json | python -c "import sys, json; content=json.load(sys.stdin); print(content[
'preferred']+content['path_info'])") && echo "F4A10BAEC5B8FF1841F10651CAC2C4AA39C162D3029CA180A9749149E6060805B5B5DDF9287B4AA321434810172F8CC0534943AC005531BB48B6622FBE228DDC *spark-3.0.1-bin-hadoop2.7.
tgz" | sha512sum -c - && tar xzf "spark-3.0.1-bin-hadoop2.7.tgz" -C /usr/local --owner root --group root --no-same-owner && rm "spark-3.0.1-bin-hadoop2.7.tgz":
------
executor failed running [/bin/bash -o pipefail -c wget -q $(wget -qO- https://www.apache.org/dyn/closer.lua/spark/spark-${APACHE_SPARK_VERSION}/spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz\
?as_json | python -c "import sys, json; content=json.load(sys.stdin); print(content['preferred']+content['path_info'])") && echo "${spark_checksum} *spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_
VERSION}.tgz" | sha512sum -c - && tar xzf "spark-${APACHE_SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz" -C /usr/local --owner root --group root --no-same-owner && rm "spark-${APACHE_SPARK_VERSION}
-bin-hadoop${HADOOP_VERSION}.tgz"]: exit code: 8
Really appreciate if someone can enlighten me on this. Thanks!
The exit code 8 is likely from wget meaning an error response from the server. As an example, this path that the Dockerfile tries to wget from isn't valid anymore: https://www.apache.org/dyn/closer.lua/spark/spark-3.0.1/spark-3.0.1-bin-hadoop2.7.tgz
From the issues on the repo, it appears that Apache version 3.0.1 is no longer valid so you should override the APACHE_SPARK version to 3.0.2 with a --build-arg:
docker build --rm --force-rm \
--build-arg spark_version=3.0.2 \
-t jupyter/pyspark-notebook:3.0.2 .
EDIT
See comment below for more, the command that worked was:
docker build --rm --force-rm \
--build-arg spark_version=3.1.1 \
--build-arg hadoop_version=2.7 \
-t jupyter/pyspark-notebook:3.1.1 .
And updated spark checksum to reflect the version for 3.1.1: https://downloads.apache.org/spark/spark-3.1.1/spark-3.1.1-bin-hadoop2.7.tgz.sha512
For this answer to be relevant in the future, it will likely need to update versions and checksum again for the latest spark/hadoop versions.
Is it possible to generate a Dockerfile from an image? I want to know for two reasons:
I can download images from the repository but would like to see the recipe that generated them.
I like the idea of saving snapshots, but once I am done it would be nice to have a structured format to review what was done.
How to generate or reverse a Dockerfile from an image?
You can. Mostly.
Notes: It does not generate a Dockerfile that you can use directly with docker build; the output is just for your reference.
alias dfimage="docker run -v /var/run/docker.sock:/var/run/docker.sock --rm alpine/dfimage"
dfimage -sV=1.36 nginx:latest
It will pull the target docker image automatically and export Dockerfile. Parameter -sV=1.36 is not always required.
Reference: https://hub.docker.com/r/alpine/dfimage
Now hub.docker.com shows the image layers with detail commands directly, if you choose a particular tag.
Bonus
If you want to know which files are changed in each layer
alias dive="docker run -ti --rm -v /var/run/docker.sock:/var/run/docker.sock wagoodman/dive"
dive nginx:latest
On the left, you see each layer's command, on the right (jump with tab), the yellow line is the folder that some files are changed in that layer
(Use SPACE to collapse dir)
Old answer
below is the old answer, it doesn't work any more.
$ docker pull centurylink/dockerfile-from-image
$ alias dfimage="docker run -v /var/run/docker.sock:/var/run/docker.sock --rm centurylink/dockerfile-from-image"
$ dfimage --help
Usage: dockerfile-from-image.rb [options] <image_id>
-f, --full-tree Generate Dockerfile for all parent layers
-h, --help Show this message
To understand how a docker image was built, use the
docker history --no-trunc command.
You can build a docker file from an image, but it will not contain everything you would want to fully understand how the image was generated. Reasonably what you can extract is the MAINTAINER, ENV, EXPOSE, VOLUME, WORKDIR, ENTRYPOINT, CMD, and ONBUILD parts of the dockerfile.
The following script should work for you:
#!/bin/bash
docker history --no-trunc "$1" | \
sed -n -e 's,.*/bin/sh -c #(nop) \(MAINTAINER .*[^ ]\) *0 B,\1,p' | \
head -1
docker inspect --format='{{range $e := .Config.Env}}
ENV {{$e}}
{{end}}{{range $e,$v := .Config.ExposedPorts}}
EXPOSE {{$e}}
{{end}}{{range $e,$v := .Config.Volumes}}
VOLUME {{$e}}
{{end}}{{with .Config.User}}USER {{.}}{{end}}
{{with .Config.WorkingDir}}WORKDIR {{.}}{{end}}
{{with .Config.Entrypoint}}ENTRYPOINT {{json .}}{{end}}
{{with .Config.Cmd}}CMD {{json .}}{{end}}
{{with .Config.OnBuild}}ONBUILD {{json .}}{{end}}' "$1"
I use this as part of a script to rebuild running containers as images:
https://github.com/docbill/docker-scripts/blob/master/docker-rebase
The Dockerfile is mainly useful if you want to be able to repackage an image.
The thing to keep in mind, is a docker image can actually just be the tar backup of a real or virtual machine. I have made several docker images this way. Even the build history shows me importing a huge tar file as the first step in creating the image...
I somehow absolutely missed the actual command in the accepted answer, so here it is again, bit more visible in its own paragraph, to see how many people are like me
$ docker history --no-trunc <IMAGE_ID>
A bash solution :
docker history --no-trunc $argv | tac | tr -s ' ' | cut -d " " -f 5- | sed 's,^/bin/sh -c #(nop) ,,g' | sed 's,^/bin/sh -c,RUN,g' | sed 's, && ,\n & ,g' | sed 's,\s*[0-9]*[\.]*[0-9]*\s*[kMG]*B\s*$,,g' | head -n -1
Step by step explanations:
tac : reverse the file
tr -s ' ' trim multiple whitespaces into 1
cut -d " " -f 5- remove the first fields (until X months/years ago)
sed 's,^/bin/sh -c #(nop) ,,g' remove /bin/sh calls for ENV,LABEL...
sed 's,^/bin/sh -c,RUN,g' remove /bin/sh calls for RUN
sed 's, && ,\n & ,g' pretty print multi command lines following Docker best practices
sed 's,\s*[0-9]*[\.]*[0-9]*\s*[kMG]*B\s*$,,g' remove layer size information
head -n -1 remove last line ("SIZE COMMENT" in this case)
Example:
~ dih ubuntu:18.04
ADD file:28c0771e44ff530dba3f237024acc38e8ec9293d60f0e44c8c78536c12f13a0b in /
RUN set -xe
&& echo '#!/bin/sh' > /usr/sbin/policy-rc.d
&& echo 'exit 101' >> /usr/sbin/policy-rc.d
&& chmod +x /usr/sbin/policy-rc.d
&& dpkg-divert --local --rename --add /sbin/initctl
&& cp -a /usr/sbin/policy-rc.d /sbin/initctl
&& sed -i 's/^exit.*/exit 0/' /sbin/initctl
&& echo 'force-unsafe-io' > /etc/dpkg/dpkg.cfg.d/docker-apt-speedup
&& echo 'DPkg::Post-Invoke { "rm -f /var/cache/apt/archives/*.deb /var/cache/apt/archives/partial/*.deb /var/cache/apt/*.bin || true"; };' > /etc/apt/apt.conf.d/docker-clean
&& echo 'APT::Update::Post-Invoke { "rm -f /var/cache/apt/archives/*.deb /var/cache/apt/archives/partial/*.deb /var/cache/apt/*.bin || true"; };' >> /etc/apt/apt.conf.d/docker-clean
&& echo 'Dir::Cache::pkgcache ""; Dir::Cache::srcpkgcache "";' >> /etc/apt/apt.conf.d/docker-clean
&& echo 'Acquire::Languages "none";' > /etc/apt/apt.conf.d/docker-no-languages
&& echo 'Acquire::GzipIndexes "true"; Acquire::CompressionTypes::Order:: "gz";' > /etc/apt/apt.conf.d/docker-gzip-indexes
&& echo 'Apt::AutoRemove::SuggestsImportant "false";' > /etc/apt/apt.conf.d/docker-autoremove-suggests
RUN rm -rf /var/lib/apt/lists/*
RUN sed -i 's/^#\s*\(deb.*universe\)$/\1/g' /etc/apt/sources.list
RUN mkdir -p /run/systemd
&& echo 'docker' > /run/systemd/container
CMD ["/bin/bash"]
Update Dec 2018 to BMW's answer
chenzj/dfimage - as described on hub.docker.com regenerates Dockerfile from other images. So you can use it as follows:
docker pull chenzj/dfimage
alias dfimage="docker run -v /var/run/docker.sock:/var/run/docker.sock --rm chenzj/dfimage"
dfimage IMAGE_ID > Dockerfile
This is derived from #fallino's answer, with some adjustments and simplifications by using the output format option for docker history. Since macOS and Gnu/Linux have different command-line utilities, a different version is necessary for Mac. If you only need one or the other, you can just use those lines.
#!/bin/bash
case "$OSTYPE" in
linux*)
docker history --no-trunc --format "{{.CreatedBy}}" $1 | # extract information from layers
tac | # reverse the file
sed 's,^\(|3.*\)\?/bin/\(ba\)\?sh -c,RUN,' | # change /bin/(ba)?sh calls to RUN
sed 's,^RUN #(nop) *,,' | # remove RUN #(nop) calls for ENV,LABEL...
sed 's, *&& *, \\\n \&\& ,g' # pretty print multi command lines following Docker best practices
;;
darwin*)
docker history --no-trunc --format "{{.CreatedBy}}" $1 | # extract information from layers
tail -r | # reverse the file
sed -E 's,^(\|3.*)?/bin/(ba)?sh -c,RUN,' | # change /bin/(ba)?sh calls to RUN
sed 's,^RUN #(nop) *,,' | # remove RUN #(nop) calls for ENV,LABEL...
sed $'s, *&& *, \\\ \\\n \&\& ,g' # pretty print multi command lines following Docker best practices
;;
*)
echo "unknown OSTYPE: $OSTYPE"
;;
esac
It is not possible at this point (unless the author of the image explicitly included the Dockerfile).
However, it is definitely something useful! There are two things that will help to obtain this feature.
Trusted builds (detailed in this docker-dev discussion
More detailed metadata in the successive images produced by the build process. In the long run, the metadata should indicate which build command produced the image, which means that it will be possible to reconstruct the Dockerfile from a sequence of images.
If you are interested in an image that is in the Docker hub registry and wanted to take a look at Dockerfile?.
Example:
If you want to see the Dockerfile of image "jupyter/datascience-notebook" type the word "Dockerfile" in the address bar of your browser as shown below.
https://hub.docker.com/r/jupyter/datascience-notebook/
https://hub.docker.com/r/jupyter/datascience-notebook/Dockerfile
Note:
Not all the images have Dockerfile, for example, https://hub.docker.com/r/redislabs/redisinsight/Dockerfile
Sometimes this way is much faster than searching for Dockerfile in Github.
docker pull chenzj/dfimage
alias dfimage="docker run -v /var/run/docker.sock:/var/run/docker.sock --rm chenzj/dfimage"
dfimage image_id
Below is the output of the dfimage command:
$ dfimage 0f1947a021ce
FROM node:8
WORKDIR /usr/src/app
COPY file:e76d2e84545dedbe901b7b7b0c8d2c9733baa07cc821054efec48f623e29218c in ./
RUN /bin/sh -c npm install
COPY dir:a89a4894689a38cbf3895fdc0870878272bb9e09268149a87a6974a274b2184a in .
EXPOSE 8080
CMD ["npm" "start"]
it is possible in just two step. First pull the image then run docker history command. also, shown in SS.
docker pull kalilinux/kali-rolling
docker history --format "{{.CreatedBy}}" kalilinux/kali-rolling --no-trunc
What is image2df
image2df is tool for Generate Dockerfile by an image.
This tool is very useful when you only have docker image and need to generate a Dockerfile whit it.
How does it work
Reverse parsing by history information of an image.
How to use this image
# Command alias
echo "alias image2df='docker run -v /var/run/docker.sock:/var/run/docker.sock --rm cucker/image2df'" >> ~/.bashrc
. ~/.bashrc
# Excute command
image2df <IMAGE>
See help
docker run --rm cucker/image2df --help
For example
$ echo "alias image2df='docker run -v /var/run/docker.sock:/var/run/docker.sock --rm cucker/image2df'" >> ~/.bashrc
$ . ~/.bashrc
$ docker pull mysql
$ image2df mysql
========== Dockerfile ==========
FROM mysql:latest
RUN groupadd -r mysql && useradd -r -g mysql mysql
RUN apt-get update && apt-get install -y --no-install-recommends gnupg dirmngr && rm -rf /var/lib/apt/lists/*
ENV GOSU_VERSION=1.12
RUN set -eux; \
savedAptMark="$(apt-mark showmanual)"; \
apt-get update; \
apt-get install -y --no-install-recommends ca-certificates wget; \
rm -rf /var/lib/apt/lists/*; \
dpkgArch="$(dpkg --print-architecture | awk -F- '{ print $NF }')"; \
wget -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-$dpkgArch"; \
wget -O /usr/local/bin/gosu.asc "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-$dpkgArch.asc"; \
export GNUPGHOME="$(mktemp -d)"; \
gpg --batch --keyserver hkps://keys.openpgp.org --recv-keys B42F6819007F00F88E364FD4036A9C25BF357DD4; \
gpg --batch --verify /usr/local/bin/gosu.asc /usr/local/bin/gosu; \
gpgconf --kill all; \
rm -rf "$GNUPGHOME" /usr/local/bin/gosu.asc; \
apt-mark auto '.*' > /dev/null; \
[ -z "$savedAptMark" ] || apt-mark manual $savedAptMark > /dev/null; \
apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false; \
chmod +x /usr/local/bin/gosu; \
gosu --version; \
gosu nobody true
RUN mkdir /docker-entrypoint-initdb.d
RUN apt-get update && apt-get install -y --no-install-recommends \
pwgen \
openssl \
perl \
xz-utils \
&& rm -rf /var/lib/apt/lists/*
RUN set -ex; \
key='A4A9406876FCBD3C456770C88C718D3B5072E1F5'; \
export GNUPGHOME="$(mktemp -d)"; \
gpg --batch --keyserver ha.pool.sks-keyservers.net --recv-keys "$key"; \
gpg --batch --export "$key" > /etc/apt/trusted.gpg.d/mysql.gpg; \
gpgconf --kill all; \
rm -rf "$GNUPGHOME"; \
apt-key list > /dev/null
ENV MYSQL_MAJOR=8.0
ENV MYSQL_VERSION=8.0.24-1debian10
RUN echo 'deb http://repo.mysql.com/apt/debian/ buster mysql-8.0' > /etc/apt/sources.list.d/mysql.list
RUN { \
echo mysql-community-server mysql-community-server/data-dir select ''; \
echo mysql-community-server mysql-community-server/root-pass password ''; \
echo mysql-community-server mysql-community-server/re-root-pass password ''; \
echo mysql-community-server mysql-community-server/remove-test-db select false; \
} | debconf-set-selections \
&& apt-get update \
&& apt-get install -y \
mysql-community-client="${MYSQL_VERSION}" \
mysql-community-server-core="${MYSQL_VERSION}" \
&& rm -rf /var/lib/apt/lists/* \
&& rm -rf /var/lib/mysql && mkdir -p /var/lib/mysql /var/run/mysqld \
&& chown -R mysql:mysql /var/lib/mysql /var/run/mysqld \
&& chmod 1777 /var/run/mysqld /var/lib/mysql
VOLUME [/var/lib/mysql]
COPY dir:2e040acc386ebd23b8571951a51e6cb93647df091bc26159b8c757ef82b3fcda in /etc/mysql/
COPY file:345a22fe55d3e6783a17075612415413487e7dba27fbf1000a67c7870364b739 in /usr/local/bin/
RUN ln -s usr/local/bin/docker-entrypoint.sh /entrypoint.sh # backwards compat
ENTRYPOINT ["docker-entrypoint.sh"]
EXPOSE 3306 33060
CMD ["mysqld"]
reference
My intention is to run a GUI jar file in Docker so I could automate commands with xdotool and may view it by x11vnc.
This is my Dockerfile:
# WEB 0.1
FROM ubuntu:14.04
RUN apt-get update \
&& apt-get install -y \
default-jre \
x11vnc \
xdotool \
xsel \
xvfb \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
RUN DISPLAY=:1.0 \
&& export DISPLAY \
&& mkdir /root/.vnc \
&& x11vnc -storepasswd 1234 /root/.vnc/passwd \
&& Xvfb :1 -screen 0 493x476x8 & \
x11vnc -display :1.0 -usepw -forever &
ENTRYPOINT ["java"]
CMD ["-jar", "/var/bin/program.jar"]
I run it with:
docker run \
--name program-jar \
-p 5090:5900 \
-v /var/bin/program-jar/:/var/bin/ \
-d program-jar:0.1
But inside this container it is not defined $DISPLAY and is not running x11vnc and Xvfb
root#62febbc0b8f9:/# echo $DISPLAY
root#62febbc0b8f9:/# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 11.7 0.9 4226956 98588 ? Ssl 14:30 0:01 java -jar /var/bin/program.jar
root 26 0.2 0.0 18188 3268 ? Ss 14:30 0:00 /bin/bash
root 41 0.0 0.0 15580 2044 ? R+ 14:30 0:00 ps aux
root#62febbc0b8f9:/#
(If I run those commands in RUN inside bash it work... don't know why RUN seems not to work when run the docker build)
docker uses a layer file system when you RUN it create a separate layer for the installation it is NOT use to run a program but it is use to download source code or build from source code etc. for example RUN mvn package
The way you should do this is create a shell script commonly they call it bootstrap.sh you copy that into your container COPY bootstrap.sh /app or something like that you can then put in this command
#!/bin/bash
DISPLAY=:1.0 \
&& export DISPLAY \
&& mkdir /root/.vnc \
&& x11vnc -storepasswd 1234 /root/.vnc/passwd \
&& Xvfb :1 -screen 0 493x476x8 & \
x11vnc -display :1.0 -usepw -forever &
java -jar /var/bin/program.jar
into your shell script and the last command in your dockerfile change it to CMD ./bootstrap.sh something like that
add to your docker run
command
-v $HOME/.Xauthority:/home/developer/.Xauthority -v /tmp/.X11-unix:/tmp/.X11-unix:ro
and if you need some EXPORT, the
ENV
directive
is designed for this, see
https://docs.docker.com/engine/reference/builder/#env
I have the following ffmpeg-cli command which does not produce the described effect in documentation. Could this be a bug, or I have something wrong with the command.
ffmpeg \
-y \
-i small.mp4 \
-i monkey/monkey_%04d.png \
-filter_complex "[0:v][1:v]overlay=enable='between(t,1,5)'[out1]" \
-map '[out1]' \
output.mp4
I expect it to overlay the #1 stream on top of #0 between seconds 1 and 5.
You may download the test tarball from this link:
https://drive.google.com/file/d/0BxIQVP1zErDPYXRveG9hN0c0Qjg/view?usp=sharing
It includes assets for the test case.
The build I tried with:
ffmpeg-3.0.2-64bit-static (available online)
FFmpeg is a time-based processor i.e. it aligns packets by timestamps, so you have to align the start of the image sequence to the start of the overlay.
ffmpeg \
-y \
-i small.mp4 \
-i monkey/monkey_%04d.png \
-filter_complex "[1:v]setpts=PTS-STARTPTS+(1/TB)[1v]; \
[0:v][1v]overlay=enable='between(t,1,5)'[out1]" \
-map '[out1]' \
output.mp4
I am trying to run the CVB on a directory of plain text files, following the procedure outlined below. However, I am not able to see the vectordump (step 6). Run without the "-c csv" flag the generated file is empty. However, if I use the flag "-c csv" the generated file starts with a series of numbers followed by an alphabetically organized series of unigrams (see below)
#1,10,1163,12,121,13,14,141,1462,15,16,17,185,1901,197,2,201,2227,23,283,298,3,331,35,4,402,4351,445,5,57,58,6,68,7,9,987,a.m,ab,abc,abercrombie,abercrombies,ability
Can someone point out what I am doing wrong?
thank you
0: Set Paths
> export HDFS_PATH=/path/to/hdfs/
> export LOCAL_PATH=/path/to/localfs
1: Put docs in HDFS using hadoop fs -put [-put ... ]
> hadoop fs -put $LOCAL_PATH/test $HDFS_PATH/rawdata
2: Generate sequence files (of Text) from a directory
> mahout seqdirectory \
-i $HDFS_PATH/rawdata \
-o $HDFS_PATH/sequenced \
-c UTF-8 -chunk 5
3- Generate sparse Vector from Text sequence files
> mahout seq2sparse \
-i $HDFS_PATH/sequenced \
-o $HDFS_PATH/sparseVectors \
-ow --maxDFPercent 85 --namedVector --weight tf
4- rowid: : Map SequenceFile to {SequenceFile, SequenceFile}
> mahout rowid \
-i $HDFS_PATH/sparseVectors/tfidf-vectors \
-o $HDFS_PATH/matrix
5- run cvb
> mahout cvb \
-i $HDFS_PATH/matrix/matrix \
-o $HDFS_PATH/test-lda \
-k 100 -ow -x 40 \
-dict $HDFS_PATH/sparseVectors/dictionary.file-0 \
-dt $HDFS_PATH/test-lda-topics \
-mt $HDFS_PATH/test-lda-model
6- Dump vectors from a sequence file to text
> mahout vectordump \
-i $HDFS_PATH/test-lda-topics/part-m-00000 \
-o $LOCAL_PATH/vectordump \
-vs 10 -p true \
-d $HDFS_PATH/sparseVectors/dictionary.file-0 \
-dt sequencefile \
-sort $HDFS_PATH/test-lda-topics/part-m-00000 \
-c csv
; cat $LOCAL_PATH/vectordump
The problem was in step 4. In step 3 I am generating TF vectors (--weight tf) but in step 4 I was running the rowid job (which converts the <Text, VectorWritable> tuples of tf-vectors, to <IntWritable, VectorWritable> which cvb expects) with tfidf-vectors.
So changing step 4 from:
> mahout rowid \
-i $HDFS_PATH/sparseVectors/tfidf-vectors \
-o $HDFS_PATH/matrix
to
> mahout rowid \
-i $HDFS_PATH/sparseVectors/tf-vectors\
-o $HDFS_PATH/matrix
fixes the problem.