Import data into Cassandra using docker in window - docker

Hi i am trying to load into cassandra in docker. Unfortunately, i can't make it. I pretty sure the path is correct, as i directly copy and paste from the properties section. May i know is there any alternative to solve it?
p.s. i am using windows 11, latest cassandra 4.1
cqlsh:cds502_project> COPY data (id)
... FROM 'D:\USM\Data Science\CDS 502 Big data storage and management\Assignment\Project\forest area by state.csv'
... WITH HEADER = TRUE;
Using 7 child processes
Starting copy of cds502_project.data with columns [id].
Failed to import 0 rows: OSError - Can't open 'D:\\USM\\Data Science\\CDS 502 Big data storage and management\\Assignment\\Project\\forest area by state.csv' for reading: no matching file found, given up after 1 attempts
Processed: 0 rows; Rate: 0 rows/s; Avg. rate: 0 rows/s
0 rows imported from 0 files in 0.246 seconds (0 skipped).
above is my code and the result. I have tried https://www.geeksforgeeks.org/export-and-import-data-in-cassandra/ exactly and it works when i create the data inside the docker, export and reimport it, but not working when i use external data.
I also notice the csv file i exported using cassandra in docker is missing in my laptop but can be access by docker.

Behaviour you are observing is what is expected from docker. What I know there are cp commands in Kubernetes which copy the data from outside to inside container and vice-versa. Either you can check those commands to take the data from inside or outside the docker or other way is you can push your csv into docker using a Docker Image.

you need to leverage Docker bind mounting a volume in order to access local files within your container. docker run -it -v <path> ...
See references below:
https://www.digitalocean.com/community/tutorials/how-to-share-data-between-the-docker-container-and-the-host
https://www.docker.com/blog/file-sharing-with-docker-desktop/

Related

My docker container keeps instantly closing when trying to run an image for bigcode-tools

I'm new to Docker, and I'm not sure how to quite deal with this situation.
So I'm trying to run a docker container in order to replicate some results from a research paper, specifically from here: https://github.com/danhper/bigcode-tools/blob/master/doc/tutorial.md
(image link: https://hub.docker.com/r/tuvistavie/bigcode-tools/).
I'm using a windows machine, and every time I try to run the docker image (via: docker run -p 80:80 tuvistavie/bigcode-tools), it instantly closes. I've tried running other images, such as the getting-started, but that image doesn't close instantly.
I've looked at some other potential workarounds, like using -dit, but since the instructions require setting an alias/doskey for a docker run command, using the alias and chaining it with other commands multiple times results in creating a queue for the docker container since the port is tied to the alias.
Like in the instructions from the GitHub link, I'm trying to set an alias/doskey to make api calls to pull data, but I am unable to get any data nor am I getting any errors when performing the calls on the command prompt.
Sorry for the long question, and thank you for your time!
Going in order of the instructions:
0. I can run this, it added the image to my Docker Desktop
1.
Since I'm using a windows machine, I had to use 'set' instead of 'export'
I'm not exactly sure what the $ is meant for in UNIX, and whether or not it has significant meaning, but from my understanding, the whole purpose is to create a directory named 'bigcode-workspace'
Instead of 'alias,' I needed to use doskey.
Since -dit prevented my image from instantly closing, I added that in as well, but I'm not 100% sure what it means. Running docker run (...) resulted in the docker image instantly closing.
When it came to using the doskey alias + another command, I've tried:
(doskey macro) (another command)
(doskey macro) ^& (another command)
(doskey macro) $T (another command)
This also seemed to be using github api call, so I also added a --token=(github_token), but that didn't change anything either
Because the later steps require expected data pulled from here, I am unable to progress any further.
Looks like this image is designed to be used as a command-line utility. So it should not be running continuously, but you run it via alias docker-bigcode for your tasks.
$BIGCODE_WORKSPACE is an environment variable expansion here. So on a Windows machine it's %BIGCODE_WORKSPACE%. You might want to set this variable in Settings->System->About->Advanced System Settings, because variables set with SET command will apply to the current command prompt session only. Or you can specify the path directly, without environment variable.
As for alias then I would just create a batch file with the following content:
docker run -p 6006:6006 -v %BIGCODE_WORKSPACE%:/bigcode-tools/workspace tuvistavie/bigcode-tools %*
This will run the specified command appending the batch file parameters at the end. You might need to add double quotes if BIGCODE_WORKSPACE path contains spaces.

Docker Desktop on Hyper-V - bind mount do not propagate inotify on file copy

I have a Docker Desktop installed on my dev machine, with WSL 2 disabled. I have shared my entire C:/ drive:
Then I have a container that inside has a .net 6 (Core) application that uses the FileSystemWatcher to observe one directory, and when a file is pasted inside to read it.
I red in several articles in the internet that WSL2 do not support notification to propagate from the Windows file system to the underlying Linux distribution that docker is running on, hence there is no way that I can bind the directory that I have to "watch" with the app in the container. So I swithed to the old Hyper-V support of docker.
I run the container with the following command:
docker run `
--name mlc-importer `
-v C:/temp/DZBank:/opt/docker/mlc_importer/dfs/DZBank `
-v C:\temp\appsettings.json:/app/appsettings.json `
-v C:\temp\log4net.config:/app/log4net.config `
mlc-importer
The container starts and starts "watching" for new files. The strange thing is, that when I cut a file and paste it in the directory, the app in the container registers the new file and reads it, but when I copy the file and paste in in the directory, the app in teh container do not register it and read it.
Can someone help me because I can't find out what the problem might comes from.
Thanks in advance,
Julian
I managed to solve mu problem, and I'll post it here if somebody encounters the same problem.
The problem was in teh file itself. This I found out when I started a new container with only debian, and installed inotify-tools, and binded the same path. When I tried to copy the file and paste it in the binded dir the output was:
Three times MODIFY event.
When I tried to cut the file and paste in in the new dir the events were:
So with copy - three times MODIFY, with cut one CREATE and two MODIFY.
Then I inspected the copied file and saw this:
When I checked the checkbox and hit ok, everything is ok. And since in the container app (from the post), I hook to only "File created" callback, it not triggers when the file is only modified.
Hope this helps someone with a similar problem

How to access scala shell using docker image for spark

I just downloaded this docker image to set up a spark cluster with two worker nodes. Cluster is up and running however I want to submit my scala file to this cluster. I am not able to start spark-shell in this.
When I was using another docker image, I was able to start it using spark-shell.
Can someone please explain if I need to install scala separately in the image or there is a different way to start
UPDATE
Here is the error bash: spark-shell: command not found
bash: spark-shell: command not found
root#a7b0682ff17d:/opt/spark# ls /home/shangupta/Scripts/
ProfileData.json demo.scala queries.scala
TestDataGeneration.sql input.scala
root#a7b0682ff17d:/opt/spark# spark-shell /home/shangupta/Scripts/input.scala
bash: spark-shell: command not found
root#a7b0682ff17d:/opt/spark#
You're getting command not found because PATH isn't correctly established
Use the absolute path /opt/spark/bin/spark-shell
Also, I'd suggest packaging your Scala project as an uber jar to submit unless you have no external dependencies or like to add --packages/--jars manually

How to create tun interface inside Docker container image?

I'm trying to create a Docker image with a /dev/net/tun device so that the image can be used across Linux, Mac and Windows host machines. The device does not need access to the host's network interface.
Note that passing --device /dev/net/tun:/dev/net/tun to docker run is undesirable because this only works on Linux.
Once the container is started, I can manually add the device by running:
$ sudo mkdir /dev/net
$ sudo mknod /dev/net/tun c 10 200
$ sudo ip tuntap add mode tap tap
but when I add these lines to the Dockerfile it results in an error:
Step 35/46 : RUN mkdir /dev/net
---> Running in 5475f2e4b778
Removing intermediate container 5475f2e4b778
---> c6f8e2998e1a
Step 36/46 : RUN mknod /dev/net/tun c 10 200
---> Running in fdb0ed813cdb
mknod: /dev/net/tun: No such file or directory
The command '/bin/sh -c mknod /dev/net/tun c 10 200' returned a non-zero code: 1
I believe the crux here is creating a filesystem node from within a docker build step? Is this possible?
The /dev directory is special, and Docker build steps cannot really put anything there. That also is mentioned in an answer to question 56346114.
Apparently a device in /dev isn't a file with data in it, but a placeholder, an address, a pointer, a link to driver code in memory that does something when accessed. Such driver code in memory is not something that a Docker image would hold.
I got device creation working in a container by putting your command line code in an .sh script wrapping the app we really want to run.
I managed to work around this by programmatically creating the TUN device in our software that needs it (which are mostly unit tests). In the setup of the program we can create a temporary file node with major/minor code 10/200:
// Create a random temporary filename. We are not using tmpfile() or the
// usual suspects because we need to create the temp file using mknod(),
// below.
snprintf(tmp_filename_, IFNAMSIZ, "/tmp/ect_%d_%d", rand(), rand());
// Create a temporary file node for use as a TUN interface.
// Device 10, 200 is the device code for a TAP/TUN device.
// See https://www.kernel.org/doc/Documentation/admin-guide/devices.txt
int result = mknod(tmp_filename_, S_IFCHR | 0644, makedev(10, 200));
if (result < 0) {
perror("Failed to make temporary file");
}
ASSERT_GE(result, 0);
and then in the tear-down of the program we close and delete the temporary file.
One issue remaining is this program only works when run as the root user because the program doesn't have cap_net_admin,cap_net_raw capabilities. Another annoyance that can be worked-around.

Cayley docker isn't writing data

I'm using [or, trying to use] the docker cayley from here: https://github.com/saidimu/cayley-docker
I created a data dir at /var/lib/apps/cayley/data and dropped the .cfg file in there that I was instructed to make:
{
"database": "myapp",
"db_path": "/var/lib/apps/cayley/data",
"listen_host": "0.0.0.0"
}
I ran docker cayley with:
docker run -d -p 64210:64210 -v /var/lib/apps/cayley/data/:/data saidimu/cayley:v0.4.0
and it runs fine, I'm looking at it's UI in the browser:
And I add a triple or two, and I get success messages.
Then I go to the query interface and try to list any vertices:
> g.V
and there is nothing to be found (I think):
{
"result": null
}
and there is nothing written in the data directory I created.
Any ideas why data isn't being written?
Edit: just to be sure there wasn't something wrong with my data directory, I ran the local volume mounted docker for neo4j on the same directory and it saved data correctly. So, that eliminates some possibilities.
I can not comment yet but I think to obtain results from your query you need to use the All keyword
g.V().All() // Print All the vertices
OR
g.V().Limit(20) // Limits the results to 20
If that was not your problem I can edit and share my dockerfile which is derived from the same docker-file that you are using.
You may refer to the lib here to learn more about how to use Cayley's APIs and the data format in Cayley and some other stuff like N-Triples N-quads and RDF:
Cayley APIs usages examples (mocha test cases)
Clearly designed entry-level N-quads data for getting you in: in the project test/data directory

Resources