Trying to sense files in smb server from airflow DAG using Filesensor - connection

I have few files in smb server. My job is to sense everyday using filesensor if files exist in that path. If they exist, I have to process them and move them to S3 Bucket. But I am not able to connect with smb server.
1.I mounted smb server on my machine and tried to access files from it. It worked on my local.I could read and process.So I created a bash script with mounting commands and credentials and called that script from BashOperator in DAG.
Then realised that I must have to have a connection id to sense if files exist in that location using Filesensor.
2.So I installed samba provider on top of airflow and gave a connection by entering hostname, login and password. When I run the dag, I am getting this message.
WARNING - Unable to find an extractor. task_type=FileSensor airflow_dag_id=CSV_Upload_To_S3 task_id=Get_CDS_From_Qlik airflow_run_id=manual__2022-12-05T10:26:41.802952+00:00
[2022-12-05, 10:26:46 UTC] {factory.py:122} ERROR - Did not find openlineage.yml and OPENLINEAGE_URL is not set
[2022-12-05, 10:26:46 UTC] {factory.py:43} WARNING - Couldn't initialize transport; will print events to console.
Am I missing something in giving connection? Is there more to it?

Related

MLflow: Unable to store artifacts to S3

I'm running my mlflow tracking server in a docker container on a remote server and trying to log mlflow runs from local computer with the eventual goal that anyone on my team can send their run data to the same tracking server. I've set the tracking URI to be http://<ip of remote server >:<port on docker container>. I'm not explicitly setting any of the AWS credentials on the local machine because I would like to just be able to train locally and log to the remote server (run data to RDS and artifacts to S3). I have no problem logging my runs to an RDS database but I keep getting the following error when it get to the point of trying to log artifacts: botocore.exceptions.NoCredentialsError: Unable to locate credentials. Do I have to have the credentials available outside of the tracking server for this to work (ie: on my local machine where the mlflow runs are taking place)? I know that all of my credentials are available in the docker container that is hosting the tracking server. I've be able to upload files to my S3 bucket using the aws cli inside of the container that hosts my tracking server so I know that it as access. I'm confused by the fact that I can log to RDS but not S3. I'm not sure what I'm doing wrong at this point. TIA.
Yes, apparently I do need to have the credentials available to the local client as well.

Connecting to a remote ArangoDB dockerized server

I am a beginner in regards to ArangoDB and I am trying to deploy my first project using it. The website is PHP based - what I did is that I created an Arango Docker container on Digital Ocean so that I can access it from the browser with the ipv4 provided. Public access to port 8529 is enabled. Locally, I am able to modify the .config file in order to point to the corresponding ip and I can painlessly retrieve data.
As a hosting provider I am using one.com. When uploading the same files that I am able to run locally on my own domain I get the following error:
["_database":"ArangoDBClient\Connection":private]=> string(7) "_system" } ArangoDBClient\ConnectException: cannot connect to endpoint 'tcp://xxx.xx.xxx.xxx:8529/': Connection timed out
I want to mention that I have also tried out ArangoOasis. No luck with it - I get the same error. Been at it for quite a few weeks - I would very much use some guidance. Even what to do next as I am out of ideas and documentation to read.

kafka connect in distributed mode is not generating logs specified via log4j properties

I have been using Kafka Connect in my work setup for a while now and it works perfectly fine.
Recently I thought of dabbling with few connectors of my own in my docker based kafka cluser with just one broker (ubuntu:18.04 with kafka installed) and a separate node acting as client for deploying connector apps.
Here is the problem:
Once my broker is up and running, I login to the client node (with no broker running,just the vanilla kafka installation), i setup the class path to point to my connector libraries. Also the KAFKA_LOG4J_OPTS environment variable to point to the location of log file to generate with debug mode enabled.
So every time i start the kafka worker using command:
nohup /opt//bin/connect-distributed /opt//config/connect-distributed.properties > /dev/null 2>&1 &
the connector starts running, but I don't see the log file getting generated.
I have tried several changes but nothing works out.
QUESTIONS:
Does this mean that connect-distributed.sh doesn't generate the log file after reading the variable
KAFKA_LOG4J_OPTS? and if it does, could someone explain how?
NOTE:
(I have already debugged the connect-distributed.sh script and tried the options where daemon mode is included and not included, by default if KAFKA_LOG4J_OPTS is not provided, it uses the connect-log4j.properties file in config directory, but even then no log file is getting generated).
OBSERVATION:
Only when I start zookeeper/broker on the client node, then provided KAFKA_LOG4J_OPTS value is picked and logs start getting generated but nothing related to the Kafka connector. I have already verified the connectivity b/w the client and the broker using kafkacat
The interesting part is:
The same process i follow in my workpalce and logs start getting generated every time the worker (connnect-distributed.sh) is started, but I haven't' been to replicate the behaviors in my own setup). And I have no clue what I am missing here.
Could someone provide some reasoning, this is really driving me mad.

Connecting to scality/s3 server between docker containers

We are using a python based solution which shall load and store files from S3. For developing and local testing we are using a vagrant environment with docker and docker-compose. We have two docker-compose defintions - one for the assisting backend services (mongo, restheart, redis and s3) and the other one containing the python based REST API exposing solution using the backend services.
When our "front-end" docker-compose group interacts with restheart this works fine (using the name of the restheart container as server host in http calls). When we are doing the same with scality/s3 server this does not work.
The interesting part is, that we have created a test suite for using the scality/s3 server from a python test suite running on the host (windows10) over the forwarded ports through vagrant to the docker container of scality/s3 server within the docker-compose group. We used the endpoint_url localhost and it works perfect.
In the error case (when frontend web service wants to write to S3) the "frontend" service always responds with:
botocore.exceptions.ClientError: An error occurred (InvalidURI) when calling the CreateBucket operation: Could not parse the specified URI. Check your restEndpoints configuration.
And the s3server always responds with http 400 and the message:
s3server | {"name":"S3","clientIP":"::ffff:172.20.0.7","clientPort":49404,"httpMethod":"PUT","httpURL":"/raw-data","time":1521306054614,"req_id":"e385aae3c04d99fc824d","level":"info","message":"received request","hostname":"cdc8a2f93d2f","pid":83}
s3server | {"name":"S3","bytesSent":233,"clientIP":"::ffff:172.20.0.7","clientPort":49404,"httpMethod":"PUT","httpURL":"/raw-data","httpCode":400,"time":1521306054639,"req_id":"e385aae3c04d99fc824d","elapsed_ms":25.907569,"level":"info","message":"responded with error XML","hostname":"cdc8a2f93d2f","pid":83}
We are calling the scality with this boto3 code:
s3 = boto3.resource('s3',
aws_access_key_id='accessKey1',
aws_secret_access_key='verySecretKey1',
endpoint_url='http://s3server:8000')
s3_client = boto3.client('s3',
aws_access_key_id='accessKey1',
aws_secret_access_key='verySecretKey1',
endpoint_url='http://s3server:8000')
s3.create_bucket(Bucket='raw-data') # here the exception comes
bucket = self.s3.Bucket('raw-data')
This issue is quite common. In your config.json file, which you mount in your Docker container, I assume, there is a restEndpoints section, where you must associate a domain name with a default region. What that means is your frontend domain name should be specified in there, matching a default region.
Do note that that default region does not prevent you from using other regions: it's just where your buckets will be created if you don't specify otherwise.
In the future, I'd recommend you open an issue directly on the Zenko Forum, as this is where most of the community and core developpers are.
Cheers,
Laure

scp files through gateway to remote machine

I can't figure out how to scp a file to another machine if there is a gateway connecting my client machine to the remote server. From my client machine I can connect to both the gateway and subsequently to the remote server using SSH without any problems.
When I try to scp my directory dir to the remote server I have no clue how to move past the gateway, because my ssh connection is actually an two-step approach. Scp'ing dir to the gateway first fails, with the remark "Permission denied".
Something like
~$: scp -r /var/www/dir usrname#remotesrv.com:/var/www/dircp
doesn't work and the only approach I found so far involves public/private keys. Is it only possible to copy files through a gateway with keys? And if that's so, can somebody tell me how to overcome the problem with copy&pasting into the terminal which sometimes just won't work (using Ubuntu 11.10). Already installed autokey hoping to circumvent buggy Ubuntu shortcuts by changing them to another hotkey, but the program is crashing all the time.
I would appreciate your help in one way or another!

Resources