jenkins error out on large file size - jenkins

I am new to jenkins. I am trying to build a repo as a production on the server but when I click on 'build now' all files succeed except one large file with error as:
FTP: Caught exception [IOException caught while copying.] Sleeping for [10,000]ms before trying again
the file size is 137MB and it is an mp4 file.
I updated the plugins for both publish over ftp/ssh and still same problem.
help please

There is default time-out setting with SSH plugin. You can set it as long it takes to complete your task. Or simply set to 0 (zero) to avoid any timeout. With 0, it will basically keep the connection.

Related

Why Jest tests are SOMETIMES failing on CircleCI?

I have Jest tests that are running against the dockerized Neo4j Database, and sometimes they fail on CircleCI. The error message for all 25+ of them is :
thrown: "Exceeded timeout of 5000 ms for a hook.
#*******api: Use jest.setTimeout(newTimeout) to increase the timeout value, if this is a long-running test."
Since they fail sometimes, like once in 25 runs, I am wondering if jest.setTimeout will solve the issue. I was able to fail them locally by setting jest.setTimeout(10), but I am not sure how to debug this even more, or whether something else could be an issue here aside from a small timeout (default 5000). I would understand if 1/25 or a few fails, or if all other suits fail, but only a single file with all tests within that file is failing. And it is always the same file, never some other file for this reason ever.
Additional information, locally, that single file runs in less than a 1000ms connected to the staging database which is huge compared to the dockerized that has only a few files at the time of running
For anyone who sees this, I was able to solve this by adding the --maxWorkers=2 flag to the test command in my CircleCI config. See here for details: https://support.circleci.com/hc/en-us/articles/360005442714-Your-test-tools-are-smart-and-that-s-a-problem-Learn-about-when-optimization-goes-wrong-
Naman's answer is perfect! I couldn't believe it but it really solved my problem. Just to be extra clear on how to do it:
I change the test script from my package.json from jest to jest --maxWorkers=2. Then I pushed and it did solve my error.

Job stuck after manual jobs deletion

I performed a manual clean by deleting job folders directly into the filesystem, and now I find a stucked running job that I cannot abort.
I've tried the answers here to force it to be stopped, but it doesn't work as it is not able to find the existing job in the system.
Additionally, when I click over the running job I get a 404 error:
"Problem accessing <route_to_job_that_doesnt_exist_anymore>"
Reason: Not found
Is there something I can do to abort this running job without restarting the server?
A way to stop a build (Like actually aborting it) is by adding a /stop at the end of the job url, behind the Build Number.
Example: http://myjenkins/project/123/stop
If this doesn't work, there is also the "Hard Kill". Instead of adding /stop you add /kill. I guess you need Admin Access for that POST action.
Don't know though if it works for jobs that don't exist on the Jenkins Host anymore due to missing filesystems

AWS Codedeploy deployments failing intermittently with no logging information and no deployment id wise directory getting created

We have a Jenkins triggered build setup, which has AWS code deploy setup in post build action. The first deployment of the day happened as follows -
Kept showed in progress in the AWS codedeploy console indefinitely, even after 10 minutes.
Jenkins timeout happening resulting the following last status
Following are the last log lines from deployment logs -
Deployment status: InProgress; instances: {Pending: 0,InProgress: 3,Succeeded: 0,Failed: 0,Skipped: 0}
Exceeded maximum polling time of 500000 milliseconds.
Deployment status: InProgress; instances: {Pending: 0,InProgress: 3,Succeeded: 0,Failed: 0,Skipped: 0}
Deployment did not succeed. Final status: InProgress
I had to manually stop the deployment from AWS console.
Our custom generated log file showed the script corresponding to the afterInstall step executed till the end.
Deployment ID wise directory created in designated place /opt/codedeploy-agent/deployment-root/3dfdc563-66c5-47a0-98f8-01605d25a6e9/ and following was the last line from the /opt/codedeploy-agent/deployment-root/deployment-logs/codedeploy-agent-deployments.log file (which are not fatal errors and are not expected to keep the build hanging) -
[2017-08-10 07:10:22.484] [d-C2A5P270O][stderr]ls: cannot access tests/hiphop_errors.txt: No such file or
directory
[2017-08-10 07:10:27.511] [d-C2A5P270O][stderr]cat: /opt/codedeploy-agent/deployment-root/3dfdc563-66c5-4
7a0-98f8-01605d25a6e9/d-C2A5P270O/deployment-archive/tests/dummy_nginx_access_logs.txt: No such file or directory
Following are screenshots from AWS code deploy console -
Details against a single instance show no details about the error -
Then I did a dos2unix of all the hook files (files which get executed at the various deployment steps, i.e. afterInstall, beforeInstall etc) in the build server, just to be sure, because I had faced similar issues earlier after copy-pasting windows files/code. After this, took two more builds, and now I see the following -
Deployment failed on all the instances. AWS code deploy console showed failed against all the instances, with no details -
No deployment-id specific directories created at designated place, as in previous case. No deployment logs created at all.
I am clueless again due to code deploy behaving weirdly. Yesterday, we had observed a similar issue of log files not generating. A reinstall of the codedeploy agent was done and then deployment logs were creating fine. But, how many times should we do a new install blindly?
Update
Adding contents of /var/log/aws/codedeploy-agent/codedeploy-agent.log file for the case where the deployment kept showing in progress indefinitely, as pointed out in the answer by #EmptyArsenal (I do not notice anything wrong in the log) -
2017-08-21 11:13:47 INFO [codedeploy-agent(1983)]: Version file found in /opt/codedeploy-agent/.version.
2017-08-21 11:13:47 INFO [codedeploy-agent(1983)]: [Aws::CodeDeployCommand::Client 200 0.065601 0 retries] poll_host_command(host_identifier:"arn:aws:ec2:us-east-1:377703961998:instance/i-e551e37d")
2017-08-21 11:13:47 INFO [codedeploy-agent(1983)]: Version file found in /opt/codedeploy-agent/.version.
2017-08-21 11:13:47 INFO [codedeploy-agent(1983)]: [Aws::CodeDeployCommand::Client 200 0.044413 0 retries] put_host_command_acknowledgement(diagnostics:nil,host_command_identifier:"WyJjb20uYW1hem9uLmFwb2xsby5kZXBsb3ljb250cm9sLmRvbWFpbi5Ib3N0Q29tbWFuZElkZW50aWZpZXIiLHsiZGVwbG95bWVudElkIjoiQ29kZURlcGxveS91cy1lYXN0LTEvUHJvZC9hcm46YXdzOnNkczp1cy1lYXN0LTE6Mzc3NzAzOTYxOTk4OmRlcGxveW1lbnQvZC1XNDFCV0tLN08iLCJob3N0SWQiOiJhcm46YXdzOmVjMjp1cy1lYXN0LTE6Mzc3NzAzOTYxOTk4Omluc3RhbmNlL2ktZTU1MWUzN2QiLCJjb21tYW5kTmFtZSI6IkFmdGVySW5zdGFsbCIsImNvbW1hbmRQb3NpdGlvbiI6NSwiY29tbWFuZEF0dGVtcHQiOjF9XQ==")
2017-08-21 11:13:47 INFO [codedeploy-agent(1983)]: Version file found in /opt/codedeploy-agent/.version.2017-08-21 11:13:47 INFO [codedeploy-agent(1983)]: [Aws::CodeDeployCommand::Client 200 0.027061 0 retries] get_deployment_specification(deployment_execution_id:"CodeDeploy/us-east-1/Prod/arn:aws:sds:us-east-1:377703961998:deployment/d-W41BWKK7O",host_identifier:"arn:aws:ec2:us-east-1:377703961998:instance/i-e551e37d")
In the case where the hosts never start executing lifecycle events, it's almost always the case where the agent isn't install, it's not running, or permissions are not set up properly. You should only have to install the agent once, so I'm not sure why you had to reinstall. Perhaps the agent died, and it just wasn't running (though it should restart on its own).
I would check out logs at /var/log/aws/codedeploy-agent/codedeploy-agent.log. Those are the agent logs and are not deployment specific. If the agent is crashing, you should see information there.
As for the deployment specific error your seeing, it looks like a script error to me. Either the files you're trying to access don't exist or you don't have proper permissions to interact with them. You can fix that in your appspec. If those are failing anywhere, you might want to do a deployment without running those commands to validate that the errors really aren't fatal.

Waiting for an available agent / Waiting for an agent to be requested

(26.07.2016)I am using TFS2015 Update3 in a VM.
When I try to queue a build through the web interface or from Team Explorer, I get the following.
Then I restart all services related to TFS in services.msc and then after some time it starts working again.
So this happens too often.
I have a custom pool running:
Is there a way to debug this behaviour?
Examining the Log files
Link to Worker log file
Link to Agent log file
Exception occurs in this order here:
Checking if artifacts directory exists C:\workspaces\agent\_work\2\a
Deleting artifacts directory
System.ComponentModel.Win32Exception (0x80004005): The directory is not empty
at Microsoft.TeamFoundation.Common.FileSpec.DeleteDirectoryLongPath(String path, Boolean recursive, Boolean followJunctionPoints)
The weird thing is, queueing new build works most of the time, this happens only sporadically
It could be, that I have opened a file from that folder in notepad with many tabs open. Will observe if this issue persists and report.
If this is happening sporadically, it might a long path exists here in artifacts:
C:\workspaces\agent_work\2\a
Or, there was a cancelled build which left the artifact directory half cleaned which exposed a bug in cleaning.
The 2.x agent isn't subject to long paths (net core) but only works with 2017+:
https://github.com/Microsoft/vsts-agent
We can troubleshoot but it would be good to get to 2017+ (2018 QU3 is out) with a 2.x agent.
If that's not an option, message me and we can dig into what I think is a cancel / state bug.

Jenkins - Publish Over CIFS Plugin error

I'm using this Publish over CIFS Plugin and I contiinous get an error, even though the copy succeeds. What I'm trying to do is to copy all the contents of a build results directory, all all it's assets, to the remove host. However I get an error message that I can't explain, and the on-line help is failing me.
On the Transfers Section I have only 1 block and this is the setup
Source files: build/123.456/**
Remove prefix: build/
Remote directory: builds/this_release/Latest/
Below are the error messages I get.
CIFS: Connecting from host [my-host]
CIFS: Connecting with configuration [to-host] ...
CIFS: Disconnecting configuration [to-host] ...
ERROR: Exception when publishing, exception message [A Transfer Set must contain Source files - if you really want to include everything, set Source files to **/ or **\]
Build step 'Send build artifacts to a windows share' changed build result to UNSTABLE
What I don't understand is that files under the 'build/123.456/', and sub-directories, get copied as I wanted but still I get an error. Any suggestions on how to correct that ? I've tried removing the '**' and it still works, but I still get an error.
Actually I've found the reason for my error.
I had a second (empty) Transfer Set defined on my job, with no fields filled in
This Set was the reason for the error message.

Resources