Trace for function name from the output of cuda-memcheck - memory

I'm running to cuda-memcheck to debug my code and the output is as follows
========= Program hit cudaErrorCudartUnloading (error 29) due to "driver shutting down" on CUDA API call to cudaFree.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x2e40d3]
========= Host Frame:./nmt [0x53526]
========= Host Frame:./nmt [0xfbd9]
terminate called after throwing an instance of '========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 [0x3c259]
========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 [0x3c2a5]
========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xfc) [0x21ecc]
thrust::system::system_error'
========= Host Frame:./nmt [0x530a]
=========
what(): driver shutting down
========= Error: process didn't terminate successfully
========= Internal error (20)
========= No CUDA-MEMCHECK results found
Is it possible to tell from the line Host Frame:./nmt [0x53526] where is broken in the code? If so, how can I do that?

As #talonmies indicated (I suspect he will not mind if I post a CW answer), the cuda-memcheck tool provides additional stack back tracing capability, which can be enabled with the --show-backtrace switch added to the command line.
The back trace may consist of both host and device functions (i.e. host and device back traces.)
If the application has been also compiled with host debug symbol information (e.g. -g on linux) then cuda-memcheck can show function names for the host functions in the host backtrace.
Additional usage information is available in the documentation.

For me cuda-memcheck with different sub-tools such as memcheck, racecheck, initcheck and synccheck often produces host backtraces without the line numbers or even without the host functions mentioned. Searching on the internet only shows this question, but I already pass -g or even -g3 to the host compiler, and --show-backtrace flag to cuda-memcheck is said in the docs to be yes by default (passing it explicitly doesn't help). So I do the following with the backtrace:
Consider your compiled program is called a.out and you get a line in the host backtrace like Host Frame:./nmt [0x530a]. Then open your program in cuda-gdb with:
cuda-gdb a.out
Then, let your program load all the shared libraries (at least up to a point in main() function). Enter the following in cuda-gdb prompt:
b main
r
Then, look up the function name with:
info symbol 0x530a
Or look up the line number with:
info line *0x530a
Where 0x530a is the address cuda-memcheck printed for you. I guess NVIDIA could automate this easily (as well as demangling the host function names where they are printed).

Related

getting this error while trying to run cube js

I'm getting this error while trying to run cube js with the default command in the getting started docs. I've started this in a folder and running it in docker.
Warning. There is no cube.js file. Continue with environment variables
πŸ”₯ Cube Store (0.28.31) is assigned to 3030 port.
Warning. Option apiSecret is required in dev mode. Cube.js has generated it as e3b8c5a35fe378f4d481ada777e5f3c4
πŸ”“ Authentication checks are disabled in developer mode. Please use NODE_ENV=production to enable it.
πŸ¦… Dev environment available at http://localhost:4000
πŸš€ Cube.js server (0.28.31) is listening on 4000
2021-09-03 15:06:01,512 INFO [cubestore::http::status] <pid:17> Serving status probes at 0.0.0.0:3031
2021-09-03 15:06:01,515 INFO [cubestore::metastore] <pid:17> Using existing metastore in /cube/conf/.cubestore/data/metastore
thread '
main
' panicked at '
called `Result::unwrap()` on an `Err` value: Error { message: "IO error: While fsync: a directory: Invalid argument" }
', /project/cubestore/src/metastore/mod.rs:1542:40
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Cube Store Start Error: undefined
I guess it’s corrupted metastore due to it was incorrectly shutdown for you locally. Could you please try to drop the .cubestore directory?

How to debug headless pdf printing problems in chrome?

Note: this is not (directly) a question about how to print PDF in chrome, instead this is a question about how to get more information when printing fails.
In short: I cannot solve a printing PDF problem, which occurs only for certain (presumably large) pages and could use some assistance in debugging the actual issue.
Background: I am using the chromedriver (v83) and chromium-browser (v83) to print PDF files from webpages by utilizing python selenium. I am building a docker image to contain the required dependencies for this. I have tried to use Debian (buster and stretch) as well as Alpine base images, but all eventually result in the same error, when trying to print some pages. The odd thing is that for other (smaller) pages the printing works, but when many assets and pages are to be printed, the printing fails. I might add that this docker images is eventually being run inside of a Kubernetes cluster, where I assigned up to 4GB of RAM.
What code am I running?
This project was written for python3, so here are some relevant code fragments. Please note that I removed all error handling and waiting for the page loads to complete here.
from selenium import webdriver
appState = {
"recentDestinations": [
{
"id": "Save as PDF",
"origin": "local"
}
],
"selectedDestinationId": "Save as PDF",
"version": 2
}
def get_chrome_options(headless: bool, enable_logging: bool) -> Options:
chrome_options = webdriver.ChromeOptions()
profile = {'printing.print_preview_sticky_settings.appState': json.dumps(appState)}
chrome_options.add_experimental_option('prefs', profile)
if headless:
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--window-size=1920,1080')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--disable-web-security')
chrome_options.add_argument('-–allow-file-access-from-files')
chrome_options.add_argument('--run-all-compositor-stages-before-draw')
chrome_options.add_argument('--kiosk-printing')
if enable_logging:
chrome_options.add_argument('--enable-logging')
return chrome_options
def print_the_page(url):
driver = webdriver.Chrome(chrome_options=get_chrome_options(headless, enable_logging))
driver.execute(driver_command=Command.GET, params={'url': url})
command_url = f"{driver.command_executor._url}/session/{driver.session_id}/chromium/send_command_and_get_result"
response = driver.command_executor._request('POST', command_url, json.dumps({'cmd': 'Page.printToPDF', 'params': {}}))
Then what happens?
For some pages this fails - meaning - there is this message in the response:
{'status': 500, 'value': '{"value":{"error":"unknown error","message":"unknown error: unhandled inspector error: {\\"code\\":-32000,\\"message\\":\\"Printing failed\\"}\\n (Session info: headless chrome=83.0.4103.116)","stacktrace":""}}'}
[UPDATE]
I have managed to produce some more error output when using the --print-to-pdf option directly, which seems to hint at an "out-of-memory" issue here:
[0923/135406.102857:WARNING:discardable_shared_memory_manager.cc(194)] Less than 64MB of free space in temporary directory for shared memory files: 23
[0923/135406.110108:WARNING:dns_config_service_posix.cc(341)] Failed to read DnsConfig.
[0923/135406.180892:WARNING:dns_config_service_posix.cc(341)] Failed to read DnsConfig.
[0923/135406.613221:FATAL:memory.cc(38)] Out of memory. size=796176
Received signal 6
r8: 00007fa6f39dadc4 r9: 0000000000000000 r10: 0000000000000008 r11: 0000000000000246
r12: 0000557efd1b0660 r13: 0000000000000000 r14: 00007fa6f39db240 r15: 0000000000000043
di: 0000000000000002 si: 00007fa6f39dac90 bp: 00007fa6f39dac90 bx: 0000000000000000
dx: 0000000000000000 ax: 0000000000000000 cx: 00007fa6fd347a71 sp: 00007fa6f39dac88
ip: 00007fa6fd347a71 efl: 0000000000000246 cgf: 002b000000000033 erf: 0000000000000000
trp: 0000000000000000 msk: 0000000000000000 cr2: 0000000000000000
[end of stack trace]
Calling _exit(1). Core file will not be generated.
[0923/135406.626313:ERROR:headless_shell.cc(399)] Abnormal renderer termination.
I will note here that I have been running this docker container locally on my machine (which has more than enough RAM) as well as on a Kubernetes cluster where this image has requested 4GB RAM. I also monitored the RAM usage and it didn't seem to be an issue - although - that could be illusive if the RAM usage is so radically high that chrome just fails and you never really see that in the overall RAM usage.
[UPDATE 2]
I have tried to use the --print-to-pdf option again, but I am seeing issues with that as well. The resources are loading, but the printing still fails.
β”‚ [0923/144355.169080:ERROR:bus.cc(393)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory
...
β”‚ [0923/141758.393923:WARNING:dns_config_service_posix.cc(341)] Failed to read DnsConfig. β”‚
β”‚ [0923/141758.401925:ERROR:zygote_host_impl_linux.cc(262)] Failed to adjust OOM score of renderer with pid 32: Permission denied (13) β”‚
β”‚ [0923/141758.413475:ERROR:zygote_host_impl_linux.cc(262)] Failed to adjust OOM score of renderer with pid 36: Permission denied (13)
... loading all the resources ...
β”‚ [0923/141824.611661:ERROR:print_render_frame_helper.cc(1889)] Printing failed. β”‚
β”‚ [0923/141824.612439:ERROR:headless_shell.cc(562)] Print to PDF failed
What's the question(s)?
How can I get more information about why the "Printing failed" - unfortunately the "unknown error: unhandled inspector error" hasn't given me any ideas about how to proceed.
Are there potentially any additional flags to get more debug output from chrome or is there a log somewhere that I should be able to find?
What else have I tried?
I have initially been running this under Debian buster with the latest google-chrome and chromium binaries (v85). I have switched to an Alpine base image and chromium - hoping that this might change something, but it didn't.
I have experimented with setting up Xvfb ${DISPLAY} -screen ${SCREEN} ${RESOLUTION} & in Docker, but it didn't seem to have any effect either.
I have tried to switch to using the direct cli google-chrome --print-to-pdf= option, but since it's a page that requires to pass through login authentication, I could only get the login page to print and it also seems to have some not so nice formatting issues.
I have been running this on my machine, outside of Docker, and was able to print as expected, but as soon as I put the same code inside a Docker container, it fails.
Unfortunately, I cannot share the page where this fails with you.
The relevant warning from your logs seems to be this:
[0923/135406.102857:WARNING:discardable_shared_memory_manager.cc(194)] Less than 64MB of free space in temporary directory for shared memory files: 23
The problem appears to stem from Docker's mounted /dev/shm being too small for Chromium to do things like you're trying to do.
I found a closed bug report against chromium referencing this issue in certain limited environments such as AWS Lambda and Docker, it was fixed in chromium v65 behind a command line flag --disable-dev-shm-usage.
The last few comments reference another bug report (now closed) about this issue in chromium v83 where the command line flag was not working properly. It has been fixed in version 84 - per comment 28:
You can find the fix in current stable release of Chrome (version 84.0.4147.89 and above).
You've indicated you're using chromium v83, so you'll need to update at least version 84.0.4147.89, then use the command line flag --disable-dev-shm-usage.

NVIDIA Jetson Nano with Realsense 435i via Isaac - Camera not found

I posted about this over on the Isaac forums, but listing it here for visibility as well. I am trying to get the Isaac Realsense examples working on a Jetson Nano with my 435i (firmware downgraded to 5.11.15 per the Isaac documentation), but I've been unable to so far. I've got a Nano flashed with Jetpack4.3 and have installed all dependencies on both the desktop and the Nano. The realsense-viewer works fine, so I know the camera is functioning properly and is being detected by the Nano. However, when I run ./apps/samples/realsense_camera/realsense_camera it throws an error:
ERROR engine/alice/components/Codelet.cpp#229: Component 'camera/realsense' of type 'isaac::RealsenseCamera' reported FAILURE:
No device connected, please connect a RealSense device
ERROR engine/alice/backend/event_manager.cpp#42: Stopping node 'camera' because it reached status 'FAILURE'
I've attached the log of this output as well. I get the same error running locally on my desktop, but that's running through WSL so I was willing to write that off. Any suggestions would be greatly appreciated!
0m2020-06-15 17:18:20.620 INFO engine/alice/tools/websight.cpp#166: Loading websight...0m
33m2020-06-15 17:18:20.621 WARN engine/alice/backend/application_json_loader.cpp#174: This application does not have an explicit scheduler configuration. One will be autogenerated to the best of the system's abilities if possible.0m
0m2020-06-15 17:18:20.622 INFO engine/alice/backend/redis_backend.cpp#40: Successfully connected to Redis server.
0m
33m2020-06-15 17:18:20.623 WARN engine/alice/backend/backend.cpp#201: This application does not have an execution group configuration. One will be autogenerated to the best of the systems abilities if possible.0m
33m2020-06-15 17:18:20.623 WARN engine/gems/scheduler/scheduler.cpp#337: No default execution groups specified. Attempting to create scheduler configuration for 4 remaining cores. This may be non optimal for the system and application.0m
0m2020-06-15 17:18:20.623 INFO engine/gems/scheduler/scheduler.cpp#290: Scheduler execution groups are:0m
0m2020-06-15 17:18:20.623 INFO engine/gems/scheduler/scheduler.cpp#299: __BlockerGroup__: Cores = [3], Workers = No0m
0m2020-06-15 17:18:20.623 INFO engine/gems/scheduler/scheduler.cpp#299: __WorkerGroup__: Cores = [0, 1, 2], Workers = Yes0m
0m2020-06-15 17:18:20.660 INFO engine/alice/backend/modules.cpp#226: Loaded module 'packages/realsense/librealsense_module.so': Now has 45 components total0m
0m2020-06-15 17:18:20.679 INFO engine/alice/backend/modules.cpp#226: Loaded module 'packages/rgbd_processing/librgbd_processing_module.so': Now has 51 components total0m
0m2020-06-15 17:18:20.696 INFO engine/alice/backend/modules.cpp#226: Loaded module 'packages/sight/libsight_module.so': Now has 54 components total0m
0m2020-06-15 17:18:20.720 INFO engine/alice/backend/modules.cpp#226: Loaded module 'packages/viewers/libviewers_module.so': Now has 83 components total0m
90m2020-06-15 17:18:20.720 DEBUG engine/alice/application.cpp#348: Loaded 83 components: isaac::RealsenseCamera, isaac::alice::BufferAllocatorReport, isaac::alice::ChannelMonitor, isaac::alice::CheckJetsonPerformanceModel, isaac::alice::CheckOperatingSystem, isaac::alice::Config, isaac::alice::ConfigBridge, isaac::alice::ConfigLoader, isaac::alice::Failsafe, isaac::alice::FailsafeHeartbeat, isaac::alice::InteractiveMarkersBridge, isaac::alice::JsonToProto, isaac::alice::LifecycleReport, isaac::alice::MessageLedger, isaac::alice::MessagePassingReport, isaac::alice::NodeStatistics, isaac::alice::Pose, isaac::alice::Pose2Comparer, isaac::alice::PoseFromFile, isaac::alice::PoseInitializer, isaac::alice::PoseMessageInjector, isaac::alice::PoseToFile, isaac::alice::PoseToMessage, isaac::alice::PoseTree, isaac::alice::PoseTreeJsonBridge, isaac::alice::PoseTreeRelink, isaac::alice::ProtoToJson, isaac::alice::PyCodelet, isaac::alice::Random, isaac::alice::Recorder, isaac::alice::RecorderBridge, isaac::alice::Replay, isaac::alice::ReplayBridge, isaac::alice::Scheduling, isaac::alice::Sight, isaac::alice::SightChannelStatus, isaac::alice::Subgraph, isaac::alice::Subprocess, isaac::alice::TcpPublisher, isaac::alice::TcpSubscriber, isaac::alice::Throttle, isaac::alice::TimeOffset, isaac::alice::TimeSynchronizer, isaac::alice::UdpPublisher, isaac::alice::UdpSubscriber, isaac::map::Map, isaac::map::ObstacleAtlas, isaac::map::OccupancyGridMapLayer, isaac::map::PolygonMapLayer, isaac::map::WaypointMapLayer, isaac::navigation::DistanceMap, isaac::navigation::NavigationMap, isaac::navigation::RangeScanModelClassic, isaac::navigation::RangeScanModelFlatloc, isaac::rgbd_processing::DepthEdges, isaac::rgbd_processing::DepthImageFlattening, isaac::rgbd_processing::DepthImageToPointCloud, isaac::rgbd_processing::DepthNormals, isaac::rgbd_processing::DepthPoints, isaac::rgbd_processing::FreespaceFromDepth, isaac::sight::AliceSight, isaac::sight::SightWidget, isaac::sight::WebsightServer, isaac::viewers::BinaryMapViewer, isaac::viewers::ColorCameraViewer, isaac::viewers::DepthCameraViewer, isaac::viewers::Detections3Viewer, isaac::viewers::DetectionsViewer, isaac::viewers::FiducialsViewer, isaac::viewers::FlatscanViewer, isaac::viewers::GoalViewer, isaac::viewers::ImageKeypointViewer, isaac::viewers::LidarViewer, isaac::viewers::MosaicViewer, isaac::viewers::ObjectViewer, isaac::viewers::OccupancyMapViewer, isaac::viewers::PointCloudViewer, isaac::viewers::PoseTrailViewer, isaac::viewers::SegmentationCameraViewer, isaac::viewers::SegmentationViewer, isaac::viewers::SkeletonViewer, isaac::viewers::TensorViewer, isaac::viewers::TrajectoryListViewer, 0m
33m2020-06-15 17:18:20.723 WARN engine/alice/application.cpp#164: The function Application::findComponentByName is deprecated. Please use `getNodeComponentOrNull` instead. Note that the new method requires a node name instead of a component name. (argument: 'websight/isaac.sight.AliceSight')0m
0m2020-06-15 17:18:20.723 INFO engine/alice/application.cpp#255: Starting application 'realsense_camera' (instance UUID: 'e24992d0-af66-11ea-8bcf-c957460c567e') ...0m
90m2020-06-15 17:18:20.723 DEBUG engine/gems/scheduler/execution_groups.cpp#476: Launching 0 pre-start job(s)0m
90m2020-06-15 17:18:20.723 DEBUG engine/gems/scheduler/execution_groups.cpp#485: Replaying 0 pre-start event(s)0m
90m2020-06-15 17:18:20.723 DEBUG engine/gems/scheduler/execution_groups.cpp#476: Launching 0 pre-start job(s)0m
90m2020-06-15 17:18:20.723 DEBUG engine/gems/scheduler/execution_groups.cpp#485: Replaying 0 pre-start event(s)0m
0m2020-06-15 17:18:20.723 INFO engine/alice/backend/asio_backend.cpp#33: Starting ASIO service0m
0m2020-06-15 17:18:20.727 INFO packages/sight/WebsightServer.cpp#216: Sight webserver is loaded0m
0m2020-06-15 17:18:20.727 INFO packages/sight/WebsightServer.cpp#217: Please open Chrome Browser and navigate to http://<ip address>:30000m
33m2020-06-15 17:18:20.727 WARN engine/alice/backend/codelet_canister.cpp#225: Codelet 'websight/isaac.sight.AliceSight' was not added to scheduler because no tick method is specified.0m
33m2020-06-15 17:18:20.728 WARN engine/alice/components/Codelet.cpp#53: Function deprecated. Set tick_period to the desired tick paramater0m
33m2020-06-15 17:18:20.728 WARN engine/alice/backend/codelet_canister.cpp#225: Codelet '_check_operating_system/isaac.alice.CheckOperatingSystem' was not added to scheduler because no tick method is specified.0m
33m2020-06-15 17:18:20.728 WARN engine/alice/components/Codelet.cpp#53: Function deprecated. Set tick_period to the desired tick paramater0m
33m2020-06-15 17:18:20.730 WARN engine/alice/components/Codelet.cpp#53: Function deprecated. Set tick_period to the desired tick paramater0m
1;31m2020-06-15 17:18:20.741 ERROR engine/alice/components/Codelet.cpp#229: Component 'camera/realsense' of type 'isaac::RealsenseCamera' reported FAILURE:
No device connected, please connect a RealSense device
0m
1;31m2020-06-15 17:18:20.741 ERROR engine/alice/backend/event_manager.cpp#42: Stopping node 'camera' because it reached status 'FAILURE'0m
33m2020-06-15 17:18:20.743 WARN engine/alice/backend/codelet_canister.cpp#225: Codelet 'camera/realsense' was not added to scheduler because no tick method is specified.0m
0m2020-06-15 17:18:21.278 INFO packages/sight/WebsightServer.cpp#113: Server connected / 10m
0m2020-06-15 17:18:30.723 INFO engine/alice/backend/allocator_backend.cpp#57: Optimized memory CPU allocator.0m
0m2020-06-15 17:18:30.724 INFO engine/alice/backend/allocator_backend.cpp#66: Optimized memory CUDA allocator.0m

Jenkins is spawning a lot of daemon processes and server crashes

I've recently installed Jenkins on a cheap VM on Azure. The specs are very low, since I use this server for testing the setup: 1vCPU & 1GB RAM. There will usually only be 1 build at the same time, with a max. of 3, in very rare occassions.
During the build process from Jenkins quite frequently my server would crash completely and stay so for +- 10 - 15 minutes until being able to be used again.
I checked the processes on the server and this is the result:
The full line is like this:
/etc/alternatives/java -Dcom.sun.akuma.Daemon=daemonized -Djava.awt.headless=true -DJENKINS_HOME=/var/lib/jenkins -jar /usr/lib/jenkins/jenkins.war --logfile=/var/log/jenkins/jenkins.log --webroot=/var/cache/jenkins/war --daemon --httpPort=8080 --debug=5 --handlerCountMax=100 --handlerCountMaxIdle=20
It is the same for every single one of those daemons, not a single parameter is different.
Is this normal behavior, and is this the reason why my server is crashing? Or are my specs just too low for Jenkins to run on to?
Thanks in advance!
EDIT:
My jenkins.log file looks pretty normal except for one NullPointerException that keeps coming back up:
2020-01-08 12:43:17.702+0000 [id=148] WARNING h.ExpressionFactory2$JexlExpression#evaluate: Caught exception evaluating: h.filterDescriptors(it,attrs.descriptors) in /configure. Reason: java.lang.NullPointerException: Descriptor list is null for context 'class hudson.model.Hudson' in thread 'Handling GET /configure from 85.154.65.124 : qtp2085857771-148 Jenkins/configure.jelly GlobalLibraries/config.jelly LibraryConfiguration/config.jelly SCMRetriever/DescriptorImpl/config.jelly MultiSCM/DescriptorImpl/config.jelly'
java.lang.NullPointerException: Descriptor list is null for context 'class hudson.model.Hudson' in thread 'Handling GET /configure from 85.154.65.124 : qtp2085857771-148 Jenkins/configure.jelly GlobalLibraries/config.jelly LibraryConfiguration/config.jelly SCMRetriever/DescriptorImpl/config.jelly MultiSCM/DescriptorImpl/config.jelly'
at hudson.model.DescriptorVisibilityFilter.apply(DescriptorVisibilityFilter.java:73)
...

Flume NullPointerExceptions on checkpoint

I've setup a file to file source/sink , just as a test of basic flume functionality.
Im currently using the "exec" source, with the command being "tail -F mytmpfile".
In my script, I continuously echo "....." >> mytmpfile , so that the tail command constitutes a stream.
However, I've started seeing the following exception in the flume logs:
java.lang. IllegalStateException: Channel closed [channel=c1]. Due to
java.lang.NullPointerException: null
at org.apache.flume.channel.file.FileChannel.createTransaction(FileChannel.java:353)
at org.apache.flume.channel.BasicChannelSemantics.getTransaction(BasicChannelSemantics.java:122)
at org.apache.flume.sink.RollingFileSink.process(RollingFileSink.java:183)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.NullPointerException
at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:895)
at org.apache.flume.channel.file.Log.replay(Log.java:406)
at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:303)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:236)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
... 1 more
Any thoughts on where this NullPointerException is coming from? It appears from scanning the code that maybe it related to a missing folder or directory. But I cant find the exact line on the git hub branches.
This is using apache-flume-1.3.1.23-...
In the past I've had problems with file channels, and they've normally boiled down to two problems:
1) If you're running multiple agents on the same box, make sure you configure them to have separate dataDirs and checkpointDir.
2) On Linux boxes, check that your tmpfs isn't near its capacity. If it's getting full, flume will complain. Try stopping the flume agent, unmount tmpfs, enlarge it, remount and restart the agent.

Resources