Hi I am implementing Email Client Application. My requirement is i need to monitor all the mailboxes available in specified IMAP server. I am created separate TCP Connection for each mailboxes. But i am getting disconnected from IMAP Server. I am trying Gmail/yahoo for my testing purpose. Is there any restriction to open multiple connection from same ip to particular IMAP Server? Particularly in Gmail and Yahoo.
or is there anyway to Monitor all the mailboxes in Single Connection without using IMAP-NOTIFY seems it does not supported in both Gmail/Yahoo...
Please Help me out...
This is something which I have answered on stackoverflow before, but which is now only available via the wayback machine. The question was about how to "kill too many parallel IMAP connections". Reprinted below; the core takeaway message is that for some reason, most server administrators prefer to have smaller number of short-lived connections instead of more connections which are active over longer period of time, yet they spend most of their time silently idling in the background. What they do not get is that the IMAP protocol is designed with long-lived connections in mind, and trying to prevent that will lead to wasting resources because the clients will constantly resync mailboxes as they are hopping among them.
The original answer follows:
Nope, it's a very wrong idea. IMAP is designed so that monitoring a single mailbox takes one connection; in most IMAP server implementations, this means a single process. However, unless the client the user is using is terribly broken, all these connections enter the IDLE mode. In IDLE, the clients are passively notified about any updates to the mailbox state. If you disable these connections, the clients would have to activelly poll for changes in many mailboxes. Now decide for yourself -- what is worse, having ten processes sitting idle, or one process doing heavy polling every two minutes? Which of these solutions would consume more energy, CPU time and IO operations? That's for the number of parallel connections.
The second question was about the long-lived connections. Again, this is a critical aspect of IMAP -- each connection carries a lot of associated state information which is rather expensive to obtain. Unless your server implements certain extensions and your clients use them (ESEARCH, CONDSTORE, QRESYNC are the crucial bits), opening a mailbox can require O(n) operations. I don't know how many messages your users have, but do you really want to transfer e.g. message flags for 250k messages when you decided to kill a connection because it has been active for "too long"?
Finally, any reasonable IMAP server vendor offers a way to configure a per-user session limit on the number of concurrent processes. Using that is much better than maintaining a script for ad-hoc killing of "unused" connections.
If you would like to learn more about the synchronization process, my thesis about using IMAP on clients with flaky network and limited resources describes what the clients have to do in order to show an updated view of mailboxes to their users.
Related
I would like to understand networking services with a large user base a bit better so that I know how to approach a project I am busy with.
The following statements that I make may be incorrect but they still lead to the question that I want to ask...
Please consider Skype and TeamViewer clients. It seems that both keep persistent network connections open to their respective servers. They use these persistent connections to initiate additional connections. Some of these connections are created by means of Hole Punching if the clients are behind NATs. They are then used for direct Peer-to-Peer communications.
Now according to http://expandedramblings.com/index.php/skype-statistics/ there are 300 million users using Skype and 4.9 million daily active users. I would assume that most of that 4.9 million users will most probably have their client apps running most of the day. That is a lot of connections to the Skype servers that are open at any given time.
So to my question; Is this feasible or at least acceptable? I mean, wouldn't it be better to not have a network connection open while idle and aspecially when there are so many connections open to the servers at once? The only reason I can think is that it would be the only way to properly do Hole Punching. Techically, how is this achieved on the server side?
Is this feasible or at least acceptable?
Feasible it certainly is, you mention already two popular apps that do it, so it is very doable in practice.
As for acceptable, to start no internet authority (e.g. IETF) has ever said it is unacceptable to have long-lived connections even with low traffic.
Furthermore, the only components for which this matters are network elements that keep connection/flow state. These are for sure the endpoints and so-called middleboxes like NAT and firewalls. For the client this is only one connection, the server is usually fine tuned by the application developers (who made this choice) themselves, so for these it is acceptable. For middleboxes it's simple: they have no choice, they're designed to just work with all kind of flows, including long-lived persistent connections.
I mean, wouldn't it be better to not have a network connection open while idle and aspecially when there are so many connections open to the servers at once?
Not at all. First of all, that could be 'much' slower as you'd need to set up a full connection before each control-plane call. This is especially noticeable if your RTT is big or if the servers do some complicated connection proxying/redirection for load-balancing/localization purposes.
Next to that this would historically make incoming calls difficult for a huge amount of users. Many ISP's block/blocked unknown incoming connections from the internet by means of a firewall. Similar, if you are behind a NAT device that does not support UPnP or PCP you can't open a port to listen on for your public IP address. So you need it even aside from hole-punching.
The only reason I can think is that it would be the only way to
properly do Hole Punching. Techically, how is this achieved on the
server side?
Technically you can't do proper hole-punching as soon as the NAT devices maintain a full <src-ip,src-port,dest-ip,dest-port,protocol> (classical 5-tuple) flow match. Then the best you can do with 'hole punching' is set up a proxy between peers.
What hole-punching relies on is that the NAT flow lookup is only looking at <src-ip,src-port,protocol> upstream and <dest-ip,dest-port,protocol> downstream to do the translation. In that case both clients just set up a connection to the server, their ip and port gets translated and the server passes this to the other client. The other client can now start sending packets to that translated <ip,port> combination which should work because NAT ignores the server's ip/port. But even if the particular NAT would work like this, some security device (e.g. stateful firewall) might detect session hi-jacking and drop this anyway.
Nowadays you rather use UPnP to open up a port to listen on your public IP which is much easier if supported.
Cowboy is webserver written in erlang. It spawns new process for each request and than using that process for subsequent requests if HTTP pipelining (sending multiple requests on same socket one after the other without waiting for the response and assuming that responses will be send back in same order as requests was sent) is used by client.
This is fine, but if you want to use that webserver for building realtime web app, it has one problem and that is when socket is closed for instance because of client network problems, the process representing that socket on the server is terminated. That means you can`t use that process for storing some session data (because in realtime web app you probably want to go behind the end of the http request (if long polling is used for instance) and have some state associated to the connected client and think about him as "he is online" even if the http request was ended.
In sock.js, it is solved by spawning one more process for each client (each session id).
So if you have 2000 clients using websockets, you will have around 4k processes (one process from cowboy that represents that socket and one more for keeping the session state alive for case that cowboy process will be terminated (for instance because of network problems).
THE QUESTION IS: i am relative new in erlang so i don`t know if it does make sense much in question of performance improvement, but i am thinking about rewriting that Cowboy webserver a bit so the process representing realtime connection will not ends until i want it (the process will be alive even when the underlying websocket socket will be terminated).
This will eliminate the needs to have one more session process for each client. So instead of 4000 processes you will have just 2000. Can it be huge performance booster in erlang?
Erlang is pretty good with processes, but, too much of anything ain't good. Using processes as direct mappings to sessions is not a good idea. Why not do it logically ? I assume you can have some IN-MEMORY storage, say, ETS, or even mnesia.
If am using Web Sockets to communicate, each user is connected via one such process, however, you simply map a certain random unique Session Key to each individual Process, hence to each individual user.
-record(client,{web_sock_pid, session_key,username}).
If the process exits, and the client end has a way pf reconnecting, once it re-identifies itself as the same user, then , the session key still holds, but the pid of the attached process has changed. it does not matter.
If it is NOT web sockets, and it is just HTTP REST/JSON/JSONP/XML services , then it is even very easy. Use ETS tables in RAM. A new session is stored and the parameters defining that session are store in RAM, then for each request, the session key can come along plus other parameters. Message delivery is by comet or frequent checks by the client end.
Sounds like you are doing some premature optimizations if you ask me.
Erlang processes are very inexpensive. You shouldn't really have to worry about spawning too manny processes.
Write it with two processes per websocket, then do some measurements to see where it is using the most memory and wasting the most cpu cycles.
Use case is to have a server connect to thousands of users email accounts and sniff incoming mail in java preferably with java mail and spring integration/amqp/rabbit mq type scalable infrastructure.. And imap idle type connections and add server processing nodes as needed.
Single inbound channel is easy with imap idle inbound adapter.. You could configure few in XML. But if you need a persistent listener/imapidlechannel adapters queue of thousands of these adapters and Needed to add new user connection dynamically for server processing.. This would be a challenge. Also need fault taulerance that if the java listener dies or server reboots all these listeners and their configuration also reboot vs rebuilding thousands of these connections and recovery if some connections loose their idle receive capability without rebuilding all user connections for the idle receiving.
Any ideas welcome as searched a lot however could not find anything? This seems to be a significant scalability issue about e mail receive connections open.
If you want to use the IMAP IDLE command to listen for new messages using JavaMail, you'll need one thread per mailbox, which is likely to impact your scalability. Even keeping thousands of connections open might be an issue.
You don't say how quickly you need to react to new messages. Unless you have near real time requirements, it might be better to poll a subset of mailboxes every so often, eventually cycling through all the mailboxes.
You'll need to deal with the fault tolerance issues yourself, using checkpointing or transactions or whatever seems appropriate for your application.
The other option is to perhaps take a look at something like Akka with actors performing the async io. You'll need to ditch the JavaMail package and parse the imap commands yourself but there's lots of packages out there to do that. Would love to hear if you had a better solution.
For my portfolio software I have been using fetchmail to read from a Google email account over IMAP and life has been great. Thanks to the miracle of idle connection supported by imap3, my triggers fire in near-realtime due to server push, much sooner than periodic polling would allow otherwise.
In my basic .fetchmailrc setup, in which a brokerage customer's account emails trade notifications to a dedicated Gmail/Google Apps box, I've had
poll imap.gmail.com proto imap user "youraddress#yourdomain-OR-gmail.com" pass "yoMama" keep nofetchall ssl idle mimedecode limit 29000 no rewrite mda "myCustomSpecialMDAhandler.sh %F %T"
Trouble is, now I need to support reading from multiple email boxes, and hand off the emails to other specialized MDA scripts I wrote. No problem, just add more poll lines to .fetchmailrc, right? Well that doesn't work when the other accounts also use imap.gmail.com. What ends up happening is that while one account reads fine (and not necessary the first one listed, though usually yes), the other is getting "socket error" all day and the emails remain untouched, unread. I can't figure out why and not even sure if it's some mechanism at imap.gmail.com or not, eg. limiting to one IMAP connection from a host. That doesn't seem right since I have kept IMAP connections to many separate Gmail & Google Apps accounts from the same client for years (like Thunderbird) and never noticed this exclusivity problem.
I haven't tried launching multiple fetchmail daemons using separate -f config files (assuming they wouldn't conflict), or deploying one or more getmail and other similar email fetchers in addition. Still trying to avoid that kind of mess--unscaleable the more boxes I have to monitor.
Don't have the reference offhand but somewhere in fetchmail's docs I recall reading that idle is not so much an imap feature as a fetchmail optional trick, which has a (nasty for me) side effect of choking off all other defined accounts from polling until the connection is cut off by some external event or timeout. So at least that would vindicate Google.
Credit to Carl's Whine Rack blog for some tips.
For now I use killall fetchmail; fetchmail -f fetcher.$[$RANDOM % $numaccounts].rc periodically from crontab to cycle reading accounts each defined individually in fetcher.1.rc, fetcher.2.rc, etc. Acceptable while email events are relatively infrequent.
I have over 50 clients connected to one server (low end server, running windows 2003 server), every time there is a power failure or switch failure the clients will disconnect from the server, the server might remain on during this incidents (if power backup is installed), when the clients came back they automatically detect the server and initiate a connection procedure, at this point the server will start dishing out the relevant data to the clients. Its at this point you realize some clients will start freezing becouse the server is not quick enough to dish out data and so it blocks the rest of the clients.
I have implemented a crude method to control this client storm but i was asking if guys out there have better algorithms to perform this kind of task.
NB: Am using Asta sockets components on a delphi application, but i dont mind examples from different fields,
Similar to network collision-detection protocols, perhaps clients could wait a random period of time before initiating their connection at startup?
In addition to the random startup delay suggested by Bremen, implement some sort of "too busy; try again later" message in your protocol. Rejecting a client with a short message should not be a problem for 50, 100, or even 1000 clients. Have the clients respond by doing a random delay and retrying + exponential backoff.
The solution depends on your preferences as well. Is it ok for you to drop down the connections request or send busy message?
Another option can be that you start sending data to the clients in sort of roundrobin manner. To this end you can have different threads responsible for sending data to different clients. Advantage of this case can be that none of the clients will be starved.