jeudi 8 mai 2014

TCP - Cloud Architecture sur Azure pour Internet of Things - Stack Overflow


I'm working on a server architecture for sending/receiving messages from remote embedded devices, which will be hosted on Windows Azure. The front-facing servers are going to be maintaining persistent TCP connections with these devices, and I need a way to communicate with them on the backend.


Problem facts:



  • Devices: ~10,000

  • Frequency of messages device is sending up to servers: 1/min

  • Frequency of messages originating server side (e.g. from user actions, scheduled triggers, etc.): 100/day

  • Average size of message payload: 64 bytes


Upward communication


The devices send up messages very frequently (sensor readings). The constraints for that data are not very strong, due to the fact that we can aggregate/insert those sensor readings in a batched manner, and that they don't require in-order guarantees. I think the best way of handling them is to put them in a Storage Queue, and have a worker process poll the queue at intervals and dump that data. Of course, I'll have to be careful about making sure the worker process does this frequently enough so that the queue doesn't infinitely back up. The max batch size of Azure Storage Queues is 32, but I'm thinking of potentially pulling in more than that: something like publishing to the data store every 1,000 readings or 30 seconds, whichever comes first.


Downward communication


The server sends down updates and notifications much less frequently. This is a slightly harder problem, as I can see two viable paradigms here (with some blending in between). Could either:



  1. Create a Service Bus Queue for each device (or one queue with thousands of subscriptions - limit is for number of queues is 10,000)

  2. Have a state table housed in a DB that contains the latest "state" of a specific message type that the devices will get sent to them


With option 1, the application server simply enqueues a message in a fire-and-forget manner. On the front-end servers, however, there's quite a bit of things that have to happen. Concerns I can see include:



  • Monitoring 10k queues (or many subscriptions off of a queue - the Azure SDK apparently reuses connections for subscriptions to the same queue)

  • Connection Management

    • Should no longer monitor a queue if device disconnects.

    • Need to expire messages if device is disconnected for an extended period of time (so that queue isn't backed up)

    • Need to enable some type of "refresh" mechanism to update device's complete state when it goes back online



The good news is that service bus queues are durable, and with sessions can arrange messages to come in a FIFO manner.


With option 2, the DB would house a table that would maintain state for all of the devices. This table would be checked periodically by the front-facing servers (every few seconds or so) for state changes written to it by the application server. The front-facing servers would then dispatch to the devices. This removes the requirement for queueing of FIFO, the reasoning being that this message contains the latest state, and doesn't have to compete with other messages destined for the same device. The message is ephemeral: if it fails, then it will be resent when the device reconnects and requests to be refreshed, or at the next check interval of the front-facing server.


In this scenario, the need for queues seems to be removed, but the DB becomes the bottleneck here, and I fear it's not as scalable.


These are both viable approaches, and I feel this question is already becoming too large (although I can provide more descriptions if necessary). Just wanted to get a feel for what's possible, what's usually done, if there's something fundamental I'm missing, and what things in the cloud can I take advantage of to not reinvent the wheel.



I'm working on a server architecture for sending/receiving messages from remote embedded devices, which will be hosted on Windows Azure. The front-facing servers are going to be maintaining persistent TCP connections with these devices, and I need a way to communicate with them on the backend.


Problem facts:



  • Devices: ~10,000

  • Frequency of messages device is sending up to servers: 1/min

  • Frequency of messages originating server side (e.g. from user actions, scheduled triggers, etc.): 100/day

  • Average size of message payload: 64 bytes


Upward communication


The devices send up messages very frequently (sensor readings). The constraints for that data are not very strong, due to the fact that we can aggregate/insert those sensor readings in a batched manner, and that they don't require in-order guarantees. I think the best way of handling them is to put them in a Storage Queue, and have a worker process poll the queue at intervals and dump that data. Of course, I'll have to be careful about making sure the worker process does this frequently enough so that the queue doesn't infinitely back up. The max batch size of Azure Storage Queues is 32, but I'm thinking of potentially pulling in more than that: something like publishing to the data store every 1,000 readings or 30 seconds, whichever comes first.


Downward communication


The server sends down updates and notifications much less frequently. This is a slightly harder problem, as I can see two viable paradigms here (with some blending in between). Could either:



  1. Create a Service Bus Queue for each device (or one queue with thousands of subscriptions - limit is for number of queues is 10,000)

  2. Have a state table housed in a DB that contains the latest "state" of a specific message type that the devices will get sent to them


With option 1, the application server simply enqueues a message in a fire-and-forget manner. On the front-end servers, however, there's quite a bit of things that have to happen. Concerns I can see include:



  • Monitoring 10k queues (or many subscriptions off of a queue - the Azure SDK apparently reuses connections for subscriptions to the same queue)

  • Connection Management

    • Should no longer monitor a queue if device disconnects.

    • Need to expire messages if device is disconnected for an extended period of time (so that queue isn't backed up)

    • Need to enable some type of "refresh" mechanism to update device's complete state when it goes back online



The good news is that service bus queues are durable, and with sessions can arrange messages to come in a FIFO manner.


With option 2, the DB would house a table that would maintain state for all of the devices. This table would be checked periodically by the front-facing servers (every few seconds or so) for state changes written to it by the application server. The front-facing servers would then dispatch to the devices. This removes the requirement for queueing of FIFO, the reasoning being that this message contains the latest state, and doesn't have to compete with other messages destined for the same device. The message is ephemeral: if it fails, then it will be resent when the device reconnects and requests to be refreshed, or at the next check interval of the front-facing server.


In this scenario, the need for queues seems to be removed, but the DB becomes the bottleneck here, and I fear it's not as scalable.


These are both viable approaches, and I feel this question is already becoming too large (although I can provide more descriptions if necessary). Just wanted to get a feel for what's possible, what's usually done, if there's something fundamental I'm missing, and what things in the cloud can I take advantage of to not reinvent the wheel.


Related Posts:

0 commentaires:

Enregistrer un commentaire