Are "jobs" just disguised RPC in a RESTful application?

In a RESTful web application design, you typically first identify the resources in your application, the nouns. For example, imagine you're writing a library app., and you're working on adding items to a catalogue. So you've got catalogues and items as your nouns.

You then decide to implement the operations on items, which live in the catalogue. In REST, HTTP verbs map onto the operations you want to perform: POST = create a new resource when you don't know what its identifier should be; PUT = update; DELETE = delete; GET = query resources. So you might end up with:

HTTP requestOperation performed on resourceReturns
GET /catalogue/items?term=potterRetrieve items containing the term "potter"Representation of items, with a 200 OK status code
POST /catalogue/items
Request body contains representation of new item
Add a new item to the catalogue201 Created status code, with Location header set to URI of new resource, and representation of resource in response body
PUT /catalogue/items/<control number>
Request body contains representation of updated item
Replace existing representation with an updated one200 OK status code
DELETE /catalogue/items/<control number>Remove item at the specified location204 No Content status code

Fairly typical REST.

Then you realise you want to upload a whole pile of items at once, embedded in a single request for efficiency; but you don't want the client to have to wait while the items are inserted into the catalogue and properly indexed etc.. Maybe it will take 5 minutes or something, and you don't want to leave a web client hanging. Or perhaps you want to upload only a single item, but once items are uploaded they are put into a queue for processing by another system, so there's a wait.

What are your choices? Here are some ideas, partly gleaned from the RESTful Web Services book:

  • Don't allow bulk uploads. You can only upload one item at a time, and you just have to wait until the operation completes and you get your proper status code back. Not really a solution, though.
  • Allow bulk uploads, process the items, and return a multi-status 207 code with the response. You still have to wait for all the processing to finish, but the response body contains a list of response codes and status reports, one for each uploaded item. Again, you have to wait for the upload to finish before you can return anything.
  • Allow a bulk upload, but return a 202 Accepted status, spawning asynchronous jobs to do the processing in the background. The response body can contain the URIs for each uploaded item. Each item then has a status which can be queried by asking for the resource again. For example, when an item is first uploaded, you get its URI back as a Location; when you GET that, the object has its status set to Inactive or Under processing or something. When processing is complete, you get a status like Complete on the item instead. The down-side is that your resource representations are polluted with status information, which could be good or bad.
  • As above, but your request to /catalogue/items returns a 202 along with a handle to a "transaction" or "job" resource which wraps the resources you uploaded. Effectively, you treat the upload itself as a resource with a status you can query; the resources attached to that resource don't have to be polluted with status information, but you perhaps lose the granularity of individual status codes on resources. Or maybe you produce one transaction resource per uploaded resource?

However, what I'm not so keen on is the idea of a job or service being a resource. Why? Well, if I want to create items in my catalog, I don't want to wrap them in a job and post them to /jobs; if I want to query my items, I don't want to have to go to a query service at /services/query or similar.

What these paths hint at to me is that an operation is being represented by that path, rather than a resource: effectively, calling them is like doing RPC: you pass the resources you want to act on as arguments to the procedure you're calling. Often, there's also some implicit resource hidden away behind the job or service. Compare:

  • GET /catalog/items?term=potter: the catalogue is visible, and we know we're querying items within it
  • GET /services/query?term=potter: here there's an implicit catalogue and its items behind the service; effectively, these objects are passed invisibly to the procedure we're calling; also what we're querying is not explicit

Or:

  • POST /catalog/items: we're appending a new item to the catalogue; we can infer that our new item will then be available at /catalog/items/<some identifier>
  • POST /jobs: job is an amorphous category, and we could post pretty much any type of resource into it; and there aren't any hints from the API about how to get at those resources once we've posted them

It's kind of like the difference between object-oriented design (REST) and procedural design (RPC). While a job might look like a resource, my opinion is that it's really an amorphous wrapper around the real resource you should be representing. Typically, jobs get introduced to cope with asynchronous updates; I'd prefer to see asynchronous operations occurring on proper resources, but exposed using the batch processing approaches outlined above. Otherwise I fear you might lose your resources inside some vague blob of a "job" or "service".

Comments

How is REST similar to object-orientation?

"It's kind of like the difference between object-oriented design (REST) and procedural design (RPC)."

Would an OOP developer constrain themselves to a meager handful of methods, only get/set/delete? Of course not, you would code into your classes whatever verbs, actions or methods are required by the given application. That is the principal shortcoming of REST: in the web development sphere, it has been bound to the methods of a single protocol, HTTP, and as such it inherits HTTP's limitations.

RPC over HTTP doesn't share that limitation, you may define and implement any verb required by the application, in this case a batch of requests. Why not send a POST body of RESTful URIs, and get back a multi-part response or something similar?

REST and RPC both have their advantages. REST is simple to implement, RPC less so. RPC is unaffected by whatever transport layer it travels over; not the case with REST. So use both, judiciously! Don't box yourself in.

And don't pit them against each other, that's holy war turf, unnecessary. As web application complexities increase, remaining flexible about data messaging will broaden your solution space.

I agree, Kevin: I often feel

I agree, Kevin: I often feel constrained by REST's meagre handful of verbs, and have trouble mapping certain operations. The comparison I was trying to make was between having resources which know how to respond to verbs (like objects) vs. services which act on resources (like procedures). I would definitely consider RPC-style web applications if I felt they were necessary. I wasn't trying to pit them head to head in a death match; just interested in the way they focus the mind on different aspects of the design. I suppose ideally what I'd want would be remote object calls (ROC). Though I believe that's what Java RMI and Corba are for, isn't it...?