These days, social networks are pervasive. It’s virtually impossible to avoid some kind of interaction with at least a few of them. Not only that, but the mere fact that so many people use them means there’s a ton of interesting data available within.
A typical example of such a popular network is Twitter, with more than 500 million tweets sent each day. Wouldn’t it be useful if you were able to querying Twitter to find tweets you want and then process them in bulk? The capacity to dig through heaps of social interactions in an effective manner is one of the core promises of Big Data – and it’s a valuable one. In this blog, I will show you how to do it with CloverETL.
First of all, you need to grant yourself access to Twitter so that you can use it to access the API later. Log in to https://dev.twitter.com/apps and select “Create new application” to set up your application. Fill in the name, description, and website here if you want, leave the Callback URL field empty, and submit the form.
After submitting, you’ll get to a page with application details. There is an OAuth settings section on this page where you can find “Consumer key” and “Consumer secret.” You’ll need these to connect from CloverETL.
Further down on the page, there is a “Your access token” section. Use “Create my access token” button. This might take some time, so wait a few seconds and then reload the page. You should see your “Access token” and “Access token secret” there. These two values will be used in CloverETL too.
With that, you’re done working on the Twitter side. Let’s now proceed to CloverETL.
To configure the HTTPConnector component, you need to specify these five attributes:
- OAuth Consumer key
- OAuth Consumer secret
- OAuth Access token
- OAuth Access token secret
All OAuth attributes are taken from the registered Twitter application (see above). The URL depends on REST API method you want to use. For example, using https://api.twitter.com/1.1/search/tweets.json?q=%40CloverETL will search for tweets related to @CloverETL.
The result is returned in JSON format. You can either store it into file (Output file URL attribute of the component) or map the response content to an output port and process with other downstream components (e.g. JSONReader).
The attached example graph queries for the current Twitter trends and tweets related to them. Parses returned JSON for tweets attributes and stores them into XML file.
And with that, you’ve now waded through the noise to find exactly what you’re looking for.