Machine learning is a sub-field of artificial intelligence wherein computer systems have the ability to inherently learn from data without being explicitly programmed to. Companies like Google, Yelp, Facebook, and HubSpot are using machine learning for various applications. For instance, Yelp uses machine learning to help their human employees categorize millions of photographs. Google’s famous neural network – The DeepMind aka the machine that dreams and produces psychedelic images – is an example of machine learning. Facebook is using machine learning in their messenger service to eliminate spam. In short, it is safe to say that machine learning is pretty hot right now. If you are into technology, and are looking for some open source tools in an enterprise level language like Java to get you started, you should read on.

1. Weka

Developed by the University of Waikato, New Zealand, the Waikato Environment for Knowledge Analysis is an open source suite of machine learning software. While the original version of Weka was intended as a tool for the agricultural domain, the Java-based version (Weka 3) is intended for educational and research purposes and data mining. Under data mining, the following tasks are possible, namely, data preprocessing, clustering, classification, regression, visualization, and feature selection.
Weka allows access to SQL databases via Java Database Connectivity and can output results returned by a database query. The YouTube channel WekaMOOC hosts a number of video playlists on data mining with Weka. You can download Weka 3 here which links to other resources like online courses.

2. Massive Online Analysis (MOA)

Also developed by the University of Waikato, New Zealand, MOA is a free open-source software that allows you to build and run experiments in machine learning or data mining on data streams. If your use case is for real-time data streams, you’re in luck because that’s exactly what MOA was designed for! MOA’s simplistic UI and easy integration with Weka lands them in our second spot. If you’re looking to download MOA, click here. MOA is particularly popular in the data mining field because of its community. Its extensive documentation is useful for beginners or you can just watch this video to get started.

3. Environment for Developing KDD-Applications Supported by Index-Structure (ELKI)

ELKI was developed at the Ludwig Maximilian University of Munich, Germany and is used for developing advanced data mining algorithms. The ELKI framework is based in Java and has a modular architecture which is great for researchers and students. The ELKI library has a large collection of configurable algorithm parameters that are particularly useful for benchmarking algorithms. ELKI has uses in data science, spaceflight, and traffic prediction. The sky’s the limit with this open source software suite; download it from here and check out a few tutorials here.

4. RapidMiner

RapidMiner is a cross-platform data science software platform that provides an integrated environment in data preparation, machine learning, deep learning, text mining, and predictive analysis. It has a great graphical user interface and a Java API for developing your own applications. RapidMiner pricing comes with a free, small, medium and large version. The free version is limited to 10,000 data rows and 1 logical processor, whereas the large version which costs $10,000 per user per year gives you unlimited data rows, logical processors and other features like RapidMiner’s Turbo Prep that accelerates data preparation by visually blending and enriching data, enabling analytics teams to work with data faster.

5. Java Statistical Analysis Tool (JSAT)

JSAT is a popular library for quickly getting into Machine Learning. Developed by Edward Raff, and a completely open source project, Edward humbly boasts that compared to Weka, JSAT is usually faster! The best part about JSAT is that it’s pure Java. Having no external dependencies and a library with the code all self-contained is nice, so it’s worth a look at.


MALLET was developed by Andrew McCallum and students from UMASS and UPenn. MALLET is a Java-based toolkit for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and supports other machine learning text-based applications. MALLET also includes a bunch of routines for transforming text into numerical representations that can be processed efficiently. MALLET is open source and can be downloaded here, and it’s documentation can be downloaded here.

7. Deeplearning4j

Deeplearning4j is a deep learning programming library for Java and the Java virtual machine. It serves as a framework for deep learning algorithms. Deeplearning4j is also open source under the Apache License 2.0 and can be used in Topic Modeling and Vector space modeling. Some of the most common applications for Deeplearning4j include cyber security, anomaly detections, recommendation systems for e-Commerce sites, and image recognition. Deeplearning4j also has an active community known as Eclipse where resources and support are easy to find.

8. Google BigQuery

Google’s cloud-based solution that enables interactive analysis of large datasets via Google Storage. Since BigQuery is an Infrastructure as a Service, it doesn’t require you to invest in the setup cost of the data warehouse or the manpower needed to run queries. Google BigQuery brags about processing queries “rocket fast”, fast enough to analyze terabytes of data in seconds, and petabytes of data in minutes. The pricing is based on the number of bytes processed. For large queries, Google will assign an entire data center for you so that you don’t have to worry about scaling. Google BigQuery encrypts and replicates your data across multiple data centers for maximum durability and service uptime.

9. Mahout

Mahout is a distributed linear algebra framework and mathematically expressive Scala DSL designed to let data scientists quickly implement their own algorithms. Mahout comes with Java libraries for common maths operations and primitive Java collections. The core algorithms of Mahout include distributed linear algebra, preprocessors, regression, clustering and recommenders. Support for MapReduce algorithms is being gradually phased out.

Stay ahead of the game with our helpful resources

4 digital solutions to address common application performance issues

High network latency, memory leaks, slow page loads, heavy CPU usage, and unresponsive servers are all typical performance issues we’ve experienced at some point when using or accessing digital applications. With how easy they occur in projects across verticals, you might be wondering whether the development teams behind these programs have done enough due diligence prior to the release. But human errors and oversight aren’t always the culprit. The reality is that while developers can strive to develop a fully functioning program with virtually no apparent faults upon delivery, no software is truly error-free. Even the most rigorously tested applications

6 useful tips for creating more robust application lifecycle management

As digital technology becomes the norm, software acquisition is now key to gaining a competitive edge in today’s market. Be it as a value offering tailored to consumers or a productivity tool to run complex processes, custom software undeniably helps companies drive growth and deliver value more efficiently. Just as necessary as having a proprietary application is prescribing a standard procedure to govern and maintain its utility. This is to ensure that your business can develop or adopt the right type of software—one that can fully cater to your business needs while keeping disruption to a minimum across critical milestones.

5 major roadblocks businesses must overcome when transitioning into a new software environment

As the business landscape becomes increasingly saturated, staying ahead of the curve often means embracing disruptive technologies to meet the fickle market demands. In most cases, this entails knowing when to pivot your current strategy to an entirely new solution. But recognizing the importance of digital shift is one thing; implementing the necessary IT upgrade is another. A global survey by Deloitte has found that although 87% of companies manage to identify the impact of digital trends on their industries, only 44% have adequately prepared for the coming disruptions. This vast disconnect between organizational expectations and conditions in the field

Is cloud computing the answer to better software development?

Cloud computing is perhaps not a term often heard in daily conversations, but it is one with a far-reaching impact on our technological needs. From expansive options of online data storage to numerous suites of web-based productivity tools like Google Workspace, nearly everyone has used a cloud-enabled technology. Over the last decade, this high degree of versatility also underpins the rapid cloud uptake among businesses. In fact, one survey has found that 94% of companies have already shifted their computing workloads on cloud platforms to varying extents. Unsurprisingly, the market size for cloud technology continues to grow exponentially. With a

Please enter a valid email address


Client Success Manager

Sindhu is a tenacious and impassioned digital product and project manager specializing in driving client success across complex healthcare technology implementations and integrations. She is a certified Agile Scrum Master and holds advanced degrees in computer science and software engineering. Her philosophy is that “work is where the heart is” and believes the key to success is creating a solid, supportive, and cohesive team.