Based on extensive data-mining experience in Front Office Trading systems at Blue Chip clients such Morgan Stanley, JP Morgan, Santander, BAML, Credit Suisse, ING, RABO, the Muppix Team have developed a Free Data Science Toolkit in Unix/Linux, SQL & Excel for Consultants and Entrepreneurs to extract and analyse unstructured information from diverse data sources.
The Muppix Team provides innovative and value driven Professional Services and Training to quickly make sense of Large scale data without needing any computer skills.
Enroll in our SQL Crash Course entitled
"Effective in SQL in 4 hours"
given at our facilities in Bosch en Duin, nr Bilthoven, just north of Utrecht or on site. ( Click on Training on top of this page, to book )
"Effective in SQL in 4 hours"
given at our facilities in Bosch en Duin, nr Bilthoven, just north of Utrecht or on site. ( Click on Training on top of this page, to book )
MUPPIX Toolkit 3.3 is now available for download, includes SQL & Excel commands in the Reference Card Spreadsheet, try out the keywords on your own text, and see how the commands will extract on a sample of your data.
Muppix enables you to easily slice, dice and make use of very large data, on any PC or Apple.
|
It uses industry-strength Linux to do the heavy lifting, but each Muppix command is described in a simple keyword language so they're easy to find.
|
Our mission is that any non-technical professional can extract data insights within 5 minutes!
As a data scientist, I spend quite a bit of time on the command-line, especially when there's data to be obtained, scrubbed, or explored - Jeroen Janssens Part of the skillset of a data scientist is knowing how to obtain a sufficient corpus of usable data, possibly from multiple sources, and possibly from sites which require specific query syntax. A data scientist should know how to do this from the command line, e.g. in a Un*X Environment - Hilary Mason Few tools are more indispensable to my work than Unix. Manipulating data into different formats, performing transformations, and conducting exploratory data analysis (EDA) is the lingua franca of data science. The coffers of Unix hold many simple tools, which by themselves are powerful, but when chained together facilitate complex data manipulations. Although languages like R and Python are invaluable for data analysis, I find Unix to be superior in many scenarios for quick and simple data cleaning, idea prototyping, and understanding data. - Seth Brown While it's sometimes difficult to remember all of the parameters for the Unix commands, getting familiar with them has been beneficial to my productivity and allowed me to avoid many headaches when working with large text files... Writing a script in Python/Ruby/Perl would probably take a few minutes and then even more time for the script to actually complete. Thankfully, the Unix Utilities exist and they're awesome. - Greg Reda Whenever you need to work with data, don’t overlook the Unix “hand tools.” Once you get used to working on the Unix command line, you’ll find that it’s often faster than the alternatives. And the more you use these tools, the more fluent you’ll become. - Mike Loukides |
80% of time on data cleaning & exploring
Huge Demand for technical decision makers
Exponential increase in volume, complexity and sources of data
|