This course is a
hands-on introduction to
tools and techniques for
data mining and machine learning analysis of
data from the built environment .
A wide range of data (e.g., text, image, acoustic, geospatial) will be collected from the built environment and
organized into tabular and non-tabular (e.g., hierarchy and graph) models. We will use techniques from data science to
clean, transform, and integrate data into formats suitable for analysis, data mining procedures to identify structural
patterns and association relationships within, and machine learning algorithms to make predictions on system behavior,
attainable levels of performance, and to identify short- and long-term planning needs.
Students will learn how to write and compile programs in Python, and work with large datasets stored and
managed (e.g., cleaned, transformed, merged) in Pandas. Then, we will work step-by-step toward the use of
object-oriented software architectures. Students will be introduced to software tools and algorithms for
data mining and machine learning analysis with binary/regression decision trees, multilayer neural networks,
variational autoencoders, object detection algorithms, and large language models.
Fig 1. Object detection with YOLOv8.
Applications will be motivated by a wide range of data sources,
including experimental studies on material behavior,
delays in air transportation,
spatial and temporal distributions of motor vehicle accidents,
and energy consumption of buildings in cities.
The semester will conclude with completion and presentation of a term project.
Graduate Student vs Undergraduate Student Expectations
Since this is a hands-on project-based course, graduate vs undergraduate students will both work on the
solution of problem sets and a project relating to the topics covered in class.
Undergraduate students will be permitted to work in pairs/groups to complete problem set activities.
Students seeking graduate course credit will be required to work individually.
Graduate students will also be strongly encouraged to explore opportunities for developing projects
that will benefit their graduate research and/or lead to publication in a conference/journal outlet.
SPRING SEMESTER, 2026
The topics will be as follows:
Part 1: Data and Information Management in the Built Environment (2 weeks)
Learning about the Built Environment
Topic: Modern Infrastructure Systems and Near-Term Challenges.
Topic: Engineering Sensor Systems.
Topic: Large-scale Urban and Global Sensing.
Opportunities for AI, Data Mining and Machine Learning
Topic: AI in the 1980s, 1990s, 2000s, and post 2010 era.
Topic: Recent Advances in Data Mining/AI/Machine Learning.
Topic: Cyber-Physical and Digital Twin Systems.
Data-Driven System Development
Topic: Data-driven decision making.
Topic: Tabular and Non-tabular data models.
Part 2: Getting Started with Python (1 week)
Introduction to Engineering Software Development
Topic: Evolution of computer languages over the past 50 years?
Topic: Low- and high-level languages
Topic: Scripting languages versus compiled languages
Python, Part I: Getting Started
Topic: Writing and Compiling a Simple Python Program,
Topic: Software Productivity Tools: pip3, Jupyter Notebook.
Topic: Basic programming (data types, expressions, control structures, functions),
Topic: Builtin collections (lists, dictionaries, sets).
Topic: Basic input/output (e.g., CSV and JSON files).
Part 3: Modeling Tabular and Non-Tabular Data (2 weeks)
Python, Part II: Tabular Data and Dataset Transformation
Topic: Working with NumPy (1-D, 2-D, 3-D arrays).
Topic: Working with Pandas (series and dataframes).
Topic: Basic operations, reading csv files, cleaning data
Topic: Filtering, grouping, aggregating and merging dataframes.
Python, Part III: Object Modeling
Topic: Objects and Classes.
Topic: Association, Inheritance, Composition Relationships.
Topic: Working with Groups of Objects.
Python, Part IV: Non-Tabular Data
Topic: Tree and Graph Data Structures.
Topic: Working with OpenStreetMap datafiles.
Part 4: Data Representation and Preprocessing (1 week)
Data Representation and Quality Assessment
Topic: Real-World Urban, Government and Geographic Data Portals
Topic: Common data formats (txt, csv, json, osm, json, arff)
Topic: Data quality assessment (accuracy, consistency, and reliability).
Data Preprocessing and Transformation
Topic: Extract-Transform-Load (ETL) processes
Topic: Feature extraction and generation
Topic: Onehot encoding techniques
Topic: Dimensionality reduction/principal component analysis
Part 5: Hands-on Data Mining (2 weeks)
Data Mining Concepts and Methods
Topic: Classification, Association, Clustering
Topic: Binary and Regression Decision Trees and Rules
Topic: Theoretical Considerations (Gini, Entropy, Information Gain)
Topic: Metrics of Evaluation
Working with Data Mining Packages
Topic: Data Mining with Python (Applications)
Topic: Data Mining with Weka (Applications)
Part 6: Hands-On Machine Learning (3 weeks)
Machine Learning Concepts and Methods
Topic: Machine Learning Capabilities
Topic: Taxonomy of Machine Learning Problems
Topic: Types of Machine Learning Systems
Topic: Urban Applications of Machine Learning
Perceptron Models and Multilayer Neural Networks
Topic: Perceptron Models (Building Block of Machine Learning)
Topic: Activation Functions, Loss Functions, Metrics of Evaluation
Topic: Training Neural Networks (Backpropogation and Optimization Algorithms)
Topic: Multilayer Network Architectures and Capabilities
Topic: Neural Networks with One/Two Hidden Layers
Working with Machine Learning Frameworks
Topic: Software Setup (TensorFlow, Keras, PyTorch)
Topic: Working with TensorFlow, Keras and PyTorch
Part 7: Advanced Topics (Class Interest and Time Permitting)
Object Detection with YOLO
Topic: Object detection with YOLO
Modeling Sequences
Topic: Recurrent Neural Networks (new for 2026)
Autoencoders and Variational Autoencoders
Topic: AutoEncoder and Variational Encoder Neural Networks
(new for 2026)
Students will complete individual homework assignments,
and work in small teams on a data mining/machine learning project.
ENCE 688P ONLINE / SYNCHRONOUS CLASS SESSIONS
Mark Austin .
Synchronous class sessions and office hours will be Monday and Wednesday at 5-6.15 pm.
Join Zoom Meeting:
https://umd.zoom.us/j/6517468335
For each lecture I will post the "lecture content" (pdf) and a "recorded video" (zoom video)
to the notes from class page.
I will also post handouts and links to
interesting web sites on notes from class .
Even if you just want to drop-in to catch up, that'll be fine too!
If 5-6 pm doesn't work for you, send an e-mail (austin "at" umd.edu) and we will work something out.
Submission of Homework and Project Work
Homework will be posted on the notes from class web page.
Please submit your homework as a zip file and send either as
an attachment to an e-mail or via Dropbox.
Also, please indicate in your e-mail subject heading the class and purpose
of the e-mail, e.g,,
ENCE688P: Homework 2 ...
Class Text and Resources
Text not required, but there will be lots of
class handouts on data mining and machine learning.
There is a great data mining text:
Witten I.H., Frank E., Hall M.A., and Pal C.J., Data Mining: Practical Machine Learning Tools and Techniques,
Fourth Edition, Morgan Kaufman, 2017,
which you might consider getting -- well worth the money!
Course Assessment and Exam Schedule
The course will be assessment will be as follows:
Homework (50%).
Mid-semester project proposal and presentation (20%).
End-of-semester project/report in data mining/machine learning (30%).
Note.
No mid-term exams. No final exam. Let's focus on making fabulous projects!
Accommodation for students with disabilities will be made.
At the end of the semester, please participate in the evaluation of courses through CourseEvalUM.
Your feedback is confidential and an important means of improving the course in future semesters.
Download Python and Java
Download Python 3.X .
It seems that Apple ships its Macs with Python 2.X pre-installed.
But for the purposes of this class I am going to assume you have Python 3.7 (or Python 3.8) installed.
This detail matters because the language is not backwards compatible (Strike 1 against Python).
Download Java .
I have Java 1.11 on my laptop, but you are
certainly welcome to download a more recent version.
Download Apache Ant . Apache Ant is a Java Library that
manages the compilation of Java programs and execution of programs and test cases. Extremely useful.
If your computer is a Mac think about downloading Homebrew and then using brew to
automatically download and install the Ant packages on your machine.
Note (Oct, 2020). I just installed homebrew on my iMac (home) running Catalina (10.15) and, sadly, Apple has not
made this process easy. Here's the bottom line to get things working:
Homebrew uses something called the "command line tools", which is an addition to Xcode.
So before you can install Homebrew, you need to download and install the command line tools.
I downloaded the dmg file for the command line tools from the Apple developers web site --
just create an account, it's free. Run the installation program and the tools will be put
in /Library/Developers/CommandLineTools/
Then, to install Homebrew, cut-and-paste the command listed on the Homebrew page into a terminal window
running the bash shell. The installation only takes are few minutes and you are good to go.
For example, to install ant, simply type: brew install ant at the prompt in a terminal window.
Java Language Specification SE8 Release 2015 ( pdf )
This document contains a detailed description of Java SE8 (handy is you want to know exactly what Java 8 does).
Standard and Extension Packages (i.e., packages java.* and javax.*) in Java 6
( pdf ).
Data Analysis with Pandas .
Real-world data sources are nearly always very messy.
Pandas is an open source data analysis
and manipulation tool, built on top of Python.
Click here to install pandas 1.1.4 (Released October, 2020).