[Data Mining]

ENCE 688X/ENCE 489X: Data Mining and Machine Learning for the Built Environment


Mark Austin,
Department of Civil and Environmental Engineering,
University of Maryland, College Park.

CLASS

Notes from Class: [ Spring 2026 ]

PROJECTS

Project Abstracts: [ Fall 2020 ] [ Fall 2021 ] [ Spring 2022 ]
[ Spring 2026 ]

GOALS

This course is a hands-on introduction to tools and techniques for data mining and machine learning analysis of
data from the built environment .

A wide range of data (e.g., text, image, acoustic, geospatial) will be collected from the built environment and
organized into tabular and non-tabular (e.g., hierarchy and graph) models. We will use techniques from data science to
clean, transform, and integrate data into formats suitable for analysis, data mining procedures to identify structural
patterns and association relationships within, and machine learning algorithms to make predictions on system behavior,
attainable levels of performance, and to identify short- and long-term planning needs.

Students will learn how to write and compile programs in Python, and work with large datasets stored and
managed (e.g., cleaned, transformed, merged) in Pandas. Then, we will work step-by-step toward the use of
object-oriented software architectures. Students will be introduced to software tools and algorithms for
data mining and machine learning analysis with binary/regression decision trees, multilayer neural networks,
variational autoencoders, object detection algorithms, and large language models.

Fig 1. Object detection with YOLOv8.

Applications will be motivated by a wide range of data sources, including experimental studies on material behavior,
delays in air transportation, spatial and temporal distributions of motor vehicle accidents, and energy consumption of buildings in cities.

The semester will conclude with completion and presentation of a term project.


Graduate Student vs Undergraduate Student Expectations

Since this is a hands-on project-based course, graduate vs undergraduate students will both work on the
solution of problem sets and a project relating to the topics covered in class.

Undergraduate students will be permitted to work in pairs/groups to complete problem set activities.
Students seeking graduate course credit will be required to work individually.

Graduate students will also be strongly encouraged to explore opportunities for developing projects
that will benefit their graduate research and/or lead to publication in a conference/journal outlet.


SPRING SEMESTER, 2026

The topics will be as follows:

Part 1: Data and Information Management in the Built Environment (2 weeks)

  • Learning about the Built Environment
    Topic: Modern Infrastructure Systems and Near-Term Challenges.
    Topic: Engineering Sensor Systems.
    Topic: Large-scale Urban and Global Sensing.
  • Opportunities for AI, Data Mining and Machine Learning
    Topic: AI in the 1980s, 1990s, 2000s, and post 2010 era.
    Topic: Recent Advances in Data Mining/AI/Machine Learning.
    Topic: Cyber-Physical and Digital Twin Systems.
  • Data-Driven System Development
    Topic: Data-driven decision making.
    Topic: Tabular and Non-tabular data models.
Part 2: Getting Started with Python (1 week)
  • Introduction to Engineering Software Development
    Topic: Evolution of computer languages over the past 50 years?
    Topic: Low- and high-level languages
    Topic: Scripting languages versus compiled languages
  • Python, Part I: Getting Started
    Topic: Writing and Compiling a Simple Python Program,
    Topic: Software Productivity Tools: pip3, Jupyter Notebook.
    Topic: Basic programming (data types, expressions, control structures, functions),
    Topic: Builtin collections (lists, dictionaries, sets).
    Topic: Basic input/output (e.g., CSV and JSON files).

Part 3: Modeling Tabular and Non-Tabular Data (2 weeks)

  • Python, Part II: Tabular Data and Dataset Transformation
    Topic: Working with NumPy (1-D, 2-D, 3-D arrays).
    Topic: Working with Pandas (series and dataframes).
    Topic: Basic operations, reading csv files, cleaning data
    Topic: Filtering, grouping, aggregating and merging dataframes.
  • Python, Part III: Object Modeling
    Topic: Objects and Classes.
    Topic: Association, Inheritance, Composition Relationships.
    Topic: Working with Groups of Objects.
  • Python, Part IV: Non-Tabular Data
    Topic: Tree and Graph Data Structures.
    Topic: Working with OpenStreetMap datafiles.

Part 4: Data Representation and Preprocessing (1 week)

  • Data Representation and Quality Assessment
    Topic: Real-World Urban, Government and Geographic Data Portals
    Topic: Common data formats (txt, csv, json, osm, json, arff)
    Topic: Data quality assessment (accuracy, consistency, and reliability).
  • Data Preprocessing and Transformation
    Topic: Extract-Transform-Load (ETL) processes
    Topic: Feature extraction and generation
    Topic: Onehot encoding techniques
    Topic: Dimensionality reduction/principal component analysis

Part 5: Hands-on Data Mining (2 weeks)

  • Data Mining Concepts and Methods
    Topic: Classification, Association, Clustering
    Topic: Binary and Regression Decision Trees and Rules
    Topic: Theoretical Considerations (Gini, Entropy, Information Gain)
    Topic: Metrics of Evaluation
  • Working with Data Mining Packages
    Topic: Data Mining with Python (Applications)
    Topic: Data Mining with Weka (Applications)

Part 6: Hands-On Machine Learning (3 weeks)

  • Machine Learning Concepts and Methods
    Topic: Machine Learning Capabilities
    Topic: Taxonomy of Machine Learning Problems
    Topic: Types of Machine Learning Systems
    Topic: Urban Applications of Machine Learning
  • Perceptron Models and Multilayer Neural Networks
    Topic: Perceptron Models (Building Block of Machine Learning)
    Topic: Activation Functions, Loss Functions, Metrics of Evaluation
    Topic: Training Neural Networks (Backpropogation and Optimization Algorithms)
    Topic: Multilayer Network Architectures and Capabilities
    Topic: Neural Networks with One/Two Hidden Layers
  • Working with Machine Learning Frameworks
    Topic: Software Setup (TensorFlow, Keras, PyTorch)
    Topic: Working with TensorFlow, Keras and PyTorch

Part 7: Advanced Topics (Class Interest and Time Permitting)

  • Object Detection with YOLO
    Topic: Object detection with YOLO
  • Modeling Sequences
    Topic: Recurrent Neural Networks (new for 2026)
  • Autoencoders and Variational Autoencoders
    Topic: AutoEncoder and Variational Encoder Neural Networks (new for 2026)

Students will complete individual homework assignments, and work in small teams on a data mining/machine learning project.

ENCE 688P ONLINE / SYNCHRONOUS CLASS SESSIONS

  • Mark Austin . Synchronous class sessions and office hours will be Monday and Wednesday at 5-6.15 pm.
    Join Zoom Meeting: https://umd.zoom.us/j/6517468335

    For each lecture I will post the "lecture content" (pdf) and a "recorded video" (zoom video) to the notes from class page.

    I will also post handouts and links to interesting web sites on notes from class .

    Even if you just want to drop-in to catch up, that'll be fine too!
    If 5-6 pm doesn't work for you, send an e-mail (austin "at" umd.edu) and we will work something out.


Submission of Homework and Project Work

  • Homework will be posted on the notes from class web page.
    Please submit your homework as a zip file and send either as an attachment to an e-mail or via Dropbox.
    Also, please indicate in your e-mail subject heading the class and purpose of the e-mail, e.g,,
        ENCE688P: Homework 2 ...
    


Class Text and Resources

  • Text not required, but there will be lots of class handouts on data mining and machine learning.
  • There is a great data mining text: Witten I.H., Frank E., Hall M.A., and Pal C.J., Data Mining: Practical Machine Learning Tools and Techniques,
    Fourth Edition, Morgan Kaufman, 2017, which you might consider getting -- well worth the money!


Course Assessment and Exam Schedule

The course will be assessment will be as follows:

  • Homework (50%).
  • Mid-semester project proposal and presentation (20%).
  • End-of-semester project/report in data mining/machine learning (30%).

Note.

  • No mid-term exams. No final exam. Let's focus on making fabulous projects!
  • Accommodation for students with disabilities will be made.
  • At the end of the semester, please participate in the evaluation of courses through CourseEvalUM.
    Your feedback is confidential and an important means of improving the course in future semesters.

Download Python and Java

  • Download Python 3.X . It seems that Apple ships its Macs with Python 2.X pre-installed.
    But for the purposes of this class I am going to assume you have Python 3.7 (or Python 3.8) installed.
    This detail matters because the language is not backwards compatible (Strike 1 against Python).
  • Download Java . I have Java 1.11 on my laptop, but you are
    certainly welcome to download a more recent version.
  • Download Apache Ant . Apache Ant is a Java Library that
    manages the compilation of Java programs and execution of programs and test cases. Extremely useful.
  • Click here to download the Eclipse IDE.


  • If your computer is a Mac think about downloading Homebrew and then using brew to
    automatically download and install the Ant packages on your machine.

    Note (Oct, 2020). I just installed homebrew on my iMac (home) running Catalina (10.15) and, sadly, Apple has not
    made this process easy. Here's the bottom line to get things working:

    Homebrew uses something called the "command line tools", which is an addition to Xcode.
    So before you can install Homebrew, you need to download and install the command line tools.
    I downloaded the dmg file for the command line tools from the Apple developers web site --
    just create an account, it's free. Run the installation program and the tools will be put
    in /Library/Developers/CommandLineTools/

    Then, to install Homebrew, cut-and-paste the command listed on the Homebrew page into a terminal window
    running the bash shell. The installation only takes are few minutes and you are good to go.
    For example, to install ant, simply type: brew install ant at the prompt in a terminal window.


Python and Java Programming Resources



AI and ML Software

  • Artificial Intelligence: A Modern Approach (AIMA). Code website on Github.
  • Apache OpenNLP : A Machine Learning-based Toolkit for the processing of Natural Language Text.
    Written completely in Java. Source code: github .



Real-World Datasets


Working with Real-World Data


Data Science, Data Mining, Neural Networks





Digital Twins


Algorithms and Software for Anomaly Detection


Time Series

  • Darts: Time Series Made Easy in Python.


Big Data Algorithms + Tutorials (useful techniques that are beyond this course)


Miscellaneous Real-World Applications and Resources

Developed in August 2020 by Mark Austin
Copyright © 2020-2026, Department of Civil and Environmental Engineering, University of Maryland