🎯

Data Science Portfolio Tips

Jun 17, 2025

Summary

  • The meeting covered strategies for making data science portfolios and resumes stand out, emphasizing the importance of unique, open-source projects over stock datasets.
  • The QAT open-source word search project was introduced, with details on its features, current limitations, and opportunities for community contribution.
  • Future improvements and optimization needs were discussed, as well as channels for collaboration (GitHub, Discord) and support for contributors.

Action Items

  • As needed – Project owner: Update the README and project documentation as new features or improvements are added.
  • As needed – Contributors: Propose and implement new features, optimizations, or bug fixes; communicate updates via Discord or GitHub.
  • As needed – Project owner: Provide support to contributors via Discord/email upon request.
  • As needed – Contributors: Test new queries and features, ensuring all test cases and formatting in documentation are correct.

Portfolio & Resume Guidance for Data Science Roles

  • Many applicants submit resumes with standard “stock” projects (e.g., Kaggle Titanic, housing price predictions), which do not help them stand out in an increasingly competitive field.
  • While stock projects are valuable for learning, candidates are encouraged to contribute to open-source or uniquely challenging projects to differentiate themselves.
  • The channel's plan includes offering members opportunities to join collaborative open-source efforts that better showcase practical skills.

QAT Open-Source Word Search Project Overview

  • The QAT project replicates and extends existing word search functionality, supporting complex queries that may involve up to a billion computations.
  • Hosted on GitHub and accessible via both a Streamlit app and a dedicated Discord channel.
  • Key advancements over the original QAT include:
    • Flexible output limits and match limits
    • Customizable word lists (not static)
    • Query timeouts to allow for performance management
    • Running multiple queries in parallel
  • Project documentation includes README, feature lists, known limitations, and important links (e.g., word lists, Streamlit app).

Features & Demonstrations

  • Supports various pattern-matching and variable-based queries (e.g., anagrams, wildcard searches, length constraints).
  • Demonstrated current functionality through live and documented examples, including debugging cases and advanced combinations.
  • Performance optimizations in place, but further work is needed as feature complexity grows.

Known Issues & Future Improvements

  • Planned features not yet supported:
    • Multiple variables in a row (e.g., repeated variables)
    • Partial variables and advanced decomposition
    • More flexible variable reordering
    • Word scoring display from word list metadata
    • Option for users to remove words from their word lists and download updated lists
    • Debug mode vs. QAT output mode toggle in the UI
  • Formatting and documentation improvements required as new test cases are added.

Community Involvement & Support

  • Contributors are encouraged to reach out via Discord, email, or YouTube comments for support or to suggest features.
  • Open invitation to collaborate, with contributions welcome in feature optimization (especially speed), bug fixes, and enhancements.
  • Contributors may freely include their involvement and improvements in their resumes.

Decisions

  • Launch QAT as an open-source project and invite community contributions — To help participants build more distinctive portfolios and accelerate progress via collaborative development.

Open Questions / Follow-Ups

  • What is the definitive process for reviewing and merging contributor code?
  • Where should new features and optimizations be prioritized: speed, advanced query types, or user customization?
  • How will advanced contributors coordinate on Discord vs. GitHub issues?