The meeting covered strategies for making data science portfolios and resumes stand out, emphasizing the importance of unique, open-source projects over stock datasets.
The QAT open-source word search project was introduced, with details on its features, current limitations, and opportunities for community contribution.
Future improvements and optimization needs were discussed, as well as channels for collaboration (GitHub, Discord) and support for contributors.
Action Items
As needed – Project owner: Update the README and project documentation as new features or improvements are added.
As needed – Contributors: Propose and implement new features, optimizations, or bug fixes; communicate updates via Discord or GitHub.
As needed – Project owner: Provide support to contributors via Discord/email upon request.
As needed – Contributors: Test new queries and features, ensuring all test cases and formatting in documentation are correct.
Portfolio & Resume Guidance for Data Science Roles
Many applicants submit resumes with standard “stock” projects (e.g., Kaggle Titanic, housing price predictions), which do not help them stand out in an increasingly competitive field.
While stock projects are valuable for learning, candidates are encouraged to contribute to open-source or uniquely challenging projects to differentiate themselves.
The channel's plan includes offering members opportunities to join collaborative open-source efforts that better showcase practical skills.
QAT Open-Source Word Search Project Overview
The QAT project replicates and extends existing word search functionality, supporting complex queries that may involve up to a billion computations.
Hosted on GitHub and accessible via both a Streamlit app and a dedicated Discord channel.
Key advancements over the original QAT include:
Flexible output limits and match limits
Customizable word lists (not static)
Query timeouts to allow for performance management
Running multiple queries in parallel
Project documentation includes README, feature lists, known limitations, and important links (e.g., word lists, Streamlit app).
Features & Demonstrations
Supports various pattern-matching and variable-based queries (e.g., anagrams, wildcard searches, length constraints).
Demonstrated current functionality through live and documented examples, including debugging cases and advanced combinations.
Performance optimizations in place, but further work is needed as feature complexity grows.
Known Issues & Future Improvements
Planned features not yet supported:
Multiple variables in a row (e.g., repeated variables)
Partial variables and advanced decomposition
More flexible variable reordering
Word scoring display from word list metadata
Option for users to remove words from their word lists and download updated lists
Debug mode vs. QAT output mode toggle in the UI
Formatting and documentation improvements required as new test cases are added.
Community Involvement & Support
Contributors are encouraged to reach out via Discord, email, or YouTube comments for support or to suggest features.
Open invitation to collaborate, with contributions welcome in feature optimization (especially speed), bug fixes, and enhancements.
Contributors may freely include their involvement and improvements in their resumes.
Decisions
Launch QAT as an open-source project and invite community contributions — To help participants build more distinctive portfolios and accelerate progress via collaborative development.
Open Questions / Follow-Ups
What is the definitive process for reviewing and merging contributor code?
Where should new features and optimizations be prioritized: speed, advanced query types, or user customization?
How will advanced contributors coordinate on Discord vs. GitHub issues?