Data Scientist/Engineer/Architect & Quantitative Analyst
"You never fail until you stop trying" - Albert Einstein
Hello! My name is Hair (Ha-ee-R), a seasoned Data Scientist and Engineer with a passion for machine learning, statistics, NLP, and a high interest in quantitative finance. I'm currently pursuing a MSc. in Data Science and Business Analytics. I hold a major in Computer Science and a double minor in Statistics and Linguistics from McGill University, and have more than 4 combined years of professional experience. Beyond my technical pursuits, I'm an avid article writer, a lifelong learner, and a casual fingerstyle guitarist. I also enjoy singing, dancing, and take pride in being a polyglot: Spanish, English, French, some Italian, Portuguese, Mandarin Chinese, Japanese, and more!"
Skilled in Data Science and Machine Learning, with a strong foundation in Deep Learning and Natural Language Processing. Experienced in building machine learning models and leveraging Generative AI. Well-acquainted with NLP technologies, including advanced models like ChatGPT and LLMs. Adept throughout the ML life cycle, from data collection to deployment.
Data Engineering and DevOps. Experienced in constructing and optimizing scalable data pipelines, ensuring data integrity and availability across various platforms. Proficient in integrating diverse data sources and leveraging cloud solutions to facilitate efficient data processing and storage. Demonstrated expertise in both batch and stream processing, adapting to the dynamic needs of data-driven environments.
During my MSc. I also gained advanced skills in Financial Engineering, including expertise in quantitative portfolio optimization techniques like Minimum Variance and Maximum Diversification Portfolio. Skilled in both parametric and non-parametric security modeling, including multivariate approaches such as Gaussian and Copulas. Proficient in quantitative asset management and risk mitigation.
Multi-class image classification challenge on the modified MNIST dataset, in which each example image contains three different digits. The challenge is to correctly classify them by identifying the highest digit, and outputint it as a target.
An innovative portfolio diversification algorithm using financial engineering and Beta-VAE stock embeddings.
AI-powered behavioral analysis framework of Wallstreetbets investor discussions using topic clustering, sentiment analysis, text mining, and data analytics with Python, HuggingFace, Streamlit, and Google Cloud.
Goal: risk management framework for estimating the risk of a book of European call options on the SP500 by taking into account the risk drivers such as underlying and implied volatility. A rigourous theoretical and applied approach is employed with the help of the R programming language, RStudio, and Git/Github.
Algorithmic momentum-replication strategy based on the SP500 leveraging advanced statistical techniques and machine learning techniques such as feature engineering, feature selection, regression and classification methods (e.g. Elasticnet, Lasso, Random Forests). With the help of a mix between technical and traditional portfolio optimization techniques and under certain hypothesis, we achieve backtesting performance comparable to the SP500.
Modeling of the accidents observed in the last 10 years to provide the city of Montreal with a ranking of the 1864 intersections in terms of safety (from the most dangerous to the least dangerous), so that it can prioritize the riskiest intersections with the aim of improving infrastructure.
A number of algorithmic trading strategies employing a number of advanced statistical and machine-learning tecnhiques for Equity and Options trading. Development of these strategies tested on QuantConnect.
Minimum-variance portfolio optimization with rolling window in R.
Why is it so hard to predict stocks?. In this article, I presente a complete forecasting pipeline using ARIMA model in R, including data preprocessing and engineering.
An NLP information extraction application for curriculum vitaes.
Improved automatic text summarization using centroid embeddings combined with Latent Dirlichet Analysis (LDA) as an improvement to the algorithm described in the original paper.
A simple wrapper for Python Matplotlib library... with cats.
A couple of images I produced using the famous neural style transfer with different style images
Initially as a final project for my database systems class, I created a full database application using PostgreSQL and Python from scratch, including design, schema, and a friendly user interface.
Multi-class image classification challenge on the modified MNIST dataset, in which each example image contains three different digits. The challenge is to correctly classify them by identifying the highest digit, and outputint it as a target.
After having taken half a Web Developing course on Udemy, and googling every 2 seconds, I learned a coupl of things. The page you are navigating right now is the result of that effort :) .
Reddit dataset multi-class text classification challenge using integrated NLP analysis and traditional machine learning pipelines using Python libraries such as nltk and sklearn.