MLOps & Interviews

Home
About
Deadlines
Contributions
Project
Classes
- 01 - Introduction
- 02 - Deploy: first try!
  - Api deploy
- 03 - Batch prediction
  - Part 1
  - Part 2
    - Sql
    - Db tool
    - Dot env
    - Aps02 sql
- 04 - Interview 01
  - Faq
  - Int01
- 05 - Docker
  - Intro
  - S3
  - Docker
- 06 - Message Broker
  - Intro
  - Rabbitmq
  - Celery
- 07 - Lambda Functions
  - Part 1
  - Part 2
  - Part 3
- 08 - Queues
- 09 - Interview 02
  - Int02
- 10 - Documentation
- 11 - Logging
- 12 - CI/CD
- 13 - Tracking
  - Part 1
    - Intro
    - Mlflow
  - Part 2
    - Centralization
    - Model registry
- 14 - Interview 03
  - Int03
- 15 - Monitoring
  - Part 1
  - Part 2
    - Data drift ml
    - Performance degradation
- 16 - Data versioning
  - Part 1
    - Intro
    - Dvc
    - S3 storage
      - Practicing!
      - References
  - Part 2
    - Pipelines
    - Practicing
- 17 - Feature store
  - Intro
  - Feast
- 18 - Sagemaker
  - Intro
  - Sagemaker
- 19 - Interview 04
  - Int04

Classes
16 - Data versioning
Part 1

DVC + S3

It is possible to use dvc with a remote pointing to an S3 bucket.

Question! 1

Answer!

It facilitates collaboration between data scientists since information is centralized.
S3 is scalable, whereas local files can exceed the disk's storage capacity.
S3 is also durable and secure, with data replication capabilities.

Practicing!

Create another repository and repeat the procedures from the previous handout.

Some important steps:

Question! 2

Question! 3

Question! 4

Question! 5

Question! 6

Attention!

After finishing the class, delete the bucket you created!

References

ML complexity image: https://dvc.org/static/d40892521e2fff94dac9e59693f366df/5cd1d/data-ver-complex.webp
Versions image: https://dvc.org/static/39d86590fa8ead1cd1247c883a8cf2c0/aa619/project-versions.webp