Tip #3: Watch Your Own Data

Read the previous tip: Do Transfer Learning When You Can

If you decide to use a CNN-based model architecture, make sure there is a good reason for it, meaning that your input data needs to have some continuity between neighbors in at least one dimension. If this is not the case, then a CNN may not be the best choice and you should consider using another model such as a support vector machine (SVM).

Image data are well-suited for CNN models, as are signal traces and DNA/protein sequence data. However, there is a difference between these types of data: image data has continuity in two dimensions, while signal trace and sequence data have continuity in one dimension. Make sure your filters are designed appropriately, using 2D filters for image data and 1D filters for signal trace and sequence data.

Look at your data to determine which filter sizes to try. For example, if your sequence data appears to have a 4-base repeat pattern and a 20-base repeat pattern, it might make sense to use two convolutional layers with 4x1 or 5x1 filters.

Keep in mind that every modeling task is different and you should examine your own data to determine how to design your models.

Read next tip: Follow Tested Practices

Need assistance in your AI/deep learning project? We may be able to help. Take a look at the intro to our bioinformatician team, see some of the advantages of using our team's help here, and check out our FAQ page!

Send us an inquiry, chat with us online (during our business hours 9-5 Mon-Fri U.S. Central Time), or reach us in other ways!

FAQs

Company

Tip #3: Watch Your Own Data