本文描述了Android Content Provider实现时,sqlite数据库出现lock问题的大致原因和解决思路;
虽然最终可以归结为线程同步和资源竞争问题,但是对于trouble shooting却并非如此简单.
[Symptom]
From the logcat:
08-22 22:32:30.499 475 1692 W sqliteDatabase: database lock has not been available for 30 sec. Current Owner of the lock is 25. Continuing to wait in thread: 30
08-22 22:32:32.850 475 1836 W sqliteDatabase: database lock has not been available for 30 sec. Current Owner of the lock is 25. Continuing to wait in thread: 31
08-22 22:32:44.000 475 1059 E sqliteCursor.cpp: Bailing on database busy retry
08-22 22:32:44.000 475 1059 E sqliteQuery: exception: database is locked: retrycount exceeded; query: SELECT storage_id FROM files WHERE (_data=?)
08-22 22:32:44.000 475 1059 E DatabaseUtils: Writing exception to parcel
08-22 22:32:44.000 475 1059 E DatabaseUtils: android.database.sqlite.sqliteDatabaseLockedException: database is locked: retrycount exceeded
08-22 22:32:44.000 475 1059 E DatabaseUtils: at android.database.sqlite.sqliteQuery.nativeFillWindow(Native Method)
08-22 22:32:44.000 475 1059 E DatabaseUtils: at android.database.sqlite.sqliteQuery.fillWindow(sqliteQuery.java:86)
08-22 22:32:44.000 475 1059 E DatabaseUtils: at android.database.sqlite.sqliteCursor.fillWindow(sqliteCursor.java:164)
08-22 22:32:44.000 475 1059 E DatabaseUtils: at android.database.sqlite.sqliteCursor.getCount(sqliteCursor.java:156)
08-22 22:32:44.000 475 1059 E DatabaseUtils: at android.database.CursorToBulkCursorAdaptor.count(CursorToBulkCursorAdaptor.java:184)
08-22 22:32:44.000 475 1059 E DatabaseUtils: at android.content.ContentProviderNative.onTransact(ContentProviderNative.java:117)
08-22 22:32:44.000 475 1059 E DatabaseUtils: at android.os.Binder.execTransact(Binder.java:338)
08-22 22:32:44.000 475 1059 E DatabaseUtils: at dalvik.system.NativeStart.run(Native Method)
..........................................
08-22 22:33:36.420 475 1836 E sqliteCursor.cpp: Bailing on database busy retry
08-22 22:33:36.420 475 1836 E sqliteQuery: exception: database is locked: retrycount exceeded; query: SELECT _id FROM files WHERE (_data=?)
08-22 22:33:36.430 475 1836 E DatabaseUtils: Writing exception to parcel
08-22 22:33:36.430 475 1836 E DatabaseUtils: android.database.sqlite.sqliteDatabaseLockedException: database is locked: retrycount exceeded
08-22 22:33:36.430 475 1836 E DatabaseUtils: at android.database.sqlite.sqliteQuery.nativeFillWindow(Native Method)
08-22 22:33:36.430 475 1836 E DatabaseUtils: at android.database.sqlite.sqliteQuery.fillWindow(sqliteQuery.java:86)
08-22 22:33:36.430 475 1836 E DatabaseUtils: at android.database.sqlite.sqliteCursor.fillWindow(sqliteCursor.java:164)
08-22 22:33:36.430 475 1836 E DatabaseUtils: at android.database.sqlite.sqliteCursor.getCount(sqliteCursor.java:156)
08-22 22:33:36.430 475 1836 E DatabaseUtils: at android.database.CursorToBulkCursorAdaptor.count(CursorToBulkCursorAdaptor.java:184)
08-22 22:33:36.430 475 1836 E DatabaseUtils: at android.content.ContentProviderNative.onTransact(ContentProviderNative.java:117)
08-22 22:33:36.430 475 1836 E DatabaseUtils: at android.os.Binder.execTransact(Binder.java:338)
08-22 22:33:36.430 475 1836 E DatabaseUtils: at dalvik.system.NativeStart.run(Native Method)
..........................................
08-22 22:34:06.639 475 1059 W sqliteDatabase: database lock has not been available for 30 sec. Current Owner of the lock is 30. Continuing to wait in thread: 25
08-22 22:34:28.630 475 1692 E sqliteCursor.cpp: Bailing on database busy retry
08-22 22:34:28.630 475 1692 E sqliteQuery: exception: database is locked: retrycount exceeded; query: SELECT _id FROM images
08-22 22:34:28.640 475 1692 E MediaScanner: sqlException in MediaScanner.scan()
08-22 22:34:28.640 475 1692 E MediaScanner: android.database.sqlite.sqliteDatabaseLockedException: database is locked: retrycount exceeded
08-22 22:34:28.640 475 1692 E MediaScanner: at android.database.sqlite.sqliteQuery.nativeFillWindow(Native Method)
08-22 22:34:28.640 475 1692 E MediaScanner: at android.database.sqlite.sqliteQuery.fillWindow(sqliteQuery.java:86)
08-22 22:34:28.640 475 1692 E MediaScanner: at android.database.sqlite.sqliteCursor.fillWindow(sqliteCursor.java:164)
08-22 22:34:28.640 475 1692 E MediaScanner: at android.database.sqlite.sqliteCursor.getCount(sqliteCursor.java:156)
08-22 22:34:28.640 475 1692 E MediaScanner: at android.media.MediaScanner.prescan(MediaScanner.java:1132)
08-22 22:34:28.640 475 1692 E MediaScanner: at android.media.MediaScanner.scanDirectories(MediaScanner.java:1315)
08-22 22:34:28.640 475 1692 E MediaScanner: at com.android.providers.media.MediaScannerService.scan(MediaScannerService.java:111)
08-22 22:34:28.640 475 1692 E MediaScanner: at com.android.providers.media.MediaScannerService.access$300(MediaScannerService.java:49)
08-22 22:34:28.640 475 1692 E MediaScanner: at com.android.providers.media.MediaScannerService$ServiceHandler.handleMessage(MediaScannerService.java:271)
08-22 22:34:28.640 475 1692 E MediaScanner: at android.os.Handler.dispatchMessage(Handler.java:99)
08-22 22:34:28.640 475 1692 E MediaScanner: at android.os.Looper.loop(Looper.java:137)
08-22 22:34:28.640 475 1692 E MediaScanner: at com.android.providers.media.MediaScannerService.run(MediaScannerService.java:193)
08-22 22:34:28.640 475 1692 E MediaScanner: at java.lang.Thread.run(Thread.java:856)
08-22 22:34:28.640 475 1692 D MediaProvider: [MediaScannerService] broadcast ACTION_MEDIA_SCANNER_FINISHED with file:///mnt/sdcard
..........................................
08-22 22:34:58.719 475 1692 W sqliteDatabase: database lock has not been available for 30 sec. Current Owner of the lock is 25. Continuing to wait in thread: 30
08-22 22:34:58.969 475 1836 W sqliteDatabase: database lock has not been available for 30 sec. Current Owner of the lock is 25. Continuing to wait in thread: 31
..................
08-22 22:35:22.369 475 1059 E sqliteCursor.cpp: Bailing on database busy retry
08-22 22:35:22.369 475 1059 E sqliteQuery: exception: database is locked: retrycount exceeded; query: SELECT _id FROM files WHERE (_data=?)
08-22 22:35:22.369 475 1059 E DatabaseUtils: Writing exception to parcel
08-22 22:35:22.369 475 1059 E DatabaseUtils: android.database.sqlite.sqliteDatabaseLockedException: database is locked: retrycount exceeded
08-22 22:35:22.369 475 1059 E DatabaseUtils: at android.database.sqlite.sqliteQuery.nativeFillWindow(Native Method)
08-22 22:35:22.369 475 1059 E DatabaseUtils: at android.database.sqlite.sqliteQuery.fillWindow(sqliteQuery.java:86)
08-22 22:35:22.369 475 1059 E DatabaseUtils: at android.database.sqlite.sqliteCursor.fillWindow(sqliteCursor.java:164)
08-22 22:35:22.369 475 1059 E DatabaseUtils: at android.database.sqlite.sqliteCursor.getCount(sqliteCursor.java:156)
08-22 22:35:22.369 475 1059 E DatabaseUtils: at android.database.CursorToBulkCursorAdaptor.count(CursorToBulkCursorAdaptor.java:184)
08-22 22:35:22.369 475 1059 E DatabaseUtils: at android.content.ContentProviderNative.onTransact(ContentProviderNative.java:117)
08-22 22:35:22.369 475 1059 E DatabaseUtils: at android.os.Binder.execTransact(Binder.java:338)
08-22 22:35:22.369 475 1059 E DatabaseUtils: at dalvik.system.NativeStart.run(Native Method)
..........................................
08-22 22:35:52.380 475 1692 W sqliteDatabase: database lock has not been available for 30 sec. Current Owner of the lock is 31. Continuing to wait in thread: 30
08-22 22:35:52.949 475 486 W sqliteDatabase: database lock has not been available for 30 sec. Current Owner of the lock is 31. Continuing to wait in thread: 19
..........................................
08-22 22:36:15.209 475 1836 E sqliteCursor.cpp: Bailing on database busy retry
08-22 22:36:15.209 475 1836 E sqliteQuery: exception: database is locked: retrycount exceeded; query: SELECT _id,date_modified,_data,artist_id FROM audio ORDER BY _id
08-22 22:36:15.209 475 1836 E DatabaseUtils: Writing exception to parcel
08-22 22:36:15.209 475 1836 E DatabaseUtils: android.database.sqlite.sqliteDatabaseLockedException: database is locked: retrycount exceeded
08-22 22:36:15.209 475 1836 E DatabaseUtils: at android.database.sqlite.sqliteQuery.nativeFillWindow(Native Method)
08-22 22:36:15.209 475 1836 E DatabaseUtils: at android.database.sqlite.sqliteQuery.fillWindow(sqliteQuery.java:86)
08-22 22:36:15.209 475 1836 E DatabaseUtils: at android.database.sqlite.sqliteCursor.fillWindow(sqliteCursor.java:164)
08-22 22:36:15.209 475 1836 E DatabaseUtils: at android.database.sqlite.sqliteCursor.getCount(sqliteCursor.java:156)
08-22 22:36:15.209 475 1836 E DatabaseUtils: at android.database.CursorToBulkCursorAdaptor.count(CursorToBulkCursorAdaptor.java:184)
08-22 22:36:15.209 475 1836 E DatabaseUtils: at android.content.ContentProviderNative.onTransact(ContentProviderNative.java:117)
08-22 22:36:15.209 475 1836 E DatabaseUtils: at android.os.Binder.execTransact(Binder.java:338)
08-22 22:36:15.209 475 1836 E DatabaseUtils: at dalvik.system.NativeStart.run(Native Method)
..........................................
08-22 22:36:15.229 2036 2049 W dalvikvm: threadid=11: thread exiting with uncaught exception (group=0x40adb9f0)
08-22 22:36:15.229 2036 2049 E test : Exception
08-22 22:36:15.229 2036 2049 E AndroidRuntime: FATAL EXCEPTION: pool-1-thread-1
08-22 22:36:15.229 2036 2049 E AndroidRuntime: android.database.sqlite.sqliteException: database is locked: retrycount exceeded
08-22 22:36:15.229 2036 2049 E AndroidRuntime: at android.database.DatabaseUtils.readExceptionFromParcel(DatabaseUtils.java:180)
08-22 22:36:15.229 2036 2049 E AndroidRuntime: at android.database.DatabaseUtils.readExceptionFromParcel(DatabaseUtils.java:136)
08-22 22:36:15.229 2036 2049 E AndroidRuntime: at android.content.ContentProviderProxy.query(ContentProviderNative.java:358)
08-22 22:36:15.229 2036 2049 E AndroidRuntime: at android.content.ContentResolver.query(ContentResolver.java:311)
08-22 22:36:15.229 2036 2049 E AndroidRuntime: at com.xxxx.mediaextra.provider.MediaExtraUpdater.getAudioInfoListFromMediaDb(MediaExtraUpdater.java:629)
08-22 22:36:15.229 2036 2049 E AndroidRuntime: at com.xxxx.mediaextra.provider.MediaExtraUpdater.access$500(MediaExtraUpdater.java:61)
08-22 22:36:15.229 2036 2049 E AndroidRuntime: at com.xxxx.mediaextra.provider.MediaExtraUpdater$2.run(MediaExtraUpdater.java:200)
08-22 22:36:15.229 2036 2049 E AndroidRuntime: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1076)
08-22 22:36:15.229 2036 2049 E AndroidRuntime: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:569)
08-22 22:36:15.229 2036 2049 E AndroidRuntime: at java.lang.Thread.run(Thread.java:856)
08-22 22:36:15.249 258 593 W ActivityManager: sendCrashtoMTBF!!
[Reason]
1. The Provider-use side
The App with pid 2049 uses ContentResolver proxy to query Audio Info and gets an exception string in the reply parcel from the Provider-Impl side.
The following line output by pid 2049
08-22 22:36:15.229 2036 2049 E AndroidRuntime: android.database.sqlite.sqliteException: database is locked: retrycount exceeded,is the EXCEPTION interpretion string.
The call realtion throw the exception.
query(...) => readExceptionFromParcel(String reply) => readExceptionFromParcel(Parcel reply,String msg,int code)
326 public Cursor ContentResolver::query(Uri url,String[] projection,String selection,
327 String[] selectionArgs,String sortOrder) throws RemoteException {
328 BulkCursorToCursorAdaptor adaptor = new BulkCursorToCursorAdaptor();
..............
353 data.writeString(sortOrder);
354 data.writeStrongBinder(adaptor.getObserver().asBinder());
355
356 mRemote.transact(IContentProvider.QUERY_TRANSACTION,data,reply,0);
357
358 DatabaseUtils.readExceptionFromParcel(reply);
359
..............
}
163 private static final void DatabaseUtils::readExceptionFromParcel(Parcel reply,int code) {
164 switch (code) {
165 case 2:
166 throw new IllegalArgumentException(msg);
167 case 3:
168 throw new UnsupportedOperationException(msg);
169 case 4:
170 throw new sqliteAbortException(msg);
171 case 5:
172 throw new sqliteConstraintException(msg);
173 case 6:
174 throw new sqliteDatabaseCorruptException(msg);
175 case 7:
176 throw new sqliteFullException(msg);
177 case 8:
178 throw new sqliteDiskIOException(msg);
179 case 9:
180 throw new sqliteException(msg);
181 default:
182 reply.readException(code,msg);
183 }
184 }
2. The Provider-Implemention side.
The Binder Thread #x series are 1692,1836,1059 and 486. Worker thread with a small pid value 486 may be one main worker thread.
When one thread wants to modify the database,it acquires and holds a lock on the sqliteDatabase.
The first two-line like log is printed in sqliteDatabase::lock@sqliteDatabase.java,when the binder worker threads do while-try to get lock on the sqliteDatabase::mLock.
407 private static final long LOCK_WAIT_PERIOD = 30L;
408 private void sqliteDatabase::lock(String sql,boolean forced) {
409 // make sure this method is NOT being called from a 'synchronized' method
410 if (Thread.holdsLock(this)) {
411 Log.w(TAG,"don't lock() while in a synchronized method");
412 }
413 verifyDbIsOpen();
414 if (!forced && !mLockingEnabled) return;
415 boolean done = false;
416 long timeStart = SystemClock.uptimeMillis();
417 while (!done) {
418 try {
419 // try to acquire the lock within 30sec
420 done = mLock.tryLock(LOCK_WAIT_PERIOD,TimeUnit.SECONDS);
421 if (!done) {
422 // lock not acquired in NSec. print a message and stacktrace saying the lock
423 // has not been available for 30sec.
424 Log.w(TAG,"database lock has not been available for " + LOCK_WAIT_PERIOD +
425 " sec. Current Owner of the lock is " + mLock.getOwnerDescription() +
426 ". Continuing to wait in thread: " + Thread.currentThread().getId());
427 }
428 } catch (InterruptedException e) {
429 // ignore the interruption
430 }
431 }
432 if (sqliteDebug.DEBUG_LOCK_TIME_TRACKING) {
433 if (mLock.getHoldCount() == 1) {
434 // Use elapsed real-time since the cpu may sleep when waiting for IO
435 mLockAcquiredWallTime = SystemClock.elapsedRealtime();
436 mLockAcquiredThreadTime = Debug.threadcpuTimeNanos();
437 }
438 }
439 if (sql != null) {
440 if (ENABLE_DB_SAMPLE) {
441 logTimeStat(sql,timeStart,GET_LOCK_LOG_PREFIX);
442 }
443 }
444 }
The claim of Bailing on database busy is output in nativeFillWindow@android_database_sqliteQuery.cpp
If one process tries to access a sqlite file that is locked by another process,the default behavior is to return sqlITE_BUSY.
And if the table in the database is locked by another process,sqlITE_LOCKED is returned.
22 public static final int sqlITE_BUSY = 5; /* The database file is locked */
23 public static final int sqlITE_LOCKED = 6; /* A table in the database is locked */
The code of nativeFillWindow@android_database_sqliteQuery.cpp list as follows.
38 static jint nativeFillWindow(JNIEnv* env,jclass clazz,jint databasePtr,
39 jint statementPtr,jint windowPtr,jint startPos,jint offsetParam) {
40 sqlite3* database = reinterpret_cast<sqlite3*>(databasePtr);
41 sqlite3_stmt* statement = reinterpret_cast<sqlite3_stmt*>(statementPtr);
42 CursorWindow* window = reinterpret_cast<CursorWindow*>(windowPtr);
59 // We assume numRows is initially 0.
60 LOG_WINDOW("Window: numRows = %d,size = %d,freeSpace = %d",
61 window->getNumRows(),window->size(),window->freeSpace());
62
63 int numColumns = sqlite3_column_count(statement);
64 status_t status = window->setNumColumns(numColumns);
65 if (status) {
66 LOGE("Failed to change column count from %d to %d",window->getNumColumns(),numColumns);
67 jniThrowException(env,"java/lang/IllegalStateException","numColumns mismatch");
68 return 0;
69 }
70
71 int retryCount = 0;
72 int totalRows = 0;
73 int addedRows = 0;
74 bool windowFull = false;
75 bool gotException = false;
76 const bool countAllRows = (startPos == 0); // when startPos is 0,we count all rows
77 while (!gotException && (!windowFull || countAllRows)) {
78 int err = sqlite3_step(statement);
79 if (err == sqlITE_ROW) {
80 LOG_WINDOW("Stepped statement %p to row %d",statement,totalRows);
} else if (err == sqlITE_DONE) {
182 // All rows processed,bail
183 LOG_WINDOW("Processed all rows");
184 break;
185 } else if (err == sqlITE_LOCKED || err == sqlITE_BUSY) {
186 // The table is locked,retry
187 LOG_WINDOW("Database locked,retrying");
188 if (retryCount > 50) {
189 LOGE("Bailing on database busy retry");
190 throw_sqlite3_exception(env,database,"retrycount exceeded");
191 gotException = true;
192 } else {
193 // Sleep to give the thread holding the lock a chance to finish
194 usleep(1000);
195 retryCount++;
196 }
197 } else {
198 throw_sqlite3_exception(env,database);
199 gotException = true;
200 }
201 }
202
203 LOG_WINDOW("Resetting statement %p after fetching %d rows and adding %d rows"
204 "to the window in %d bytes",
205 statement,totalRows,addedRows,window->size() - window->freeSpace());
206 sqlite3_reset(statement);
207
208 // Report the total number of rows on request.
209 if (startPos > totalRows) {
210 LOGE("startPos %d > actual rows %d",startPos,totalRows);
211 }
212 return countAllRows ? totalRows : 0;
213}
When the binder worker thread succeed in acquiring the sqliteDatabase::mLock in while-try loop,it uses jni nativeFillWindow() via sqlite api to query sqlite and fill the CursorWindow;
If the table or the file is locked by another process,it makes a 50-times try. If retrycount exceeded,the thread Bails on database busy and exception string is passed to the client.
[Conclusion]
So,the reason is that the sqliteDatabase high-level lock doesn't well protect the database access,and threads get into sqlite-level lock problem.
It is a sqlite access competition issue.
Why the thread has acquired sqliteDatabase::mLock BUT encounters to sqlite-level's lcok problem.
There may be three reasons,
a. The threads use different sqliteDatabase instance(should be singleton),so not the sole sqliteDatabase::mLock.
b. Some thread forgets closing the sqlite and releasing sqlite lock whenit leaves.
c. Some thread is in a long time wasted sqlite batch transaction.
For one table or file-db,it is recommended to use one sqliteDatabase instance which uses its lock to protect the db.
And pay attention to the comment "// make sure this method is NOT being called from a 'synchronized' method" in sqliteDatabase::lock().
Pls check com.xxxx.ppppp.provider.MediaExtraUpdater's code; For further investigating,com.xxxx.Metadatacleanup source code is needed.